CS 208 f21 — Buffer Overflow Attacks

1. Background
2. Buffer Overflow In Action
3. Code Injection Attack
- 3.1. Real World examples
4. Countermeasures
- 4.1. System Level
  - 4.1.1. Non-Executable Stack
  - 4.1.2. Stack Randomization
- 4.2. Writing Better Code
  - 4.2.1. Stack Corruption Detection
5. Practice
6. Starting Lab 3

1 Background

1.1 Stack Frame Review

In x86-64 Linux
- stack segment of memory starts at 0x00007fffffffffff and grows down
- code segment of memory starts at 0x400000 and grows up

1.2 What is a Buffer?

array used to temporarily store data
video buffering is the video being written to a buffer before being played
bufferes are often used to store user input

1.3 What is a Buffer Overflow?

arrays can be stored on the stack alongside procedure data like the return address
C does not prevent writing to elements beyond the end of an array
together, these two facts allow for a buffer overflow where program state on the stack is corrupted
- for example, overwritting the return address pushed on the stack by the caller would cause the program to jump to an unexpected or invalid place

Input that does not overflow the buffer:

Input that does overflow the buffer, overwriting part of the return address:

attacker just has to choose the right inputs to overwrite interesting data
simple attack: overwrite the current return address (sometimes called stack smashing)
for a long time this was the #1 technical cause of security vulnerabilities
- I say technical cause because the #1 overall cause is pretty much always humans (social engineering, ignorance, etc.)

1.3.1 Example

/* Get string from stdin */
char* gets(char* dest) {
    int c = getchar();
    char* p = dest;
    while (c != EOF && c != '\n') {
        *p++ = c;
        c = getchar();
    }
    *p = '\0';
    return dest;
}

what could go wrong here?
- we could read in a lot more data than dest has room for, overwriting things
- gets has no information about the size of dest (just passed as a pointer to the start of the array)

1.3.2 `gets` Known to be Harmful

bugs section of gets man page

Never use gets(). Because it is impossible to tell without knowing the data in advance how many characters gets() will read, and because gets() will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Use fgets() instead.

also a problem with strcpy, scanf, fscanf, sscanf
gcc even gives you a warning: the `gets' function is dangerous and should not be used.

2 Buffer Overflow In Action

Consider this (very insecure) code:

/* Echo Line */
void echo() {
    char buf[8];  /* Way too small! */
    gets(buf);
    puts(buf);
}

void call_echo() {
    echo();
}

full source code here
entering 01234567890123456789012 works fine
- overwriting unused space on the stack
entering 012345678901234567890123 causes an illeagal instruction error
- overwriting least significant byte of return address with '\0' (0x00), making it 0x400500
- causes program to return into the middle of another instruction, CPU triggers an exception when we try to execute it as code
entering 0123456789012345678901234 causes a segmentation fault
- overwriting two low-order bytes of return address with '4' and '\0', making it 0x400034
- this isn't a valid memory address for our program, triggering a segmentation fault when we try to access it

From objdump -d buf-nsp:

0000000000400566 <echo>:
  400566:       48 83 ec 18             sub    $0x18,%rsp
  40056a:       48 89 e7                mov    %rsp,%rdi
  40056d:       b8 00 00 00 00          mov    $0x0,%eax
  400572:       e8 d9 fe ff ff          callq  400450 <gets@plt>
  400577:       48 89 e7                mov    %rsp,%rdi
  40057a:       e8 b1 fe ff ff          callq  400430 <puts@plt>
  40057f:       48 83 c4 18             add    $0x18,%rsp
  400583:       c3                      retq

0000000000400584 <call_echo>:
  400584:       48 83 ec 08             sub    $0x8,%rsp
  400588:       b8 00 00 00 00          mov    $0x0,%eax
  40058d:       e8 d4 ff ff ff          callq  400566 <echo>
  400592:       48 83 c4 08             add    $0x8,%rsp
  400596:       c3                      retq

Spreadsheet example

3 Code Injection Attack

very common attack to get a program to execute an arbitrary function
- over a network, program given a string containing executable code (exploit code) with extra data to overwrite a return address with the location of the exploit
- exploit might use a system call to start a shell giving the attacker access to the system
- exploit might do some mischief, then repair the stack and call ret again, giving the appearance of normal behavior

3.1 Real World examples

buffer overflow exploits are alarmingly common in real programs
- programmers keep making the same mistakes
- recent innovations have improved the situation

3.1.1 Internet Worm (1988)

protocol for getting the status of a server (fingerd) used gets to read its argument
worm sent exploit code that executed a root shell on the target machine
scanned other machines to attack, invaded about 6000 computers in a matter of hours (10% of the Internet at that time)
- see June 1989 article in Comm. of the ACM
author (Robert Morris) was first person ever convicted under the Computer Fraud and Abuse Act, now faculty at MIT (so I guess things turned out all right for him)

3.1.2 Heartbleed (2014)

affected Tumblr, Google, Yahoo, Intuit (makers of TurboTax), Dropbox, Netflix, Facebook, and many, many smaller sites

3.1.3 Hacking Cars

in 2010, UW researchers demonstrated wirelessly hacking a car using buffer overflow
overwrote the onboard control system’s code
- disable brakes
- unlock doors
- turn engine on/off

3.1.4 Hacking DNA Sequencing Machines

in 2017, security researchers demonstrated that a buffer overflow exploit could be encoded in DNA
when read by vulnerable sequencing software, the attack could compromise the sequencing machine

4 Countermeasures

4.1 System Level

4.1.1 Non-Executable Stack

x86-64 added execute permission (not all systems have hardware support, doesn't block all exploits)

4.1.2 Stack Randomization

in the past, stack addresses were highly predictable, meaning if an attacker could determine addresses for a common web server, than many machines were vulnerable
make it unpredictable by allocating between 0 and \(n\) bytes on the stack at the start of the program
part of a larger class of techniques called address-space layout randomization (ASLR)
in general, this randomization can greatly increase the effort required for a successful attack, but cannot guarantee safety

4.2 Writing Better Code

Use the safe versions of C library functions: fgets instead of gets, strncpy instead of strcpy
Avoid using the %s format specifier—provide a max width like %20s
Use a safer programming language! C has unique vulnerabilities.

4.2.1 Stack Corruption Detection

detect when stack corruption occurs before it can have harmful effects
gcc now uses stack protectors to detect buffer overflows
- canary value (or guard value) between buffer and rest of the stack
- generated each time the program runs, so hard for attacker to know what it will be
- stored in a read-only segment of memory (so attacker can't modify it)
prevents many common attack strategies
- Exercise

vulnerable:
    subq  $0x40, %rsp
     ...
    leaq  0x10(%rsp), %rdi
    call  gets
     ...

What is the minimum number of characters that gets must read in order for us to change the return address to a stack address?

For example, change 0x00 00 00 00 00 40 05 D1 to 0x00 00 7F FF CA FE F0 0D ¹

5 Practice

CSPP practice problem 3.46 (p. 282)

Return address of 0x400776
%rbx has 0x0123456789ABCDEF
input is "0123456789012345678901234"

char *get_line() {
    char buf[4];
    char *result;
    gets(buf);
    result = malloc(strlen(buf));
    strcpy(result, buf);
    return result;
}

get_line:
  400720:    push %rbx
  400721:    sub $0x10, %rsp
// diagram the stack at this point
  400725:    mov %rsp, %rdi
  400728:    callq gets
// modify your diagram to show stack contents at this point

6 Starting Lab 3

Like lab 2, your goal is to analyze compile programs in order to devise the right input strings. In this case, the compiled programs have buffer overflow vulnerabilites. The string you input can overflow a char array on the stack and overwrite the return address, as well as writing other data to the stack as necessary for the attack.

6.1 Getting a target

Go to http://awb66333.mathcs.carleton.edu:15513/, enter your Carleton username and email, download the tar file. This lab must be done on Linux, and working on mantis is probably the easiest option. Use VS Code or go to https://mantis.mathcs.carleton.edu:9595/ and upload the tar file. Extract it using the terminal.

6.2 Stages

The target tar file includes two vulnerable programs ctarget and rtarget. You can analyze the assembly code for these using objdump and gdb, though the only relevant function is getbuf which contains the buffer overflow vulnerability you will exploit. The real work of the lab is thinking carefully about how the stack and ret instructions work, and designing the right input strings.

The lab consists of five phases, three for ctarget and two for rtarget. The first (ctarget level 1) is a simple stack-smashing attack (i.e., just overwrite the return address to execute the desired function). The second and third (ctarget levels 2 and 3) require code injection (i.e., putting your own code on the stack via buffer overflow and causing it to execute). The final two (rtarget levels 2 and 3) involve a technique called return-oriented programming (see the writeup for a description). Progress will be automatically recorded, like lab 2 (requires a campus-internet connection). See http://awb66333.mathcs.carleton.edu:15513/progress for live progress, run ctarget and rtarget with -q to work offline.

6.3 Tools

6.3.1 `hex2raw`

You will want to write specific hex values to the stack. Your input, however, is an ASCII string. If you want to write the bytes 40, 51, and 2b to the stack, your input will need to include '@', 'Q', and '+'. The tool hex2raw has been provided to do this conversion for you. You put the hex values you want to be written to the buffer in a text file (e.g., type 40 51 2b into ctarget.phase1), and then run

cat ctarget.phase1 | ./hex2raw | ./ctarget

to convert the hex values into the corresponding ASCII and use that as input to ctarget. See Appendix A of the lab writeup for more information.

6.3.2 Generating machine code

For ctarget levels 2 and 3 you will need to write the hex values for specific assembly instructions to the stack. See Appendix B for how to generate these hex values from a file with hand-written assembly.

Footnotes:

gets would need to read in 54 character (bytes) to overwrite the return address with a stack address. subq 0x40, %rsp tells us that the stack frame for the function is 64 bytes. leaq 0x10(%rsp), %rdi tells us that the start of the buffer passed to gets is 16 bytes above the top of the stack, making it 48 bytes from the start of the stack frame. The stack address we want to replace the return address with is 6 non-zero bytes, so we need to pass 48 bytes of filler followed by 6 bytes of address to gets in order to execute this part of our attack, for a total of 54 bytes.