CS 208 s22 — Buffer Overflow Attacks
Table of Contents
1 Background
1.1 Stack Frame Review
- In x86-64 Linux
- stack segment of memory starts at
0x00007fffffffffff
and grows down - code segment of memory starts around
0x400000
and grows up
- stack segment of memory starts at
1.2 What is a Buffer?
- array used to temporarily store data
- video buffering is the video being written to a buffer before being played
- bufferes are often used to store user input
1.3 What is a Buffer Overflow?
- arrays can be stored on the stack alongside procedure data like the return address
- C does not prevent writing to elements beyond the end of an array
- together, these two facts allow for a buffer overflow where program state on the stack is corrupted
- for example, overwritting the return address pushed on the stack by the caller would cause the program to jump to an unexpected or invalid place
Input that does not overflow the buffer:
Input that does overflow the buffer, overwriting part of the return address:
- attacker just has to choose the right inputs to overwrite interesting data
- simple attack: overwrite the current return address (sometimes called stack smashing)
- for a long time this was the #1 technical cause of security vulnerabilities
- I say technical cause because the #1 overall cause is pretty much always humans (social engineering, ignorance, etc.)
1.3.1 Example
/* Get string from stdin */ char* gets(char* dest) { int c = getchar(); char* p = dest; while (c != EOF && c != '\n') { *p++ = c; c = getchar(); } *p = '\0'; return dest; }
- what could go wrong here?
- we could read in a lot more data than
dest
has room for, overwriting things gets
has no information about the size ofdest
(just passed as a pointer to the start of the array)
- we could read in a lot more data than
1.3.2 gets
Known to be Harmful
- bugs section of
gets
man page
Never use
gets()
. Because it is impossible to tell without knowing the data in advance how many charactersgets()
will read, and becausegets()
will continue to store characters past the end of the buffer, it is extremely dangerous to use. It has been used to break computer security. Usefgets()
instead.
- also a problem with
strcpy
,scanf
,fscanf
,sscanf
gcc
even gives you a warning:the `gets' function is dangerous and should not be used.
2 Buffer Overflow In Action
Consider this (very insecure) code:
/* Echo Line */ void echo() { char buf[8]; /* Way too small! */ gets(buf); puts(buf); } void call_echo() { echo(); }
- full source code here
Running
gdb -batch buf-nsp -ex 'disas echo'
gives0x0000000000400537 <+0>: push %rbx 0x0000000000400538 <+1>: sub $0x10,%rsp 0x000000000040053c <+5>: lea 0x8(%rsp),%rbx 0x0000000000400541 <+10>: mov %rbx,%rdi 0x0000000000400544 <+13>: mov $0x0,%eax 0x0000000000400549 <+18>: callq 0x400440 <gets@plt> 0x000000000040054e <+23>: mov %rbx,%rdi 0x0000000000400551 <+26>: callq 0x400430 <puts@plt> 0x0000000000400556 <+31>: add $0x10,%rsp 0x000000000040055a <+35>: pop %rbx 0x000000000040055b <+36>: retq
- So when we call
- entering
123456781234567
works fine- overwriting saved
%rbx
on the stack
- overwriting saved
- entering
1234567812345678
causes a segmentation fault- overwriting least significant byte of return address with
'\0'
(0x00
), making it0x400500
- causes program to return into the utility function
<__do_global_dtors_aux>
, since that happens to start at address0x400500
- when that function returns, it uses whatever happens to be on top of the stack as a return address, causing a segmentation fault
- overwriting least significant byte of return address with
- entering
12345678123456781
causes a segmentation fault- overwriting two low-order bytes of return address with
'1'
and'\0'
, making it0x400031
- this isn't a valid memory address for our program, triggering a segmentation fault when we try to access it
- overwriting two low-order bytes of return address with
3 Code Injection Attack
- very common attack to get a program to execute an arbitrary function
- over a network, program given a string containing executable code (exploit code) with extra data to overwrite a return address with the location of the exploit
- exploit might use a system call to start a shell giving the attacker access to the system
- exploit might do some mischief, then repair the stack and call
ret
again, giving the appearance of normal behavior
3.1 Real World examples
- buffer overflow exploits are alarmingly common in real programs
- programmers keep making the same mistakes
- recent innovations have improved the situation
3.1.1 Internet Worm (1988)
- protocol for getting the status of a server (
fingerd
) usedgets
to read its argument - worm sent exploit code that executed a root shell on the target machine
- scanned other machines to attack, invaded about 6000 computers in a matter of hours (10% of the Internet at that time)
- see June 1989 article in Comm. of the ACM
- author (Robert Morris) was first person ever convicted under the Computer Fraud and Abuse Act, now faculty at MIT (so I guess things turned out all right for him)
3.1.2 Heartbleed (2014)
- affected Tumblr, Google, Yahoo, Intuit (makers of TurboTax), Dropbox, Netflix, Facebook, and many, many smaller sites
3.1.3 Hacking Cars
- in 2010, UW researchers demonstrated wirelessly hacking a car using buffer overflow
- overwrote the onboard control system's code
- disable brakes
- unlock doors
- turn engine on/off
3.1.4 Hacking DNA Sequencing Machines
- in 2017, security researchers demonstrated that a buffer overflow exploit could be encoded in DNA
- when read by vulnerable sequencing software, the attack could compromise the sequencing machine
4 Countermeasures
4.1 System Level
4.1.1 Non-Executable Stack
x86-64 added execute permission (not all systems have hardware support, doesn't block all exploits)
4.1.2 Stack Randomization
- in the past, stack addresses were highly predictable, meaning if an attacker could determine addresses for a common web server, than many machines were vulnerable
- make it unpredictable by allocating between 0 and \(n\) bytes on the stack at the start of the program
- part of a larger class of techniques called address-space layout randomization (ASLR)
- in general, this randomization can greatly increase the effort required for a successful attack, but cannot guarantee safety
4.2 Writing Better Code
- Use the safe versions of C library functions:
fgets
instead ofgets
,strncpy
instead ofstrcpy
- Avoid using the
%s
format specifier—provide a max width like%20s
- Use a safer programming language! C has unique vulnerabilities.
4.2.1 Stack Corruption Detection
- detect when stack corruption occurs before it can have harmful effects
gcc
now uses stack protectors to detect buffer overflows- canary value (or guard value) between buffer and rest of the stack
- generated each time the program runs, so hard for attacker to know what it will be
- stored in a read-only segment of memory (so attacker can't modify it)
- prevents many common attack strategies
5 Exercise
vulnerable: subq $0x40, %rsp ... leaq 0x10(%rsp), %rdi call gets ...
What is the minimum number of characters that gets must read in order for us to change the return address to a stack address?
For example, change 0x00 00 00 00 00 40 05 D1
to 0x00 00 7F FF CA FE F0 0D
1
6 Practice
CSPP practice problem 3.46 (p. 282)
- Return address of
0x400776
%rbx
has0x0123456789ABCDEF
- input is "0123456789012345678901234"
char *get_line() { char buf[4]; char *result; gets(buf); result = malloc(strlen(buf)); strcpy(result, buf); return result; }
get_line: 400720: push %rbx 400721: sub $0x10, %rsp // diagram the stack at this point 400725: mov %rsp, %rdi 400728: callq gets // modify your diagram to show stack contents at this point
Footnotes:
gets
would need to read in 54 character (bytes) to overwrite
the return address with a stack address. subq 0x40, %rsp
tells us
that the stack frame for the function is 64 bytes. leaq 0x10(%rsp),
%rdi
tells us that the start of the buffer passed to gets
is 16
bytes above the top of the stack, making it 48 bytes from the start of
the stack frame. The stack address we want to replace the return
address with is 6 non-zero bytes, so we need to pass 48 bytes of
filler followed by 6 bytes of address to gets
in order to execute
this part of our attack, for a total of 54 bytes.