May 3, 2022
gdb
Cheat SheetThis assignment involves generating a total of five attacks on two programs having different security vulnerabilities. Similar to lab 2, this will consist of analyzing pre-compiled executables and devising appropriate inputs. Outcomes you will gain from this lab include:
gdb
and objdump
.In this lab, you will gain firsthand experience with methods used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of these security weaknesses so that you can avoid them when you write system code. We do not condone the use of any other form of attack to gain unauthorized access to any system resources.
You will generate attacks (in the form of input strings) for target programs that are custom generated for you. They are generated on a Linux machine, so you will need to complete this lab on a Linux machine.
Your progress is automatically tracked like in lab 2, so there will not be anything you need to submit. Also like in lab 2, this automatic submission requires you be connected to the internet. You can run the target programs with a-q
flag to prevent them
from trying to contact the grading server if you are working offline, as
in
./ctarget -q
You can obtain your files by pointing your Web browser at: http://awb66333.mathcs.carleton.edu:15513/
The server will build your files and return them to your browser in a
tar
file called targetK.tar
, where
K
is the unique number of your target programs.
Note: It takes a few seconds to build and download your target, so please be patient.
Save the targetK.tar
in a directory in which you plan to
do your work (or upload it to mantis
).
Then run this terminal command from that directory:
tar -xvf targetK.tar
. This will extract a directory
targetK
containing the files described below.
You should only download one set of files. If for some reason you download multiple targets, choose one target to work on and delete the rest.
The files in targetK
include:
README.txt
: A file describing the contents of the
directoryctarget
: An executable program vulnerable to
code-injection attacksctarget.phaseN
: Empty files for your solutions to the
ctarget
phasesrtarget
: An executable program vulnerable to
return-oriented-programming attacksrtarget.phaseN
: Empty files for your solutions to the
rtarget
phasescookie.txt
: An 8-digit hex code that you will use as a
unique identifier in your attacks.farm.c
: The source code of your target’s “gadget farm,”
which you will use in generating return-oriented programming
attacks.hex2raw
: A utility to generate attack strings.Both ctarget
and rtarget
read strings from
standard input (the terminal). They do so with the function
getbuf
defined below:
unsigned getbuf()
{
char buf[BUFFER_SIZE];
(buf);
Getsreturn 1;
}
The function Gets
is similar to the standard library
function gets
—it reads a string from standard input
(terminated by '\n'
or end-of-file) and stores it (along
with a null terminator) at the specified destination. In this code, you
can see that the destination is an array buf
, declared as
having BUFFER_SIZE
bytes. At the time your targets were
generated, BUFFER_SIZE
was a compile-time constant specific
to your version of the programs.
Functions Gets()
and gets()
have no way to
determine whether their destination buffers are large enough to store
the string they read. They simply copy sequences of bytes, possibly
overrunning the bounds of the storage allocated at the destinations.
If the string typed by the user and read by getbuf
is
sufficiently short, it is clear that getbuf
will return 1.
Typically an error occurs if you input a long string.
Program rtarget
will have the same behavior. Overrunning
the buffer typically causes the program state to be corrupted, leading
to a memory access error (i.e., a segmentation fault).
Your task is to be more clever with the strings you feed
ctarget
and rtarget
so that they do more
interesting things. These are called exploit strings.
Both ctarget
and rtarget
take several
different command line arguments:
-h
: Print list of possible command line arguments-q
: Don’t send results to the grading server (for
working offline)-i FILE
: Supply input from a file, rather than from the
terminalYour exploit strings will typically contain byte values that do not
correspond to the ASCII values for printing characters. The program
hex2raw
will enable you to generate these raw
strings. See Using hex2raw
for
more information on how to use hex2raw
.
Your exploit string must not contain byte value 0x0a
at
any intermediate position, since this is the ASCII code for newline
('\n'
). When Gets
encounters this byte, it
will assume you intended to terminate the string.
hex2raw
expects two-digit hex values separated by one or
more white spaces. So if you want to create a byte with a hex value of
0, you need to write it as 00
. To create the input
0xdeadbeef
you should pass ef be ad de
to
hex2raw
(note the reversal required for little-endian byte
ordering).
For the first three phases, your exploit strings will attack
ctarget
. This program is set up in a way that the stack
positions will be consistent from one run to the next and so that data
on the stack can be treated as executable code. These features make the
program vulnerable to attacks where the exploit strings contain the byte
encodings of executable code.
For phase 1, you will not inject new code. Instead, your exploit string will redirect the program to execute an existing procedure
Function getbuf
is called within ctarget
by
a function test
having the following C code:
void test()
{
int val;
= getbuf();
val ("No exploit. Getbuf returned 0x%x\n", val);
printf}
When getbuf
executes its return statement (line 5 of
getbuf
), the program ordinarily resumes execution within
function test
(at line 5 of this function). We want to
change this behavior. Within the file ctarget
, there is
code for a function touch1
having the following C
representation:
void touch1()
{
= 1; /* Part of validation protocol */
vlevel ("Touch1!: You called touch1()\n");
printf(1);
validate(0);
exit}
Your task is to get ctarget
to execute the code for
touch1
when getbuf
executes its return
statement, rather than returning to test
. Note that your
exploit string may also corrupt parts of the stack not directly related
to this stage, but this will not cause a problem, since
touch1
causes the program to exit directly.
All the information you need to devise your exploit string for
this level can be determined by examining a disassembled version of
ctarget
. Use objdump -d ctarget
to get this
dissembled version.
The idea is to position a byte representation of the starting
address for touch1
so that the ret
instruction
at the end of the code for getbuf
will transfer control to
touch1
.
Be careful about byte ordering.
You might want to use gdb
to step the program
through the last few instructions of getbuf
to make sure it
is doing the right thing.
The placement of buf
within the stack frame for
getbuf
depends on the value of compile-time constant
BUFFER_SIZE
, as well the stack allocation strategy used by
gcc
. You will need to examine the disassembled code to
determine its position.
Phase 2 involves injecting a small amount of code as part of your exploit string.
Within the file ctarget
there is code for a function
touch2
having the following C representation:
void touch2(unsigned val)
{
= 2; /* Part of validation protocol */
vlevel if (val == cookie)
{
("Touch2!: You called touch2(0x%.8x)\n", val);
printf(2);
validate}
else
{
("Misfire: You called touch2(0x%.8x)\n", val);
printf(2);
fail}
(0);
exit}
Your task is to get ctarget
to execute the code for
touch2
rather than returning to test
. In this
case, however, you must make it appear to touch2
as if you
have passed your cookie as its argument.
You will want to position a byte representation of the address of
your injected code in such a way that ret
instruction at
the end of the code for getbuf
will transfer control to
it.
Recall that the first argument to a function is passed in
register %rdi
.
Your injected code should set the register to your cookie, and
then use a ret
instruction to transfer control to the first
instruction in touch2
.
Do not attempt to use jmp
or call
instructions in your exploit code. The encodings of destination
addresses for these instructions are difficult to formulate. Use
ret
instructions for all transfers of control, even when
you are not returning from a call.
See the discussion in Generating Byte Codes on how to use tools to generate the byte-level representations of instruction sequences.
Phase 3 also involves a code injection attack, but passing a string as argument.
Within the file ctarget
there is code for functions
hexmatch
and touch3
having the following C
representations:
/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char *sval)
{
char cbuf[110];
/* Make position of check string unpredictable */
char *s = cbuf + random() % 100;
(s, "%.8x", val);
sprintfreturn strncmp(sval, s, 9) == 0;
}
void touch3(char *sval)
{
= 3; /* Part of validation protocol */
vlevel if (hexmatch(cookie, sval))
{
("Touch3!: You called touch3(\"%s\")\n", sval);
printf(3);
validate}
else
{
("Misfire: You called touch3(\"%s\")\n", sval);
printf(3);
fail}
(0);
exit}
Your task is to get ctarget
to execute the code for
touch3
rather than returning to test
. You must
make it appear to touch3
as if you have passed a string
representation of your cookie as its argument.
You will need to include a string representation of your cookie
in your exploit string. The string should consist of the eight
hexadecimal digits (ordered from most to least significant) without a
leading “0x
.”
Recall that a string is represented in C as a sequence of bytes
followed by a byte with value 0. Type man ascii
on any
Linux machine to see the byte representations of the characters you
need.
Your injected code should set register %rdi
to the
address of this string.
When functions hexmatch
and strncmp
are
called, they push data onto the stack, overwriting portions of memory
that held the buffer used by getbuf
. As a result, you will
need to be careful where you place the string representation of your
cookie.
Performing code-injection attacks on program rtarget
is
much more difficult than it is for ctarget
, because it uses
two techniques to thwart such attacks:
Fortunately for you (though unfortunately in general), clever people
have devised strategies for getting useful things done in a program by
executing existing code, rather than injecting new code. The most
general form of this is referred to as return-oriented
programming (ROP)2. The strategy with ROP is to
identify byte sequences within an existing program that consist of one
or more instructions followed by the instruction ret
. Such
a segment is referred to as a gadget. The figure below
illustrates how the stack can be set up to execute a sequence of n gadgets. In this figure, the stack
contains a sequence of gadget addresses. Each gadget consists of a
series of instruction bytes, with the final one being 0xc3
,
encoding the ret
instruction. When the program executes a
ret
instruction starting with this configuration, it will
initiate a chain of gadget executions, with the ret
instruction at the end of each gadget causing the program to jump to the
beginning of the next.
A gadget can make use of code corresponding to assembly-language
statements generated by the compiler, especially ones at the ends of
functions. In practice, there may be some useful gadgets of this form,
but not enough to implement many important operations. For example, it
is highly unlikely that a compiled function would have
popq %rdi
as its last instruction before ret
.
Fortunately, with a byte-oriented instruction set, such as x86-64, a
gadget can often be found by extracting patterns from other parts of the
instruction byte sequence.
For example, one version of rtarget
contains code
generated for the following C function:
void setval_210(unsigned *p)
{
*p = 3347663060U;
}
The chances of this function being useful for attacking a system seem pretty slim. But, the disassembled machine code for this function shows an interesting byte sequence:
0000000000400f15 <setval_210>:
400f15: c7 07 d4 48 89 c7 movl $0xc78948d4,(%rdi)
400f1b: c3 retq
The byte sequence 48 89 c7
encodes the instruction
movq %rax, %rdi
. (See the tables below for the encodings of
useful movq
instructions.) This sequence is followed by
byte value c3
, which encodes the ret
instruction. The function starts at address 0x400f15
, and
the sequence starts on the fourth byte of the function. Thus, this code
contains a gadget, having a starting address of 0x400f18
,
that will copy the 64-bit value in register %rax
to
register %rdi
.
Your code for rtarget
contains a number of functions
similar to the setval_210
function shown above in a region
we refer to as the gadget farm. Your job will be to identify
useful gadgets in the gadget farm and use these to perform attacks
similar to those you did in Phases 2 and 3.
Important: The gadget farm is demarcated by
functions start_farm
and end_farm
in your copy
of rtarget
. Do not attempt to construct gadgets from other
portions of the program code.
For Phase 4, you will repeat the attack of Phase
2, but do so on program rtarget
using gadgets from your
gadget farm. You can construct your solution using gadgets consisting of
the following instruction types, and using only the first eight x86-64
registers (%rax
–%rdi
).
movq
: The codes for these are shown in the first table
above.popq
: The codes for these are shown in the second table
above.ret
: This instruction is encoded by the single byte
0xc3
.nop
: This instruction (pronounced “no op,” which is
short for “no operation”) is encoded by the single byte
0x90
. Its only effect is to cause the instruction pointer
to be incremented by 1.rtarget
demarcated by the functions start_farm
and mid_farm
.popq
instruction, it will pop data
from the stack. As a result, your exploit string will contain a
combination of gadget addresses and data.Before you take on the Phase 5, pause to consider what you have
accomplished so far. In Phases 2 and 3, you caused a program to execute
machine code of your own design. If ctarget
had been a
network server, you could have injected your own code into a distant
machine. In Phase 4, you circumvented two of the main devices modern
systems use to thwart buffer overflow attacks. Although you did not
inject your own code, you were able inject a type of program that
operates by stitching together sequences of existing code. You have also
gotten 58/60 points for the lab (assuming you made a check-in
post).
That’s a very good score. If you have other pressing obligations
consider stopping right now.
Phase 5 requires you to do an ROP attack on rtarget
to
invoke function touch3
with a pointer to a string
representation of your cookie. The 2 points it is worth are not a true
measure of the effort it will likely require. Think of it as more an
extra credit problem for those who want to go beyond the normal
expectations for the course.
To solve Phase 5, you can use gadgets in the region of the code in
rtarget
demarcated by functions start_farm
and
end_farm
. In addition to the gadgets used in Phase 4, this
expanded farm includes the encodings of different movl
instructions, as shown in the third table above. The byte sequences in
this part of the farm also contain 2-byte instructions that serve as
functional no-ops, i.e., they do not change any register or
memory values. These include instructions, shown in the fourth table,
such as andb %al,%al
, that operate on the low-order bytes
of some of the registers but do not change their values. There’s also a
function that may be useful in its entirety.
movl
instruction has
on the upper 4 bytes of a register, as is described on page 183 of the
textbook.hex2raw
hex2raw
takes as input a hex-formatted string.
In this format, each byte value is represented by two hex digits. For
example, the string 012345
could be entered in hex format
as 30 31 32 33 34 35 00
. (Recall that the ASCII code for
decimal digit X
is 0x3X
, and that the end of a
string is indicated by a null byte.)
The hex characters you pass to hex2raw
should be
separated by whitespace (blanks or newlines). We recommend separating
different parts of your exploit string with newlines while you’re
working on it. hex2raw
supports C-style block comments, so
you can mark off sections of your exploit string. For example:
48 c7 c1 f0 11 40 00 /* mov $0x40011f0,%rcx */
Be sure to leave space around both the starting and ending comment
strings (/*
, */
), so that the comments will be
properly ignored.
If you generate a hex-formatted exploit string in the file
ctarget.phase1
, you can apply the raw string to
ctarget
in several different ways:
hex2raw
.
cat ctarget.phase1 | ./hex2raw | ./ctarget
./hex2raw < ctarget.phase1 > ctarget.phase1.raw
./ctarget < ctarget.phase1.raw
gdb
:
gdb ctarget
(gdb) run < ctarget.phase1.raw
./hex2raw < ctarget.phase1 > ctarget.phase1.raw
./ctarget -i ctarget.phase1.raw
gdb
.Using gcc
as an assembler and objdump
as a
disassembler makes it convenient to generate the byte codes for
instruction sequences. For example, suppose you write a file
example.s
containing the following assembly code:
# Example of hand-generated assembly code
pushq $0xabcdef # Push value onto stack
addq $17,%rax # Add 17 to %rax
movl %eax,%edx # Copy lower 32 bits to %edx
The code can contain a mixture of instructions and data. Anything to
the right of a #
character is a comment.
gcc -c example.s
objdump -d example.o > example.d
example.d
contains the following:
example.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <.text>:
0: 68 ef cd ab 00 pushq $0xabcdef
5: 48 83 c0 11 add $0x11,%rax
9: 89 c2 mov %eax,%edx
The lines at the bottom show the machine code generated from the
assembly language instructions. Each line has a hexadecimal number on
the left indicating the instruction’s starting address (starting with
0), while the hex digits after the :
character indicate the
byte codes for the instruction. Thus, we can see that the instruction
push 0xABCDEF
has hex-formatted byte code
68 ef cd ab 00
.
68 ef cd ab 00 48 83 c0 11 89 c2
hex2raw
to generate
an input string for the target programs. Alternatively, you can edit
example.d
to omit extraneous values and to contain C-style
comments for readability, yielding:
68 ef cd ab 00 /* pushq $0xabcdef */
48 83 c0 11 /* add $0x11,%rax */
89 c2 /* mov %eax,%edx */
This is also a valid input you can pass through hex2raw
before sending to one of the target programs.
This lab is graded out of 60 points, with 57 from the various phases as shown below and 3 from the check-in post.
Phase | Program | Method | Function | Points |
---|---|---|---|---|
1 | ctarget |
Code injection | touch1 |
21 |
2 | ctarget |
Code injection | touch2 |
21 |
3 | ctarget |
Code injection | touch3 |
9 |
4 | rtarget |
Return-oriented programming | touch2 |
4 |
5 | rtarget |
Return-oriented programming | touch3 |
2 |
When you have correctly solved one of the phases, your target program will indicate this. For example:
cat ctarget.phase2 | ./hex2raw | ./ctarget
Cookie: 0x1a7dd803
Type string:Touch2!: You called touch2(0x1a7dd803)
Valid solution for level 2 with target ctarget
PASS: Sent exploit string to server to be validated.
NICE JOB!
The server will test your exploit string to make sure it really works, and it will update the lab progress page indicating that your target has completed this phase.
You can view the progress page by pointing your Web browser at http://awb66333.mathcs.carleton.edu:15513/progress
There is no penalty for making mistakes in this lab. Feel free to
fire away at ctarget
and rtarget
with any
strings you like.
This lab is adapted from the Attack Lab developed for Computer Systems: A Programmer’s Perspective by Randal E. Bryant and David R. O’Hallaron, Carnegie Mellon University, available here.↩︎