Lab 3: Attack Lab

Aaron Bauer

May 3, 2022

Lab 3: Buffer Overflow Attacks1

Introduction

This assignment involves generating a total of five attacks on two programs having different security vulnerabilities. Similar to lab 2, this will consist of analyzing pre-compiled executables and devising appropriate inputs. Outcomes you will gain from this lab include:

In this lab, you will gain firsthand experience with methods used to exploit security weaknesses in operating systems and network servers. Our purpose is to help you learn about the runtime operation of programs and to understand the nature of these security weaknesses so that you can avoid them when you write system code. We do not condone the use of any other form of attack to gain unauthorized access to any system resources.

Logistics

You will generate attacks (in the form of input strings) for target programs that are custom generated for you. They are generated on a Linux machine, so you will need to complete this lab on a Linux machine.

Your progress is automatically tracked like in lab 2, so there will not be anything you need to submit. Also like in lab 2, this automatic submission requires you be connected to the internet. You can run the target programs with a -q flag to prevent them from trying to contact the grading server if you are working offline, as in
    ./ctarget -q

You can obtain your files by pointing your Web browser at: http://awb66333.mathcs.carleton.edu:15513/

The server will build your files and return them to your browser in a tar file called targetK.tar, where K is the unique number of your target programs.

Note: It takes a few seconds to build and download your target, so please be patient.

Save the targetK.tar in a directory in which you plan to do your work (or upload it to mantis).
Then run this terminal command from that directory: tar -xvf targetK.tar. This will extract a directory targetK containing the files described below.

You should only download one set of files. If for some reason you download multiple targets, choose one target to work on and delete the rest.

The files in targetK include:

Target Programs

Both ctarget and rtarget read strings from standard input (the terminal). They do so with the function getbuf defined below:

unsigned getbuf()
{
    char buf[BUFFER_SIZE];
    Gets(buf);
    return 1;
}

The function Gets is similar to the standard library function gets—it reads a string from standard input (terminated by '\n' or end-of-file) and stores it (along with a null terminator) at the specified destination. In this code, you can see that the destination is an array buf, declared as having BUFFER_SIZE bytes. At the time your targets were generated, BUFFER_SIZE was a compile-time constant specific to your version of the programs.

Functions Gets() and gets() have no way to determine whether their destination buffers are large enough to store the string they read. They simply copy sequences of bytes, possibly overrunning the bounds of the storage allocated at the destinations.

If the string typed by the user and read by getbuf is sufficiently short, it is clear that getbuf will return 1. Typically an error occurs if you input a long string.

Program rtarget will have the same behavior. Overrunning the buffer typically causes the program state to be corrupted, leading to a memory access error (i.e., a segmentation fault).
Your task is to be more clever with the strings you feed ctarget and rtarget so that they do more interesting things. These are called exploit strings.

Both ctarget and rtarget take several different command line arguments:

Your exploit strings will typically contain byte values that do not correspond to the ASCII values for printing characters. The program hex2raw will enable you to generate these raw strings. See Using hex2raw for more information on how to use hex2raw.

Important Points About Input

Your exploit string must not contain byte value 0x0a at any intermediate position, since this is the ASCII code for newline ('\n'). When Gets encounters this byte, it will assume you intended to terminate the string.

hex2raw expects two-digit hex values separated by one or more white spaces. So if you want to create a byte with a hex value of 0, you need to write it as 00. To create the input 0xdeadbeef you should pass ef be ad de to hex2raw (note the reversal required for little-endian byte ordering).

Part I: Code Injection Attacks

For the first three phases, your exploit strings will attack ctarget. This program is set up in a way that the stack positions will be consistent from one run to the next and so that data on the stack can be treated as executable code. These features make the program vulnerable to attacks where the exploit strings contain the byte encodings of executable code.

Phase 1

For phase 1, you will not inject new code. Instead, your exploit string will redirect the program to execute an existing procedure

Function getbuf is called within ctarget by a function test having the following C code:

void test()
{
    int val;
    val = getbuf();
    printf("No exploit.  Getbuf returned 0x%x\n", val);
}

When getbuf executes its return statement (line 5 of getbuf), the program ordinarily resumes execution within function test (at line 5 of this function). We want to change this behavior. Within the file ctarget, there is code for a function touch1 having the following C representation:

void touch1()
{
    vlevel = 1; /* Part of validation protocol */
    printf("Touch1!: You called touch1()\n");
    validate(1);
    exit(0);
}

Your task is to get ctarget to execute the code for touch1 when getbuf executes its return statement, rather than returning to test. Note that your exploit string may also corrupt parts of the stack not directly related to this stage, but this will not cause a problem, since touch1 causes the program to exit directly.

Advice:

Phase 2

Phase 2 involves injecting a small amount of code as part of your exploit string.

Within the file ctarget there is code for a function touch2 having the following C representation:

void touch2(unsigned val)
{
    vlevel = 2; /* Part of validation protocol */
    if (val == cookie)
    {
        printf("Touch2!: You called touch2(0x%.8x)\n", val);
        validate(2);
    }
    else
    {
        printf("Misfire: You called touch2(0x%.8x)\n", val);
        fail(2);
    }
    exit(0);
}

Your task is to get ctarget to execute the code for touch2 rather than returning to test. In this case, however, you must make it appear to touch2 as if you have passed your cookie as its argument.

Advice

Phase 3

Phase 3 also involves a code injection attack, but passing a string as argument.

Within the file ctarget there is code for functions hexmatch and touch3 having the following C representations:

/* Compare string to hex represention of unsigned value */
int hexmatch(unsigned val, char *sval)
{
    char cbuf[110];
    /* Make position of check string unpredictable */
    char *s = cbuf + random() % 100;
    sprintf(s, "%.8x", val);
    return strncmp(sval, s, 9) == 0;
}

void touch3(char *sval)
{
    vlevel = 3; /* Part of validation protocol */
    if (hexmatch(cookie, sval))
    {
        printf("Touch3!: You called touch3(\"%s\")\n", sval);
        validate(3);
    }
    else
    {
        printf("Misfire: You called touch3(\"%s\")\n", sval);
        fail(3);
    }
    exit(0);
}

Your task is to get ctarget to execute the code for touch3 rather than returning to test. You must make it appear to touch3 as if you have passed a string representation of your cookie as its argument.

Advice

Part II: Return-Oriented Programming

Performing code-injection attacks on program rtarget is much more difficult than it is for ctarget, because it uses two techniques to thwart such attacks:

Fortunately for you (though unfortunately in general), clever people have devised strategies for getting useful things done in a program by executing existing code, rather than injecting new code. The most general form of this is referred to as return-oriented programming (ROP)2. The strategy with ROP is to identify byte sequences within an existing program that consist of one or more instructions followed by the instruction ret. Such a segment is referred to as a gadget. The figure below illustrates how the stack can be set up to execute a sequence of n gadgets. In this figure, the stack contains a sequence of gadget addresses. Each gadget consists of a series of instruction bytes, with the final one being 0xc3, encoding the ret instruction. When the program executes a ret instruction starting with this configuration, it will initiate a chain of gadget executions, with the ret instruction at the end of each gadget causing the program to jump to the beginning of the next.

Setting up sequence of gadgets for execution. Byte value 0xc3 encodes the ret instruction.

A gadget can make use of code corresponding to assembly-language statements generated by the compiler, especially ones at the ends of functions. In practice, there may be some useful gadgets of this form, but not enough to implement many important operations. For example, it is highly unlikely that a compiled function would have popq %rdi as its last instruction before ret. Fortunately, with a byte-oriented instruction set, such as x86-64, a gadget can often be found by extracting patterns from other parts of the instruction byte sequence.

For example, one version of rtarget contains code generated for the following C function:

void setval_210(unsigned *p)
{
    *p = 3347663060U;
}

The chances of this function being useful for attacking a system seem pretty slim. But, the disassembled machine code for this function shows an interesting byte sequence:

0000000000400f15 <setval_210>:
  400f15:	c7 07 d4 48 89 c7    	movl   $0xc78948d4,(%rdi)
  400f1b:	c3                   	retq   

The byte sequence 48 89 c7 encodes the instruction movq %rax, %rdi. (See the tables below for the encodings of useful movq instructions.) This sequence is followed by byte value c3, which encodes the ret instruction. The function starts at address 0x400f15, and the sequence starts on the fourth byte of the function. Thus, this code contains a gadget, having a starting address of 0x400f18, that will copy the 64-bit value in register %rax to register %rdi.

Your code for rtarget contains a number of functions similar to the setval_210 function shown above in a region we refer to as the gadget farm. Your job will be to identify useful gadgets in the gadget farm and use these to perform attacks similar to those you did in Phases 2 and 3.

Important: The gadget farm is demarcated by functions start_farm and end_farm in your copy of rtarget. Do not attempt to construct gadgets from other portions of the program code.

Encodings of movq instructions.
Encodings of popq instructions.
Encodings of movl instructions.
Encodings of 2-byte “no-operation” (no-op) instructions that functionally have no effect when executed.

Phase 4

For Phase 4, you will repeat the attack of Phase 2, but do so on program rtarget using gadgets from your gadget farm. You can construct your solution using gadgets consisting of the following instruction types, and using only the first eight x86-64 registers (%rax%rdi).

Advice

Phase 5

Before you take on the Phase 5, pause to consider what you have accomplished so far. In Phases 2 and 3, you caused a program to execute machine code of your own design. If ctarget had been a network server, you could have injected your own code into a distant machine. In Phase 4, you circumvented two of the main devices modern systems use to thwart buffer overflow attacks. Although you did not inject your own code, you were able inject a type of program that operates by stitching together sequences of existing code. You have also gotten 58/60 points for the lab (assuming you made a check-in post).
That’s a very good score. If you have other pressing obligations consider stopping right now.

Phase 5 requires you to do an ROP attack on rtarget to invoke function touch3 with a pointer to a string representation of your cookie. The 2 points it is worth are not a true measure of the effort it will likely require. Think of it as more an extra credit problem for those who want to go beyond the normal expectations for the course.

To solve Phase 5, you can use gadgets in the region of the code in rtarget demarcated by functions start_farm and end_farm. In addition to the gadgets used in Phase 4, this expanded farm includes the encodings of different movl instructions, as shown in the third table above. The byte sequences in this part of the farm also contain 2-byte instructions that serve as functional no-ops, i.e., they do not change any register or memory values. These include instructions, shown in the fourth table, such as andb %al,%al, that operate on the low-order bytes of some of the registers but do not change their values. There’s also a function that may be useful in its entirety.

Advice

Using hex2raw

hex2raw takes as input a hex-formatted string. In this format, each byte value is represented by two hex digits. For example, the string 012345 could be entered in hex format as 30 31 32 33 34 35 00. (Recall that the ASCII code for decimal digit X is 0x3X, and that the end of a string is indicated by a null byte.)

The hex characters you pass to hex2raw should be separated by whitespace (blanks or newlines). We recommend separating different parts of your exploit string with newlines while you’re working on it. hex2raw supports C-style block comments, so you can mark off sections of your exploit string. For example:

48 c7 c1 f0 11 40 00 /* mov    $0x40011f0,%rcx */

Be sure to leave space around both the starting and ending comment strings (/*, */), so that the comments will be properly ignored.

If you generate a hex-formatted exploit string in the file ctarget.phase1, you can apply the raw string to ctarget in several different ways:

Generating Byte Codes

Using gcc as an assembler and objdump as a disassembler makes it convenient to generate the byte codes for instruction sequences. For example, suppose you write a file example.s containing the following assembly code:

# Example of hand-generated assembly code
pushq   $0xabcdef          # Push value onto stack
addq    $17,%rax           # Add 17 to %rax
movl    %eax,%edx          # Copy lower 32 bits to %edx

The code can contain a mixture of instructions and data. Anything to the right of a # character is a comment.

You can now assemble and disassemble this file:
    gcc -c example.s
    objdump -d example.o > example.d
The generated file example.d contains the following:

example.o:     file format elf64-x86-64


Disassembly of section .text:

0000000000000000 <.text>:
   0:	68 ef cd ab 00       	pushq  $0xabcdef
   5:	48 83 c0 11          	add    $0x11,%rax
   9:	89 c2                	mov    %eax,%edx

The lines at the bottom show the machine code generated from the assembly language instructions. Each line has a hexadecimal number on the left indicating the instruction’s starting address (starting with 0), while the hex digits after the : character indicate the byte codes for the instruction. Thus, we can see that the instruction push 0xABCDEF has hex-formatted byte code 68 ef cd ab 00.

From this file, you can get the byte sequence for the code:
68 ef cd ab 00 48 83 c0 11 89 c2 
This string can then be passed through hex2raw to generate an input string for the target programs. Alternatively, you can edit example.d to omit extraneous values and to contain C-style comments for readability, yielding:
   68 ef cd ab 00   /* pushq  $0xabcdef  */
   48 83 c0 11      /* add    $0x11,%rax */
   89 c2            /* mov    %eax,%edx  */

This is also a valid input you can pass through hex2raw before sending to one of the target programs.

Grading

This lab is graded out of 60 points, with 57 from the various phases as shown below and 3 from the check-in post.

Phase Program Method Function Points
1 ctarget Code injection touch1 21
2 ctarget Code injection touch2 21
3 ctarget Code injection touch3 9
4 rtarget Return-oriented programming touch2 4
5 rtarget Return-oriented programming touch3 2

When you have correctly solved one of the phases, your target program will indicate this. For example:

    cat ctarget.phase2 | ./hex2raw | ./ctarget
    Cookie: 0x1a7dd803
    Type string:Touch2!: You called touch2(0x1a7dd803)
    Valid solution for level 2 with target ctarget
    PASS: Sent exploit string to server to be validated.
    NICE JOB!

The server will test your exploit string to make sure it really works, and it will update the lab progress page indicating that your target has completed this phase.

You can view the progress page by pointing your Web browser at http://awb66333.mathcs.carleton.edu:15513/progress

There is no penalty for making mistakes in this lab. Feel free to fire away at ctarget and rtarget with any strings you like.


  1. This lab is adapted from the Attack Lab developed for Computer Systems: A Programmer’s Perspective by Randal E. Bryant and David R. O’Hallaron, Carnegie Mellon University, available here.↩︎

    • Roemer, R., Buchanan, E., Shacham, H., & Savage, S. (2012). Return-oriented programming: Systems, languages, and applications. ACM Transactions on Information and System Security (TISSEC), 15(1), 1-34
    • Schwartz, E. J., Avgerinos, T., & Brumley, D. (2011). Q: Exploit hardening made easy. In 20th USENIX Security Symposium (USENIX Security 11).
    ↩︎