Tuesday, June 25, 2013

Reversing Basics Part 3: Dynamically Reversing main()

By Robert Portvliet.

This is the thrid blog post in a four part series. In the first post, we reviewed the structure of a simple C program. In the second post, we reviewed how that program translated into assembly. In this post we’ll cover dynamic analysis of the main() function with GDB. We’ll run our simple program in GDB and take a look at what happens along the way.

As a refresher, make sure you've compiled our source code with the “-g” argument so debugging info is included in the compiled executable.

gcc -g -o basic basic.c



Ok, so first things first, let’s fire up GDB

gdb -q ./basic



This will leave you at the (gdb) prompt.

First off, I should mention that GDB uses AT&T syntax by default, so if you wish to use Intel syntax (as I do), you can change it by using the command:

set disassembly-flavor intel



Secondly, we’ll cover some of the basic commands in GDB, but if you want to see a bunch more type help and it will list them out for you. Even better is to type help, and a category, such as help show or help info. This will show you all the subcommands under that category.

A couple interesting things we can do with GDB first. We can use the disassemble command to disassemble parts of, or our entire program. We can also use shortcuts. Type just enough of the command that GDB knows what you want to do, and hit enter. GDB also has tab autocomplete; start typing your command, and then hit tab. GDB will either finish the command or show you the possible options.

Another thing we can do is list out our source code, using the list command. By default, the list command will print out 10 lines of source code from the position you give it. Our program is 15 lines long, so if we want to see it all in one shot, we need to change the default with the command set listsize 20. You can view the default list size with the command show listsize.

Here's the output of the list command, with line 1 specified as the starting point



Ok, before we run our program, let’s set a couple break points. Set one at the beginning of main(), which is 0x080484bc, and at the beginning of func() which is 0x08048484. We can set these as follows:

break *0x080484bc
break *0x08048484



The asterisk denotes that the argument passed to break is a memory address.
We can view our breakpoints by typing info break, and we can delete breakpoints by typing delete with no arguments to delete them all, or delete and the number of the breakpoint we want to delete. Such as delete 11. Instead of deleting them, we can simply enable and disable them by typing, enable or disable and the number of the breakpoint.

Here we show the output of disassemble func. Then, we are setting a break point at 0x08048484, the beginning of func(). Finally, we are viewing the breakpoints we have set.



One last thing, set the following to display the status of the EBP, ESP, and EIP registers each time we hit a breakpoint:

display /x $esp
display /x $ebp
display /x $eip



Ok, let’s run our program. We can run it by typing run, and we can give it an argument also. Let’s type run AAAA. The program will run and we’ll hit our first break point at 0x080484bc. We can confirm this by typing disas main, which shows that we’re on the first line of the main() function.

Here's line one of the main() function. It’s worth noting that the instruction the arrow points to in disassemble main has not executed yet. When you step through a program the arrow points to the next instruction to be executed, not the one that has just been executed.



We also see that EBP is at 0xbffff5c8, and ESP is at 0xbffff54c. So, we might ask ourselves, how large is our stack frame currently? Well, C8-4C=7C or 124 decimal. So, it looks like our stack frame is 124 bytes (right now).

We can also confirm this another way. Just type x/w $ebp-124 to view the address at 124 bytes down the stack from EBP. It turns out to be the address in ESP. We’re also still in the function prologue for main(), and ESP hasn’t been copied into EBP yet, so we’re actually not looking at the size of the stack frame for main() right now, we’re looking at the previous stack frame.

Let's confirm the size of the stack frame at the time of the first instruction in main().



Incidentally, you see the command x/w being used above to examine memory locations. Here is a quick (and incomplete) rundown on using the ‘x’ command.

  • x/s [location] Allows us to examine the location as a string
  • x/w [location] Allows us to examine the location as a WORD (4 bytes)
  • x/i [location] Allows us to examine the location as an instruction


Anyway, we could type ‘c’ or continue, and the program would run until it hit the next breakpoint, but we want to go one instruction at a time so we’re going to go with stepi instead.

So, once EBP gets pushed onto the stack in the first instruction, the value of ESP now becomes 0xbffff548, and C8-48=80 or 128 decimal, so our stack grew by 4 bytes or one DWORD (Remember, each instruction is 4 bytes).

In line 1 we push EBP onto the stack. We start with a stack size of 124 bytes (from the previous stack frame). Then EBP is pushed onto the stack, resulting in a stack size of 128 bytes (Each DWORD is 4 bytes):



In line 2 ESP gets copied into EBP as part of the function prologue, and our new stack frame is created. It’s flat as a pancake right now with EBP and ESP at the same memory address:



In line 3 the stack gets aligned in a 16 byte boundary, which also has the effect of moving ESP 8 bytes down the stack. For a quick explanation of stack alignment, check out this article.



In line 4 we move ESP 16 bytes down the stack to allocate some space we will need going forward



In line 5 we’re taking the string from 0x80485ce and pointing ESP at it. We can confirm what value is there, by using the command x/w 0x80485ce



We can confirm that ESP now points at 0x80485ce by using the command x/w $esp.



We’re going to use the nexti command to jump over the puts() function in this case. When we get to func() we’ll use stepi to dive in, but right now I’d like to get to the next instruction in main. That’s the difference between the two, nexti skips over functions, while stepi dives in.

By the way, puts() is just a compiler optimization of printf(), and the result of puts() was the string "Passing user input to func()" was printed to stdout.

Now that we’re past puts(), on lines 7 and 8 we move the value at ebp+0xc or 0xbffff749, which is argv[0] into EAX, then add 0x4 to EAX which gets us to 0xbffff758. This is argv[1] containing “AAAA”, the argument we passed to the program at runtime.



After line 7 has executed we can see that EAX points to 0xbffff749, which contains argv[0] or "/root/bo/basic":



After line 8 has executed we can see that EAX now points to 0xbffff758, which contains argv[1] or "AAAA":



Now on line 9 we get to an interesting instruction. As of line 8 EAX only points to argv[1], as we can prove by using the command x/s $eax. No luck getting back “AAAA” unless we do x/w $eax to get the memory address it points to (0xbffff758), and then use x/s 0xbffff758 to view the string at the memory address (“AAAA”).



However, after we use stepi to execute ‘mov eax, [eax]’ we can then use x/s $eax and the string “AAAA” truly is “in” EAX now. On line 10 we point ESP to the location of the contents of EAX. We’re basically pointing it at that memory address. We can verify this by using x/s $esp. We see that we get no string of “AAAA” back, but using x/w $esp gives us the memory address, 0xbffff758, that contains argv[1], our string of “A’s”.



In the next installment we’ll dive into the func()/ function and finish running through the rest of our simple program in a debugger. Hope you enjoyed!

Tuesday, June 18, 2013

Reversing Basics Part 2: Understanding the Assembly

By Robert Portvliet.

This is the second blog post in a four part series. In the first post, we reviewed the structure of a simple C program. In this installment, we will cover disassembling this program, and reviewing the Assembly code generated by the compiler, GCC.

First, let’s once again post our source code, just for reference purposes:

#include <stdio.h>
void func(char *ptr)
{
 char buf[10];
 printf("copy %d bytes of data to buf\n", strlen(ptr));
 strcpy(buf, ptr);
}

int main(int argc, char **argv)
{
 printf("Passing user input to func()\n");
 func(argv[1]);
 return 0;
}



Next, let’s compile it using GCC. I’m going to include the “–fno-stack-protector” switch, to avoid adding a stack canary. This will simplify things, and allow us to walk through a simple stack based buffer overflow in the 3rd blog post of our series.

Our command line is:

gcc -o basic basic.c -fno-stack-protector



This command takes the basic.c source code as input and outputs the compiled program ‘basic’. We built it; now let’s immediately take it apart :)

To disassemble our program, we’ll use objdump.

objdump -M intel -d basic | grep -A 15 main.:  



main()

Here, we’re disassembling the ‘basic’ program, specifying Intel syntax, and piping the output to grep, where we want the next 20 lines after we see “main.:

That gives us the following:

080484bc <main>:
 80484bc: 55                    push   ebp
 80484bd: 89 e5                 mov    ebp,esp
 80484bf: 83 e4 f0              and    esp,0xfffffff0
 80484c2: 83 ec 10              sub    esp,0x10
 80484c5: c7 04 24 ce 85 04 08  mov    DWORD PTR [esp],0x80485ce
 80484cc: e8 e7 fe ff ff        call   80483b8 <puts@plt>
 80484d1: 8b 45 0c              mov    eax,DWORD PTR [ebp+0xc]
 80484d4: 83 c0 04              add    eax,0x4
 80484d7: 8b 00                 mov    eax,DWORD PTR [eax]
 80484d9: 89 04 24              mov    DWORD PTR [esp],eax
 80484dc: e8 a3 ff ff ff        call   8048484 <func>
 80484e1: b8 00 00 00 00        mov    eax,0x0
 80484e6: c9                    leave  
 80484e7: c3                    ret    
 



So, let’s look at what’s going on here.

1. 80484bc: 55                    push   ebp
2. 80484bd: 89 e5                 mov    ebp,esp




These first two lines are the function prologue. The first pushes EBP, the base pointer, onto the stack. The second line copies ESP, the existing stack pointer from the previous stack frame, into EBP.

3. 80484bf: 83 e4 f0              and    esp,0xfffffff0




The next line aligns the stack to a 16-byte boundary. This is another instruction added by the compiler.

4. 80484c2: 83 ec 10              sub    esp,0x10




The fourth line “sub esp,0x10” allocates 16 bytes of space on the stack. Remember that the stack grows towards lower memory addresses, so allocations will use ‘sub’. This is carving out space for ‘buf’.

char buf[10];

5. 80484c5: c7 04 24 ce 85 04 08  mov    DWORD PTR [esp],0x80485ce




The fifth line moves the memory address 0x80485ce into the location pointed to by ESP. It denotes this as being a DWORD or double word, which is 4 bytes (32 bits). In this case, it is the string "Passing user input to func()\n” that is being moved here, as it is setting up for the puts() function.

Note: Whenever you see brackets around something in assembly, such as we see with ESP here, it’s pointing to the value (the actual data) in the memory address, that is being pointed to (in this case by ESP). This is called dereferencing a pointer.

We’re going to see the behavior of something being shoved into a memory address pointed to by ESP right before a function is called a few more times before we’re done. This is the equivalent of pushing a value onto the stack so we can work with it.

5. 80484cc: e8 e7 fe ff ff        call   80483b8 




The sixth line makes a call to puts(). You may be wondering why this is since it was not in our source code. The answer is that it is a compiler (GCC) optimization. If we were to specify –fno-builtin-printf when we compiled the program, we would see printf() being called here instead.

Anyway, back to puts(). Three things happen here, first ‘call’ puts the address of the next instruction (80484d1) on the stack so the program can return to it after puts() is done executing. Then it calls puts() which prints the string "Passing user input to func()\n" to stdout.

7. 80484d1: 8b 45 0c              mov    eax,DWORD PTR [ebp+0xc]
8. 80484d4: 83 c0 04              add    eax,0x4



Line seven moves the value at ebp+0xc (12 bytes down the stack from EBP) into EAX, then line eight moves us another 4 bytes. The best way to figure out what that is would be to look at how the stack frame is laid out.



As you can see, EBP+12 contains the function parameters that we are passing in from the command line. However, recall from our first blog post that the first element in the argv array, argv[0], is always the program itself, so we would want the second element in the array, argv[1], which is EBP+16.

Note: It’s helpful to remember that each memory address is 4 bytes (32bits), or one DWORD (double word). So, each line of assembly that we are covering here equates to 4 bytes.
9. 80484d7:  8b 00                 mov    eax,DWORD PTR [eax]
10. 80484d9: 89 04 24              mov    DWORD PTR [esp],eax
11. 80484dc: e8 a3 ff ff ff        call      8048484 <func>




The next three lines are best looked at together. Line nine dereferences the pointer that EAX points to (grabs the value at the memory address that EAX points to) and stores that value in the EAX register itself. This is our command line argument. Then, line ten moves that value from EAX into a location pointed to by ESP.

Lines nine and ten are setting up the function variables for func(), and then on line eleven we call func(). This which will once again place the address of the next instruction (80484e1) on the stack, execute func(), and then return to address of the next instruction, 80484e1, when func() is done.

Finally, lines 12-14 are basically clean up:

12. 80484e1: b8 00 00 00 00        mov    eax,0x0
13. 80484e6: c9                    leave  
14. 80484e7: c3                    ret    




In the first, it zeros out the EAX register, then in the next it invokes ‘leave’ which is basically a shortcut for the function epilogue, and equates to the following:

mov esp, ebp
pop ebp




Here we collapse our stack frame by moving EBP into ESP, and then pop EBP off the stack.

Finally, ‘ret’ pops the return address of the previous stack frame off the stack and returns to it.

func()

Now, let’s take a look at the func() function. Run the following to disassemble basic and grep for the next 20 lines following “func.:

objdump -M intel -d basic | grep -A20 func.:

08048484 :
 8048484: 55                    push   ebp
 8048485: 89 e5                 mov    ebp,esp
 8048487: 83 ec 28              sub    esp,0x28
 804848a: 8b 45 08              mov    eax,DWORD PTR [ebp+0x8]
 804848d: 89 04 24              mov    DWORD PTR [esp],eax
 8048490: e8 f3 fe ff ff        call   8048388 <strlen@plt>
 8048495: 89 c2                 mov    edx,eax
 8048497: b8 b0 85 04 08        mov    eax,0x80485b0
 804849c: 89 54 24 04           mov    DWORD PTR [esp+0x4],edx
 80484a0: 89 04 24              mov    DWORD PTR [esp],eax
 80484a3: e8 00 ff ff ff        call   80483a8 <printf@plt>
 80484a8: 8b 45 08              mov    eax,DWORD PTR [ebp+0x8]
 80484ab: 89 44 24 04           mov    DWORD PTR [esp+0x4],eax
 80484af: 8d 45 ee              lea    eax,[ebp-0x12]
 80484b2: 89 04 24              mov    DWORD PTR [esp],eax
 80484b5: e8 de fe ff ff        call   8048398 <strcpy@plt>
 80484ba: c9                    leave  
 80484bb: c3                    ret
   




Ok, let’s get started...

1. 8048484: 55                    push   ebp
2. 8048485: 89 e5                 mov    ebp,esp



Once again, the first two lines are the function prologue.

1. 8048484: 55                    push   ebp
3. 8048487: 83 ec 28              sub    esp,0x28



Next 0x28 bytes of space is allocated on the stack.

4. 804848a: 8b 45 08              mov    eax,DWORD PTR [ebp+0x8]



Then, the value at EBP+0x08 is loaded into the EAX register. To figure out what that is, let’s refer back to our stack diagram.



At EBP+8 (8 bytes ‘down’ the stack from EBP) is the ‘ptr’ pointer variable. So, it is loading the value of *ptr, (brackets mean ‘actual data at location in memory’, remember?), into the EAX register.

Note: The value in *ptr is the contents of argv[1]. When main() called func() it passed argv[1] as a parameter.

func(argv[1]);

5. 804848d: 89 04 24              mov    DWORD PTR [esp],eax



Here we see it setting up to do something with this data again, as Line 5 copies the contents of EAX (argv[1]), into the memory location pointed to by ESP. I smell a function call coming...

6. 8048490: e8 f3 fe ff ff        call   8048388 <strlen@plt>



Yup, there it is. So, here we are calling strlen() with ‘ptr’ as an argument. As mention in our first blog post, strlen() returns the length of a string. It iterates through until it hits a null byte.

So, look at the source code below. We’ve just done the part in red. Now, I’m guessing the value we get out of strlen() is headed for printf() to fill in the %d, don’t you?

printf("copy %d bytes of data to buf\n", strlen(ptr));



Ok, on to the next few lines...

7. 8048495: 89 c2                 mov    edx,eax



EAX is generally used to contain the output of a function, so the output of strlen() (the length of the data in *ptr) is now in EAX. It’s likely moving it into EDX for the moment, so EAX can be used for something else.

Note: EDX – “The data register is an extension to the accumulator (EAX). It is most useful for storing data related to the accumulator's current calculation.

8. 8048497: b8 b0 85 04 08        mov    eax,0x80485b0



Now that we’ve freed up EAX, we can move the memory address 0x80485b0 into it.

9. 804849c:  89 54 24 04           mov    DWORD PTR [esp+0x4],edx
10. 80484a0: 89 04 24              mov    DWORD PTR [esp],eax



Now, we’re moving the value in EDX (the output of strlen()) into ESP+4. That’s 4 bytes, or one memory address ‘down’ the stack from ESP. Then we move the contents of EAX (the memory address 0x80485b0) into a location pointed to by ESP.

So, two things here, what is now pointed to by ESP is the string "copy %d bytes of data to buf\n", and what is in ESP+4 is the value going into %d. Do you sense a function call being setup here? ;)

11. 80484a3: e8 00 ff ff ff        call   80483a8 <printf@plt>



Yup! Here we call printf() and print "copy %d bytes of data to buf\n" to stdout. The %d being the length (as determined by strlen()) of the argument we provided to the program at runtime.

Now that’s done. We have one more thing to do:

strcpy(buf, ptr);



12. 80484a8: 8b 45 08              mov    eax,DWORD PTR [ebp+0x8]



Line 12 moves the value at EBP+8, which is *ptr, into EAX.

13. 80484ab: 89 44 24 04           mov    DWORD PTR [esp+0x4],eax



Line 13 then moves it from EAX into the location pointed to by ESP+4.

14. 80484af: 8d 45 ee              lea    eax,[ebp-0x12]



Line 14 gives us a new instruction, LEA. LEA stands for ‘load effective address’, and it does just that. In this case, it loads the memory address (not the contents) at EBP-12 (0x12 bytes ‘up’ the stack from EBP) into EAX.

15. 80484b2: 89 04 24              mov    DWORD PTR [esp],eax



Then we move the contents of EAX into the address pointed to by ESP. Again, this is the equivalent of pushing it onto the stack, and means we are setting up for another function...

16. 80484b5: e8 de fe ff ff        call   8048398 <strcpy@plt>



Here we call strcpy() and copy the contents of ‘ptr’ into ‘buf’.

17. 80484ba: c9                    leave  
18. 80484bb: c3                    ret



The last two lines we are already familiar with from reviewing main(). The first instruction, leave, is basically a shortcut for the function epilogue, and equates to the following:

mov esp, ebp
pop ebp




Finally, ‘ret’ pops the return address of the previous stack frame off the stack and returns to it.

So, that wraps up part two of the series. In part 3 we’re going to cover dynamic analysis with GDB. Hope you enjoyed :)

Tuesday, June 11, 2013

Reversing Basics Part 1: Understanding the C Code

By Robert Portvliet.

This is the first in a series of blog posts which will cover basic reversing of a very simple program written in C. The first post will walk through the simple C program and explain how it is constructed and a bit about C syntax and functions. The second post will cover static analysis (disassembly) of our program, and the third will cover dynamic analysis, or walking through the program in GDB.

Ok, let’s get started…

Our program basically just takes one argument from the command line and copies it into a buffer. It has two functions, main() and func(), which we will review.

Let’s first take a look at the source code of our program, basic.c:

#include <stdio.h>

void func(char *ptr)
{
 char buf[10];
 printf("copy %d bytes of data to buf\n", strlen(ptr));
 strcpy(buf, ptr);
}

int main(int argc, char **argv)
{
 printf("Passing user input to func()\n");
 func(argv[1]);
 return 0;
}



We start our code by including the library ‘stdio.h’, which we’ll need in order to use ‘built in’ C functions like strlen(), strcpy(), and printf().

We then define the function, func(). This function does not return any value, so we define it as type ‘void’. The argument to func() is a pointer variable called *ptr, and since the data stored at the memory address it will point to is expected to be ASCII, it is defined as type ‘char’.

You may also notice the asterisk before it, this denotes that *ptr will hold the value at a given memory address.

We then allocate a buffer, named ‘buf’ (again, denoted as type ‘char’), with a size of 0x10 bytes and call the built in C function printf() to print the string "copy %d bytes of data to buf\n", to stdout.

One of the arguments made to printf() is a call to strlen(), another built in C function, which will return the length of the string pointed to by *ptr. It will iterate through the string until it reaches a null byte.

So, putting that together, printf() will print "copy %d bytes of data to buf\n" to stdout, and the output of strlen() will be shoved into the ‘%d’ format placeholder (%d is used since the output of strlen() will be a decimal value).

Lastly, strcpy() is called, which copies the contents of what address ‘ptr’ points to into the buffer (buf) that we had previously allocated. As we’ll see later, using strcpy() is a bad idea, as it doesn’t check the size of data it copies before it shoves it into a buffer. This can lead to a buffer overflow vulnerability if the size of the data being copied is larger than the buffer it’s being copied into.

Ok, so now on to the next function, main(). All C programs must have a main() function. It’s what calls the rest of the functions in the program, and generally dictates the program flow. In our case, main() will be returning an integer value so we define it as type ‘int’. Then, we give two arguments to the function. These are the arguments given at the command line (**argv), and the number of arguments given (argc). Since ‘argc’ is a numerical value, it’s of type ‘int’, and **argv is of type ‘char’ as we are expecting ASCII input.

The two asterisks in front of the array **argv denotes it as being a pointer to a pointer. As we will see in a minute, it will be passed as a argument to func() and thus **argv will point to *ptr, which is itself a pointer.

Ok, so next we call printf() again and print "Passing user input to func()\n" to stdout. Then we call func() and pass it the first command line argument, argv[1], as an argument. As I said, argv is also an array, and argv[0] (first spot in the array) denotes the program itself, so the first command line argument will always be denoted by argv[1].

We then return 0 to indicate the program was successful, and we’re done.

You may also note the brackets used in the program. These denote code blocks in C, and are necessary syntactically for the program to run. You can see how they enclose the code block for each of our functions.

So, this is the end the first blog post on this topic. The next will cover disassembling this simple program and reviewing the ASM code to understand what is occurring.

Hope you enjoyed :)