EXPLOIT DEVELOPMENT PART 4

12 min readFeb 15, 2023

MEMORY BASED OVERFLOW THEORY AND VARIETIES

In this chapter, memory-based overflow problems will be theoretically addressed through sample codes to provide an infrastructure for the applications that will be made (Source: M. Alparslan Akyıldız Uygulamalarla Siber Güvenliğe Giriş, …th. Edition?, Ankara, Gazi Kitabevi Press, Year ?, pp. …? ). The C codes to be described will be run on the KALI 1.0 32 bit operating system. The use of KALI 1.0 32 bit in the VMware environment can be recommended for the more stable operation of the samples. Let’s first look at the representation that shows how a simple C program, shown below, is positioned in the stack:

When looking at the sample code, the variables global_bss and global_data are seen, and the sum function that takes two integer inputs, which will return an integer, stands out. After calling the sum function in “main ()” function, the characters are printed on the screen with the function printf ().The representation of the code in the stack part of the memory is as follows:

While the main function is called, the address of the function to be returned to the STACK region after the process is completed is pushed as the return address. In the next step, the EBP is pushed as the old ebp value into the stack as the base pointer for the data to be placed in other sections. Function variables are placed in the stack. So, the variables 10 and 20 are placed in the stack. Then, the sum function is called. Therefore, when the sum function is called, the start address of the instructions in the printf line, where the program flow will continue after the sum function, is loaded as the return address to the stack region. Then the old ebp value is put into the the stack. Since the ebp is the base address and the data will go from the high address to the lower address as they are pushed into the stack, the value 10 is stored at the loction [ebp + 12] and the value 20, at the loction [ebp + 8]. The variable c is also placed at the loction [ebp-4] in the stack. As the operations are finished, the variables are poped from the inside of the stack and sent to the specified points with the help of the eip.

BUFFER OVERFLOW

Now, what bof is will be explained ((Source: M. Alparslan Akyıldız Uygulamalarla Siber Güvenliğe Giriş, pp. …? ). Since the memory control is left to the user in the programs written in the C programming language and its derivatives, due to some errors in coding, a memory overflow may occur as a result of writing more data to a memory-separated place. This kind of the overflows which cause the program to crash or its flow to change is called memory-based stack overflow vulnerability. The figure below shows symbolically how a program with a bof vulnerability runs:

Let’s have a look at the code C code below:

When looking at the C code, it seems that it receives a name from the command line of the program and sends the address of the name to the registration function by pointer, and a directory of 8 bytes is opened in the function, and the name is copied to the directory. The program is compiled as follows:

Let’s have a look at the code C code below:

When an an 8 bytes of input is given to the program, it works without any problems. When a 16 bytes of input is given, the error above was received due to stack-based overflow.

The program is analyzed on KALI 32-bit operating system with the gdb as follows:

When the program is running, when the EIP is looked at, it is observed that

When looking at the stack, it is seen that the stack part is also filled as follows:

A memory-based overflow vulnerability occurred as a result of the failure to create an 8 byte directory in the above-written C program and check the input. Taking advantage of this vulnerability, the reverse engineer can write an exploit using the appropriate shellcode by changing the program flow.

Exploit code is a general name given to the codes written for purposes such as seizing the target operating system, upgrading authority at the target, or leaving the target denial of service. Shellcode is a set of codse translated into the machine language used to run code on the target system. In the LAB studybelow, in the light of the information described above, the program flow will be changed and the program will be directed to a different address location.

Changing the Program Flow

In this lab study, basic working mechanism of buffer overflow will be explained. Let’s check out the C program, the source code of which is shown below:

Looking at the code, it is seen that the newer_work function is not called in the normal program flow. The program is compiled with the gcc as follows. The stack protection is turned off and the length of the preferred stack blocks are set to 2 bytes

With the gdb, the program is analyzed as follows:

When 12 “A”s (‘\ x41’) and 4 “B” (‘\ x42’) are given as input to the program, the following overflow occurs. The register values are as follows:

It is observed that the eip value can be controlled after the overflow. For a better understanding of the overflow fact, Let’s examine the following representative the stack output:

First the return address is pushed in the stack, then the old value of ebp is pushed and a 8-byte field is left in the stack for the buffer array. At this point, the 12 byte field is filled with ‘\ x41’, then the eip, with ‘\ x42’. Since the source code was seen in this application, it was known that the field to be indicated with the eip would be filled 12 bytes after. If the source code was not known, first of all, it needed to be checked whether or not the program failed with the different inputs from the program entry in order to understand whether or not there was an overflow problem. For example, let’s assume that we have a file with the characters of different lengths as follows:

Let’s run the different entries in this file by giving them to the program:

When the program returns with an error, it is seen that there is a memory-based overflow problem. In its simplest terms, the process of testing the program by giving different inputs is called the FUZZING process. It is the first step of exploit development. It should then be observed that the eip value changes after how many bytes of data. It is called offset determination operation. This is the first second step of developing exploits. Let’s have a look at the output gdb below:

It is seen that the eip value has changed after 12-byte inputs in the program flow. That is, if the address of one of the commands in the memory is placed here after 12-byte data, the program flow will be directed to that side. In order to switch to a stack region where the shellcode will be normally placed, the adress of a command such as JMP ESP in the windows environment will be searched for and the program will be placed in the stack region, and then the appropriate shellcode will be run in the stack. In this application, since the basis of program directing will be explained, the program is opened with the GDB and the never_works function is disassembled. Then, the relevant address is shown with the eip and the program flow is directed to a function that will never be called. Due to the little endian structure, the address entry is written in reverse.

When the disassembled code is examined, it is seen that the prelog operation is performed in the first step. After allocating a 4-byte area in the stack, the address pointed by the stack pointer is set to 0x8048530. What happens at this address can be seen with the GDB as follows:

When puts is called with the comand call, the operation of printing on the screen will be done. If the starting address of the function can be put in the eip, the program flow will also be changed

As seen in the above process, when 12 A is given to the program as input, the stack overflows and the program flow is changed by printing the address of the never_work function on the eip.

FORMAT STRING VULNERABILITY

Format strings are used to control and display various variables. For example, the function int i; printf ( “% d”, i); is the printf format function in simple C code. The format functions use the format strings to convert the C data types into the string representation. The format strings control format functions. The following functions can be given as the examples of the format functions:

. Printf . Fprintf . Sprintf . Vfprintf . Snprintf . Vsprintf . Vsnprintf

The functions mentioned above are variadic functions. They accept the variable numbers of arguments. The arguments are expected to be placed in the stack. The input format string decides the number of arguments to be read in the stack region. For example, a 1-argument area is allocated for the argument specified by the %s. Format string vulnerabilities can occur when the user uses the format strings as input. With this method, the program can be crashed. Information can be disclosed by displaying the memory. Random data can be written on the memory. For a better understanding of the subject, the code written below will be examined.

In this lab study, the working logic and settling-into-the-stack mechanism of the format string will be shown on a simple C code: a theoretical and practical application of the format string vulnerability will be made.

When the code written below is examined, it is seen that k1 and k2 which keep the character addresses in the two pointer arrays are defined, and a program is then written to print the first argument entered from the command line onto the screen.

The program is compiled as follows:

The argument given as input to the program is printed on the screen as seen. For example, when the printf function is used, the arguments written in the function are placed on the stack as follows:

At this point, it is possible to disclose addressess and even run codes with the entries like %s using the format string vulnerability. When the program is run and the % s% s% s% s is given as input, the output is as follows:

As can be seen, the characters in the character string marked with the pointers k1 and k2 are printed on the screen using the format string vulnerability. The issue can be better understood if the program is analyzed with the gdb. A breakpoint is set at the main function.

After running the program using the input % s% s% s%s with the command run, the main function is disassembled, and when looking inside the memory regions marked by the addresses placed in the stack, the characters marked by the string pointers are seen

When the inside of the stack is checked by proceeding with the command s, it can be seen that the format strings are seated in the stack:

The character addresses shown in the previous output are placed in the stack. The commands indicated by the first address in the stack are displayed as follows:

Since the edi will show the address at which the writing operation will be done, when looking at the string expression in the address to be put in the edi, the %s%s%s%s is seen.

Looking at the stack, it is understood that the characters indicated by k1 for the first %s and k2 for the second %s will be printed on the screen. In other words, the code snippet, which needs to normally work as printf (argv [1]) by making use of vulnerability, will run as printf (“% s,% s), k1, k2); and print the character strings in the addresses that need to not be seen on the screen. The process is done as follows by calling the printf function with the command call:

HEAP OVERFLOW DISPLAYING

The codes Malloc and Realloc are the C codes that dynamically make space in the memory. The dynamically allocated space is allocated by the heap. A possible overflow in the heap may cause problems such as remote code execution and crashing. When the sample program below is examined, the overflow to occur on the heap will be seen:

In the above code, 16 bytes of space are reserved for the buffer character pointer in the heap region, then the first argument entered from the command line is placed in this region and the region is emptied with the command free. When the program is compiled and executed, if the command line argument is given as an input with a value greater than 16 bytes, the program fails due to an overflow in the heap region.

When the analysis of the program is done with the gdb, the changes during the overflow are observed as follows:

The error message is seen due to the overflow of the heap. Looking at the eip and edi values, it is seen that the program is directed to the libc side to get an error.