Memory Layout of C program
Memory Layout of C program
Created: June 7, 2021 7:49 PM
I think you guys are already familiar with the fundamentals of C programming.
Most of us already learnt what is stack, heap, memory but university was all text books.
Now,let’s study about memory layout of C program.
Although we are not going to study about Assembly, we will introduce some Assembly Instructions and debugging to understand deeply.
Don’t worry.just a few things.You will get it!
Before we begin, you should familiar with Pointer, dereference and address-of operators.
Pointer, Dereference and address-of Operators
Pointer
Pointer is defined by prepending an asterisk (*) to a variable name.
Pointer points to memory address of its referenced variable.
Address-of operator
address-of operator is defined by prepending an ampersand (&) to a variable name.
address-of operator is used in conjunction with pointer.
address-of operator contains memory address of that variable.
Dereference operator
dereference operator is defined by prepending an asterisk (*) to a variable name.
dereference will return the data found in the address the pointer is pointing to.
[ pointer* will return memory address,while dereference* will return the data found in that address ]
Study the following code and output.
If you understand the program, write a similar one to ensure your knowledge.
//ptr-addr-deref.c
#include <stdlib.h>
#include <stdio.h>
int main()
{
int int_var;
int *int_ptr;
int_var = 1337;
int_ptr = &int_var;
printf("int_var is located at %p and contains %d\n",&int_var,int_var);
printf("int_ptr is located at %p , contains %p and points to %d\n",&int_ptr,int_ptr,*int_ptr); // here *int_ptr is dereference.
}
root@zus:Desktop/memory_layout # ./ptr-addr-deref
int_var is located at 0xff97758c and contains 1337
int_ptr is located at 0xff977588 , contains 0xff97758c and points to 1337
The Memory
When a program is executed,the Operating System creates address space for that program and maps the elements of program into memory.
Then compiled program’s segments are loaded into that newly created address space.
Everything about program’s requirements are included in that address space.
Now, we need to know a little bit about how a program’s memory is laid out.
Low memory addresses (0x08000000) :
------------------------
| text segment |
------------------------
| data segment |
------------------------
| bss segment |
------------------------
| heap segment | ↓ heap grows downward (to higher memory addresses)
------------------------
| unused memory |
------------------------
| stack segment | ↑ stack grows upward (to lower memory addresses)
------------------------
High memory addresses (0xbfffffff) :
Memory Segmentation
The compiled program’s memory is divided into 5 segments.
- .text
- .data
- .bss
- .heap
- .stack
text segment
.text segment is used to locate assembly instructions.
.text segment is read-only segment.
.text segment is fixed-size segment.
data segment
.data segment is used to store initialized global variables and initialized static variables.
.data segment is writable segment.
.data segment is fixed-size segment.
bss segment
.bss segment is used to store uninitialized global variables and uninitialized static variables.
.bss segment is writable segment.
.bss segment is fixed-size segment.
heap segment
The heap is First-In-First-Out (FIFO) data structure.
.heap segment can be directly controlled by a programmer.
.heap segment is not fixed-size segment.
.heap segment grows downward (to higher memory addresses).
stack segment
The stack is Last-In-First-Out (LIFO) also called First-In-Last-Out (FILO) data structure.
.stack segment is used to store local variables,function arguments and so on.
.stack segment is not fixed-size segment.
Study the following code and take a look at output memory addresses.
//memory_segments.c
//demo of memory segments
Memory segments
#include <stdio.h>
#include <stdlib.h>
int global_var; //this variable will store in bss segment because this is a global variable.
int global_initialized_var = 1337; //this variable will store in data segment because this is a global initialized variable.
void test_function()
{
int stack_var; //this variable will store in stack because this is a local function.
printf("[.stack segment]-----------------------------\n\n");
printf("test_function's stack_var is located at %p\n",&stack_var);
printf("\n");
}
int main()
{
int stack_var; //this variable will store in stack because this is a local function.
static int static_var; //this variable will store in bss segment because this is a static variable.
static int static_initialized_var = 100; //this variable will store in data segment because this is a static initialized variable.
int *heap_var_ptr;
heap_var_ptr = (int *)malloc(4);
//remember that we can control the heap segment and allocate?
printf("\n[.data segment]------------------------------\n\n");
printf("global_initialized_var is at %p\n",&global_initialized_var);
printf("static_initialized_var is at %p\n",&static_initialized_var);
printf("\n");
printf("[.bss segment]-------------------------------\n\n");
printf("global_var is at %p\n",&global_var);
printf("static_var is at %p\n",&static_var);
printf("\n");
printf("[.heap segment]------------------------------\n\n");
printf("heap_var is at %p\n",heap_var_ptr);
printf("\n");
test_function();
}
root@zus:Desktop/memory_layout # ./memory_segments
[.data segment]------------------------------
global_initialized_var is at 0x5659f028
static_initialized_var is at 0x5659f02c
[.bss segment]-------------------------------
global_var is at 0x5659f034
static_var is at 0x5659f038
[.heap segment]------------------------------
heap_var is at 0x56fa61a0
[.stack segment]-----------------------------
test_function's stack_var is located at 0xff967e9c
As we learnt,lower memory addresses are located at the higher places.
Check the output’s content.
Variables inside .data segment have the lower addresses than the variables’ addresses which are located within other segments.
Now, you understand how memory laid out when a program is executed. Keep reading.
Assembly
Assembly language is a low-level programming language for a computer.
We will use 32-bit Intel processor.
The x86 processor ( 32-bit processor ) has several registers, which are like internal variables for the processor.
Registers are responsible for manipulations that allow modern computers to function, and can be manipulated with assembly instructions.
Registers can be grouped into four categories.
- General purpose
- Segment
- Control
The most critical registers are the General registers.In these, there are further subdivisions into Data registers, Pointer registers, and Index registers.
General Purpose Registers
-
Data registers
EAX ( Accumulator )
EBX ( Base )
ECX ( Counter )
EDX ( Data )
-
Pointer registers
ESP ( Extended Stack Pointer )
EBP ( Extended Base Pointer )
EIP ( Extended Instruction Pointer )
-
Index registers
ESI **( Extended Source Index )
EDI **( Extended Destination Index )
If you are familiar with General Purpose Registers, you can examine memory addresses.
So,let’s skip some parts of assembly.
The assembly instructions in Intel syntax generally follow this style:
operation ,
The destination and source values will either be a register, a memory address, or a value.
operations :
mov operation will move a value from source to destination.
sub operation will subtract.
inc operation will increase.
ret operation will return address**.**
cmp operation will compare values.
jle operation, jump-if-less-than-or-equal, referring to the result of the previous comparison.
jmp operation will jump to another value.
For example :
804838b: 83 7d fc 09 cmp DWORD PTR [ebp-4],0x9
804838f: 7e 02 jle 8048393 <main+0x1f>
8048391: eb 13 jmp 80483a6 <main+0x32>
This assembly instruction will compare 4-byte value located at the address ebp-4 with number 9.
if that value is less than or equal to ( distinction ),it will jump to address 8048393,otherwise,jump to 80483a6.
P.S There left many operations.You should read yourself.
Debugger
It’s better to see for yourself.Let’s use a debugger.
Debuggers are used by programmers to step through compiled programs, examine program memory, and view processor registers.
Debuggers can view the execution from all angles, pause it, and change anything along the way.
We will use GNU debugger,GDB.
We would like to use intel syntax.
root@zus:Desktop/memory_layout # echo "set disassembly-flavor intel" > ~/.gdbinit
root@zus:Desktop/memory_layout # cat ~/.gdbinit
set disassembly-flavor intel
root@zus:Desktop/memory_layout #
Now,GDB is configured to use Intel syntax.
void stack_function(int a, int b, int c, int d)
{
}
int main()
{
stack_function(1, 2, 3, 4);
}
This simple program first declares a function named stack_function which expects 4 integer variables as arguments,then main function called that function.
Remember .stack segment stores function arguments.
Now, compile this program with GCC compiler.
The -g
flag can be used by the GCC compiler to include extra debugging information ,which will give GDB access to the source code.
To see the source code on GDB,use list
cmd.
root@zus:Desktop/memory_layout # gdb -q stack
Reading symbols from stack...
(gdb) list
1 void stack_function(int a, int b, int c, int d)
2 {
3
4 }
5
6 int main()
7 {
8 stack_function(1, 2, 3,4);
9 }
(gdb)
To disassemble in GDB, use disass
or disassemble
command.
Let’s disassemble the main function.
(gdb) disass main
Dump of assembler code for function main:
0x00001199 <+0>: push ebp
0x0000119a <+1>: mov ebp,esp
0x0000119c <+3>: call 0x11bd <__x86.get_pc_thunk.ax>
0x000011a1 <+8>: add eax,0x2e5f
0x000011a6 <+13>: push 0x4
0x000011a8 <+15>: push 0x3
0x000011aa <+17>: push 0x2
0x000011ac <+19>: push 0x1
0x000011ae <+21>: call 0x1189 <stack_function>
0x000011b3 <+26>: add esp,0x10
0x000011b6 <+29>: mov eax,0x0
0x000011bb <+34>: leave
0x000011bc <+35>: ret
End of assembler dump.
(gdb)
This is the dump of assembler code for main function.
We are not going to explain how to use debugger in depth.
For more information, read the official documentation.
Stack
Stack is Last-In-First-Out (LIFO) data structure.
The stack starts with a high address and grows towards lower memory addresses as values are added.
The Base Pointer points to the beginning (base) of the stack.
The Stack Pointer points to the top of the stack.
Data is written to the stack segment by using PUSH
instruction.
Data is removed from the stack segment by using POP
instruction.
Let’s see it.
Run the following program in GDB and learn how they really work.
//stack.c
void stack_function(int a, int b, int c, int d)
{
}
int main()
{
stack_function(1, 2, 3, 4);
}
root@zus:Desktop/memory_layout # gdb -q ./stack
Reading symbols from ./stack...
(gdb) disass main
Dump of assembler code for function main:
0x00001199 <+0>: push ebp
0x0000119a <+1>: mov ebp,esp
0x0000119c <+3>: call 0x11bd <__x86.get_pc_thunk.ax>
0x000011a1 <+8>: add eax,0x2e5f
0x000011a6 <+13>: push 0x4
0x000011a8 <+15>: push 0x3
0x000011aa <+17>: push 0x2
0x000011ac <+19>: push 0x1
0x000011ae <+21>: call 0x1189 <stack_function>
0x000011b3 <+26>: add esp,0x10
0x000011b6 <+29>: mov eax,0x0
0x000011bb <+34>: leave
0x000011bc <+35>: ret
End of assembler dump.
(gdb)
Since we call argument 1,2,3,4 in sequence, integer 1 (0x1) is pushed first, becomes the last item and the stack grows toward lower memory addresses.
Remember .stack segment is Last-In-First-Out (LIFO) and is used to store local variables,function arguments.
Heap
The heap is First-In-First-Out (FIFO) data structure.
The heap starts with a low address and grows downward to high memory addresses as values are added.
Blocks of memory in .heap segment can be allocated and used for whatever we need.
Allocating memory on the heap is done using the malloc()
function.
This function accepts size as its only argument and reserves that much space in .heap segment, return the address to the start of this memory as void pointer.
malloc function returns void pointer which points to newly created memory address ,so we need to typecast to appropriate data type.
The corresponding deallocation function is free()
function.
This function accepts a pointer as its only argument and free that memory space on .heap segment.
These functions are demonstrated below.
//heap.c
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
int mem_size = 20;
char *char_ptr;
char_ptr = (char *)malloc(mem_size); // typecast malloc's return address
printf("\n[+] Allocating %d bytes on heap.\n", mem_size);
strcpy(char_ptr,"Hello World!");
printf(" char_ptr points to ( %p ) ~> %s\n",char_ptr,char_ptr);
printf("\n[-] Freeing char_ptr's memory\n");
free(char_ptr);
printf(" char_ptr points to ( %p ) ~> %s\n\n",char_ptr,char_ptr);
return 0;
}
root@zus:Desktop/memory_layout # ./heap
[+] Allocating 20 bytes on heap.
char_ptr points to ( 0x5686c1a0 ) ~> Hello World!
[-] Freeing char_ptr's memory
char_ptr points to ( 0x5686c1a0 ) ~>
root@zus:Desktop/memory_layout #
Here we add some data to the heap segment using malloc()
function and then remove using free()
function.
The following part will study how the heap segment store data in memory.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
int main()
{
printf("\nHeap is FIFO [ First-In-First-Out ] data structure.\nHeap grows downward (towards higher addresses.)\n\nLow memory address\n\n");
int *int_ptr,*int_ptr1,*int_ptr2;
int memory_size;
memory_size = 10;
int_ptr = (int *)malloc(memory_size);
*int_ptr = 127;
printf("[ Allocated %d bytes of memory on heap ] : int_ptr (0x%x) ~> %d\n",memory_size,int_ptr,*int_ptr );
int_ptr1 = (int *)malloc(memory_size);
*int_ptr1 = 10;
printf("[ Allocated %d bytes of memory on heap ] : int_ptr1 (0x%x) ~> %d\n",memory_size,int_ptr1,*int_ptr1 );
int_ptr2 = (int *)malloc(memory_size);
*int_ptr2 = 1337;
printf("[ Allocated %d bytes of memory on heap ] : int_ptr2 (0x%x) ~> %d\n",memory_size,int_ptr2,*int_ptr2 );
printf("\nHigh memory address\n\n");
}
root@zus:Desktop/memory_layout # ./heap_FIFO
Heap is FIFO [ First-In-First-Out ] data structure.
Heap grows downward (towards higher addresses.)
Low memory address
[ Allocated 10 bytes of memory on heap ] : int_ptr (0x568055b0) ~> 127
[ Allocated 10 bytes of memory on heap ] : int_ptr1 (0x568055c0) ~> 10
[ Allocated 10 bytes of memory on heap ] : int_ptr2 (0x568055d0) ~> 1337
High memory address
root@zus:Desktop/memory_layout #
Take a look at those addresses (Highlight in red) on this output.
Every time we call malloc()
function, heap grows toward higher addresses.
These are all I understand.
Try Harder.