The article shows the concepts of memory layout, malloc, pointer, array, 2D array, and void* in C language.
Memory layout
A program consists of the text segment and the data segment.
- The text segment includes executable instructions.
- The data segment is more complex than the text segment.
Fig.1 below is the classic memory layout of a program.
- system: stores command-line arguments.
- text: contains executable instructions.
- data, bss: global variable and static variable. The former stores initialized data, and the latter stores the uninitialized data.
- stack: local variable in function. All local variables in a function appear in memory only when the function is called.
- heap: provide a space for you to dynamic memory allocation. This space is managed by the OS. You can allocate space and release through OS. We usually use
malloc()
to allocate space without initialization, however, you can usecalloc()
to allocate zero-filled space.
1D Array
Before introducing how to use malloc(), we need to review a basic array in C. In fact, an array is a fixed block of memories. For example, we can declare an int array {10, 20, 30} before runtime.
sizeof(int): 4
list[0]: 000000000061FE14
list[1]: 000000000061FE18
list[2]: 000000000061FE1C
Observe that list+1
does not mean list+1byte
! Instead, list+1
means list+sizeof(int)
i.e., list+4bytes
.
The decision of how many bytes C language offsets is the type of variable.
2D Array
How about a 2D array?
sizeof(int): 4
list[0][0]: 000000000061FE00 list[0][1]:000000000061FE04
list[1][0]: 000000000061FE08 list[1][1]:000000000061FE0C
list[2][0]: 000000000061FE10 list[2][1]:000000000061FE14
A 2D array is more complex than a 1D array.
&list[2][1] = *(list+2)+1list = 0x61FE00
list+2 = 0x61FE00+0x10 = 0x61FE10 (base+2*sizeof(int)*2)
*(list+2)+1 = 0x61FE10+0x04 = 0x61FE14 (base+1*sizeof(int))
2 does not mean 2 bytes, instead, 2 means 2* sizeof(int)*2
i.e., 16 (0x10).
It is very important that the type of variable affects the offset of a pointer.
Malloc() & Free()
It is common that we cannot declare an array in bss, data, or stack segment without knowing the length of the list before runtime. Nevertheless, we can use a system call to request our operating system (OS) to give us a contiguous space in the heap segment at runtime.
malloc() returns the first pointer of the contiguous spaces. We can release the same space by calling free(). Notice that free() releases the same spaces you allocate. That is, if you allocate 64 bytes at 0x10, you can call free(0x10) to release 64 bytes — you cannot call free(0x14). There is no function in the standard C library that is able to return the number of bytes you allocate by given the first pointer of the spaces you allocated. However, you shouldn’t worry about that free() does know or does not know how many bytes you allocate.
- malloc(b) allocates b-byte contiguous memory by requesting OS (if need) and returns the first address of the space you allocated. The return type is void* because it is hardly necessary for the type of pointer.
- If there is not sufficient space, malloc() returns NULL. Thus, you must check that the pointer that malloc() returned is non-null.
- free(p) deallocates memories by given the first pointer of contiguous memory.
void*
void*
is a type that can store any pointer. Well, why do we need int*
, char*
, and other pointer types? Wanting to fetch the value to which the pointer points, we do not know how many bytes we should fetch, for example, int
needs to read 4 bytes while char
needs to read only 1 byte.
In pure C language, void* can automatically convert to any type of pointers without explicitly converting. However, in C++, you must convert the type explicitly — but, why don’t you use new?
int* p = (int*)malloc(sizeof(int)); // ok
int* p = malloc(sizeof(int)); // ok and concise
You can not directly fetch the value that void* points to because the compiler does not know how many bytes it should fetch.
int num = 10;
void* p = malloc(sizeof(int));
*p = 10; // ERROR!!
Dynamically allocate the 1D array
1D array is a contiguous memory. For example, if we want to make an integer list that has 5 elements, we can allocate 5*sizeof(int) spaces.
Dynamically allocate the 2D array
How to allocate a 2D array with 3 rows and 5 columns?
We can allocate a list that has 3 rows, i.e., a list can store 3 pointers of integer. Each one of the integer pointers (Fig.3 left) points to a 1d array(Fig. 3 right).
As for release, how do free those spaces? We should free 4 pointers separately.
1. the int** that points to the list
2. the int* that points to list[0]
3. the int* that points to list[1]
4. the int* that points to list[2]
Notice that malloc() and free() may interact with OS by system call which spends more time and reduce performance.
When you fetch array2D[1][3], the program will fetch array2D[1] first, i.e., *(array2D+1)
, and then fetch array2D[1][3] *(*(array2D+1)+3)
Observe that if we want to make a 3*5 2D array, we not only need 15 spaces to store integers but need additional 3 spaces in order to store the pointer of integers. It is caused by our program that unknown what the size of one column is (except for static declaration)
static declaration:
dynamic declaration:
The C compiler sees list1 as int[4].
The C compiler sees list2 as an int pointer.
Dynamically allocate the contiguous 2D array
The previous section needs to call malloc() 4 times when allocating and needs to call free() 4 times when deallocating. Can we just allocate by calling single malloc() and deallocate by calling single free() in order to improve performance?
We can allocate all spaces we need at once. If we need a 3*5 2D array, we at least need sizeof(int)*3*5 bytes to store all elements. Besides, we need additional spaces to store integer pointers that point to the first element of rows, called overhead in the below example code. After allocating, we make the first 8 bytes point to list[0][0], and then make the second 8 bytes point to list[1][0], …
Assume that int** list = 0x00
. When we fetch list[1][3], the program:
- fetch
list+1
where 1 is 1*sizeof(int*) => 8bytes
i.e., fetch 0x00+0x08 = 0x08
0x08 stores the pointer to the list[1] i.e., 0x2C - list[1] = *list+1 = 0x2C (type: int*)
- jump to
*(list+1)+3
=0x2C+3
where 3 is 3*sizeof(int) => 12bytes
i.e., fetch 0x2C+0x0C = 0x38
0x38 stores the value of integer.
&list[1][3] = *(list+1)+3list = 0x00
list+1 = 0x00+0x08 = 0x08 (base+1*sizeof(int))
*(list+1) = 0x2C
*(list+1)+3 = 0x2C+0x0C = 0x38
Conclusion
- The type of variable affects the offset of a pointer.
- malloc() and free() are symmetric. You can only call free() as many times as you call malloc(). In addition, you can only free() the space which is allocated at runtime. i.e., you can NOT free the space in the bss segment, data segment, or stack segment.
- malloc() is a kind of system call so you should call them as little as possible.
void*
is just a type that can store a pointer. If you want to fetch the value pointed to by avoid*
, please convert the type first.- See full code at: https://github.com/ksw2000/Medium/tree/main/how-to-use-malloc-and-free-in-C