aboutsummaryrefslogtreecommitdiff
path: root/blog/cpp-compiler/index.org
blob: 1e2f802b61358b6952b25458c39313928d2f996f (plain) (blame)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
#+title: The C++ Compiler
#+date: 2018-11-28
#+description: Learn basics about the C++ compilation process.
#+filetags: :dev:

* A Brief Introduction
[[https://en.wikipedia.org/wiki/C%2B%2B][C++]] is a general-purpose
programming language with object-oriented, generic, and functional
features in addition to facilities for low-level memory manipulation.

The source code, shown in the snippet below, must be compiled before it
can be executed. There are many steps and intricacies to the compilation
process, and this post was a personal exercise to learn and remember as
much information as I can.

#+begin_src cpp
#include <iostream>

int main()
{
    std::cout << "Hello, world!\n";
}
#+end_src

** Compilation Process
*** An Overview
Compiling C++ projects is a frustrating task most days. Seemingly
nonexistent errors keeping your program from successfully compiling can
be annoying (especially since you know you wrote it perfectly the first
time, right?).

I'm learning more and more about C++ these days and decided to write
this concept down so that I can cement it even further in my own head.
However, C++ is not the only compiled language. Check out
[[https://en.wikipedia.org/wiki/Compiled_language][the Wikipedia entry
for compiled languages]] for more examples of compiled languages.

I'll start with a wonderful, graphical way to conceptualize the C++
compiler. View
[[https://web.archive.org/web/20190419035048/http://faculty.cs.niu.edu/~mcmahon/CS241/Notes/compile.html][The
C++ Compilation Process]] by Kurt MacMahon, an NIU professor, to see the
graphic and an explanation. The goal of the compilation process is to
take the C++ code and produce a shared library, dynamic library, or an
executable file.

** Compilation Phases
Let's break down the compilation process. There are four major steps to
compiling C++ code.

*** Step 1
The first step is to expand the source code file to meet all
dependencies. The C++ preprocessor includes the code from all the header
files, such as =#include <iostream>=. Now, what does that mean? The
previous example includes the =iostream= header. This tells the computer
that you want to use the =iostream= standard library, which contains
classes and functions written in the core language. This specific header
allows you to manipulate input/output streams. After all this, you'll
end up which a temporary file that contains the expanded source code.

In the example of the C++ code above, the =iostream= class would be
included in the expanded code.

*** Step 2
After the code is expanded, the compiler comes into play. The compiler
takes the C++ code and converts this code into the assembly language,
understood by the platform. You can see this in action if you head over
to the [[https://godbolt.org][GodBolt Compiler Explorer]], which shows
C++ being converted into assembly dynamically.

For example, the =Hello, world!= code snippet above compiles into the
following assembly code:

#+begin_src asm
.LC0:
        .string "Hello, world!\n"
main:
        push    rbp
        mov     rbp, rsp
        mov     esi, OFFSET FLAT:.LC0
        mov     edi, OFFSET FLAT:_ZSt4cout
        call    std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*)
        mov     eax, 0
        pop     rbp
        ret
__static_initialization_and_destruction_0(int, int):
        push    rbp
        mov     rbp, rsp
        sub     rsp, 16
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        cmp     DWORD PTR [rbp-4], 1
        jne     .L5
        cmp     DWORD PTR [rbp-8], 65535
        jne     .L5
        mov     edi, OFFSET FLAT:_ZStL8__ioinit
        call    std::ios_base::Init::Init() [complete object constructor]
        mov     edx, OFFSET FLAT:__dso_handle
        mov     esi, OFFSET FLAT:_ZStL8__ioinit
        mov     edi, OFFSET FLAT:_ZNSt8ios_base4InitD1Ev
        call    __cxa_atexit
.L5:
        nop
        leave
        ret
_GLOBAL__sub_I_main:
        push    rbp
        mov     rbp, rsp
        mov     esi, 65535
        mov     edi, 1
        call    __static_initialization_and_destruction_0(int, int)
        pop     rbp
        ret
#+end_src

*** Step 3
Third, the assembly code generated by the compiler is assembled into the
object code for the platform. Essentially, this is when the compiler
takes the assembly code and assembles it into machine code in a binary
format. After researching this online, I figured out that a lot of
compilers will allow you to stop compilation at this step. This would be
useful for compiling each source code file separately. This saves time
later if a single file changes; only that file needs to be recompiled.

*** Step 4
Finally, the object code file generated by the assembler is linked
together with the object code files for any library functions used to
produce a shared library, dynamic library, or an executable file. It
replaces all references to undefined symbols with the correct addresses.