LLVM is a statically typed intermediate representation and an associated toolchain for manipulating, optimizing and converting this intermediate form into native code. LLVM code comes in two flavors, a binary bitcode format () and assembly (.ll). The command line tools llvm-dis and llvm-as can be used to convert between the two forms. We’ll mostly be working with the human readable LLVM assembly and will just refer to it casually as IR and reserve the word assembly to mean the native assembly that is the result of compilation. An important note is that the for LLVM bitcode starts with the magic two byte sequence ( 0x42 0x43 ) or “BC”.

    An LLVM module consists of a sequence of toplevel mutually scoped definitions of functions, globals, type declarations, and external declarations.

    A LLVM function consists of a sequence of basic blocks containing a sequence of instructions and assignment to local values. During compilation basic blocks will roughly correspond to labels in the native assembly output.

    First class types in LLVM align very closely with machine types. Alignment and platform specific sizes are detached from the type specification in the data layout for a module.

    While LLVM is normally generated procedurally we can also write it by hand. For example consider the following minimal LLVM IR example.

    What makes LLVM so compelling is it lets us write our assembly-like IR as if we had an infinite number of CPU registers and abstracts away the register allocation and instruction selection. LLVM IR also has the advantage of being mostly platform independent and retargatable, although there are some details about calling conventions, vectors, and pointer sizes which make it not entirely independent.

    As an integral part of Clang, LLVM is very well suited for compiling C-like languages, but it is nonetheless a very adequate toolchain for compiling both imperative and functional languages. Some notable languages using LLVM include:

    GHC has a LLVM compilation path that is enabled with the -fllvm flag. The library can be used to view the IR compilation artifacts.