shithub: mc

Info • Files • Log • Branches
ref: 0c14bdce7f2c20becd103d0c8d0641a038b5d83b
dir: /doc/compiler.txt/
                Structure of the Myrddin Compiler
                            Aug 2012
                          Ori Bernstein

TABLE OF CONTENTS:

    1. OVERVIEW
    2. PARSING
        2.1. Lexing
        2.2. AST Creation
        2.3. Type checking
        2.4. Generic Specialization
        2.5. Serialization
        2.6. Usefiles
    3. FLATTENING
        3.1. Flattening Conditionals
        3.2. Flattening Complex Expressions
        3.3. Building the Control Flow Graph
    4. OPTIMIZATION
        4.1. Constant Folding
    5. CODE GENERATION
        5.1. Instruction Selection
        5.2. Register Allocation
    6. TUTORIAL: ADDING A STATEMENT
        6.1. Stubbing in the node types
        6.2. Parsing
        6.3. Flattening
        6.4. Optimization
        6.5. Instruction Selection

1. OVERVIEW:

    The Myrddin compiler suite consists of a set of binaries, written in C,
    which translate Myrddin source code to the assembly most appropriate for
    the target platform, and subsequently invoke the native assembler on it.
    The linker is not invoked by the compiler, and the final output is an
    object file for the target platform.

    The compilers are named with a single character for the target platform,
    with a single character for the language being compiled. A table of the
    compilers and their names is below:

        Compiler        Platform
        -------------------------
        6m              x86-64


    The compilation is divided into a small number of phases. The first phase
    is parsing. The first phase is parsing, where the source code is first
    tokenized, parsed, and semantically checked. The second phase is the
    machine dependent tree flattening. In this phase, the tree is decomposed
    function by function into simple operations that are relatively close to
    the machine. Sizes are fixed, and all loops, if statements, etc are
    replaced with gotos. The next phase is a machine independent optimizer,
    which currenty does nothing other than simply folding trees. In the final
    phase, the instructions are selected and the registers are allocated.

    So, to recap, the phases are as follows:

        parse           Tokenize, parse and analyze the source.
        flatten         Rewrite the complex nodes into simpe ones
        opt             Optimize the flattened source trees
        gen             Generate the assembly code


2. PARSING:

    This phase takes in a source file, and outputs a tree that is guaranteed
    to be valid.

    2.1. Lexing:
        
        Lexing occurs in parse/tok.c. Because we desire to use this lexer from
        within yacc, the entry point to this code is in 'yylex()'. As required
        by yacc, 'yylex()' returns an integer defining the token type, and
        sets the 'tok' member of yylval to the token that was taken from the
        input stream. In addition, to allow for better error messages, the
        global variable 'curtok' is set to the value of 'yylval.tok'. This
        allows yyerror to print the last token that was seen.

        The tokens that are allowable are generated by Yacc from the '%token'
        definiitions in parse/gram.y, and are placed into the file
        'parse/gram.h'. The lexer and parser code is the only code that
        depends on these token constants.

        The lexer is initalized through 'tokinit(char *file)'. This function
        will open the file passed in, read all the data from it in one go
        and set up the internal data for the tokenizer. The tokenizing is then
        done while the whole file is in memory, which means that this code
        will work poorly on files that are larger than the address space
        available to the compiler. If this is a problem, you deserve all the
        pain that is caused.

        The file data is stored in the three global variables 'fidx', 'fbuf',
        and 'fbufsz'. The actual tokenization happens through 'toknext()' and
        its callees, which operate on these data structures character by
        character, matching the values read, and shoving them into the 'Tok'
        data structure.

    2.2. AST Creation:

        The parser used is a traditional Yacc based parser. It is generated
        from the source in parse/gram.y. The starting production is 'file',
        which fills in a global 'file' tree node. This 'file' tree node must
        be initialized before yyparse() is called.


    2.3. Type Checking:

        Type checking is done through unification of types. It's implemented
        in parse/infer.c. It proceeds through a simple unification algorithm,
        which is documented in lang.txt. As a result, only the internal
        details of this algorithm will be discussed here.

        The first step done is loading and resolving use files. This is
        deferred to the type checking phase for two reasons. First, we
        do not want to force tools to have all dependencies compiled if they
        use this parser, even though type full type checking is impossible
        until all usefiles are loaded. And second, this is when the
        information is actually needed.

        Next, the types declared in the package section are merged with the
        exported types, allowing us to start off with our type information as
        complete as possible, and making sure that the types of globals
        actually match up with the exported types.
        
        The next step is the actual type inference. We do a bottom up walk of
        the tree, unifying types as we go. There are subtleties with the
        member operator, however. Because the '.' operator is used for both
        member lookups and namespace lookups, before we descend into a node
        that has operator Omemb, we need to check if it's a namespaced name,
        or an actual member reference. If it is a namespaced name, we replace
        the expression with an Ovar expression. This check happens in the 
        'checkns()' function. Second, because we need to know the LHS of a
        member expression before we can check if the RHS is valid, and we
        are not guaranteed to know this at the first time that we see it, the
        expression is assumed to be valid, and this asumption is verified in
        a post-processing pass. Casts are validated in a deferred manner
        similarly.

        Generic variables are added to a list of generic callsites to
        specialize when they are seen in as a leaf of an Ovar node.

        The type inference, to this point, has only built up a mapping
        of types. So, for example, if we were to have the inferred types
        for the following set of statements:

            var a
            var b
            var c
            a = b
            c = b + 1

        We would have the mappings:

            $t0 -> $t1
            $t1 -> $t2
            $t2 -> int

        So, in the 'typesub()' function, we iterate over the entire tree,
        replacing every instance of a non-concrete type with the final
        mapped type. If a type does not map to a fully concrete type,
        this is where we error.

        FIXME: DESCRIBE HOW YOU FIXED GENERICS ONCE YOU FIX GENERICS.

    2.4. Generic Specialization:

        After type inference (well, technially, as the final step of it),
        we walk through the list of callsites that need instantiations
        of generics, and create a specialized generic instance for each of
        them. This specialization is done, unsurprisingly, in specialize.c,
        by the simple algorithm of cloning the entire tree that needs to
        be specialized, and walking over all nodes substituting the types
        that are replacing the type parameters.

    2.5. Serialization:

        Trees of all sorts can be serialized and deserialized from files,
        as long as they are fully typed. Trees containing type variables (ie,
        uninferred types) cannot be serialized, as type variables cannot be
        deserialized meaningfully.

        The format for this is only documented in the source, and is a
        straighforward dump of the trees to memory. It is constantly shifting,
        and will not reliably work between compiler versions.

    2.6. Usefiles:

        Usefiles are more or less files that consist of a single character tag
        that tells us what type of tree to deserialize.  Because serialized
        trees are compiler version dependant, so are usefiles.

3. FLATTENING:

    This phase is invoked repeatedly on each top level declaration that we
    want to generate code for. There is a good chance that this flattening
    phase should be made machine independent, and passed as a parameter
    a machine description describing known integer and pointer sizes, among
    other machine attributes. However, for now, it is machine dependent,
    and lives in 6/simp.c, implemented through the callees of simpnode(),
    such as simpif(), simploop(), and so on.

    The goal of flattening a tree is to take semantically involved constructs
    such as looping, and simplify things into something that is easy to
    generate code for, as well as something that is easier to analyze for
    optimization.

    3.1. Flattening Conditionals:

        All if statements, loops, and other complex constructs are simplified
        to jumps and conditional jumps. Loops are generally simplified from
        something that would look like this:

            loop
                init
                cond
                inc
                body

        To something that would look like this:

                init
                jmp cond
            .loop:
                body
                inc
            .cond:
                cjmp cond .loop .end
            .end:

        Boolean expressions are simplified to a location to jump to, as
        described in section 8.4 of the Dragon book[1].

    3.2. Flattening Complex Expressions:

        Complex expressions such as copying types larger than a single
        machine word, pulling members out of structures, emulated
        multiplication and division for larger integers sizes, and similar
        operations are reduced to trees that are expressible in terms of
        simple machine operations.  This is implemented in lval() and rval()
        in 6/simp.c, which reduce expressions to lvals (assignable
        expressions which may appear on the left hand side of an assignemnt)
        or rvals (value-yeilding expressions which may appear on the right
        side of an assignment.

        By the end of the simplification pass, the following operators should
        not be present in the trees:

            Obad Oret Opreinc Opostinc Opredec Opostdec Olor Oland Oaddeq
            Osubeq Omuleq Odiveq Omodeq Oboreq Obandeq Obxoreq Obsleq
            Obsreq Omemb Oslice Oidx Osize Numops Oucon Ouget Otup Oarr
            Oslbase Osllen Ocast


        After flattening, all nodes must either be pure, or impure roots.
        With the exception of assignments to a temporary, no inner nodes
        may be impure. For example, the following sequence of root nodes
        is valid, as all impure nodes are either roots, or children of
        rooted assignments to temporaries:

            Oadd
                Ovar "temp0"
                Olit int(123)
            Oasn
                Ovar "temp1"
                Ocall
                    Ovar "func"
                    Ovar "arg1"
                    Ovar "arg2"
            Ocall
                Ovar "std.put"
                Olit "some string\n"

        The following tree is invalid because the Ocall node is impure, but
        it is not the root node or the direct child of a rooted assignment:

            Oasn
                Ovar "result"
                Oadd
                    Ovar "temp0"
                    Oasn
                        Ovar "temp1"
                        Ocall
                            Ovar "somefunc"
                            Ovar "arg0"
                            Ovar "arg1"
                
        In order to make it valid, it must be transformed to the following
        form:

            Oasn
                Ovar "temp1"
                Ocall
                    Ovar "somefunc"
                    Ovar "arg0"
                    Ovar "arg1"
            Oasn
                Ovar "result"
                Oadd
                    Ovar "temp0"
                    Ovar "temp1"

        Note that Oasn, Oblit, and other assignment nodes may only appear as
        rooted nodes.


    3.3. Building the Control Flow Graph

        Once everything is simplified to relatively flat trees made of
        primitve operations, a control flow graph is built. The
        structureless list is split into basic blocks (sequences of
        instructions where the control flows straight through), and
        put into the 'Cfg' data structure defined in opt/opt.h. The
        fact that each Bb within the Cfg has linear flow makes it easier
        to reason about the code, and thus generate code and implement
        optimizations.

        This pass is machine independent, as it looks only at the leaders
        (ie, labels) and trailers (ie, jumps and conditional jumps) for
        blocks. There are no machine independent components to this process.

4. OPTIMIZATION:

    Currently, there is virtually no optimization done on the trees after
    flattening. The only optimization that is done is constant folding. All
    optimizations currently operate on a per-function basis, and preserve
    [or are required to update] the control flow graph.

    4.2. Constant Folding:

        Expressions with constant values are simplified algebraically. For
        example, the expression 'x*1' is simplified to simply 'x', '0/n' is
        simplified to '0', and so on.


5. CODE GENERATION:

    5.1. Instruction Selection:

        Instruction selection is done via a simple hand written bottom up pass
        over the tree. Common patterns such as scaled or offset indexing are
        recognized by the patterns, but no attempts at finding an optimal
        tiling are made. This is implemented in 6/isel.c and is very machine
        specific.

        The structure of the control flow graph is preserved by the
        instruction selection transformation, with each basic block in the
        flattened-node flow graph mirrored by an assembly node in the flow
        graph. This is because there is no need to make the assembly flow
        graph different when translating, and it is simpler to simply
        preserve the structure of the flow graph than to rebuild the
        structure from scratch.

    5.2. Register Allocation:

        Register allocation is done via the algorithm described in "Iterated
        Regster Coalescing", by Appel and George. As of the time of this
        writing, the register allocator does not yet implement overlapping
        register classes. This will be done as described in "A generalized
        algorithm for graph-coloring register allocation", by Smith, Ramsey,
        and Holloway.

        The full description will not be mirrored here, as the papers do a
        far more thorough job of describing the algorithm than I could.

6: TUTORIAL: ADDING A STATEMENT:

    6.1. Stubbing in the node types:

    6.2. Parsing:
    
    6.3. Flattening:

    6.4. Optimization:

    6.5. Instruction Selection:

[1] Aho, Sethi, Ullman: Compilers: Principles, Techniques, and Tools, 1988.
        ISBN 0-201-10088-6