shithub: mc

Download patch

ref: fefdce5c957865ebcf2e30c99b5ff1b6e09e0efb
parent: 88608e748f11edcaf898275ce5d7b54cba7be9de
author: Ori Bernstein <ori@eigenstate.org>
date: Sat Jan 14 16:41:13 EST 2017

Start updating the language docs.

	Still out of date and incomplete, but we're moving on it
	again.

--- a/doc/lang.txt
+++ b/doc/lang.txt
@@ -6,23 +6,26 @@
 TABLE OF CONTENTS:
 
     1. ABOUT
-    2. LEXICAL CONVENTIONS
-    3. SYNTAX
-        3.1. Declarations
-        3.2. Literal Values
-        3.3. Control Constructs and Blocks
-        3.4. Expressions
-        3.5. Data Types
-        3.6. Type Inference
-        3.7. Generics
-        3.8. Traits
-        3.9. Packages and Uses
-    4. TOOLCHAIN
-    5. EXAMPLES
-    6. STYLE GUIDE
-    7. STANDARD LIBRARY
-    8. GRAMMAR
-    9. FUTURE DIRECTIONS
+    2. NOTATION
+        2.1. Grammar
+    3. LEXICAL CONVENTIONS
+        3.1. Summary
+    4. SYNTAX
+        4.1. Declarations
+        4.2. Literal Values
+        4.3. Control Constructs and Blocks
+        4.4. Expressions
+        4.5. Data Types
+        4.6. Type Inference
+        4.7. Generics
+        4.8. Traits
+        4.9. Packages and Uses
+    5. TOOLCHAIN
+    6. EXAMPLES
+    7. STYLE GUIDE
+    8. STANDARD LIBRARY
+    9. FULL GRAMMAR
+    10. FUTURE DIRECTIONS
 
 1. ABOUT:
 
@@ -29,67 +32,88 @@
         Myrddin is designed to be a simple, low-level programming
         language.  It is designed to provide the programmer with
         predictable behavior and a transparent compilation model,
-        while at the same time providing the benefits of strong
-        type checking, generics, type inference, and similar.
-        Myrddin is not a language designed to explore the forefront
-        of type theory or compiler technology. It is not a language
-        that is focused on guaranteeing perfect safety. Its focus
-        is on being a practical, small, fairly well defined, and
-        easy to understand language for work that needs to be close
-        to the hardware.
+        while at the same time providing the benefits of strong type
+        checking, generics, type inference, and similar.  Myrddin is
+        not a language designed to explore the forefront of type
+        theory or compiler technology. It is not a language that is
+        focused on guaranteeing perfect safety. Its focus is on being
+        a practical, small, fairly well defined, and easy to
+        understand language for work that needs to be close to the
+        hardware.
 
-        Myrddin is a computer language influenced strongly by C
-        and ML, with ideas from Rust, Go, C++, and numerous other
-        sources and resources.
+        Myrddin is a computer language influenced strongly by C and
+        ML, with ideas from too many other places to name. 
 
 
-2. LEXICAL CONVENTIONS:
+2. NOTATION:
 
-    The language is composed of several classes of tokens. There
-    are comments, identifiers, keywords, punctuation, and whitespace.
+    2.1. Grammar:
 
-    Comments begin with "/*" and end with "*/". They may nest.
+        Syntax is defined using an informal variant of EBNF.
 
-        /* this is a comment /* with another inside */ */
+            token:      /regex/ | "quoted"
+            prod:       prodname ":" [ expr ]
+            expr:       alt ( "|" alt )*
+            alt:        term term*
+            term:       prodname | token | group | opt | rep
+            group:      "(" expr ")" .
+            opt:        "[" expr "]" .
+            rep:        zerorep | onerep
+            zerorep:    expr "*"
+            onerep:     expr "+"
 
-    Identifiers begin with any alphabetic character or underscore,
-    and continue with any number of alphanumeric characters or
-    underscores. Currently the compiler places a limit of 1024
-    bytes on the length of the identifier.
+3. LEXICAL CONVENTIONS:
 
-        some_id_234__
+    3.1. Summary:
 
-    Keywords are a special class of identifier that is reserved
-    by the language and given a special meaning. The set of
-    keywords in Myrddin are as follows:
+        The language is composed of several classes of tokens. There are
+        comments, identifiers, keywords, punctuation, and whitespace.
 
-        castto          match
-        const           pkg
-        default         protect
-        elif            sizeof
-        else            struct
-        export          trait
-        extern          true
-        false           type
-        for             union
-        generic         use
-        goto            var
-        if              while
+        Comments begin with "/*" and end with "*/". They may nest.
 
+            /* this is a comment /* with another inside */ */
 
-    Literals are a direct representation of a data object within the source of
-    the program. There are several literals implemented within the language.
-    These are fully described in section 3.2 of this manual.
+        Identifiers begin with any alphabetic character or underscore, and
+        continue with alphanumeric characters or underscores. Currently the
+        compiler places a limit of 1024 bytes on the length of the identifier.
 
-    In the compiler, single semicolons (';') and newline (\x10)
-    characters are treated identically, and are therefore interchangeable.
-    They will both be referred to "endline"s throughout this manual.
+            some_id_234__
 
+        Keywords are a special class of identifier that is reserved by the
+        language and given a special meaning. The full set of keywords are
+        listed below. Their meanings will be covered later in this reference
+        manual.
 
-3. SYNTAX OVERVIEW:
+            $noret          _               break
+            castto          const           continue
+            elif            else            extern
+            false           for             generic
+            goto            if              impl
+            in              match           pkg
+            pkglocal        sizeof          struct
+            trait           true            type
+            union           use             var
+            void            while
 
-    3.1. Declarations:
+        Literals are a direct representation of a data object within the
+        source of the program. There are several literals implemented within
+        the language.  These are fully described in section 3.2 of this
+        manual. 
 
+        Single semicolons (';') and newline (\n) characters are synonymous and
+        interchangable. They both are used to mark the end of logical lines,
+        and will be uniformly referred to as line terminators.
+
+4. SYNTAX OVERVIEW:
+
+    4.1. Declarations:
+
+            decl:       attrs ("var" | "const" | "generic")  decllist
+            attrs:      ("exern" | "pkglocal" | "$noret")+
+            decllist:   declbody ("," declbody)*
+            declbody:   declcore ["=" expr]
+            declcore:   name [":" type
+
         A declaration consists of a declaration class (i.e., one
         of 'const', 'var', or 'generic'), followed by a declaration
         name, optionally followed by a type and assignment. One thing
@@ -101,8 +125,10 @@
             const:      Declares a constant value, which may not be
                         modified at run time. Constants must have
                         initializers defined.
+
             var:        Declares a variable value. This value may be
                         assigned to, copied from, and modified.
+
             generic:    Declares a specializable value. This value
                         has the same restrictions as a const, but
                         taking its address is not defined. The type
@@ -110,12 +136,21 @@
                         named in the declaration in order for their
                         substitution to be allowed.
 
-        In addition, there is one modifier allowed on declarations:
-        'extern'. Extern declarations are used to declare symbols from
-        another module which cannot be provided via the 'use' mechanism.
-        Typical uses would be to expose a function written in assembly. They
-        can also be used as a workaround for external dependencies.
+        In addition, declarations may accept a number of modifiers which
+        change the attributes of the declarations:
 
+            extern:     Declares a variable as having external linkage.
+                        Assigning a definition to this variable within the
+                        file that contains the extern definition is an error.
+
+            pkglocal:   Declares a variable which is local to the package.
+                        This variable may be used from other files that
+                        declare the same `pkg` namespace, but referring to
+                        it from outside the namespace is an error.
+
+            $noret:     Declares the function to which this is applied as
+                        a non-returning function.
+
         Examples:
 
             Declare a constant with a value 123. The type is not defined,
@@ -149,113 +184,138 @@
                     -> a + b + c
                 }
 
-    3.2. Literal Values
+    4.2. Literal Values
 
-        Integers literals are a sequence of digits, beginning with a
-        digit and possibly separated by underscores. They are of a
-        generic type, and can be used where any numeric type is
-        expected. They may be prefixed with "0x" to indicate that the
-        following number is a hexadecimal value, or 0b to indicate a
-        binary value. Decimal values are not prefixed, and octal values
-        are not supported.
+        4.2.1. Atomic Literals:
 
-            eg: 0x123_fff, 0b1111, 1234
+                literal:    strlit | chrlit | floatlit |
+                            boollit | voidlit | intlit |
+                            funclit | seqlit | tuplit
 
-        Floating-point literals are also a sequence of digits beginning with
-        a digit and possibly separated by underscores. They are also of a
-        generic type, and may be used whenever a floating-point type is
-        expected. Floating point literals are always in decimal, and
-        as of this writing, exponential notation is not supported[2]
+                strlit:     \"(char|escape)*\"
+                chrlit:     \'(char|escape)\'
+                intlit:     "0x" digits | "0o" digits | "0b" digits | digits
+                floatlit:   digit+"."digit+["e" digit+]
+                boollit:    "true"|"false"
+                voidlit:    "void"
 
-            eg: 123.456
+            Integers literals are a sequence of digits, beginning with a digit and
+            possibly separated by underscores. They are of a generic type, and can
+            be used where any numeric type is expected. They may be prefixed with
+            "0x" to indicate that the following number is a hexadecimal value, 0o
+            to indicate an octal value, or 0b to indicate a binary value. Decimal
+            values are not prefixed.
 
-        String literals represent a compact method of representing a byte
-        array. Any byte values are allowed in a string literal, and will be
-        spit out again by the compiler unmodified, with the exception of
-        escape sequences.
+                eg: 0x123_fff, 0b1111, 0o777, 1234
 
-        There are a number of escape sequences supported for both character
-        and string literals:
-            \n          newline
-            \r          carriage return
-            \t          tab
-            \b          backspace
-            \"          double quote
-            \'          single quote
-            \v          vertical tab
-            \\          single slash
-            \0          nul character
-            \xDD        single byte value, where DD are two hex digits.
+            Floating-point literals are also a sequence of digits beginning with a
+            digit and possibly separated by underscores. They are also of a
+            generic type, and may be used whenever a floating-point type is
+            expected. Floating point literals are always in decimal, but may
+            have an exponent attached to them.
 
-        String literals begin with a ", and continue to the next
-        unescaped ".
+                eg: 123.456, 10.0e7, 1_000.
 
-            eg: "foo\"bar"
+            String literals represent a compact method of representing a byte
+            array. Any byte values are allowed in a string literal, and will be
+            spit out again by the compiler unmodified, with the exception of
+            escape sequences.
 
-        Multiple consecutive string literals are implicitly merged to create
-        a single combined string literal. To allow a string literal to span
-        across multiple lines, the new line characters must be escaped.
-        
-            eg: "foo" \
-                "bar"
+            There are a number of escape sequences supported for both character
+            and string literals:
+                \n          newline
+                \r          carriage return
+                \t          tab
+                \b          backspace
+                \"          double quote
+                \'          single quote
+                \v          vertical tab
+                \\          single slash
+                \0          nul character
+                \xDD        single byte value, where DD are two hex digits.
+                \u{xxx}     unicode escape, emitted as utf8.
 
-        Character literals represent a single codepoint in the character
-        set. A character starts with a single quote, contains a single
-        codepoint worth of text, encoded either as an escape sequence
-        or in the input character set for the compiler (generally UTF8).
-        They share the same set of escape sequences as string literals.
+            String literals begin with a ", and continue to the next
+            unescaped ".
 
-            eg: 'א', '\n', '\u{1234}'
+                eg: "foo\"bar"
 
-        Boolean literals are either the keyword "true" or the keyword
-        "false".
+            Multiple consecutive string literals are implicitly merged to create
+            a single combined string literal. To allow a string literal to span
+            across multiple lines, the new line characters must be escaped.
+            
+                eg: "foo" \
+                    "bar"
 
-            eg: true, false
+            Character literals represent a single codepoint in the character
+            set. A character starts with a single quote, contains a single
+            codepoint worth of text, encoded either as an escape sequence
+            or in the input character set for the compiler (generally UTF8).
+            They share the same set of escape sequences as string literals.
 
-        Function literals describe a function. They begin with a '{',
-        followed by a newline-terminated argument list, followed by a
-        body and closing '}'. They will be described in more detail
-        later in this manual.
+                eg: 'א', '\n', '\u{1234}'
 
-            eg: {a : int, b
-                    -> a + b
-                }
+            Boolean literals are either the keyword "true" or the keyword
+            "false".
 
-        Sequence literals describe either an array or a structure
-        literal. They begin with a '[', followed by an initializer
-        sequence and closing ']'. For array literals, the initializer
-        sequence is either an indexed initializer sequence[4], or an
-        unindexed initializer sequence. For struct literals, the
-        initializer sequence is always a named initializer sequence.
+                eg: true, false
 
-        An unindexed initializer sequence is simply a comma separated
-        list of values. An indexed initializer sequence contains a
-        '#number=value' comma separated sequence, which indicates the
-        index of the array into which the value is inserted. A named
-        initializer sequence contains a comma separated list of
-        '.name=value' pairs.
+        4.2.2. Sequence and Tuple Literals:
+            
+            seqlit:     "[" structelts | arrayelts "]"
+            structelts: 
+            arrayelts:  
 
-            eg: [1,2,3], [#2=3, #1=2, #0=1], [.a = 42, .b="str"]
+            tuplit:     "(" tuplelts ")"
+            tupelts:    expr
 
-        A tuple literal is a parentheses separated list of values.
-        A single element tuple contains a trailing comma.
+        4.2.3. Function Literals
 
-            eg: (1,), (1,'b',"three")
+            Function literals describe a function. They begin with a '{',
+            followed by a newline-terminated argument list, followed by a
+            body and closing '}'. They will be described in more detail
+            later in this manual.
 
-        Finally, while strictly not a literal, it's not a control
-        flow construct either. Labels are identifiers preceded by
-        colons.
+                eg: {a : int, b
+                        -> a + b
+                    }
 
-            eg: :my_label
+            Sequence literals describe either an array or a structure
+            literal. They begin with a '[', followed by an initializer
+            sequence and closing ']'. For array literals, the initializer
+            sequence is either an indexed initializer sequence[4], or an
+            unindexed initializer sequence. For struct literals, the
+            initializer sequence is always a named initializer sequence.
 
-        They can be used as targets for gotos, as follows:
+            An unindexed initializer sequence is simply a comma separated
+            list of values. An indexed initializer sequence contains a
+            '#number=value' comma separated sequence, which indicates the
+            index of the array into which the value is inserted. A named
+            initializer sequence contains a comma separated list of
+            '.name=value' pairs.
 
-            goto my_label
+                eg: [1,2,3], [#2=3, #1=2, #0=1], [.a = 42, .b="str"]
 
-        the ':' is not part of the label name.
+            A tuple literal is a parentheses separated list of values.
+            A single element tuple contains a trailing comma.
 
-    3.3. Control Constructs and Blocks:
+                eg: (1,), (1,'b',"three")
 
+            Finally, while strictly not a literal, it's not a control
+            flow construct either. Labels are identifiers preceded by
+            colons.
+
+                eg: :my_label
+
+            They can be used as targets for gotos, as follows:
+
+                goto my_label
+
+            the ':' is not part of the label name.
+
+
+    4.3. Control Constructs and Blocks:
+
             if          for
             while       match
             goto
@@ -366,7 +426,7 @@
             ;;
 
 
-    3.4. Expressions:
+    4.4. Expressions:
 
         Myrddin expressions are relatively similar to expressions in C.  The
         operators are listed below in order of precedence, and a short
@@ -462,7 +522,7 @@
         on overflow. Right shift expressions fill with the sign bit on
         signed types, and fill with zeros on unsigned types.
 
-    3.5. Data Types:
+    4.5. Data Types:
 
         The language defines a number of built in primitive types. These
         are not keywords, and in fact live in a separate namespace from
@@ -473,7 +533,7 @@
         must be explicitly cast if you want to convert, and the casts must
         be of compatible types, as will be described later.
 
-            3.5.1. Primitive types:
+            4.5.1. Primitive types:
 
                     void
                     bool            char
@@ -491,6 +551,10 @@
                 This allows generics to not have to somehow work around void
                 being a toxic type. The void value is named `void`.
 
+                It is interesting to note that these types are not keywords,
+                but are instead merely predefined identifiers in the type
+                namespace.
+
                 bool is a type that can only hold true and false. It can be
                 assigned, tested for equality, and used in the various boolean
                 operators.
@@ -509,7 +573,7 @@
                     var y : float32     declare y as a 32 bit float
 
 
-            3.5.2. Composite types:
+            4.5.2. Composite types:
 
                     pointer
                     slice           array
@@ -533,7 +597,7 @@
                     foo[123]    type: array of 123 foo
                     foo[,]      type: slice of foo
 
-            3.5.3. Aggregate types:
+            4.5.3. Aggregate types:
 
                     tuple           struct
                     union
@@ -567,7 +631,7 @@
                     ;;
 
 
-            3.5.4. Magic types:
+            4.5.4. Magic types:
 
                     tyvar           typaram
                     tyname
@@ -597,7 +661,7 @@
                                                 named '@foo'.
 
 
-    3.6. Type Inference:
+    4.6. Type Inference:
 
         The myrddin type system is a system similar to the Hindley Milner
         system, however, types are not implicitly generalized. Instead, type
@@ -612,7 +676,7 @@
         It begins by initializing all leaf nodes with the most specific
         known type for them as follows:
 
-        3.6.1 Types for leaf nodes:
+        4.6.1 Types for leaf nodes:
 
             Variable        Type
             ----------------------
@@ -682,7 +746,7 @@
             <           <=              >               >=
 
 
-    3.7. Packages and Uses:
+    4.7. Packages and Uses:
 
             pkg     use
 
@@ -724,7 +788,7 @@
         them in the body of the code for readability. Scanning the export
         list is desirable from a readability perspective.
 
-4. TOOLCHAIN:
+5. TOOLCHAIN:
 
     The toolchain used is inspired by the Plan 9 toolchain in name. There
     is currently one compiler for x64, called '6m'. This compiler outputs
@@ -734,9 +798,9 @@
             -I path	Add 'path' to use search path
             -o	Output to outfile
 
-5. EXAMPLES:
+6. EXAMPLES:
 
-    5.1. Hello World:
+    6.1. Hello World:
 
             use std
             const main = {
@@ -746,7 +810,7 @@
 
         TODO: DESCRIBE CONSTRUCTS.
 
-    5.2. Conditions
+    6.2. Conditions
 
             use std
             const intmax = {a, b
@@ -765,7 +829,7 @@
 
         TODO: DESCRIBE CONSTRUCTS.
 
-    5.3. Looping
+    6.3. Looping
 
             use std
             const innerprod = {a, b
@@ -782,9 +846,9 @@
 
         TODO: DESCRIBE CONSTRUCTS.
 
-6. STYLE GUIDE:
+7. STYLE GUIDE:
 
-    6.1. Brevity:
+    7.1. Brevity:
 
         Myrddin is a simple language which aims to strip away abstraction when
         possible, and it is not well served by overly abstract or bulky code.
@@ -795,7 +859,7 @@
         Write for humans, not machines. Write linearly, so that an algorithm
         can be understood with minimal function-chasing.
 
-    6.2. Naming:
+    7.2. Naming:
 
         Names should be brief and evocative. A good name serves as a reminder
         to what the function does. For functions, a single verb is ideal. For
@@ -833,21 +897,17 @@
                 const length_mm = {;...} /* '_' disambiguates returned values.  */
                 const length_cm = {;...}
 
-    6.3. Collections:
+    7.3. Collections:
 
 
 
-7. STANDARD LIBRARY:
+8. STANDARD LIBRARY:
 
     This is documented separately.
 
-8. GRAMMAR:
+9. GRAMMAR:
 
-9. FUTURE DIRECTIONS:
+10. FUTURE DIRECTIONS:
 
 BUGS:
-
-[2] TODO: exponential notation.
-[4] TODO: currently the only sequence literal implemented is the
-          unindexed one