shithub: scc

Download patch

ref: 25e676d87fef3efa5cd4101ce7f955f477564cfc
parent: 08f9c9fbdba14ce22efed411a31d6d536ea1aa2a
author: Roberto E. Vargas Caballero <k0ga@shike2.com>
date: Fri Feb 2 15:55:20 EST 2018

[doc] Add documentation about myro

--- /dev/null
+++ b/doc/myro.txt
@@ -1,0 +1,179 @@
+Object File Format
+------------------
+
+The object file format is designed to be the simplest format that covers
+all the needs of many modern programming languages, with sufficient support
+for hand written assembly. All the types are little endian.
+
+File Format
+-----------
+
+	+== Header ======+
+	| signature      | 32 bit
+	+----------------+
+	| format str     | 32 bit
+	|                |
+	+----------------+
+	| entrypoint     | 64 bit
+	|                |
+	+----------------+
+	| stringtab size | 64 bit
+	|                |
+	+----------------+
+	| section size   | 64 bit
+	| 		 |
+	+----------------+
+	| symtab size    | 64 bit
+	|                |
+	+----------------+
+	| reloctab size  | 64 bit
+	|                |
+	+== Metadata ====+
+	| strings...     |
+	| ....           |
+	+----------------+
+	| sections...    |
+	| ...            |
+	|----------------+
+	| symbols....    |
+	| ...            |
+	+----------------+
+	| relocations... |
+	| ...            |
+	+== Data ========+
+	| data...        |
+	| ...            |
+	+================+
+
+The file is composed of three components: The header, the metadata, and
+the data. The header begins with a signature, containing the four bytes
+"uobj", identifying this file as a unified object. It is followed by
+a string offste with a format description (it may be used to indicate
+file format version, architecture, abi, ...) .This is followed by the
+size of the string table, the size of the section table, the size of
+the symbol table, and the size of the relocation table.
+
+Metadata: Strings
+-----------------
+
+The string table directly follows the header. It contains an array of strings.
+Each string is a sequence of bytes terminated by a zero byte. A string may
+contain any characters other than the zero byte. Any reference to a string
+is done using an offset of 32 bits into the string table. If it is needed
+to indicate a "no string" then the value 0FFFFFFFFH may be used.
+
+Metadata: Sections
+------------------
+
+The section table follows the string table. The section table defines where
+data in a program goes.
+
+	+== Sect ========+
+	| str 		 | 32 bit
+	+----------------+
+	| flags          | 16 bit
+	+----------------+
+	| fill value     |  8 bit
+	+----------------+
+	| aligment       |  8 bit
+	+----------------+
+	| offset         | 64 bit
+	+----------------+
+	| len            | 64 bit
+	+----------------+
+
+All the files must defined at least 5 sections, numbered 1 through 5,
+which are implcitly included in every binary:
+
+	.text		SprotRread | SprotWrite | Sload | Sfile | SprotExec
+	.data		SprotRread | SprotWrite | Sload | Sfile
+	.bss		SprotRread | SprotWrite | Sload
+	.rodata		SprotRread | Sload | Sfile
+	.blob		Sblod      | Sfile
+
+A program may have at most 65,535 sections. Sections have the followign flags;
+
+	SprotRead	= 1 << 0
+	SprotWrite	= 1 << 1
+	SprotExec	= 1 << 2
+	Sload		= 1 << 3
+	Sfile		= 1 << 4
+	Sabsolute	= 1 << 5
+	Sblob		= 1 << 6
+
+Blob section. This is not loaded into the program memory. It may be used
+for debug info, tagging the binary, and other similar uses.
+
+Metadata: Symbols
+-----------------
+
+The symbol table follows the string table. The symbol table contains an array
+of symbol defs. Each symbol has the following structure:
+
+
+	+== Sym =========+
+	| str name	 | 32 bit
+	+----------------+
+	| str type       | 32 bit
+	+----------------+
+	| section id     | 8 bit
+	+----------------+
+	| flags          | 8 bit
+	+----------------+
+	| offset         | 64 bit
+	|                |
+	+----------------+
+	| len            | 64 bit
+	|                |
+	+----------------+
+
+A symbol is 24 bytes in size.
+
+The string is an offset into the string table, pointing to the start
+of the string. The kind describes where in the output the data goes
+and what its role is. The offset describes where, relative to the start
+of the data, the symbol begins. The length describes how many bytes it is.
+
+Currently, there's one flag supported:
+
+	1 << 1:	Deduplicate the symbol.
+	1 << 2: Common storage for the symbol.
+	1 << 3: external symbol
+	1 << 4: undefined symbol
+
+Metadata: Relocations
+----------------------
+
+The relocations follow the symbol table. Each relocation has the
+following structure:
+
+
+	+== Reloc =======+
+	| 0 | symbol id  | 32 bit
+	| 1 | section id |
+	+----------------+
+	| flags          | 8 bit
+	+----------------+
+	| rel size       | 8 bit
+	+----------------+
+	| mask size      | 8 bit
+	+----------------+
+	| mask shift     | 8 bit
+	+----------------+
+	| offset         | 64 bit
+	|                |
+	+----------------+
+
+Relocations write the appropriate value into the offset requested.
+The offset is relative to the base of the section where the symbol
+is defined.
+
+The flags may be:
+
+	Rabsolute   = 1 << 0
+	Roverflow   = 1 << 1
+
+Data
+----
+
+It's just data. What do you want?