ref: 466cf20d3524b8e42edc333a6d2df2a01e99a95b
dir: /sys/src/cmd/venti/words/notes/
all data is big-endian on disk. arena layout: ArenaPart (first at offset PartBlank = 256kB in the disk file) magic[4] 0xA9E4A5E7 version[4] 3 blockSize[4] arenaBase[4] offset of first ArenaHead structure in the disk file the ArenaMap starts at the first block at offset >= PartBlank+512 bytes. it is a sequence of text lines /* * amap: n '\n' amapelem * n * n: u32int * amapelem: name '\t' astart '\t' asize '\n' * astart, asize: u64int */ the astart and astop are byte offsets in the disk file. they are the offsets to the ArenaHead and the end of the Arena block. ArenaHead [base points here in the C code] size bytes Clumps ClumpInfo blocks Arena Arena magic[4] 0xF2A14EAD version[4] 4 name[64] clumps[4] cclumps[4] ctime[4] wtime[4] used[8] uncsize[8] sealed[1] optional score[20] once sealed, the sha1 hash of every block from the ArenaHead to the Arena is checksummed, as though the final score in Arena were the zeroScore. strangely, the tail of the Arena block (the last one) is not included in the checksum (i.e., the unused data after the score). clumpMax = blocksize/ClumpInfoSize = blocksize/25 dirsize = ((clumps/clumpMax)+1) * blocksize want used+dirsize <= size want cclumps <= clumps want uncsize+clumps*ClumpSize+blocksize < used want ctime <= wtime clump info is stored packed into blocks in order. clump info moves forward through a block but the blocks themselves move backwards. so if cm=clumpMax and there are two blocks worth of clumpinfo, the blocks look like; [cm..2*cm-1] [0..cm-1] [Arena] with the blocks pushed right up against the Arena trailer. ArenaHead magic[4] 0xD15C4EAD version[4] = Arena.version name[64] blockSize[4] size[8] Clump magic[4] 0xD15CB10C (0 for an unused clump) type[1] size[2] uncsize[2] score[20] encoding[1] raw=1, compress=2 creator[4] time[4] ClumpInfo type[1] size[2] uncsize[2] score[20] the arenas are mapped into a single address space corresponding to the index that brings them together. if each arena has 100M bytes excluding the headers and there are 4 arenas, then there's 400M of index address space between them. index address space starts at 1M instead of 0, so the index addresses assigned to the first arena are 1M up to 101M, then 101M to 201M, etc. of course, the assignment of addresses has nothing to do with the index, but that's what they're called. the index is split into index sections, which are put on different disks to get parallelism of disk heads. each index section holds some number of hash buckets, each in its own disk block. collectively the index sections hold ix->buckets between them. the top 32-bits of the score is used to assign scores to buckets. div = ceil(2³² / ix->buckets) is the amount of 32-bit score space per bucket. to look up a block, take the top 32 bits of score and divide by div to get the bucket number. then look through the index section headers to figure out which index section has that bucket. then load that block from the index section. it's an IBucket. the IBucket has ib.n IEntry structures in it, sorted by score and then by type. do the lookup and get an IEntry. the ia.addr will be a logical address that you then use to get the ISect magic[4] 0xD15C5EC7 version[4] name[64] index[64] blockSize[4] blockBase[4] address in partition where bucket blocks start blocks[4] start[4] stop[4] stop - start <= blocks, but not necessarily == IEntry score[20] wtime[4] train[2] ia.addr[8] index address (see note above) ia.size[2] size of uncompressed block data ia.type[1] ia.blocks[1] number of blocks of clump on disk IBucket n[2] next[4] not sure; either 0 or inside [start,stop) for the ISect data[n*IEntrySize] final piece: all the disk partitions start with PartBlank=256kB of unused disk (presumably to avoid problems with boot sectors and layout tables and the like). actually the last 8k of the 256k (that is, at offset 248kB) can hold a venti config file to help during bootstrap of the venti file server.