ref: f31b4ce3b34f7191f7cfdae7b3d88e8e27e6b179
parent: 50cfd4075ac00a6047dbbedd73ee7867ac5d6889
author: Noam Preil <noam@pixelhero.dev>
date: Thu Sep 18 20:04:25 EDT 2025
sufficiently robust for testing!!!
--- a/arena.c
+++ b/arena.c
@@ -27,6 +27,8 @@
vtarenasync(VtArena *arena)
{
// First, check if the directory and the data log are in sync
+ fprint(2, "skipping arena sync, logic needs rethought...\n");
+ return 1;
usize n, m, corruptchain;
char *buf, *ci;
u32int block, perblock, off;
--- a/disk.c
+++ b/disk.c
@@ -353,7 +353,7 @@
nn = len - n;
if(nn > index.arena->blockremain)
nn = index.arena->blockremain;
- fprint(2, "WPTR block %d offset %d, n %d nn %d, block must be at least %d\n", index.arena->block, index.arena->offset, n, nn, n+nn);
+ fprint(2, "WPTR clump %d block %d offset %d, n %d nn %d, block must be at least %d\n", index.arena->arenastats.clumps, index.arena->block, index.arena->offset, n, nn, index.arena->offset+nn);
// Use however much space is available in the block, then move to the next
memcpy(wptr, buf + n, nn);
n += nn;
--- a/notebook
+++ b/notebook
@@ -3494,3 +3494,105 @@
fprint("appending to ...", key)
the later cacheunlock uses s_sect->cacheindex directly, not `key`, so this might just be a bad log (and bad error handling!) cachelookup is also using that cacheindex directly.
+
+Gotta reformat and reproduce now, though, since my write logs are inaccurate - well. Actually, probably didn't have to, but good practice.
+
+Easy to reproduce: `vac reformschem.png`, kill neoventi, reload neoventi, corruption is immediately detected. The question is, is this another read-side bug, or write corruption?
+
+% grep WPTR /dev/text > /tmp/wptr
+that should grab a full trace of writes...
+
+Yep. Now, compare those to addresses from corruption logs?
+
+Need to augment the logs, they're not directly comparable. Rerun this, but first, have both sides report exact address, clump number, block number, and offset within block.
+
+Ah, wait, confusing two concepts - venti _clumps_ may take multiple _blocks._ We get one WPTR per _block write_, one clump may issue multiple. Start by adding clump, that should make this clearer.
+
+WPTR clump 0 block 0 offset 38, n 0 nn 389, block must be at least 389
+WPTR clump 1 block 0 offset 465, n 0 nn 120, block must be at least 120
+WPTR clump 2 block 0 offset 623, n 0 nn 7569, block must be at least 7569
+WPTR clump 2 block 1 offset 0, n 7569 nn 623, block must be at least 8192
+WPTR clump 3 block 1 offset 661, n 0 nn 7531, block must be at least 7531
+WPTR clump 3 block 2 offset 0, n 7531 nn 661, block must be at least 8192
+WPTR clump 4 block 2 offset 699, n 0 nn 7493, block must be at least 7493
+WPTR clump 4 block 3 offset 0, n 7493 nn 699, block must be at least 8192
+
+...the "block must be at least" measurement is wrong, too.
+
+WPTR clump 0 block 0 offset 38, n 0 nn 389, block must be at least 427
+WPTR clump 1 block 0 offset 465, n 0 nn 120, block must be at least 585
+WPTR clump 2 block 0 offset 623, n 0 nn 7569, block must be at least 8192
+WPTR clump 2 block 1 offset 0, n 7569 nn 623, block must be at least 623
+WPTR clump 3 block 1 offset 661, n 0 nn 7531, block must be at least 8192
+WPTR clump 3 block 2 offset 0, n 7531 nn 661, block must be at least 661
+WPTR clump 4 block 2 offset 699, n 0 nn 7493, block must be at least 8192
+
+Okay.
+
+First clump is data offset 38 - we should probably report clump header offsets, though, since that's what read side will be noticing for logging - except, this log is for blocks, not clumps. So. Add one for clumps too? Eh, let's just xref this against the corruption logs for now.
+
+clump 1 is broken, at address 49228: arenarepair: magic is incorrect. If this block was not written by old-venti, you may have a problem.
+
+So, clump 0 shows as written correctly, but clump 1 is not.That address seems suspicious, though - clump 0 should be at address 0, and max clump shouldn't be that big. This looks like a read side bug!!
+
+...is it even actually clump 1? It's the second clump _checked_, but that loop starts at zero and goes to the difference between the clupms in arenastats and indexstats.
+
+Easier to conceptualize as "start at end of indexed, keep going".
+
+So addr = indexstats.used, m = indexstats.clumps, n = arenastats.clumps.
+
+Yeah, we're not keeping things in sync on repair correctly. If I disable arenasync, can I read back the file?
+
+Yes!! This isn't a write bug, it's a read bug! Love it!
+
+Okay, so - w'ere starting at the first clump, period. We should repair it _if not indexed_, but for now, let's just do it always; this code needs rewritten anyways. GoodEnough™ for now.
+
+We start at address zero, and clump zero. Is this accurate? Probably. But, why are we trusting clump info for the first clump? it's most likely wrong...
+
+...wait, if the clump is wrong, how can we even trust the clump info to have the right _size_? For now, disable this. Just see if we got it working...
+
+% vac organizer
+vac:047b9bacdf40fc9878dfcd4712817adbcf837804
+# Kill, switch to venti
+% vacfs
+# No such block exists
+
+....uh.
+
+# Switch to neoventi
+% vacfs $HASH
+% walk /n/vac
+
+Oh. Okay, so this is a compatibility issue. Is it maybe not actually indexing properly in venti on startup? Is it because the clumpinfo is missing?
+
+Try flushicache and flushdcache and then vacfs maybe?
+
+Regardless, if neoventi can write data, and can read data written by either it or venti, I don't really care enough to make venti work properly right this second. That's followup work.
+
+# Using venti,
+
+
+% vac reformschem.png
+vac:71040226a21604a39fcc04d111a17769f42600f5
+
+# Back to neoventi
+% vacfs vac:71040226a21604a39fcc04d111a17769f42600f5
+vacfs: vacfsopen: neoventi: read error: entry not found in bucket
+
+That tracks, but is also likely a venti bug, I already know it's writing out of order, and I haven't been flushing it before killing.
+
+...I think we're good, then?
+
+I mean, not _good_ good. Remaining steps:
+
+- Rewrite the arena sync logic entirely
+- Fix all TODOs / FIXMEs, grep for those
+- Write clump info to the directory, currently we're just not
+- Seal arenas
+ - Test crossing arena boundaries
+
+But, we're able to write and read and I can no longer _trivially_ break neoventi on its own, only in combination with venti bugs - bugs that weren't considered such previously because we didn't have a better implementation to compare against.
+
+Calling this good enough to push out for review; committing current state.
+
+
--
⑨