< jimpo>
gmaxwell: By "like the undo data", do you mean just that there are flat file storing the large values and disk positions stored in LevelDB or are you suggesting specifically that the filters be computed in validation code and referenced by the block index?
< gmaxwell>
Flat files with the filters, indexed by position.
< jimpo>
Yeah, that makes sense to me. I'll code that up and compare read/write perf.
< sipa>
i doubt it matters much here; we're not throughput limited
< sipa>
leveldb writes all data twice, which is a reason against writing huge things like blocks and undo data
< gmaxwell>
also has varrious caching behaviors, also stores it somewhat inefficiently. I think that for recent blocks the filters are about 30KB per block? in any case if you think its okay it probably is.
< gmaxwell>
I was also thinking about future dependency on leveldb... since we got non-atomic flushing, there are many other things possible for the chainstate.
< echeveria>
mongodb?
< gmaxwell>
it has webscale
< gmaxwell>
I don't think that even with non-atomic-flushing would mongo's consistency behavior be acceptable. :P
< jimpo>
mongo write consistency could be a decent entropy source
< jimpo>
hmm, seems the whole block tree db could be moved to flat files since it's all read into memory on startup anyway
< sipa>
jimpo: i guess!
< wumpus>
I think the idea is to not read it all into memory at some point
< wumpus>
just like with the wallet, FWIW
< wumpus>
for the block index, the pointers could be handles that prompt fetching some more specific data only on demand
< wumpus>
ken2812221: yes, that is funny
< wumpus>
ken2812221: the argument handling code is pretty weird in some regards, now
< wumpus>
I tried to document it, but I guess I failed
< wumpus>
I guess I'm going to untag #14105 and #14100 from 0.17.0
< wumpus>
we're never going to do a release if we try to solve this first
< luke-jr>
wumpus: aren't block indexes so small that it wouldn't be worth doing fetch-on-demand handles? (as opposed to fetch-on-demand map)
< wumpus>
luke-jr: there is certainly some minimum state that would make no sense to fetch on deman
< wumpus>
luke-jr: on the other hand, the structure per block is growing every release, I'm sure there are also things that don't make sense to read and store persistently
< wumpus>
luke-jr: I just meant I don't want to commit to a flat file because of that; also for updates, that would be much harder to manage
< wumpus>
having a block index database makes sense, no matter how exactly it's managed now
< wumpus>
the linting stage is failing but there are no errors
< wumpus>
of course it all passes perfectly locally
< gmaxwell>
luke-jr: so sizeof(CBlockIndex) is 144 bytes, so thats 78MB (and slowly growing) of memory used for little particular purpose, excluding malloc overheads (which I guess are probably at least another 16 bytes per header). The fact that we also keep so many of them in memory means a longer start time, and constant pressure to not add things to those objects with a result of reducing
< gmaxwell>
functionality.
< gmaxwell>
so I think it would make sense to eventually not keep them in memory.
< gmaxwell>
there should be no particular reason that someone couldn't run a fully functional bitcoin node using a few tens of MB of ram... though obviously not one with the lowest possible latency.
< luke-jr>
gmaxwell: sure, I'm just saying, a handle wouldn't be a big improvement
< luke-jr>
seems to make more sense to just create the indexx object itself on demand
< luke-jr>
and not store anything in memory per-block
< gmaxwell>
ah, I think I agree with that.
< gmaxwell>
Well really the access to the block index could be intermediated through a caching layer, so that the policy of what is in memory vs not is hidden from the rest of the code.
< luke-jr>
sure
< wumpus>
"so I think it would make sense to eventually not keep them in memory" exactly
< wumpus>
I just meant we shouldn't be making any code changes in the direction of making that more difficult
< wumpus>
not so much 'we should be doing that now'
< wumpus>
I'd agree it's certainly not the biggest memory sink at the moent
< gmaxwell>
maybe one of the least useful ones, however.