< gmaxwell> sipa: did you look at their implementation? https://github.com/armfazh/flo-shani-aesni/blob/master/sha256/flo-shani.c
< sipa> gmaxwell: yes, just interleaving
< ilufang> quit
< provoostenator> sipa: thanks for the extra context. Maintaining a large dbcache is mainly useful during IBD, so can't the problem of reorgs be avoided by only doing the optimization for very deep blocks?
< provoostenator> And just in case, if during IBD an alternative set of headers is found that would trigger a deep reorg, you'd flush the cache and turn off the optimization, before switching to that new branch.
< provoostenator> Right now it seems that 500 MB < dbcache < 7000 MB is a performance dead zone. Though I can try tweaking #11658 to see where the diminishing returns are.
< gribble> https://github.com/bitcoin/bitcoin/issues/11658 | During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after by luke-jr · Pull Request #11658 · bitcoin/bitcoin · GitHub
< provoostenator> Do I understand correctly that the only way for a coin cache entry to be dirty, is if the UTXO existed before the last flush and was spent since then? Would it be worth trying to bypass the cache in those cases and update the disk when spending a UTXO that's not in the cache?
< provoostenator> I wonder if OS's make any effort to optimize a write to the same physical place on disk that you just read from.
< sipa> provoostenator: it can be dirty because it's created after the last flush, or spent after the last flush while ot was created before
< sipa> and of course we can bypass the cache... if we don't care about the performance it offers
< sipa> provoostenator: i guess we could only do the background flushing during IBD, but that's still very scary
< provoostenator> "dirty because it's created after the last flush" - how does that work? I thought they always get the FRESH flag in that case.
< provoostenator> (I meant DIRTY flag, not dirty in general db terminology)
< provoostenator> Of course I do care about the performance impact of such a change. My working theory is that too many DITRY entries slows things down to a state that's worse than a smaller cache. So perhaps preventing accumulation of DIRTY entries would prevent that.
< provoostenator> (my "aggresive" pruning branch is much slower than master, despite the cache growing much bigger)
< provoostenator> I'm currently running IBD from block 320,000 - 480,000 on my iMac several times with decreasing dbcache (and once from genesis without interrupting) to see what happens.
< provoostenator> My hypothesis, based on what I've seen so far, is that when running from genesis to with "infinite" cache, going from 320K to 480K will be fastest. Followed by starting at 320K with infinite cache. A 3 GB cache will be slower, but a 500 MB cache will _faster_ than a 3 GB cache. Possibly regardless of pruning.
< gmaxwell> I think that would be very surprising.
< provoostenator> Indeed
< bitcoin-git> [bitcoin] ken2812221 opened pull request #13426: [WIP, bugfix] Add u8path and u8string to boost to fix #13103 (master...u8path_u8string) https://github.com/bitcoin/bitcoin/pull/13426
< sipa> provoostenator: FRESH implies DIRTY
< sipa> provoostenator: too many dirty entries slows things down... there may be a memory locality effect from just having many entries, but i don't see any way how dirtyness can impact that
< provoostenator> sipa: ah I see, so I should have said "DIRTY but not FRESH"
< provoostenator> Is there any sorting going on when entries are added?
< sipa> no
< sipa> it's a hash table
< sipa> provoostenator: i meant to say "about too many dirty entries slowing things doen"
< sipa> i don't believe that can be the case
< sipa> provoostenator: the time to flush itself may be proportional or worse to the number of dirty entries, though
< provoostenator> From what I saw on my AWS nodes, the pruning (which usually coincided with a cache flash) took just minutes and happened just a dozen or so times, on a IBD measured in days.
< sipa> right
< sipa> that seems expected
< provoostenator> So if an entry is not found int he cache, it starts walking through the disk looking for it? But there's no reason to assume that would be slower than without cache.
< sipa> of course disk will be slower than cache
< sipa> is it possible you're running into swap space?
< provoostenator> Amazon Ubuntu images don't have swap on by default, so I don't think so, but I already deleted those machine.
< provoostenator> At least I can rule that out in this current experiment, since I have 48 GB RAM
< provoostenator> When there's a cache, every time it calls CCoinsViewCache::FetchCoin it walks through the memory cache and if nothing is found walks through the disk cache. So there's potentially some duplicate effort, maybe that becomes a problem?
< provoostenator> Oh no, because it's a hash table, it's not walking, it just fetches it.
< provoostenator> The term "iterator" confused me there.
< sipa> yes
< sipa> and on disk, it just fetches from leveldb, which has indexes and other structure to guide the search - it's isn't really iterating either
< provoostenator> If a big cache causes a slowdown compared to a small cache, it has to be the in-memory stuff I would guess.
< sipa> how long does flushing take?
< sipa> it can be minutes even on high end systems for multi-gb caches
< provoostenator> Minutes as far as I know, let me upload the logs...
< provoostenator> https://ufile.io/tlvv3 (prune3000_sjors.log was the slowest, I gave up after 5 days)
< provoostenator> TIL about OnionShare, so here you go: http://4nzykwc37ncqcwhp.onion/recall-shiftless
< sipa> gmaxwell: i win
< sipa> intel's SSE4 sha256 code, transliterated to sse4 intrinsics... is 8% faster than the asm version
< sipa> (on a Ryzen system)
< sipa> on i7 the intrinsics version is slightly slower (0.7% slower for long hashes, 1.5% slower for double-SHA256, 4$ slower for 32-byte hashes)