#bitcoin-core-dev on 2018-06-10 — searchable irc log

02:48 < gmaxwell> sipa: did you look at their implementation? https://github.com/armfazh/flo-shani-aesni/blob/master/sha256/flo-shani.c

02:52 < sipa> gmaxwell: yes, just interleaving

07:37 < ilufang> quit

09:08 < provoostenator> sipa: thanks for the extra context. Maintaining a large dbcache is mainly useful during IBD, so can't the problem of reorgs be avoided by only doing the optimization for very deep blocks?

09:10 < provoostenator> And just in case, if during IBD an alternative set of headers is found that would trigger a deep reorg, you'd flush the cache and turn off the optimization, before switching to that new branch.

09:14 < provoostenator> Right now it seems that 500 MB < dbcache < 7000 MB is a performance dead zone. Though I can try tweaking #11658 to see where the diminishing returns are.

09:14 < gribble> https://github.com/bitcoin/bitcoin/issues/11658 | During IBD, when doing pruning, prune 10% extra to avoid pruning again soon after by luke-jr · Pull Request #11658 · bitcoin/bitcoin · GitHub

15:58 < provoostenator> Do I understand correctly that the only way for a coin cache entry to be dirty, is if the UTXO existed before the last flush and was spent since then? Would it be worth trying to bypass the cache in those cases and update the disk when spending a UTXO that's not in the cache?

16:01 < provoostenator> I wonder if OS's make any effort to optimize a write to the same physical place on disk that you just read from.

16:02 < sipa> provoostenator: it can be dirty because it's created after the last flush, or spent after the last flush while ot was created before

16:03 < sipa> and of course we can bypass the cache... if we don't care about the performance it offers

16:04 < sipa> provoostenator: i guess we could only do the background flushing during IBD, but that's still very scary

16:07 < provoostenator> "dirty because it's created after the last flush" - how does that work? I thought they always get the FRESH flag in that case.

16:08 < provoostenator> (I meant DIRTY flag, not dirty in general db terminology)

16:09 < provoostenator> Of course I do care about the performance impact of such a change. My working theory is that too many DITRY entries slows things down to a state that's worse than a smaller cache. So perhaps preventing accumulation of DIRTY entries would prevent that.

16:10 < provoostenator> (my "aggresive" pruning branch is much slower than master, despite the cache growing much bigger)

16:11 < provoostenator> I'm currently running IBD from block 320,000 - 480,000 on my iMac several times with decreasing dbcache (and once from genesis without interrupting) to see what happens.

16:18 < provoostenator> My hypothesis, based on what I've seen so far, is that when running from genesis to with "infinite" cache, going from 320K to 480K will be fastest. Followed by starting at 320K with infinite cache. A 3 GB cache will be slower, but a 500 MB cache will _faster_ than a 3 GB cache. Possibly regardless of pruning.

16:19 < gmaxwell> I think that would be very surprising.

16:20 < provoostenator> Indeed

16:24 < bitcoin-git> [bitcoin] ken2812221 opened pull request #13426: [WIP, bugfix] Add u8path and u8string to boost to fix #13103 (master...u8path_u8string) https://github.com/bitcoin/bitcoin/pull/13426

16:31 < sipa> provoostenator: FRESH implies DIRTY

16:31 < sipa> provoostenator: too many dirty entries slows things down... there may be a memory locality effect from just having many entries, but i don't see any way how dirtyness can impact that

16:32 < provoostenator> sipa: ah I see, so I should have said "DIRTY but not FRESH"

16:33 < provoostenator> Is there any sorting going on when entries are added?

16:34 < sipa> no

16:34 < sipa> it's a hash table

16:34 < sipa> provoostenator: i meant to say "about too many dirty entries slowing things doen"

16:34 < sipa> i don't believe that can be the case

16:39 < sipa> provoostenator: the time to flush itself may be proportional or worse to the number of dirty entries, though

16:42 < provoostenator> From what I saw on my AWS nodes, the pruning (which usually coincided with a cache flash) took just minutes and happened just a dozen or so times, on a IBD measured in days.

16:43 < sipa> right

16:43 < sipa> that seems expected

16:43 < provoostenator> So if an entry is not found int he cache, it starts walking through the disk looking for it? But there's no reason to assume that would be slower than without cache.

16:44 < sipa> of course disk will be slower than cache

16:44 < sipa> is it possible you're running into swap space?

16:45 < provoostenator> Amazon Ubuntu images don't have swap on by default, so I don't think so, but I already deleted those machine.

16:49 < provoostenator> At least I can rule that out in this current experiment, since I have 48 GB RAM

16:52 < provoostenator> When there's a cache, every time it calls CCoinsViewCache::FetchCoin it walks through the memory cache and if nothing is found walks through the disk cache. So there's potentially some duplicate effort, maybe that becomes a problem?

16:53 < provoostenator> Oh no, because it's a hash table, it's not walking, it just fetches it.

16:53 < provoostenator> The term "iterator" confused me there.

16:54 < sipa> yes

16:54 < sipa> and on disk, it just fetches from leveldb, which has indexes and other structure to guide the search - it's isn't really iterating either

16:56 < provoostenator> If a big cache causes a slowdown compared to a small cache, it has to be the in-memory stuff I would guess.

17:04 < sipa> how long does flushing take?

17:04 < sipa> it can be minutes even on high end systems for multi-gb caches

17:07 < provoostenator> Minutes as far as I know, let me upload the logs...

17:17 < provoostenator> https://ufile.io/tlvv3 (prune3000_sjors.log was the slowest, I gave up after 5 days)

17:20 < provoostenator> TIL about OnionShare, so here you go: http://4nzykwc37ncqcwhp.onion/recall-shiftless

23:35 < sipa> gmaxwell: i win

23:35 < sipa> intel's SSE4 sha256 code, transliterated to sse4 intrinsics... is 8% faster than the asm version

23:35 < sipa> (on a Ryzen system)

23:49 < sipa> on i7 the intrinsics version is slightly slower (0.7% slower for long hashes, 1.5% slower for double-SHA256, 4$ slower for 32-byte hashes)