#bitcoin-core-dev on 2018-06-09 — searchable irc log

01:29 < fanquake> Empact Am alive and well :p

05:37 < murrayn> Is there a reason -O2 is specifically enabled in configure with --enable-debug? Should this not be -Og?

16:34 < gmaxwell> sipa: linked on your SHANI pr is an implementation where someone else noticed the throuput/latency relationship that I noticed.. they also do a 4way and it's faster (by a small amount) than 2-way.

16:34 < gmaxwell> they get 18% speedup for 2way over 1way, and 21% for 4-way over two-way.

16:35 < gmaxwell> I'm not sure if that difference is even worth it, though perhaps throughput might increase for later cpus.

16:37 < sipa> interesting, i'll try that too

16:37 < gmaxwell> Their implementation might be interesting to look at to see if they had some smarter way of dealing with register pressure.

16:38 < sipa> another remarkable thing i noticed: the speedup of 64-specialized shani over variable length shani was close to 2x

16:38 < sipa> far higher than the ratio observed elsewhere

16:38 < sipa> gmaxwell: from what i can see it's just interleaving

16:39 < gmaxwell> (presumably register churn is why their attempt at 8-way was slower 2/4 way)

16:40 < gmaxwell> sipa: The 64-specialized saves expander work, which I guess isn't as fast with shani? or maybe it's just that shani is faster so calling overhead (which the specialized reduces) matters more?

16:41 < provoostenator> Memory management is a pain. I have a device with 1 GB RAM, trying to squeeze as much as possible out of it during IBD. Without swap, if I set it slight too high, it crashes when dbcache gets too large. With swap, it starts using the swap, which presumably defeats the purpose. Is there any way to _have_ swap but prevent dbcache from using it?

16:42 < gmaxwell> provoostenator: I doubt swapping is actually defeating the purpose, at least if it isn't doing it heavily.

16:42 < gmaxwell> The data that gets swapped is infrequently used stuff first...

16:45 < sipa> gmaxwell: SHANI has special instructions both for expansion and transform

16:45 < provoostenator> It indeed didn't seem very slow, so maybe it's not too bad in practice then. 450 MB dbcache (with maxmempool=5) seems about the max without swap.

17:09 < sipa> gmaxwell: 4-way seems a bit slower here, but that may be due to less than perfectly interleaved code being emitted

21:04 < provoostenator> I have a new theory as to why my aggresive pruning IBD branch is _slower_ than master. Namely that dirty CCoinsCacheEntry read/write doesn't perform well for very large cache sizes. See See also https://github.com/bitcoin/bitcoin/pull/12404#issuecomment-395998702

21:05 < provoostenator> (theory, still have to measure this)

21:27 < phantomcircuit> provoostenator, aggressive pruning?

21:27 < sipa> phantomcircuit: #12404

21:27 < gribble> https://github.com/bitcoin/bitcoin/issues/12404 | Prune more aggressively during IBD by Sjors · Pull Request #12404 · bitcoin/bitcoin · GitHub

21:28 < phantomcircuit> oh

21:34 < phantomcircuit> sipa, does flushing the cache still remove everything?

21:35 < sipa> yes

21:37 < phantomcircuit> sipa, and there's no way to flush "upto block x" right?

21:38 < sipa> phantomcircuit: indeed, because there may have been entries created before x, but spent after x, which wouldn't be present on disk

21:39 < sipa> it is possible with the non-atomic flushing since 0.15 (which writes to disk a range of blocks rather than a single up-to-x point)

21:39 < sipa> though it's pretty complicated to reason about

21:59 < phantomcircuit> sipa, so to enable that you'd need to keep around entries that are a record of an entry being deleted?

22:01 < sipa> phantomcircuit: you actually don't

22:02 < sipa> you just need to accurately keep track of (a) the block up to which you've flushed everything and (b) the block up to which effects may be present on disk, and at startup replay the blocks' UTXO effects between those 2

22:02 < sipa> that's already implemented even

22:02 < sipa> however, once you introduce partial flushing during reorgs which may overlap etc... it becomes far more complicated

22:03 < phantomcircuit> yeah wasn't thinking about reorgs

22:04 < sipa> all of this is doable, and i think i know all the algorithms necessary to implement it

22:05 < sipa> with the goal of being able to have a background process that just periodically (and asynchronously) flushes the oldest dirty UTXO entries (and wipes the oldest non-dirty ones)

22:05 < sipa> but it's a pretty big amount of work without knowing if it'll actually speed things up :)

22:06 < phantomcircuit> sipa, i had a patch which did this, but broke consensus across reorgs

22:06 < phantomcircuit> it was a substantial speed up

22:06 < phantomcircuit> but that was a while ago, so possibly it wouldn't be as large anymore?

22:06 < sipa> since per-txout in 0.15 performance profiles of such things may have shifted drastically

22:07 < sipa> it could be less or more of a speedup now :)

22:11 < phantomcircuit> yeah

22:11 < phantomcircuit> iirc it was really simple to do

23:39 < phantomcircuit> sipa, the FRESH flag looks a bit confusing

23:41 < phantomcircuit> the idea is that if an entry is added and spent before a flush it's effectively a noop ?

23:41 < sipa> it just means "this entry does not exist in the parent cache, so if it is spent, we can just forget about it"

23:41 < sipa> phantomcircuit: it's *the* major performance gain our cache gives

23:41 < phantomcircuit> ok i get that

23:42 < phantomcircuit> yeah

23:43 < sipa> because it avoids entries ever hitting disk at all

23:47 < phantomcircuit> sipa, yup i definitely get it