#bitcoin-core-dev on 2016-10-06 — searchable irc log

02:06 < Dabs> uh. md5? isn't that "broken" ?

02:26 < luke-jr> Dabs: not because of its speed

02:58 < jeremyrubin> Dabs: probably not if you use a unique secret salt

03:12 < Dabs> yeah, but ... you'll have to always justify using a "broken" primitive. ... anyway. I guess it depends on the purpose.. hmac md5 or something.

03:18 < jeremyrubin> Dabs: I'm not seriously advocating using it... but it's really not broken in this use case.

03:18 < jeremyrubin> Dabs: For the same reason RIPEMD-160 is OK

03:19 < jeremyrubin> oops RIPEMD != RIPEMD-160

03:20 < jeremyrubin> Anyways, most of those collisions require the attacker to fully control both documents being hashed

03:20 < jeremyrubin> to collide their hashes

03:48 < luke-jr> Dabs: MD5 is not a primitive of BLAKE2b, they just have similar speeds

06:19 < wumpus> please don't use md5 in new code

06:19 < wumpus> it's, at its current level of brokenness still fine for some continuing legacy usages, but for new designs it should be avoided

06:22 < wumpus> there are modern fast and secure hash functions (indeed as BLAKE), and if speed trumps cryptographic security there are tons of other options

06:23 < wumpus>

07:28 < GitHub95> [bitcoin] MarcoFalke pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/223f4c2dd5fa...61d191fbf953

07:28 < GitHub95> bitcoin/master 7d8afb4 fanquake: [Doc] Improve GitHub issue template

07:28 < GitHub95> bitcoin/master 61d191f MarcoFalke: Merge #8887: [Doc] Improve GitHub issue template...

07:28 < GitHub95> [bitcoin] MarcoFalke closed pull request #8887: [Doc] Improve GitHub issue template (master...link-stackexchange) https://github.com/bitcoin/bitcoin/pull/8887

09:58 < NicolasDorier> roasbeef: You can now measure your size https://testnet.smartbit.com.au/block/0000000000001280a3d758cb956e565daffd5af0e6b6723f28e1cbcda0da8652 ;)

10:00 < btcdrak> ah, great it's been made segwitty

14:24 < morcos> gmaxwell: sorry i missed discussion yesterday on the cuckoo sigcache. jeremyrubin and i did put a LOT of work into different designs. but i do think this design if far better than the existing sig cache.

14:24 < sipa> morcos: i don't think there is doubt that it's better :)

14:25 < morcos> it might be fine to reduce the depth limit or untie it from the size of the table. but i seriously doubt it impacts the performance of ATMP. (once we saw how inefficient the old deleting behavior was anyway)

14:25 < jeremyrubin> wumpus: it wasn't a serious suggestion ;)

14:26 < morcos> one thing to keep in mind is that with a 40MB sigcache, it probably isn't going to get full except in the event of an attack. i ran a simulation over 6 months, and i was getting very close to the maximal possible hit rate on a 40MB cache.

14:26 < morcos> the fact that we delete sigs that were in blocks makes a HUGE difference

14:26 < sipa> morcos: yes... but we should know how it behaves in case of an attack as well

14:26 < sipa> without attack, i expect the size of the cache to not even matter that much

14:26 < sipa> as we'll only ever have live entries in it

14:27 < morcos> my idea for how to make it even more fool proof from ever getting full would be to implement a Rolling version where you have 2 and insert and check in both of them and then alternately clear them after some amount of time

14:27 < sipa> morcos: ha, the old rolling bloom filter design :)

14:27 < morcos> sipa: in the absence of attack there are still txs that get generated but are never mined (replaced, doublespent, too low fee)

14:27 < sipa> right

14:27 < morcos> sipa: yeah i just didn't think it was worth the complication for the first PR

14:28 < sipa> morcos: completely agreed

14:28 < * sipa> just hopes 0.13.1 is out the door soon

14:28 < jeremyrubin> generations are better than rolling I think

14:28 < sipa> jeremyrubin: yes, i think so

14:29 < sipa> especially given your entries are already 128-256 bits, adding a few bits for a generation number doesn't add much

14:29 < sipa> using two staggered sets effectively makes your entries twice the size

14:29 < sipa> or even 4 times

14:30 < morcos> sure, sounds good to me too

14:30 < sipa> as in the worst case time, you just emptied one, and the other one is half full, so you have 25% utilization of your entire set

14:30 < jeremyrubin> Also fee is maybe even better than generations

14:31 < sipa> use an ARC :)

14:31 < jeremyrubin> You can also emulate generations

14:31 < jeremyrubin> by createnewblocking a 10mb block or something from mempool

14:31 < jeremyrubin> after deleting a bunch of things randomly

14:31 < jeremyrubin> if you want 0 memory overhead

14:31 < sipa> https://en.wikipedia.org/wiki/Adaptive_replacement_cache

14:32 < jeremyrubin> sipa: ah, not automatic reference counting

14:32 < jeremyrubin> ARC seems not to work for this use case?

14:32 < jeremyrubin> Things are write once read once?

14:33 < sipa> ah, i'm confusing with the coin cache

17:06 < gmaxwell> morcos: part of the reason I was asking questions is that I wasn't sure if we knew how it would perform once someone sent in a huge set of junk transactions. But have no doubt, I'm sure its much better than what we currently have.

17:07 < gmaxwell> I'm surprised it didn't bencmark out as faster with only a single thread.

17:08 < morcos> gmaxwell: it was ever so slightly faster (1%) for a single thread, but thats within noise. i just think the amount of time taken in either case is small, it was only the lock contention that was a true problem.

17:09 < morcos> but now you inspired a hopefully efficient generation keeping design that will never get full, but only delete old generations if it needs to.. :)

17:11 < gmaxwell> sipa: re generation number, I had just assumed the entries would be changed to 252 bits, so you'd only lose at most 6% at a time, and would only delete things if it had to.

17:15 < morcos> the idea jeremyrubin and i want to try is actually a separate bit vector that just keeps track of whehter each entry is in the current generation or not. that's only touched during insert, so its lock free. and then use a simple heuristic to trigger an occasional testforcleanup.

17:16 < morcos> when doing that you just loop that vector and the garbage collection vector and if the current generation not marked for deletion is > 25% of capacity then mark for deletion old generation and increase generation.

17:17 < morcos> that has the nice property that you don't actually delete old things unless you have to, but under an attack you'll delete them often enough to stop yourself getting full.

17:45 < gmaxwell> I wish we could easily delete based on an entry being in the top of the mempool or not.

17:45 < gmaxwell> hm. I suppose that when we evict something from the mempool, we could revalidate it again with the cache in delete mode.

17:45 < gmaxwell> that would be kinda like that.

17:46 < gmaxwell> or insert things with a lower feerate 1 or two generations behind the current generation. (assuming many generations)

17:50 < michagogo> Today's lesson in my programming lessons: threading is hard

17:51 < sdaftuar> gmaxwell: i was thinking about doing that (revalidate in delete mode) for things that we don't accept to our mempool, as well

17:51 < morcos> gmaxwell: with a 40MB cache, it's just not necessary to make further optimizations. someone would have to flood you with 250k excess signatures in order for you to start marking for deletion things older than when the flood started.

17:51 < michagogo> (Also, each iteration of a for loop has its own scope for local variables, and those variables stick around even after the iteration is done and the name is reused for something else)

17:52 < morcos> if that time frame is recent enough that its causing a problem for your cache hit rate, then i think that implies your bigger problem is the tx flood and the DoS on checking all those sigs in the first place

17:52 < michagogo> (or rather, reused for the next iteration)

17:54 < gmaxwell> morcos: okay, for some reason I thought sipa's measurements had shown that we still had a needlessly low hitrate.

17:56 < morcos> gmaxwell: hmm, i'd be interested to see that, but i think any low hit rates we have are probably due to txs we never accepted to our mempool in the first place, not sigs that we accidentally evicted..

17:57 < gmaxwell> yea, I may be conflating things. it might be that right now its from things we never accept, and when sipa tried validating everything, we found it tainted the cache.

17:58 < morcos> gmaxwell: i attempted an approximation by inserting 10 random signatures for every tx my node saw and deleting those same 10 if the tx appeared in a block

17:59 < morcos> i ran that for 6 months and the hit rate was 98.35% on average, whereas perfect would have been 98.38% and the existing algo was 97.99%. with 40MB

18:00 < gmaxwell> hm. why was the existing algo lower?

18:00 < morcos> in reality many of those txs probably wouldn't have passed ATMP and made it into the cache, leading to a lower hit rate in practice, but not due to cache filling

18:00 < morcos> gmaxwell: primarily because you can cram more signatures in 40MB with the new design

18:00 < morcos> and partially because on a reorg the old algo doesn't have them, but the new one does with high probability

18:00 < gmaxwell> Makes sense.

18:01 < gmaxwell> yea, I'm really happy about the delete flag. I really didn't like the current code's behavior under reorg.

18:01 < gmaxwell> not good for network convergence for reorg to be much slower.

18:02 < morcos> gmaxwell: heh, probably MANY things to improve there

18:02 < gmaxwell> yes. but any improvement is good. :)

18:03 < gmaxwell> I'd take a wag that the validation performance limiter is now the additional sha256 used in the lookup.

18:08 < morcos> gmaxwell: as far as jeremyrubin and i can tell, the biggest limiter now is how much you blow out your various machine caches with different permutations of coinsviewcache size and flushing algo, sig cache lookup pattern, etc.. although this tends to affect performance AFTER validation has finished. i.e. slows down removeForBlock

18:09 < morcos> to really get better performance we'll need to switch data representations to eliminate so much allocating, deallocating and copying of vectors.

18:10 < morcos> but sdaftuar's idea is the next lowest hanging fruit is to just speed the path from validation finished to actually relaying to new peers, which isn't optimized at all right now

18:10 < gmaxwell> well BIP152 allows you to send the block before its verified.

18:10 < gmaxwell> so that would be an obvious thing to actually do.

18:11 < morcos> sure. same issue still applies though. you're just deciding to send it earlier.

18:13 < sdaftuar> i think gmaxwell's point is that the way you'd do that would be in the handling of eg ProcessNewBlock, so it'd actually go out earlier (rather than in SendMessages)

18:13 < sdaftuar> that's basically what i was planning to do

18:13 < gmaxwell> well it gets the whole validation process out of that critical path.

19:01 < michagogo> (Go figure... After a bunch of weeks of having something else every time, this week I'm around but the meeting is cancelled)

19:20 < jeremyrubin> gmaxwell: note you don't actually have to re-evaluate the signature, just see if it's hash is present on eviction (but you still need to eval script to extract signatures)