< Dabs> uh. md5? isn't that "broken" ?
< luke-jr> Dabs: not because of its speed
< jeremyrubin> Dabs: probably not if you use a unique secret salt
< Dabs> yeah, but ... you'll have to always justify using a "broken" primitive. ... anyway. I guess it depends on the purpose.. hmac md5 or something.
< jeremyrubin> Dabs: I'm not seriously advocating using it... but it's really not broken in this use case.
< jeremyrubin> Dabs: For the same reason RIPEMD-160 is OK
< jeremyrubin> oops RIPEMD != RIPEMD-160
< jeremyrubin> Anyways, most of those collisions require the attacker to fully control both documents being hashed
< jeremyrubin> to collide their hashes
< luke-jr> Dabs: MD5 is not a primitive of BLAKE2b, they just have similar speeds
< wumpus> please don't use md5 in new code
< wumpus> it's, at its current level of brokenness still fine for some continuing legacy usages, but for new designs it should be avoided
< wumpus> there are modern fast and secure hash functions (indeed as BLAKE), and if speed trumps cryptographic security there are tons of other options
< wumpus>
< GitHub95> [bitcoin] MarcoFalke pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/223f4c2dd5fa...61d191fbf953
< GitHub95> bitcoin/master 7d8afb4 fanquake: [Doc] Improve GitHub issue template
< GitHub95> bitcoin/master 61d191f MarcoFalke: Merge #8887: [Doc] Improve GitHub issue template...
< GitHub95> [bitcoin] MarcoFalke closed pull request #8887: [Doc] Improve GitHub issue template (master...link-stackexchange) https://github.com/bitcoin/bitcoin/pull/8887
< btcdrak> ah, great it's been made segwitty
< morcos> gmaxwell: sorry i missed discussion yesterday on the cuckoo sigcache. jeremyrubin and i did put a LOT of work into different designs. but i do think this design if far better than the existing sig cache.
< sipa> morcos: i don't think there is doubt that it's better :)
< morcos> it might be fine to reduce the depth limit or untie it from the size of the table. but i seriously doubt it impacts the performance of ATMP. (once we saw how inefficient the old deleting behavior was anyway)
< jeremyrubin> wumpus: it wasn't a serious suggestion ;)
< morcos> one thing to keep in mind is that with a 40MB sigcache, it probably isn't going to get full except in the event of an attack. i ran a simulation over 6 months, and i was getting very close to the maximal possible hit rate on a 40MB cache.
< morcos> the fact that we delete sigs that were in blocks makes a HUGE difference
< sipa> morcos: yes... but we should know how it behaves in case of an attack as well
< sipa> without attack, i expect the size of the cache to not even matter that much
< sipa> as we'll only ever have live entries in it
< morcos> my idea for how to make it even more fool proof from ever getting full would be to implement a Rolling version where you have 2 and insert and check in both of them and then alternately clear them after some amount of time
< sipa> morcos: ha, the old rolling bloom filter design :)
< morcos> sipa: in the absence of attack there are still txs that get generated but are never mined (replaced, doublespent, too low fee)
< sipa> right
< morcos> sipa: yeah i just didn't think it was worth the complication for the first PR
< sipa> morcos: completely agreed
< * sipa> just hopes 0.13.1 is out the door soon
< jeremyrubin> generations are better than rolling I think
< sipa> jeremyrubin: yes, i think so
< sipa> especially given your entries are already 128-256 bits, adding a few bits for a generation number doesn't add much
< sipa> using two staggered sets effectively makes your entries twice the size
< sipa> or even 4 times
< morcos> sure, sounds good to me too
< sipa> as in the worst case time, you just emptied one, and the other one is half full, so you have 25% utilization of your entire set
< jeremyrubin> Also fee is maybe even better than generations
< sipa> use an ARC :)
< jeremyrubin> You can also emulate generations
< jeremyrubin> by createnewblocking a 10mb block or something from mempool
< jeremyrubin> after deleting a bunch of things randomly
< jeremyrubin> if you want 0 memory overhead
< jeremyrubin> sipa: ah, not automatic reference counting
< jeremyrubin> ARC seems not to work for this use case?
< jeremyrubin> Things are write once read once?
< sipa> ah, i'm confusing with the coin cache
< gmaxwell> morcos: part of the reason I was asking questions is that I wasn't sure if we knew how it would perform once someone sent in a huge set of junk transactions. But have no doubt, I'm sure its much better than what we currently have.
< gmaxwell> I'm surprised it didn't bencmark out as faster with only a single thread.
< morcos> gmaxwell: it was ever so slightly faster (1%) for a single thread, but thats within noise. i just think the amount of time taken in either case is small, it was only the lock contention that was a true problem.
< morcos> but now you inspired a hopefully efficient generation keeping design that will never get full, but only delete old generations if it needs to.. :)
< gmaxwell> sipa: re generation number, I had just assumed the entries would be changed to 252 bits, so you'd only lose at most 6% at a time, and would only delete things if it had to.
< morcos> the idea jeremyrubin and i want to try is actually a separate bit vector that just keeps track of whehter each entry is in the current generation or not. that's only touched during insert, so its lock free. and then use a simple heuristic to trigger an occasional testforcleanup.
< morcos> when doing that you just loop that vector and the garbage collection vector and if the current generation not marked for deletion is > 25% of capacity then mark for deletion old generation and increase generation.
< morcos> that has the nice property that you don't actually delete old things unless you have to, but under an attack you'll delete them often enough to stop yourself getting full.
< gmaxwell> I wish we could easily delete based on an entry being in the top of the mempool or not.
< gmaxwell> hm. I suppose that when we evict something from the mempool, we could revalidate it again with the cache in delete mode.
< gmaxwell> that would be kinda like that.
< gmaxwell> or insert things with a lower feerate 1 or two generations behind the current generation. (assuming many generations)
< michagogo> Today's lesson in my programming lessons: threading is hard
< sdaftuar> gmaxwell: i was thinking about doing that (revalidate in delete mode) for things that we don't accept to our mempool, as well
< morcos> gmaxwell: with a 40MB cache, it's just not necessary to make further optimizations. someone would have to flood you with 250k excess signatures in order for you to start marking for deletion things older than when the flood started.
< michagogo> (Also, each iteration of a for loop has its own scope for local variables, and those variables stick around even after the iteration is done and the name is reused for something else)
< morcos> if that time frame is recent enough that its causing a problem for your cache hit rate, then i think that implies your bigger problem is the tx flood and the DoS on checking all those sigs in the first place
< michagogo> (or rather, reused for the next iteration)
< gmaxwell> morcos: okay, for some reason I thought sipa's measurements had shown that we still had a needlessly low hitrate.
< morcos> gmaxwell: hmm, i'd be interested to see that, but i think any low hit rates we have are probably due to txs we never accepted to our mempool in the first place, not sigs that we accidentally evicted..
< gmaxwell> yea, I may be conflating things. it might be that right now its from things we never accept, and when sipa tried validating everything, we found it tainted the cache.
< morcos> gmaxwell: i attempted an approximation by inserting 10 random signatures for every tx my node saw and deleting those same 10 if the tx appeared in a block
< morcos> i ran that for 6 months and the hit rate was 98.35% on average, whereas perfect would have been 98.38% and the existing algo was 97.99%. with 40MB
< gmaxwell> hm. why was the existing algo lower?
< morcos> in reality many of those txs probably wouldn't have passed ATMP and made it into the cache, leading to a lower hit rate in practice, but not due to cache filling
< morcos> gmaxwell: primarily because you can cram more signatures in 40MB with the new design
< morcos> and partially because on a reorg the old algo doesn't have them, but the new one does with high probability
< gmaxwell> Makes sense.
< gmaxwell> yea, I'm really happy about the delete flag. I really didn't like the current code's behavior under reorg.
< gmaxwell> not good for network convergence for reorg to be much slower.
< morcos> gmaxwell: heh, probably MANY things to improve there
< gmaxwell> yes. but any improvement is good. :)
< gmaxwell> I'd take a wag that the validation performance limiter is now the additional sha256 used in the lookup.
< morcos> gmaxwell: as far as jeremyrubin and i can tell, the biggest limiter now is how much you blow out your various machine caches with different permutations of coinsviewcache size and flushing algo, sig cache lookup pattern, etc.. although this tends to affect performance AFTER validation has finished. i.e. slows down removeForBlock
< morcos> to really get better performance we'll need to switch data representations to eliminate so much allocating, deallocating and copying of vectors.
< morcos> but sdaftuar's idea is the next lowest hanging fruit is to just speed the path from validation finished to actually relaying to new peers, which isn't optimized at all right now
< gmaxwell> well BIP152 allows you to send the block before its verified.
< gmaxwell> so that would be an obvious thing to actually do.
< morcos> sure. same issue still applies though. you're just deciding to send it earlier.
< sdaftuar> i think gmaxwell's point is that the way you'd do that would be in the handling of eg ProcessNewBlock, so it'd actually go out earlier (rather than in SendMessages)
< sdaftuar> that's basically what i was planning to do
< gmaxwell> well it gets the whole validation process out of that critical path.
< michagogo> (Go figure... After a bunch of weeks of having something else every time, this week I'm around but the meeting is cancelled)
< jeremyrubin> gmaxwell: note you don't actually have to re-evaluate the signature, just see if it's hash is present on eviction (but you still need to eval script to extract signatures)