< xinxi>
I am wondering why Bitcoin Core does not store the blockchain in a distributed way to save space?
< wumpus>
there are some ideas for adding range negotiation to the protocol, so that nodes can store different ranges of blocks
< wumpus>
see e.g. last meeting notes
< wumpus>
the answer to 'why' is always: because no one has designed/implemented this yet
< xinxi>
So theoretically, it is possible.
< wumpus>
in any case if you are short in space you can use pruning
< xinxi>
And blocksize won't be a problem seriously.
< wumpus>
you're confounding issues
< xinxi>
Sure blocksize increase will cause other issues like network latency.
< wumpus>
increasing block size is a problem for utxo set growth, for the time it takes to initially sync, for bandwidth requirements of nodes and miners, for CPU validation overhead. Storing the block chain is the least of worries really
< wumpus>
anyhow move this to #bitcoin, this is off topic here
< xinxi>
OK
< xinxi>
Thank you.
< xinxi>
Another question directly about Bitcoin's current protocol.
< xinxi>
Is it possible to increase OP_RETURN's data size with a softfork?
< wumpus>
this topic is about development, discussion about changing the source code, use #bitcoin for general questions
< GitHub91>
[bitcoin] theuni opened pull request #8888: CConnman: Remove global usage in wallet and some gui (master...no-global-connman) https://github.com/bitcoin/bitcoin/pull/8888
< achow101>
the values for MSG_WITNESS_BLOCK and MSG_WITNESS_TX should probably be defined in BIP144...
< sipa>
wumpus: i guess MSG_FILTERED_WITNESS_BLOCK is what we'd need for a bip37 extension with segwit data
< sipa>
NicolasDorier: yes, i'm aware
< wumpus>
sipa: makes sense
< wumpus>
luke-jr: I suspect it's a very young person and someone that doesn't know English very well, and he's just trying to be helpful. But indeed, wtf
< wumpus>
and indeed, in fact most of build-unix.md is about linux distros, and the only UNIX that gets mentioned separately has its own document (openbsd), but no, adding kernel versions will not clarify that in any way :)
< * wumpus>
goes to add a section about FreeBSD to balance it out
< dcousens>
btcdrak: "uncompressed keys which are disallowed under segwit.", interesting, no issue but can't find that in the BIP?
< btcdrak>
dcousens: because it is too late to change the consensus layer without causing delays, so it is implemented as policy for now and consider for future sf.
< btcdrak>
see last meetung notes
< dcousens>
btcdrak: ok, sweet, will do
< dcousens>
btcdrak: seen in notes, cheers ;)
< btcdrak>
I think we will make a note in the BIP bow, thanks.
< wumpus>
maybe, though it should be very clear that it is policy and not consensus then
< gmaxwell>
My suggestion is that it should state that it's policy created with the intent of making it consensus in a future softfork.
< wumpus>
or already create a BIP for such a future softfork, although that may be premature if it's to be combined with other things that may still come to light using this in the wild
< gmaxwell>
Well we can do like we did with CSV, we had three BIPs triggered on one bit.
< wumpus>
indeed. true.
< gmaxwell>
The thing I'd combine it with would be low-S enforcement.
< dcousens>
gmaxwell: agreed
< wumpus>
yes that would make sense, it's a comparable thing
< gmaxwell>
(low-S doesn't have the blowing up anyone's pubkey risk, though, and already IsStandard for non-segwit, so no need to warn about it.)
< dcousens>
btcdrak: that was uncompressed rejection only in witnesses, or even in regular scriptSigs?
< luke-jr>
indeed, node policy isn't something BIPs should cover, except in the case of it being preparation for a softfork
< btcdrak>
dcousens: only for segwit
< dcousens>
or I suppose the question is targeted at the idea in general, would it be a soft-fork to reject all uncompressed keys? Or only uncompressed keys in witness scripts
< luke-jr>
dcousens: only in witnesses; it'd break compatibility otherwise
< dcousens>
ok
< gmaxwell>
dcousens: I would be reasonable for a library to strongly discourage uncompressed keys in whatever ways it can do so without breaking things.
< dcousens>
gmaxwell: indeed, defaults are almost all against it now anyway... but I think by the time that soft-fork came around, it'd be something I'd probably move into the "hard enough to do you have to do it manually" camp
< dcousens>
rather than just flick a switch
< gmaxwell>
we can't, unfortunately, get rid of uncompressed keys for non-segwit, as that would confiscate coins. But for segwit we can so long as it's clearly non-kosher from the start.
< dcousens>
gmaxwell: hmmm, I suppose that is risky, but I can see some actors aggrevating the issue for agendas
< gmaxwell>
dcousens: changing things for existing keys is just completely impratical ... people with random uncompressed keys printed out... coins paid to them.. no idea that the keys are uncompressed.
< dcousens>
gmaxwell: I 100% agree its a good move, I just wonder if its worth delaying instead of policy for a short time... probably already discussed in the meeting, I'll look up the logs more in depth
< dcousens>
delaying segwit* (and then putting it in that SF)
< GitHub79>
bitcoin/master a78e542 Luke Dashjr: Bugfix: Trivial: RPC: getblockchaininfo help: pruneheight is the lowest, not highest, block
< GitHub79>
bitcoin/master 223f4c2 Wladimir J. van der Laan: Merge #8884: Bugfix: Trivial: RPC: getblockchaininfo help: pruneheight is the lowest, not highest, block...
< GitHub78>
[bitcoin] laanwj closed pull request #8884: Bugfix: Trivial: RPC: getblockchaininfo help: pruneheight is the lowest, not highest, block (master...trivial_pruneheight_help) https://github.com/bitcoin/bitcoin/pull/8884
< Lauda>
Any estimate for 0.13.1 yet?
< sipa>
soon.
< instagibbs>
Two Weeks(TM)
< sipa>
TW
< sipa>
not TM, i hope
< Lauda>
I dislike the word soon as it can mean anywhere between now and sometime in the distant future. :P
< sipa>
Lauda: that is exactly what it meams
< sipa>
there are very few remaining issues
< achow101>
Lauda: as soon as everything in the milestone is taken care of
< Lauda>
tl;dr: Soon.
< Lauda>
I've checked the milestone not long ago.
< achow101>
they keep adding new stuff
< cdecker>
Soon is still better than 'eventually' :-)
< instagibbs>
achow101, people can ask for milestone tags, may not happen tho
< instagibbs>
a number have been pruned off already
< cfields_>
jonasschnelli: ping
< GitHub16>
[bitcoin] jnewbery opened pull request #8894: [Testing] Include fRelay in mininode version messages (master...mininode-relay-field) https://github.com/bitcoin/bitcoin/pull/8894
< gmaxwell>
oh good. someone implement a Cuckoo Hash Table... now we need a lossy version and can get rid of those cahe destroying huge bloom filters we're using all over the place.
< jeremyrubin>
sipa: latex irc plugin would be dope
< gmaxwell>
whats lossy about it? looks like if there is a collision you ripple, like an ordinary cuckoo hash.
< jeremyrubin>
gmaxwell: once depth is reached, last thing touched goes bye-bye
< jeremyrubin>
gmaxwell: no rehashing
< gmaxwell>
Did you test if that optimization mattered? I think if makes a difference at all, your table is over populated and the performance will be poor.
< gmaxwell>
(because if it makes a difference it means you are hitting the limit often, and if you're doing that you're probably doing it for almost every insert.)
< jeremyrubin>
gmaxwell_: I tested many many optimizations... not that one because I have a fixed size table so rehashing can't help
< jeremyrubin>
gmaxwell: ^^^
< gmaxwell>
jeremyrubin: what you need to do if you hit the maximum then is go delete entries at random.
< jeremyrubin>
gmaxwell: I did however test with more hashes per entry (10)
< gmaxwell>
If you look at a cuckoo hash number of expected accesses as a function of table fullness, you get this behavior where it's basically 2 until the table is half full and then it shoots up like a rocket.
< jeremyrubin>
Yeah
< jeremyrubin>
so fortunately insert performance isn't a large priority
< sipa>
with 3 hashes you can get to 90% full :)
< gmaxwell>
So your depth limit means that you'll never hit it until you're too full, then you'll often hit it.
< jeremyrubin>
and we'd rather spend the compute poking around the table looking for things that have been deleted than clear something that hasn't been deleted
< jeremyrubin>
So the "eager-delete" strategy is a loser
< jeremyrubin>
sipa: yes, but on a non-fixed memory system
< sipa>
?
< jeremyrubin>
*non-fixed
< jeremyrubin>
eg, you can get to 90% before needing a re-hash
< jeremyrubin>
but we never increase memory size
< jeremyrubin>
so we always get to 100% full
< jeremyrubin>
it's just a matter of how fast
< jeremyrubin>
There are gains to be made with more hashes/other things I agree. I'm pretty excited about some of them, but this is minimal & those all add some uneeded complexity
< gmaxwell>
jeremyrubin: what is the depth limit you're using?
< jeremyrubin>
log(size)
< gmaxwell>
yea, so you're going to end up doing log() accesses on insert basically every time once its run long enough.
< jeremyrubin>
Yeah that's fine
< jeremyrubin>
We don't really care about insert performance (non-hot path)
< jeremyrubin>
And we also do care about deleting things that have been erased preferentially (rather than randomly)
< jeremyrubin>
so it's worth it.
< sipa>
jeremyrubin: i see
< jeremyrubin>
besides, log2(40mb/32bytes per entry) ~~ 21 which is not trivial, but not huge by any means
< gmaxwell>
it's a lot of random memory accesses.
< jeremyrubin>
gmaxwell: I tested designs that did less random mem accesses and it wasn't terribly better for much added complexity
< jeremyrubin>
gmaxwell: I'll float those later if this gets through review
< jeremyrubin>
gmaxwell: for instance, one can split keys to an 8-bit tag and 258-bit main value, and do initial lookups on the tags.
< gmaxwell>
what matters is the access, 1-64 bytes (and likely 1-128 bytes) will all end up being basically the same speed.
< jeremyrubin>
gmaxwell: you can also do buckets of two hashes so that k can be stored at h0(k) & ~1 and h0(k) | 1.
< jeremyrubin>
gmaxwell: wrong. l1/l2 matters at that size
< jeremyrubin>
gmaxwell: you can also do linear probing style stuff with 64 entries per bucket (and say, 2 buckets per entry allowed) and 1 byte tags
< jeremyrubin>
(I implemented and tested all of the above, none are a clear winner over the K.I.S.S. solution PR'd)
< sipa>
(h0(k) & 1, h0(k) | 1) won't result in a cuckoo style behaviour, i think
< sipa>
as the collisions are perfectly correlated
< gmaxwell>
no, it won't.
< jeremyrubin>
sipa: that's why you have two hashes still
< jeremyrubin>
sorry if that was unclear
< sipa>
ah, you mean in addition
< sipa>
right
< jeremyrubin>
yes
< gmaxwell>
but this table doesn't really make use of cuckoo style behavior once it fills.
< sipa>
you're turning each entry into a 2-way associativr cache
< gmaxwell>
it just becomes a 2-way set assc cache.
< jeremyrubin>
sure
< jeremyrubin>
gmaxwell: more or less the same thing
< sipa>
what would be the effect if you greatly reduced the max iterations?
< jeremyrubin>
less likely to find garbage on insert
< sipa>
(in the otherwise kiss design)
< sipa>
so why bother doing 21 iterations?
< gmaxwell>
Well I'm saying this cuckoo table is a not one, at least once filled it's a two way set assc cache. I think you can probably make your insert behavior stop at the first collision once you're over some fullness threshold (like 90%) and you'll see performance improve.
< BlueMatt>
remember: we really give 0 shits about insert performance....ATMP isnt our hot-path, and it just did sigchecks, so a bunch of lookups (even hitting memory) isnt all that bad
< gmaxwell>
sorry if I didn't state it clearly above: this data structure is a cucokoo hash table that turns into a 2way set assc cache once it becomes overfull
< gmaxwell>
BlueMatt: I'm also thinking about other places where we should use this.
< BlueMatt>
ahh, ok, fair enough
< gmaxwell>
And I care about the overall cpu usage of bitcoin, not just the block race. :)
< gmaxwell>
it looks like great work too, I'm not being negative, just thinking it through.
< BlueMatt>
ehh, doing a few more memory hits after doing ATMP checksigs isnt all that much too notice :p
< jeremyrubin>
sipa: Finding garbage means finding a signature you don't need again (unless reorg)
< sipa>
yes, in case it isn't clear, i'm just playing devil's advocate to understand
< jeremyrubin>
sipa: finding !garbage means you delete something maybe valuable
< jeremyrubin>
sipa: (yes :) )
< jeremyrubin>
If i math'd right: (1-((5000/((float(40<<20)/(32.0+1.0/8))))))**21
< gmaxwell>
though I wonder if its code complexity wouldn't be reduced by dropping the cuckoo behavior entirely... thats a question that depends on how full the cache is typically.
< jeremyrubin>
== ~92%
< jeremyrubin>
should be chance of deleting with this design
< jeremyrubin>
gmaxwell: what is the cuckoo behavior? Eg, one hash location?
< gmaxwell>
jeremyrubin: no, cuckoo behavior is the rippling insertion.
< jeremyrubin>
gmaxwell: so not iterating at all?
< gmaxwell>
In a plain 2way set assc cache, you'll probe both locations on insert and if both are full you'll evict one (at random or whatever).
< BlueMatt>
gmaxwell: i think, here, thats motly there so that you can find entries marked deleted while rippling
< BlueMatt>
if you didnt have the delete flags, probably
< gmaxwell>
okay, I wasn't thinking about the delete flags, hows that work?
< BlueMatt>
its the sigcache - we used to delete things while checking sigs in blocks
< BlueMatt>
now they're marked for deletion in an atomic
< BlueMatt>
so jeremyrubin's thing goes through and treats slots as empty if they have a delete flag on insert
< gmaxwell>
I don't think that matters actually.
< BlueMatt>
this has an interesting benifit of making it so that reorgs at the tip might use sigcache
< gmaxwell>
When you go to insert, if either are free or marked delete, overwrite.
< gmaxwell>
otherwise pick one and overwrite.
< gmaxwell>
rippling at that point doesn't do you any good.
< gmaxwell>
the other entries you would delete will get deleted when something tries to insert in them.
< jeremyrubin>
yes it does? the next one might find a trash
< sipa>
jeremyrubin: i see!
< gmaxwell>
I don't.
< jeremyrubin>
k1 and k2 sharing location X are highly unlikely to have another in common
< gmaxwell>
If I insert k1 and one of its locations contains trash, I'll use that location.
< sipa>
gmaxwell: if you iterate 21 times, you've had 21 independent chances of finding an empty/deleted position, so you don't need to evict anything potentially valua le
< gmaxwell>
I don't need someone to have come in front of me and taken out the trash for me, I can do it then.
< gmaxwell>
sipa: yes; thats equal to saying that cuckoo is helpful if the table is not overfull.
< sipa>
well, yes, but it won't be if many entries are marked deleted
< gmaxwell>
which it is, but if the table is normally more than 50% full, it doesn't help.
< sipa>
i think it can be way more than 50% full
< jeremyrubin>
will always go to 100% full
< jeremyrubin>
except after a block
< gmaxwell>
which is what I said above... depending on how full the table is (and by full I am counting trash as deleted), this could just be a set assc cache with no loss in performance.
< sipa>
if there are no deleted positions (or very few), agree
< jeremyrubin>
there are n_sigops(last_block) empty slots
< BlueMatt>
gmaxwell: by recursing you're less likely to throw out non-trash if there are delete positions available, this is esp true if you eg have blocks come in quickly in succession
< gmaxwell>
sipa: not just 'very few'. The benefit of cuckoo drops of exponentially after 50%.
< gmaxwell>
BlueMatt: not if the table is well over 50% full (deleted entries are not full in this measure).
< sipa>
gmaxwell: i believe that if you're 90% full, you have a very high chance that a new insert will end up with you consuming an empty position
< jeremyrubin>
This can be addressed by increasing hashes to 3 or 4 (or 10!)
< jeremyrubin>
not fact 10 just 10 excitement
< BlueMatt>
gmaxwell: indeed, though if you have quick blocks it means you're less full...
< gmaxwell>
yes, the table can run fuller if there are more probes, but with a linear slowdown for your lookup.
< BlueMatt>
as an aside: we could also add a few bits to indicate feerate for each entry, which would make the probing actually really matter
< sipa>
BlueMatt: or age
< BlueMatt>
or age
< jeremyrubin>
Yep
< sipa>
add generation numbers like the bloom filter cache
< jeremyrubin>
Yep
< gmaxwell>
sipa: I think you're overestimating it, when you're 90% full most entries end up in the posistion they're in because their sibling was alreaady full.
< jeremyrubin>
Or like.. clear them on eviction
< sipa>
gmaxwell: 0.9^21, no?
< gmaxwell>
sipa: no, because the when you consider entry X the current member is more likely to be in X because its alternate Y is full. The sequence of tests are no longer independant.
< jeremyrubin>
Wait what?
< gmaxwell>
this is why the performance falls off so dramatically at over 50% full.
< jeremyrubin>
That's only true if we prefer h0 over h1?
< BlueMatt>
this is only true if your table is small compared to your iteration count?
< jeremyrubin>
Also... so what! If it's really such a big deal we can just clear 50% of things out from this one
< jeremyrubin>
and still be on par with existing :P
< gmaxwell>
BlueMatt: I'm responding to sipa saying that the odds you fail to find a free entry in 21 hops is 0.9^21
< BlueMatt>
gmaxwell: i mean it should be close
< sipa>
gmaxwell: i don't understand
< sipa>
gmaxwell: maybe we have different assumptions
< jeremyrubin>
I agree with sipa
< jeremyrubin>
my math was: ((1-((5000/((float(40<<20)/(32.0+1.0/8))))))**1)
< jeremyrubin>
oops last one should be a 21
< BlueMatt>
anyway, headed out, y'all figure it out and report back :P
< jeremyrubin>
for a block clearing 5k things
< jeremyrubin>
Gonna head out for some dinner; looking forward to hearing more of your thoughts.
< jeremyrubin>
gmaxwell: if you want I have a super templated version where you can enable/disable a ton of things... but it's quite unsightly and needs some bugfixes.
< gmaxwell>
sipa: as a cuckoo hashtable fills, everythign initially starts out in it's h0 position. then you start getting collisions, when the table is way overfull, and you hit a collision (p=.9) then you look at that entry and go to put it in its alternate... but the reason it was in the entry you looked at was often because the alternate was already full. So the comparison you now make on the alternate
< gmaxwell>
if >p=.9 even though the table is only 90% full overall.
< sipa>
if evictions are independent from insertions (and evictions are responsible for all of the 10% empties), then i think any chain of entries up to 21 deep will have 0.9^21 chance to not have any evicted entries
< sipa>
you're talking about a model where there are only insertions
< gmaxwell>
okay, if we assume that the table is completely full, then evictions happen, then on your next insert I agree with you.
< gmaxwell>
but on the insert after that the performance will be worse than 0.899^21
< sipa>
right.
< sipa>
the iterations are at fullness just a search for entries that were deleted after the chain being traversed was created
< sipa>
and increasing the iteration count means you're willing to spend more time looking for those
< sipa>
i'm not convinced that making this choice depend on the table size is the best choice
< sipa>
but it looks simple and it works better than what we have
< gmaxwell>
But a 2-way set assc cache would be even simpler.
< jeremyrubin>
sipa: you could make the choice based on the number of sigops in the last block
< jeremyrubin>
eg, st (1-empty%)^iter > .50
< gmaxwell>
and at very full levels it will perform no worse, you give an argument which I can can buy that the complexity does improve performance, e.g. at the 90% full level. I dunno what level of fullness would be typical.
< gmaxwell>
if the cache has 40MB of 32 byte entries. and a single block empties 4000 and before a new block arrives 4000 are added, then the low watermark would be 99.6%
< sipa>
yeah, i think in optimal cases you want to artificially keep the fullness low
< gmaxwell>
and 21 probes would be pretty unlikely to find a deleted entry.
< gmaxwell>
meaning a plain 2way set assc would have performed just as well but with simpler code and 40x faster insertion.
< sipa>
find some function of the fullness for the maximum number of iterations
< sipa>
which is 1 when completely
< sipa>
full
< sipa>
jeremyrubin: can you efficiently keep track of fullness?
< gmaxwell>
well what you want is something that is an insertion time tolerance, and when that tolerance would not be likely to yield an improvement, you just go 1 deep.
< sipa>
it's a balance between chance of evicting a live entry and increasing fullness
< sipa>
perhaps the max iterations should be 1 already at 90%
< sipa>
and $TOLERANCE at 0%
< sipa>
and some nifty fitted function in between
< gmaxwell>
well the question you have is the marginal rate of return for each probe.
< gmaxwell>
for low full levels the tolerance exists to prevent cycles.
< gmaxwell>
the probably of going more than N times when you are M full, is negligible unless there is a cycle.
< gmaxwell>
so thats normally where the max probe count (which then triggers a resize and/or rehashing of the whole time) in an ordinary cuckoo hash.
< gmaxwell>
I'm sure tromp could tell us all about those probablities. :P
< sipa>
there is an annoyong collision in my brain's cuckoo table between john trump and donald tromp.
< gmaxwell>
... just move one to its alternative location?
< gmaxwell>
s/whole time/whole table/
< gmaxwell>
In any case, unless there is something surprising about fullness going on, with our current tables, I believe that setting the max depth to 1 (turning this into a 2way set assc cache) would not hurt block validation performance.
< gmaxwell>
but a generation count could make it be effectively 50% free at all time, and then that argument goes out the window.
< jeremyrubin>
you can cheaply do generations for cost of one bit
< gmaxwell>
it would be fine to reduce the key size by a couple bits.
< jeremyrubin>
hell, you can even replace marking garbage at all and just do generations
< jeremyrubin>
gmaxwell: yeah I built a version like that
< jeremyrubin>
you use the last bit/byte for flags
< gmaxwell>
to be honest the keys could be 16 bytes, since the hash is privately salted... but at least with the old datastructure there wouldn't have been a huge performance improvement from that.
< jeremyrubin>
If memory overhead is really a must
< jeremyrubin>
gmaxwell: agree 100%, but I thought that wasn't reviewable
< gmaxwell>
If we wanted the same code to work as a replacement for the bloom filter elsewhere we'd want to use the location xor trick, where other_index = this_index^H(key) ... this lets you use very small keys.. the original insert tries to go into index = H(bigger_key).
< gmaxwell>
jeremyrubin: yea, I wouldn't bother but it would be good to know what performance we were leaving on the floor, if any. If it's a lot it would be worth considering.
< jeremyrubin>
yeah; the nice thing is I think this patch is a nesc first step to trying out those designs; so my feeling is we should try to review this then such improvements can be measured
< jeremyrubin>
I didn't try a version with half sized keys FYI.. figured proposing that would be DOA
< gmaxwell>
it would only be an interesting proposal if it showed a big speedup, otherwise it's not worth trying to reason about the security.
< jeremyrubin>
Also if we're going to 128 bit may as well do md5 or something if you really want things to go faster
< jeremyrubin>
md5+salt should be just as secure
< jeremyrubin>
(iirc)
< sipa>
SipHash128.
< sipa>
me ducks
< jeremyrubin>
haha isn't that experimental tho
< jeremyrubin>
but yeah computing the hashes is a hot path thing, so if one really wants to rice this sucker would make things faster
< jeremyrubin>
(ComputeEntry called once per lookup)
< GitHub138>
[bitcoin] randy-waterhouse opened pull request #8896: Update INSTALL landing redirection notice for build instructions. (master...master) https://github.com/bitcoin/bitcoin/pull/8896
< tunafizz>
needs moar power al *grunt*grunt*grunt*
< gmaxwell>
well if you want faster, using blake2b would be the same speed as md5, and keeping the ~32 byte output I wouldn't really have much to fuss about the security.
< tunafizz>
needs the binford2000 hash al *grunt*grunt*grunt*