< bitcoin-git> [bitcoin] za-kk reopened pull request #17355: gui: grey out used address in address book (master...oct-19-17174) https://github.com/bitcoin/bitcoin/pull/17355
< bitcoin-git> [bitcoin] jnewbery opened pull request #18113: [coins] Don't allow a coin to spent and FRESH. Improve commenting. (master...2020-01-prune-pruned) https://github.com/bitcoin/bitcoin/pull/18113
< cfields_> gitian builders: detached sigs for v0.19.1rc2 are up.
< fanquake> ?
< bitcoin-git> [bitcoin] achow101 opened pull request #18115: [wallet] Pass in transactions and messages for signing instead of exporting the private keys (master...sign-in-spkman) https://github.com/bitcoin/bitcoin/pull/18115
< jaheller> Hi, some multisig "gurus" here? Got a question regarding multisig, seems like there was a change in how multisig transactions are being created. Running a node using bitcoind v0.16.3 and another one using latest bitcoind (v.0.19.0.1). The multisig transaction created on the old node can be decoded by https://coinb.in correctly (it's displayed as a
< jaheller> multisig 2:3 transaction). Using the newest bitcoind to create transaction coinb.in isn't able to display it correctly. Someone involved into this?
< bitcoin-git> [bitcoin] fanquake pushed 2 commits to master: https://github.com/bitcoin/bitcoin/compare/646f0ada0205...35b7a8e539fa
< bitcoin-git> bitcoin/master 0e0fa27 Pieter Wuille: Get rid of VARINT default argument
< bitcoin-git> bitcoin/master 35b7a8e fanquake: Merge #18087: Get rid of VARINT default argument
< bitcoin-git> [bitcoin] fanquake merged pull request #18087: Get rid of VARINT default argument (master...202002_novarintvarmacro) https://github.com/bitcoin/bitcoin/pull/18087
< sipa> jaheller: ask on bitcoin.stackexchange.com
< bitcoin-git> [bitcoin] hebasto opened pull request #18116: depends: Use clang for Ubuntu 16.04 (master...20200211-clang-ubuntu) https://github.com/bitcoin/bitcoin/pull/18116
< jaheller> sipa Done, thanks.
< bitcoin-git> [bitcoin] fanquake pushed 3 commits to master: https://github.com/bitcoin/bitcoin/compare/35b7a8e539fa...98264e2ccb17
< bitcoin-git> bitcoin/master fa55a25 MarcoFalke: depends: Remove reference to win32
< bitcoin-git> bitcoin/master fae9084 MarcoFalke: build: Skip i686 build by default in guix and gitian
< bitcoin-git> bitcoin/master 98264e2 fanquake: Merge #18104: build: Skip i686 build by default in guix and gitian
< bitcoin-git> [bitcoin] fanquake merged pull request #18104: build: Skip i686 build by default in guix and gitian (master...2002-i686NoBuildByDefault) https://github.com/bitcoin/bitcoin/pull/18104
< elichai2> j #regex
< elichai2> ops
< bitcoin-git> [bitcoin] hebasto opened pull request #18117: build: Fix Qt link issue for macOS target with DEBUG=1 (master...20200211-macos-debug) https://github.com/bitcoin/bitcoin/pull/18117
< wumpus> newer boost::test output is nicely colorful
< jb55> sipa: do we know how slow passing around vector of vectors is in script eval? its the one thing I noticed when writing my own script interpreter, that can't be good for overall perf? or does it not matter much.
< sipa> jb55: where specifically?
< jb55> sipa: I'm working from memory, sec I'll look
< jb55> sipa: the stack, first argument to EvalScript
< sipa> that one is passed bt reference
< sipa> *by
< jb55> sipa: like, is vector moving contiguous chunks of memory on each operations, or just pointers to chunks?
< jb55> talking about the inner vector in stack
< sipa> that depends on the operation
< sipa> hard to say without a specific case to talk about
< jb55> I think I just misunderstood the memory layout... looks like its all points to dynamically allocated memory, so my concern wasn't really valid, modulo you could probably come up with something more cache friendly.
< sipa> oh, you definitely can
< sipa> but most stack affecting operations only happen once during script execution
< sipa> one crazy one is OP_ROLL that can cause a huge amount of memory operations
< jb55> but alas, consensus code. probably won't be able to make drastic changes to that just for a slight perf increase
< sipa> well, if it doesn't hurt it's also not worth optimizing :)
< jeremyrubin> you could imagine making everything COW non atomic shared pointers
< sipa> that wouldn't help with OP_ROLL actually
< jeremyrubin> hang on
< jeremyrubin> had a phone call come up was going to suggest not doing this
< jeremyrubin> anyways my point was going to be that it would help with things like OP_PICK but not OP_ROLL because any time you're moving something it has to move a lot of pointers on the stack.
< jeremyrubin> doing this plus a double linked list would be the space/time tradeoff you'd want to make
< jeremyrubin> (maybe single works too...)
< jeremyrubin> Then you could detach a node and rotate it without having to do O(n) moves
< jeremyrubin> But, then you would have to walk the entire list for a lot of operations which could be just as bad
< aj> i think a deque instead of a vector would fix ROLL for very large stacks on some implementations
< jeremyrubin> not quite
< jeremyrubin> deque remove is O(N)
< aj> http://www.cplusplus.com/reference/deque/deque/erase/ -- only linear in remaining elements on some implementations
< jeremyrubin> which is the same as std::vector no?
< aj> it's a vector of vectors instead sometimes apparently
< jeremyrubin> OP_DEPTH OP_ROLL gets you the last thing on the stack (or OP_DEPTH OP_1 OP_ROLL...)
< jeremyrubin> aj: I mean that remove from vector is also linear in remaining elements
< jeremyrubin> but you can use some efficient backshifting thing IIRC
< jeremyrubin> The benefit of the deque is just at best a 1/2 improvement
< aj> http://www.cplusplus.com/reference/vector/vector/erase/ -- vector is always linear in subsequent elements, deque can be more efficient
< jeremyrubin> It's at most 1/2
< jeremyrubin> err I guess you're saying if it is a vector<vector<T>> then you can be better...
< jeremyrubin> I see
< jeremyrubin> depending on the bucket size
< aj> bucket size is a constant factor, so you ignore that with O notation
< aj> but probably means it's worse for every real script
< jeremyrubin> well ~kinda. If the bucket size is > max stack...
< jeremyrubin> Then for big O we can ignore, but for our intepreter we're already at the maximally bad thing
< jeremyrubin> Buckets are what? 128?
< jeremyrubin> (in normal implementations)
< jeremyrubin> it seems implementations vary widely
< jeremyrubin> something like 200 stack elements
< jeremyrubin> MAX_STACK_SIZE = 1000
< jeremyrubin> (I think it'd be better to stick with something that has more regular implementations so that there isn't divergent performance characteristics)
< sipa> as long as the max stack size isn't increased there is no reason to worry about any of this
< sdaftuar> jeremyrubin: around?
< jeremyrubin> ya
< jeremyrubin> So I thought of a hybrid approach you might like
< sdaftuar> i thought it might be easier to talk about your very detailed comment here rather than go back and forth on your PR
< sdaftuar> first observation: you make a good point about memory blowup
< jeremyrubin> Keep the cache, but only store it if it is larger than 32 elements or something
< * jeremyrubin> listening mode
< sdaftuar> i have not quite worked through your picture examples involving N/2 parents and N/2 children yet, but i had a question for you about your linear example-
< jeremyrubin> ask away!
< sdaftuar> in the linear case, aren't we doing N^2 lookups into GetMempoolChildren, versus something much smaller if we use a cache?
< sdaftuar> I think N
< jeremyrubin> Good question.
< jeremyrubin> At node M (M < N), we look at the cache below us and do (N-M) lookups
< sdaftuar> it's just one lookup?
< sdaftuar> the cache is built from bottom to top
< jeremyrubin> but what's the length of the cache!
< jeremyrubin> We need to iterate over that cache to "read it"
< jeremyrubin> So maybe I'm confusing you when I say lookup
< sdaftuar> well, it's one map lookup, and then we read N-M elements i guess
< jeremyrubin> Yes
< jeremyrubin> That's the point I'm making
< jeremyrubin> Reading those N - M elements also requires visiting them for de-duplication
< jeremyrubin> Pre epochs, this was... really bad
< jeremyrubin> (well in the linear case not so bad, but in general n log n to merge into our set)
< sdaftuar> oh right
< jeremyrubin> Post epochs, you just iterate and visit
< sdaftuar> yeah so if your point is we're already doing N^2 work in this pathological case, that doing a little more N^2 work isn't asymptotically worse
< sdaftuar> then maybe that's a good point
< jeremyrubin> We could maybe make an optimization where if you only have one child we do something smart
< jeremyrubin> sdaftuar: yeah that's my point
< sdaftuar> i wonder if we should be doing something else altogether in these pathological cases
< jeremyrubin> evict everything and reinsert isn't too bad
< sdaftuar> hmm, ok i will give this more thought, thanks for the clarification
< jeremyrubin> I think the case I'm concerned about most is a -> b -> c....-> {big tree of things}
< jeremyrubin> because if the first part is N/2 and the big tree of things is (N/2)^2, it can get kind of annoying
< jeremyrubin> and then you really do want a cache
< sdaftuar> you're saying there's a sweet spot where the memory tradeoff is worth the cpu benefit?
< jeremyrubin> which might be why only caching when you have more than X things not in the cache could be nice
< jeremyrubin> Yeah. essentially any time you are running and you trigger some traversal limit excluding things in the cache, you make a new cache entry
< jeremyrubin> right now we work this way, but the limit is 1
< jeremyrubin> you could imagine preferring to traverse until our traversal is having a poor hit rate and is big. E.g., we traversed 10000 things, but we only saw 100 things uniquely, so we should create a cache line for this sub-unit
< sdaftuar> if i understand you correctly -- something like this is just a constant factor benefit at best, fundamentally we're pretty screwed by pathological chains in reorged blocks, right?
< jeremyrubin> Yeah
< jeremyrubin> But we can maybe make the constants large enough so that the pathological case is bounded
< sdaftuar> so that is disappointing :) i mean, i feel like we need to do something just much smarter
< jeremyrubin> maybe goes from N^3 to N^2
< sdaftuar> N here is the number of transactions in a block though
< jeremyrubin> But overall my opinion is I expect the map lookups and insertions (using ordered or not) to dominate just not using a map
< sdaftuar> which is way too big for even N^2 i think
< jeremyrubin> Well so that's what I did the math on, or tried to
< jeremyrubin> smallest txn is 61 bytes
< sdaftuar> yeah well the memory blowup there is certainly bad, but the cpu blowup is something that is more fundamentally broken i think?
< sdaftuar> i mean, we can take your advice and get rid of the cache to avoid OOM
< jeremyrubin> also fundamentally a memory blowup implies a CPU blowup
< sdaftuar> but we need to rearchitect what happens on a reorg to get rid of the CPU DoS
< jeremyrubin> because you need to touch the memory anywasy
< sdaftuar> sure
< jeremyrubin> it only matters if the memory blowup is less than the CPU blowup
< jeremyrubin> But I agree
< jeremyrubin> We're talking about something very slow to deal with
< jeremyrubin> how long do we think traversal takes?
< jeremyrubin> Because another option is to parallelize it
< sdaftuar> i think maybe the best option is to evict things that have too many descendants? or come up with some lazily updating data structures
< sdaftuar> not sure exactly how to do either of those ideas
< jeremyrubin> Or maybe we run a thread which times out the reorg
< jeremyrubin> and evicts everything if it took too long to update
< jeremyrubin> E.g., optimistically this should be simple. But if it takes too long, do the dumb thing
< sdaftuar> the fundamental problem is that we need to get the mempool back in a consistent state, ideally with some useful fee-paying transactions, so that we can produce a new block template as quickly as possible
< jeremyrubin> Maybe we should spend the effort to time some "worst case" blocks?
< sdaftuar> so evicting everything is nice for consistency, but not really the right thing in the long run
< sdaftuar> yeah i will try to do that. after giving this more thought and reading your writeup, i'm not optimistic about the results
< jeremyrubin> sdaftuar: hence do the right thing optimistically (e.g., 10 seconds? 100 seconds?) and otherwise evict 50% and repeat?
< sdaftuar> but maybe you're right that we're within a constant factor of workable
< sdaftuar> small constant factor*
< jeremyrubin> (100 seconds/((1e6/61)**2)) == 370 ns
< jeremyrubin> but it's actual n-1*n/2 so
< jeremyrubin> 750 ns
< jeremyrubin> We can parallelize traversals (let's assume this helps us by at most a factor of 2 because of concurrency overhead)
< sdaftuar> if you want to be a bit more depressed, we're not bounded by the size of a block here
< jeremyrubin> Ah is it mempool bounded?
< jeremyrubin> I thought we were block bounded
< jeremyrubin> I guess we're updating descendents in mempool?
< sdaftuar> in a multi-block reorg, i think we will process a big batch of transactions in this way
< sdaftuar> and yes we can have an unbounded number of descendants in mempool
< jeremyrubin> Well in multi-block reorg we should definitely go block by block not all at once
< jeremyrubin> But in multi-block I guess we can then get unbounded?
< jeremyrubin> I thought we have a 25 limit presently?
< sdaftuar> the 25 limit does not apply in the case of a reorg
< jeremyrubin> Ah
< jeremyrubin> So in multi block it might not be that bad to at a certain point evict everything and resubmit to mempool
< sdaftuar> the issue is that our main code path for accepting a new transaction to the mempool implicitly assumes no in-mempool descendants
< sdaftuar> jeremyrubin: i think conceptually that is corret
< jeremyrubin> because then we're at least guaranteed to not have too much N^2 stuff
< jeremyrubin> But it still is sucky
< sdaftuar> architecting it might be a bit messy, but maybe the right intuition is to build the mempool from scratch by reinserting things from disconnected blocks in-order for just as long as you need to produce a reasonable new fee-paying block
< sdaftuar> and then let the mining code run, and go back to inserting more stuff, etc
< sdaftuar> i think we should have some alternative to the whole thing hanging if we get some messy blocks reorged out, anyway
< jeremyrubin> I guess as a sidebar, if we know that reorging is already *yikes* I see a few paths forward for this PR: 1) drop entirely because review not worth if we're going to need to re-do it all 2) Accept, if we can show that it's better/not worse 3) Accept half and keep cache so we at least preserve the same failure modes 4) Accept both, knowing we have new/different failure modes 5) time out after 100 seconds, switch algorithms, try
< jeremyrubin> again for unbounded with caching?
< jeremyrubin> the switching between algorithms with some exponential backing on time isn't actually that dumb.
< sdaftuar> i think either 2/4 or 3 are the most likely outcomes IMO? but i want to do some more benchmarking to get a sense of how bad things acn be
< aj> a 100 second "stall" sounds horrible?
< jeremyrubin> Because you would need to have an attack against both the memory version and the time version which maybe we can prove is impossible
< jeremyrubin> aj: scylla charbidis
< jeremyrubin> This isn't a problem that we're introducing, the existing algorithms can already be wicked slow
< jeremyrubin> the 100 second switch is *in the hopes* that a different algorithm could be faster for this reorg
< jeremyrubin> We could measure historical reorg times (e.g., if it's 10s per block, do 20, and then switch back and forth doubling)
< jeremyrubin> The idea is just to time a give up point and switch to a different algorithm (for the rest of the block that is, we're still making monotonic progress)
< aj> if we've got a different algorithm that goes faster, seems better to do that from the start i guess
< jeremyrubin> aj we don't
< jeremyrubin> you can review the problem here
< jeremyrubin> #18063
< gribble> https://github.com/bitcoin/bitcoin/issues/18063 | Improve UpdateForDescendants by using Epochs and Removing CacheMap by JeremyRubin . Pull Request #18063 . bitcoin/bitcoin . GitHub
< jeremyrubin> The algorithms we have with or without caching have different worst cases
< jeremyrubin> And caching opens up to memory based attackers
< jeremyrubin> so short of no new magic bullet, getting rid of caching reduces attack surface to just time based DoS
< aj> all we really need is to from {reorged blocks} + {old mempool} to 4 to 8 megaweight of highish-value txs in a new mempool as quickly as possible
< sdaftuar> aj: agreed
< jeremyrubin> we could filter the to_reorg set by feerate
< jeremyrubin> with a max 8 megaweight ordered map, dropping anything else
< jeremyrubin> Then, we could process the updates for just that max feerate list
< jeremyrubin> But then the descendants could still be unbounded.
< jeremyrubin> didn't even say bye :'(
< jeremyrubin> ragequit
< aj> ragerejoin
< sdaftuar> maybe i quit in disgust at this problem
< sdaftuar> (or maybe my internet is flaky)
< jeremyrubin> hah
< jeremyrubin> Anyways... maybe I should abandon #18603 for the "fast track" and open up some more obvious mempool wins (like eliminating mapLinks)?
< gribble> https://github.com/bitcoin/bitcoin/issues/18603 | HTTP Error 404: Not Found
< jeremyrubin> #18063
< gribble> https://github.com/bitcoin/bitcoin/issues/18063 | Improve UpdateForDescendants by using Epochs and Removing CacheMap by JeremyRubin . Pull Request #18063 . bitcoin/bitcoin . GitHub
< jeremyrubin> It seems like it's going to take some time to ensure we're doing the right thing.
< aj> yay for easy wins?
< sdaftuar> hmm, i think if we're not getting rid of cachemap, it should be the case that your patch is strictly better. i think i need to do a little more work to determine whether removing the cache is strictly better, probably better overall but maybe not strictly better, or somehow worse
< sdaftuar> but eg the first commit seems like a no-brainer?
< jeremyrubin> ok why don't I open up a new PR with just the no-brainer one?
< jeremyrubin> We can keep the broader discussion going on 18063
< sdaftuar> sure, that's fine with me
< jeremyrubin> And then we can keep progress up on fixing all the other hot mess in the mempool ;)
< aj> sdaftuar: does an O( peers * log(txs_announced) ) lookup to retry NOTFOUNDs
< sdaftuar> oh, i did look at it briefly -- my immediate reaction was a small amount of concern at iterating over all peers whenever there's a NOTFOUND
< sdaftuar> because we coudl get up to 100 at once, i think, in one message -- so that's like 10k map lookups?
< sdaftuar> if we spaced out processing though then i think that would be fine, or maybe it's fast enough as-is and i just need to benchmark it to convince myself
< aj> hmm, 100 is also the max in flight from a peer
< sdaftuar> right, that's why i figure we could get 100 at once
< sdaftuar> (we coudl get more from a malicious peer, but we filter out things that weren't in-flight, so 100 should be the right max)
< aj> txs_announced maxes out at 100k
< sdaftuar> oh, right
< bitcoin-git> [bitcoin] JeremyRubin opened pull request #18120: Change UpdateForDescendants to use Epochs (master...epoch-mempool-clean-split-updatefordescendants-pt1) https://github.com/bitcoin/bitcoin/pull/18120
< jeremyrubin> ok feel free to review just that chunk (same as the commit from the other branch)
< sdaftuar> jeremyrubin: will do
< aj> jeremyrubin: ack
< aj> hmm, maybe that's ambiguous :)
< sdaftuar> ready for merge!
< jeremyrubin> lgtm
< jeremyrubin> sdaftuar: good catch with the grandchild it thing btw earlier
< jeremyrubin> We should probably have a test for that...
< sdaftuar> agreed, that code is definitely undertested
< sdaftuar> oh! i misremembered how chain limits get applied in the case of a reorg
< sdaftuar> so there are an unbounded number of in-mempool descendants, but there is a bound on chain limits with respect to what is in the block(s)
< jeremyrubin> I was just thinking more about that
< sdaftuar> so the worst-case scenarios we were talking about are not quite right
< jeremyrubin> The issue with the unbounded in-mempool descendants is also that the cache entries can get HUGE
< jeremyrubin> e.g., the entire mempool
< sdaftuar> yes
< jeremyrubin> so we should definitely get rid of the cache IMO, even if our runtime goes up
< jeremyrubin> because the entire mempool * N is bad
< jeremyrubin> like maybe even brick the network today with a few blocks bad
< sdaftuar> well, one alternative is to memory cap it instead?
< jeremyrubin> Or LRU cache
< sdaftuar> right
< jeremyrubin> IIRC the philosophy that BCash deals with this is not... awful?
< jeremyrubin> step 1: clone the mempool
< jeremyrubin> step 2: fork the reorging
< sdaftuar> it's starting to make more sense to me though that if what is in the mempool is bounded in some reasonable-ish way, and what gets added back from blocks is bounded in some way, that we could get comfortable with some kind of CPU bound on the combination and be comfortable dropping the cache
< jeremyrubin> step3: continue mining on the non-reorg chain
< jeremyrubin> step4; if reorg finishes, switch
< jeremyrubin> if the reorg never finishes then it was done by a malicious miner, so it's not economically good anyways maybe
< sdaftuar> not sure that really makes sense, as until we return a new block template, mining on the old chain is almost certainly what the default behavior of miners would be regardless
< sdaftuar> so i think our job is to make it so that we can produce a valid new block on the new chain as fast as possible
< jeremyrubin> Fair point
< jeremyrubin> I wonder if a randomized algorithm can also help here
< jeremyrubin> The attacker is exploiting some specific structures
< jeremyrubin> I wonder if there's an algorithm with a random order that can prevent them from exposing some of the worst case stuff
< jeremyrubin> but being on average much slower maybe
< bitcoin-git> [bitcoin] hebasto opened pull request #18121: gui: Throttle GUI update pace when -reindex (master...20200211-reindex-gui) https://github.com/bitcoin/bitcoin/pull/18121
< bitcoin-git> [bitcoin] theStack opened pull request #18122: rpc: update validateaddress RPCExamples to bech32 (master...20200211-rpc-update-validateaddress-rpcexamples-to-bech32) https://github.com/bitcoin/bitcoin/pull/18122
< bitcoin-git> [bitcoin] ryanofsky opened pull request #18123: gui: Fix race in WalletModel::pollBalanceChanged (master...pr/pollbug) https://github.com/bitcoin/bitcoin/pull/18123
< bitcoin-git> [bitcoin] practicalswift opened pull request #18124: init: Clarify C and C++ locale assumptions. Add locale sanity check to verify assumptions at run-time. (master...LocaleSanityCheck) https://github.com/bitcoin/bitcoin/pull/18124