< luke-jr>
hebasto: but it's certain to have the same issues without this code
< bitcoin-git>
[bitcoin] amitiuttarwar opened pull request #22306: [test] Improvements to p2p_addr_relay.py (master...2021-06-addr-tests) https://github.com/bitcoin/bitcoin/pull/22306
< hebasto>
luke-jr: ofc, because it was not its goal to change behavior on Linux or Windows
< kanzure>
win 5
< kanzure>
whoops
< robertspigler>
re: travel restrictions for developer meetup, the Kayak travel website has a world map where you can select any origin country, and then see all the restrictions
< Bullit>
Yeah if you didnt know google has 1379 members and the stockbroker application Alphabet A and Alphabet C are runaway CEO´s saddly what is suggested about the kayak is googles suggestion in from europe to america by kayak
< bitcoin-git>
[bitcoin] ariard opened pull request #22310: test: Add functional test for replacement penalty check (master...2021-06-add-rbf5-test) https://github.com/bitcoin/bitcoin/pull/22310
< bitcoin-git>
[bitcoin] MarcoFalke opened pull request #22311: test: Add missing syncwithvalidationinterfacequeue in p2p_blockfilters (master...2106-testsyncwithvalidationinterfacequeue) https://github.com/bitcoin/bitcoin/pull/22311
< bitcoin-git>
[bitcoin] siv2r opened pull request #22312: changes for wait_for_getheaders to include hash_list (master...modify-wait-getheaders) https://github.com/bitcoin/bitcoin/pull/22312
< bitcoin-git>
[bitcoin] MarcoFalke opened pull request #22313: test: Add missing sync_all to feature_coinstatsindex (master...2106-testSync) https://github.com/bitcoin/bitcoin/pull/22313
< jamesob>
I know this is an age old and time honored tradition, but just for fun: when's the last time someone looked seriously at replacing leveldb with something bespoke for our access patterns?
< jamesob>
not saying it's something I want to do but may be a fun project for a new contributor/summer intern or something who's so inclined
< jamesob>
even having a relatively thorough enumeration of those access patterns and what exactly we need from such a library would be interesting
< laanwj>
having the entire utxo set in memory still works best :-)
< laanwj>
could just write it out as a linear file on shutdown (and read it in on startup); the drawback of not using a database is, besides the memory use, that unexpected crashes of the daemon lose the entire state, as it cannot be written incrementally
< jamesob>
laanwj: absolutely :)
< jamesob>
but ofc we don't want to make running a full node require 10gb of memory or whatever it is these days
< laanwj>
probably not, and after the initial sync the performance tradeoff becomes different anyway
< laanwj>
after that there's pretty much two use cases, either you want verification to go as quickly as possible (e.g. miners), which warrants keeping everything in memory, or the speed is pretty much irrelevalt (personal nodes)
< laanwj>
in which case leveldb is good enough?
< jeremyru_>
jamesob: i think it'd be fun to have some decidedly *worse* databases too -- e.g. a Filesystem tree
< jamesob>
jeremyru_: totally; would be fun to see to what extent that degrades performance
< jamesob>
laanwj: right absolutely. I guess I'm just compelled to think about it because I wonder if we couldn't come up with something equally robust/performant but simpler. After dealing with the issue underlying #22263 I was reminded that leveldb definitely isn't perfect and drags in some stuff we may not need (e.g. snaphots)
< jeremyru_>
it'd also be interesting to put in sqlite because then you can build out more indexing stuff, and sqlite is already a dependency for wallet
< laanwj>
back in the day we tried some experiments with LMDB but while read performance was somewhat better, write performance was worse
< jamesob>
jeremyru_: problem with that is that would drag sqlite into consensus which we definitely don't want
< laanwj>
snapshots are useful for being able to run utxo statistics or backup in the background
< jamesob>
yeah that's a good point, and probably not something we'd want to implement ourselves
< jeremyru_>
jamesob: true, just thinking more generally about things a node operator might want to have, as experimental/run as an internal API node stuff
< laanwj>
i doubt we need more indexing stuff, a key/value database is fine for utxos
< jamesob>
right
< sipa>
we tried sqlite
< sipa>
at the time it had terrible performance for this kind of load
< laanwj>
yes
< sipa>
i don't think that has changed; it's just not designed for this
< jeremyru_>
having everything in memory performance wise should be doable mostly as command line options right on cache sizes? I guess the flushes need to be sync on some responses.
< sipa>
if you run with -dbcache=infinity and reindex you'll effectively do the entire sync without a single db write
< jamesob>
jeremyru_: mem only a nonstarter for reasons laanwj mentioned; need to have durability for crashes
< sipa>
(the blocks will be written to disk, but no flushes would occur)
< jeremyru_>
sipa: what happens on crash?
< laanwj>
jamesob: I mean it's fine for specific scenarios where you have UPS backup, or fallback nodes
< jamesob>
right but we can't assume that as a generality ofc
< laanwj>
sure, but if you want to specialize for use cases
< sipa>
jeremyru_: you'll start over from scratch
< jamesob>
nor can we assume a lot of RAM... but I do like this idea of an optional "sync as fast as you can" mode that ensures sufficient memory and just blazes through an ibd
< sipa>
jamesob: that exists, just set dbcache to all your memory
< jamesob>
(not to mention makes tip maintenance as fast as possible for miners)
< jamesob>
sipa: right
< jamesob>
sipa: although I guess such a mode would preclude periodic flushing?
< jeremyru_>
jamesob that should give you a good estimate on how much gain to be had
< laanwj>
sipa: yes, it just lacks a way to read in the entire database at node restart at the moment
< sipa>
laanwj: right, that would be easy to add
< laanwj>
sure
< jeremyru_>
jamesob: you could make a double buffered thing if memory really = infinity
< jamesob>
jeremyru_: right
< jeremyru_>
so that you just copy the entire utxo cache, and then write the snapshot periodically
< bitcoin-git>
[bitcoin] nourou4them opened pull request #22314: doc: Install WSL on non-system drive and compile Bitcoin (master...patch-1) https://github.com/bitcoin/bitcoin/pull/22314
< jeremyru_>
avoid the restart from scratch issue
< jeremyru_>
or you could add another cache layer temoporarily
< jeremyru_>
so that all the reads/writes while flushing are temporarily against another layer while you flush
< sipa>
that's surprisingly hard to do right
< sipa>
in combination with the "delete entries that are spent if they have been created before the last flush"
< sipa>
optimization (which we benefit massively from, as all utxos are written once, read once, deleted once)
< jamesob>
haha you know I'm now remembering why I arrived at "screw all this ~10% optimization stuff, let's just work on assumeutxo"
< jeremyru_>
jamesob: hmm
< jeremyru_>
one idea:
< jeremyru_>
just write your assumeutxo rolling hashes to disk as you go
< jeremyru_>
fully verified, but you could re download if you corrupt?
< jamesob>
that's kind of interesting, but you've still gotta keep the block data (and index) somewhere
< jamesob>
anyway sounds precarious/complex relative to the benefits
< jamesob>
I think it's one thing if there's a sizable optimization we can make for tip maintenance (like avoiding synchronous flushes), but otherwise maybe not worth it to get too fancy
< sipa>
i had a design a few years ago that would permit concurrent flushing with cache updating (so it doesn't need stop-the-clock and wipe-all-memory on every flush), but it was pretty nontrivial
< sipa>
while still maintaining the soon-spend-never-hits-disk optimization within some window of blocks
< jamesob>
also the long-neglected #17487 seems pretty relevant here
< jamesob>
> the soon-spend-never-hits-disk optimization
< jamesob>
yeah this one is pretty interesting, feel like there's some potential there
< sipa>
well, we already use it
< sipa>
it's the reason why -dbcache=huge is so much faster than smaller caches
< sipa>
it's not just avoiding reads from disk - it's mostly preventing things from being written in the first place
< sipa>
but combining it with partial flushing is hard, because it leaves you in an inconsistent state
< sipa>
every utxo individually on disk will be consistent with the state it had at some point between the last fully flushed block, and the last processed block
< sipa>
but you can't guarantee they're all consistent with each other
< jamesob>
so basically you'd have to use an ordered map for cacheCoins to do any better than we're doing right now I think, right? or have some index for insertion order
< sipa>
insertion order does not help
< sipa>
or at least, not on its own
< jamesob>
will younger coins are more likely to be spent, right?
< jamesob>
*well
< jeremyru_>
if you were to, say, download N blocks at a time, you could make a lookahead cache that tells you what you should flush to disk and what will be deleted before you actually process the blocks.
< sipa>
the problem is that the order in which you delete never-written-to-disk entries from your cache isn't the same as the order the utxos are created it
< jamesob>
jeremyru_: there's some kind of optimization like that the utreexo guys are doing
< sipa>
jamesob: oh you just mean as an access optimization? i don't think that's the bottleneck
< jamesob>
sipa: no, I mean when partial flushing, avoid flushing coins that are likely soon to be spent
< sipa>
jamesob: there is no solution for that problem
< sipa>
you just need to keep track of the range of blocks that your state on disk corresponds to
< sipa>
and on restart, reprocess those blocks to fix the db
< sipa>
we do that already btw, to a limited extent, if you crash in the middle of a disk flush
< jamesob>
right
< sipa>
but doing it asynchronously makes tracking a lot harder, because you can have reorgs mixing different histories that all get written to disk at once
< sipa>
(and triggering a full flush on reorg would be disasterous for orphan rates)
< sipa>
right now we only have an inconsistent state on disk from the moment a flush begins until the point where it finishes
< sipa>
continuously partial flushing (which would be far better for performance) would mean you're effectively *always* inconsistent on disk, but need to retain the ability to recover from it
< jeremyru_>
sipa: if crash during that do we corrupt recoverably? detectably? or just requires reindex?
< sipa>
jeremyru_: it gets detected at startup, and reliably recovered without full reindex
< sipa>
(unless you have disk errors of course)
< sipa>
at the start of a flush, a record is written of the form "sync started, block range A...B" where B is the current tip, and A is the old tip that was flushed to
< sipa>
at the end, that record is removed and replaced with "synced to block B"
< sipa>
at startup, if a range block is present, those blocks are processed again, and their UTXO updated are applied to disk, without any other validation
< sipa>
one problem with this is that due to fewer consistency guarantees at that point, certain optimizations can't be used, and this reprocessing is surprisingly slower than actual validation
< sipa>
so if the range is too big, it's actually slower than just a reindex
< darosior>
morcos: why does the fee estimator disregard CPFP? You mentioned in https://www.mail-archive.com/bitcoin-development@lists.sourceforge.net/msg06405.html that it would skew estimates, but how so? I don't have statistics but i think CPFP is widely used on the network and might be even more in the future with package relay and it becoming a first
< darosior>
class citizen in L2 protocols. It seems to me it could bias estimates downward as you would see a low-fee transaction (or actually lot of them) being confirmed quickly whereas you should not rely on their feerate as a decent estimate for being confirmed quickly. Accounting for their parent might be conservative, and accounting for it as a package