< bitcoin-git>
bitcoin/master 6965456 Andrew Chow: Introduce DeferringSignatureChecker and inherit with SignatureExtractor
< bitcoin-git>
bitcoin/master a97a929 Andrew Chow: Test that signrawtx works when a signed CSV and CLTV inputs are present
< bitcoin-git>
bitcoin/master 245a5cd fanquake: Merge #21166: Introduce DeferredSignatureChecker and have SignatureExtract...
< bitcoin-git>
[bitcoin] fanquake merged pull request #21166: Introduce DeferredSignatureChecker and have SignatureExtractorClass subclass it (master...fix-sig-extractor-checker) https://github.com/bitcoin/bitcoin/pull/21166
< fanquake>
Am I misremembering or did someone recently report seeing terrible performance with sqlite based wallets?
< sipa>
i think phantomcircuit was complaining about that a while ago
< fanquake>
I'm currently seeing a 100x performance difference in the wallet tests
< fanquake>
202536866us to run CreateWallet with sqlite vs 2037701us with bdb.
< sipa>
that's no good
< bitcoin-git>
[bitcoin] MarcoFalke merged pull request #21616: [0.21] build: link against -lsocket if required for *ifaddrs (0.21...backport_21486) https://github.com/bitcoin/bitcoin/pull/21616
< bitcoin-git>
[gui] hebasto opened pull request #274: [PoC] [do not merge]: Support runtime appearance adjustment on macOS (master...210407-dark-poc) https://github.com/bitcoin-core/gui/pull/274
< bitcoin-git>
[bitcoin] fanquake opened pull request #21629: build: fix configuring when building depends with NO_BDB=1 (master...fixup_depends_sqlite_only_build) https://github.com/bitcoin/bitcoin/pull/21629
< bitcoin-git>
[bitcoin] vasild opened pull request #21630: fuzz: split FuzzedSock interface and implementation (master...FuzzedSock_move) https://github.com/bitcoin/bitcoin/pull/21630
< bitcoin-git>
[bitcoin] vasild opened pull request #21631: i2p: always check the return value of Sock::Wait() (master...SockWait_usage_fix) https://github.com/bitcoin/bitcoin/pull/21631
< bitcoin-git>
bitcoin/master 3bb6e7b Jon Atack: rpc: add network field to rpc getnodeaddresses
< bitcoin-git>
bitcoin/master 1b91898 Jon Atack: rpc: simplify/constify getnodeaddresses code
< bitcoin-git>
bitcoin/master 5c44678 Jon Atack: rpc: improve getnodeaddresses help
< bitcoin-git>
[bitcoin] laanwj merged pull request #21594: rpc: add network field to getnodeaddresses (master...getnodeaddresses-network) https://github.com/bitcoin/bitcoin/pull/21594
< wumpus>
jeremyrubin: from what i understand, it is about simply running the wallet tests with sqlite, don't think fuzzing is involved here
< achow101>
maybe it's a performance regression in some version? I don't see this issue on my syste with sqlite 3.35.4
< sipa>
achow101: perhaps it is related to disk i/o speed?
< achow101>
I'm thinking it is related to that
< achow101>
I just tried 3.34.1 and did not see the same performance issue
< sipa>
if it"s doing lots of fsyncs, it may be super slow on network-connected spinning disk storage, but barely noticable on fast nvme ssd
< wumpus>
i remember some issues with fsyncing being really slow when used inside VMs
< wumpus>
yes that
< wumpus>
maybe it's doing a lot of write transactions separately
< achow101>
well the test datadir is /tmp, which IIRC is a ramdisk
< achow101>
*is in /tmp
< sipa>
not on my system
< achow101>
so disk shouldn't matter there?
< achow101>
hmm
< sipa>
hmm
< wumpus>
/tmp is definitely not a ramdisk on all linux distros
< sipa>
as soon as descriotor wallets were merged, the parallellism i could run the tests with dramatically reduced
< sipa>
went from 60 without problems to 10 or so
< sipa>
and i often still see a few time out that i need to rerun manually
< achow101>
I will try this on a slower drive
< sipa>
it's always descriptor wallet tests that time out for me
< achow101>
being disk i/o bound does make sense to me
< sipa>
(not all of them, and not deterministically, so i nevrr thought too much about it)
< phantomcircuit>
sipa, the performance issue i saw wasn't in a vm btw
< phantomcircuit>
wait actually it was im dumb
< sipa>
so if this is related, i don't think it's a regression
< achow101>
how do I make test_bitcoin use a different datadir?
< sipa>
chroot? :p
< sipa>
or just moujt another fs over /tmp
< sipa>
mkdir /non-tmpfs-tmp && sudo mount --bind /non-tmpfs-tmp /tmp
< sipa>
achow101: oh it's the functional tests that time out for m; unit tests are never a problrm
< achow101>
I have other things appear to be using tmp, so I'd rather not mount over it. I think I'll just modify the code
< wumpus>
or pass a different TMPDIR
< achow101>
it seems like the the unit tests can reliably show the performance issue, so I'll go with that
< wumpus>
(assuming nothing is hardcoding /tmp which would be bad)
< wumpus>
alternatively a chroot with everything bind-mounted execept tmp, but that seems overkill in this case
< achow101>
hard coding the path was easier
< wumpus>
i think setting TMPDIR should work i had to do it once for a system with only a small /tmp partition
< achow101>
definitely appears to be disk io
< achow101>
100% disk utilization and the CreateWallet test is now taking a very long time
< achow101>
we may be using sqlite subotimally
< jonatack>
"it's always descriptor wallet tests that time out for me" --> same (apart from other timeouts/races)
< sipa>
achow101: i can give you ssh access on my system if that helps, but it sounds like you're already able to reproduce?
< achow101>
yes, I believe I have reproduced it
< jeremyrubin>
can you just pass a ramdisk in for where to create the DB for testing lol
< jeremyrubin>
ah i guess this is already discussed above my b
< achow101>
the functional tests let you specify the tempdir the datadirs are made in
< wumpus>
that does require a lot of memory though, the bitcoin functional tests (especially with high parallelism) tend to create many files
< sipa>
with tmpfs /tmp, functional tests at -j60: 126 s runtime, all tests pass
< phantomcircuit>
sipa, through the magic of fsync() {}
< sipa>
with normal /tmp, the same takes 488 s, and 3 tests fail
< sipa>
(this is a 16-core/32-thread system with 32G RAM)
< phantomcircuit>
there's a bunch of functional tests that randomly timeout iirc
< achow101>
We are currently enabling fullfsync (someone suggested we do this to ensure that everything is truly flushed), but the sqlite docs says this "But the implementation of fullfsync involves resetting the disk controller. And so not only is it profoundly slow, it also slows down other unrelated disk I/O. So its use is not recommended."
< achow101>
but that would only affect macs
< achow101>
I would guess that the slow down is because sqlite does an (or two) fsync per write
< luke-jr>
doesn't fsync on Linux also wait for everything else writing to get to disk too?
< luke-jr>
or at least things queued before you
< phantomcircuit>
achow101, performance sensitive sqlite requires using the write ahead log
< achow101>
and a write ahead log is why we don't want bdb
< phantomcircuit>
luke-jr, yes, fsync and fdatasync are for the entire filesystem despite being called on a file or directory
< phantomcircuit>
achow101, which journal mode are we using?
< achow101>
rollback
< phantomcircuit>
achow101, you mean DELETE ?
< achow101>
we use the rollback journal in PERSIST mode
< achow101>
PERSIST is because that's what it does when locking_mode=EXCLUSIVE
< phantomcircuit>
achow101, i guess the real issue is that there basically two categories of 'written' we care about, key material and not-key material
< phantomcircuit>
i don't think with PERSIST requires using synchronous=FULL, but can be NORMAL instead
< sipa>
if it's just creation that's painful, these paranoid modes could also just be enabled after creation
< phantomcircuit>
which are separate from the journal_mode and the synchronous pragmas
< sipa>
but i suspect it's a more general sign of a i/o boundness issue
< luke-jr>
achow101: well, corruption is the reason we don't want bdb… is sqlite's WAL just as prone to loss?
< phantomcircuit>
sipa, everywhere that key material is written could be sandwiched in a bunch of flushing logic, but also that's kinda of asking for missing that somewhere
< phantomcircuit>
luke-jr, if you lose the WAL you lose what was written, but don't corrupt the database
< achow101>
luke-jr: WAL (in general) means we have to do the thing where we constantly flush the log back to the db file. this adds additional overhead, another thread, and is just kind of a pain. It's one of the things bdb does that results in a lot of complexity in that code
< achow101>
sipa: I think if we did all of setup as a single transaction a large part of this would go away.
< achow101>
and doing setup as a single transaction is probably safe
< luke-jr>
achow101: but so long as it doesn't corrupt, it's tolerable IMO
< phantomcircuit>
yes sqlite performance is almost entirely based on commits
< phantomcircuit>
a thousand writes with one commit is going to be faster than hundreds of writes with 10 commits
< achow101>
luke-jr: WAL has the risk of losing data if that log is lost, and I think that isn't tolerable for a completely new db system
< phantomcircuit>
achow101, wrapping the entire setup in a transaction has the issue of making the logic tied more closely to the data layer so
< phantomcircuit>
iono options etc
< sipa>
aright; is that hard to do?
< achow101>
shouldn't be hard to transaction-ize large portions of setup
< achow101>
especially the parts that write a few thousand records
< luke-jr>
won't break abstractions?
< achow101>
no, WalletBatch already has transaction create and commit
< achow101>
although now that I think about it, descriptor wallets don't have a whole lot of records to write
< achow101>
unless the test is making a legacy wallet with sqlite
< phantomcircuit>
do they not pregenerate a bunch of keys?
< achow101>
they're generated on the fly and kept in memory only
< sipa>
what?
< sipa>
the key cache isn't written?
< achow101>
Only the parent xpub is written
< sipa>
oh, right
< sipa>
that's nice
< sipa>
is that recent?
< achow101>
that was in the original
< sipa>
interesting
< sipa>
what if you have a blabla/*h descriptor?
< achow101>
then it stores all of the derived keys
< sipa>
i see, ok
< achow101>
but you have to import such a descriptor explicitly, so the CreateWallet test shouldn't be doing that
< sipa>
right
< sipa>
still, a 1000x slowdown... kind of suggests it's more than a few records?
< achow101>
... it's making a legacy wallet with sqlite
< achow101>
sipa: which descriptor wallet functional tests timeout?
< achow101>
the functional tests shouldn't be making legacy wallets with sqlite
< sipa>
ha.
< sipa>
rpc_fundrawtransaction.py --descriptors | ✖ Failed | 488 s
< sipa>
wallet_create_tx.py --descriptors | ✖ Failed | 94 s
< sipa>
wallet_keypool.py --descriptors | ✖ Failed | 116 s
< sipa>
what about phantomcircuit's slow test?
< phantomcircuit>
i was definitely using a descriptor wallet
< phantomcircuit>
yeah format sqlite descriptors true
< achow101>
phantomcircuit: what were you testing specifcally that was slow?
< phantomcircuit>
achow101, just creating the wallet took a very long time
< achow101>
ok..
< achow101>
sipa: do those timeout on a createwallet call or something else?
< achow101>
phantomcircuit: were you using an older version of the descriptor wallet pr? there was an iteration that wrote a few thousand records.
< achow101>
it seems like functional test timeout might be from writing all of the transactions that generatetoaddress produces
< wumpus>
if it's only a performance problem in the tests, what about disabling the fsyncing in the tests?
< wumpus>
i'm not sure there is any real life scenario where the wallet would generate so many writes