< bitcoin-git>
[bitcoin] fanquake opened pull request #22621: make ParseOutputType return a std::optional<OutputType> (master...parse_output_type_optional) https://github.com/bitcoin/bitcoin/pull/22621
< maaku>
I have a couple of nodes that are stuck on block #692260. Calling reconsider block on #692261 (0000000000000000000f14c35b2d841e986ab5441de8c585d5ffe55ea1e395ad) generates the following error: "ERROR: ConnectBlock: CheckQueue failed"
< maaku>
Is there a way to get better debugging info about which script checks are failing, and why?
< sipa>
maaku: -par=1 may help (that avoids outsourcing the validation to other threads)
< maaku>
sipa: thank you. I can at least get an error message now
< maaku>
"ERROR: ConnectBlock(): CheckInputScripts on b10c007c60e14f9d087e0291d4d0c7869697c6681d979c6639dbd960792b4d41 failed with non-mandatory-script-verify-flag (Witness program was passed an empty witness)"
< maaku>
well I've fixed the problem by regressing to 0.21.0, which seems to let that block through fine
< maaku>
i'm running with my own patches though, which is probably the source of the issue
< maaku>
This transaction includes a spend from the taproot address "51200101010101010101010101010101010101010101010101010101010101010101" -- someone is having fun :\
< maaku>
But anyway, it looks like my client started enforcing taproot early. I'm still looking into why but obviously its something specific to my patchset
< maaku>
Thanks for the help sipa
< b10c>
maaku: yes, if your clients are enforcing taproot already, then it failed to process the transaction b10c007c60e14f9d087e0291d4d0c7869697c6681d979c6639dbd960792b4d41. I'm to blame for this tx. I'll dm you
< laanwj>
mounting a tmpfs on /tmp for running the bitcoin functional tests seems to do wonders for stability on systems with slow i/o (doesn't even cost that much memory)
< laanwj>
i've had no reports of nodes getting stuck
< muhblockchain>
is it a bug that some rpc commands like "reconsiderblock" are not listed in RPC call "help"? they also are not found on https://bitcoincore.org/en/doc/0.21.0/ ? or is user supposed to get list of all rpc commands elsewhere?
< jonatack>
muhblockchain: these are hidden RPCs, see for instance the bottom of the src/rpc/blockchain.cpp file
< muhblockchain>
uhh. can we give users a way to view them? it's quite confusing
< jonatack>
./src/bitcoin-cli help reconsiderblock
< jonatack>
^ can show the help doc, but AFAIK the hidden RPCs are for development and not intended for general use
< muhblockchain>
right, but it confuses user when he tries to invalidate/reconsider a block because he almost remembers the name, but -cli help | grep invalid comes out empty. one might display that category instead with a comment "(expert option, do not use normally)". this concept makes sense?
< jonatack>
that discussion predates me but i surmise it may have been along the lines of "if they are ok with grepping the codebase, they are expert enough to use them"
< jonatack>
e.g. git grep reconsiderblock
< muhblockchain>
as user I got a node apparently to damage own UTXO store, I can repair it by redownloading from zero. But now I wonder on development side - are state files not checksummed to avoid this, perhaps there is a bug, or perhaps it would make sense to protect utxo files against hw failures even a bit.
< laanwj>
if your UTXO store is damaged you can use -reindex-chainstate, this is faster than a full redownload
< laanwj>
don't try to fix it with reconsiderblock
< laanwj>
(it won't work, all the commands assume the current state is correct)
< muhblockchain>
I will try. The strangest thing happened, one PC runs 2 nodes, at same time: node1's blk*.dat file changed (bit rot?), and other process node2 apparently damaged own UTXO and noticed it around 20 blocks later. maybe strange hw failure
< laanwj>
"protecting against hw failures" is impossible in user software, you don't have the kind of introspection into hardware failures needed to detect and correct them, e.g. the most common problem resulting in corruption is CPU overheating, which will simply cause computations to returns invalid resutls; besides doing every operations two or three times NASA-style, there's nothing to be done
< muhblockchain>
laanwj: of course "to some degree", eg from silent bitrot of files
< laanwj>
that would be incredibly slow and without detection of invalid componentrs, also pointless
< laanwj>
e.g. if a CPU is broken in a certain way that may be detereminstic
< laanwj>
silent bitrot of files is rare, but is detected in the leveldb databases (through a CRC check), blocks self-checksum
< laanwj>
in any case if your hardware is even the least bit broken: stop running bitcoin on it immediately, you're risking your funds
< muhblockchain>
it's a node that has blocks/ copied over, and is offline. It's blk002009 (AFAIK years old) was modified on disk (without updating it's mtime); but what also puzzles me is why around 20 other files in blocks/index/ were being changed. This node started serving invalid version of block 293215 (the other node noticed it) which I presume was in blk02009 because after replacing it from pendrive it started serving the right version of this
< muhblockchain>
block
< muhblockchain>
so I wonder if this kinds of damage should be detected by leveldb or something and should I investingate more; and I wonder why index/ files are being updated by a node that is offline and is not getting any new blocks nor changes to utxo
< laanwj>
damage is detected by leveldb but only when the record is being accessed, it doesn't perform a full check of the database at start (that would take a long time)
< muhblockchain>
laanwj: so it is not checked for purposes of sending blocks to peers?
< laanwj>
block storage is completely separate from the utxo set, but yes, a block is not checked before sending it to peers (for performance reasons)
< muhblockchain>
this PC I considered stable, and memcheck was running for days with no issue. is it worthwhile to somehow look into how the node2 there got damaged UTXO? it is strange setup, that is node2 has connect= to node1, and node1 is 99% downloaded and has whitebind=download@localhost
< laanwj>
a node this is offline (as in: disconncected from the network) can still write to index files, for example when flushing the cache, or leveldb internal administration, always shut it down before manually manipulating files
< laanwj>
bitcoind (and its tests) seem to be a much better burn-in check for hardware than many programs designed for the purpose :)
< muhblockchain>
oooor rare cornercase where receiving invalid block from your only peer somehow damges utxo \o/ *rubshands*
< laanwj>
most programs only stress-test one component at the time, not CPU disk memory and network
< jonatack>
running tests and builds, while bitcoind nodes with debug_addrman on are running, has definitely had my 2 physical cores of cpu complaining and fans whirring, maybe i should ease up :))
< laanwj>
hehe, wouldn't be the first time running test_runner.py with full paralellism in a loop uncovers a hardware or kernel issue
< muhblockchain>
I hope when open hw cpus are popular, our reasonable blocksize will make it feasible to run core on such boxes
< jnewbery>
jonatack: DEBUG_ADDRMAN is very expensive. It's iterating over every entry in addrman, constructing sets, looking up in those sets, etc every time a call is made into addrman.
< laanwj>
muhblockchain: that's already feasible, i'm running a node on RISC-V (sifive unmatched), and plan to do guix builds on that, some ppl are running a node on POWER (Raptor Talos II)
< muhblockchain>
very cool. and cheers to some people -jr
< jonatack>
jnewbery: yes indeed, i see and hear it. i've been running it for a couple of months but turning it off now, if/when your PR is in, using it will be easier because re-building with/without it defined takes a pretty long time for me
< jonatack>
good at least that grep "ADDRMAN CONSISTENCY CHECK FAILED" ~/.bitcoin/debug.log still turns up no result
< jnewbery>
yes, and #20233 would allow you to run consistency checks only every n addrman operations, so you could set it to something like 1000, still have consistency checks running, and not have your cpus burn out.
< bitcoin-git>
[bitcoin] MarcoFalke merged pull request #22577: Close minor startup race between main and scheduler threads (master...2021-07-28-startup-race) https://github.com/bitcoin/bitcoin/pull/22577
< sipa>
muhblockchain: fwiw, the utxo database and all blockchain files do have checksums; if they were invalid, you'd get a read error rather than a failed block validation; in your case it appears that incorrect data was written, with its correspondingly correct checksum