< achow101>
cfields: I saw that the detached sigs were pushed, but they don't seem to be working rn
< cfields>
achow101: not quite yet, still working on it
< achow101>
oh, ok
< cfields>
this is the first release with the new key, and the first that's not just me signing. So it's a bit kludgy
< achow101>
we haven't done the mpc rsa thing yet, have we?
< cfields>
no, didn't make it in time
< cfields>
ok, pushed the tag. should work now
< cfields>
off to find food before everything closes, bbl
< cfields>
gitian builders: v0.16.0rc1 detached sigs are pushed. Please ping me if there are any issues
< dx25>
with 0.16.0rc1 on exit i'm seeing "IO Error: ...chainstate/244997.ldb: Bad file descriptor", system error while flushing: Database I/O error.
< dx25>
Does not seem to be happening with v0.15.2 or older version
< gmaxwell>
dx25: can you tell us some about your system? What OS, etc.
< gmaxwell>
dx25: we believe thats a new issue, which will likely block the release until we fix it.
< gmaxwell>
dx25: we've had one developer hit it but don't have a reliable reproduction yet, and maybe there is something in common between systems that have hit it which would help track it down
< gmaxwell>
e.g. what OS, kernel version, etc.
< bitcoin-git>
[bitcoin] murrayn opened pull request #12322: Docs: Remove step making cloned repository world-writable for Windows build. (master...doc_change) https://github.com/bitcoin/bitcoin/pull/12322
< dafuq>
a PR change to the --help CLI options would fall under what category?
< wumpus>
docs
< dafuq>
ok, seems most suitable but does involve code changes
< wumpus>
yes, that doesn't matter, as long as it is message changes
< provoostenator>
I'm getting "Fatal Internal Error" during IBD on testnet3 with 0.16.0rc1, twice. Will investigate.
< bitcoin-git>
[bitcoin] kekimusmaximus opened pull request #12325: Use dynamic_cast for downcasting instead of static_cast. (master...use_dynamic_cast_to_downcast) https://github.com/bitcoin/bitcoin/pull/12325
< provoostenator>
Might be a disk permission thing. "Pre-allocating up to position 0x3000000 in blk00002.dat" "ERROR: WriteBlockToDisk: ftell failed" "*** Failed to write block"
< wumpus>
disk full?
< provoostenator>
Also seeing a bunch of "socket recv error Bad file descriptor (9)" messages, not sure what that's about.
< wumpus>
oof bad file descriptor
< provoostenator>
No, plenty of space. It's an external SSD though, so can't rule out a hardware problem.
< dx25>
mine's also an external drive, not ssd tho
< wumpus>
no, it's possibly a regression in 0.16
< provoostenator>
Getting this crash every other minute now on testnet. Will try local drive just to rule out hardware issue.
< dx25>
i'm on qubes, fedora25 vm, 4.9.56 kernel
< wumpus>
I really wonder why the bad file descriptor thing happens, normally that happens if either a fd is used that was not returned by open, or a file descriptor that was already closed
< provoostenator>
Same "bad file descriptor" on a fresh testnet3 IBD (MacOS) on the built in harddisk which has plenty of space. Waiting for it to crash.
< wumpus>
I seriously doubt this is a hardware issue
< wumpus>
are you doing anything special with the node? regular RPC requests, for example?
< wumpus>
or any fd related settings in bitcoin.conf?
< provoostenator>
I was running QT production and testnet at the same time for different users on my system, bother in seperate directories. I also suspended the computer (though it should keep syncing in that case). So trying to reproduce with fewer variables now.
< provoostenator>
I used a symlink to point to the SSD drive.
< dx25>
i have some weird walletnotify and alertnotify thing calling curl for some reason i can't remember. haven't tried turning that off yet.
< provoostenator>
Ah there we go again: crash.
< provoostenator>
(no symlink involved this time, nor an external drive). Crashed happened at or after block 64709 (testnet)
< wumpus>
provoostenator: so it happens even without listening
< dx25>
i was doing no rpc stuff
< wumpus>
provoostenator: that rules out some p2p races I guess, but what can it be then...
< wumpus>
provoostenator: you're able to reliably reproduce this? could try git bisecting, if it was ok with 0.15.1
< provoostenator>
Meanwhile I had production QT running all night without a problem, so I suspect it's sync related (my testnet node was doing an IBD the first time it crashed)
< wumpus>
(or find some later commit where the issue doesn't exist)
< provoostenator>
I just down the other mainnet QT instance, will try once more. Then I'll compare versions after that.
< dx25>
also sync related here, but on mainnet
< provoostenator>
Pretty reliable so far
< provoostenator>
It was built with: ./configure --disable-tests --disable-bench --with-miniupnpc=no
< provoostenator>
Hooray, another crash. Ok, should be easy to bisect.
< provoostenator>
Height 8089 (testnet)
< wumpus>
yes, would make a backup of the data directory to make sure you start with the same state every time
< wumpus>
at hight 8089 at least that isn't too much data
< provoostenator>
I get this crash with a fresh testnet3 directory.
< provoostenator>
I'll keep a copy for forensics
< wumpus>
I'll hold up on uploading executables for rc1 for now
< provoostenator>
Mmm, I just found a zombie lightningd instance in the background. Maybe it was making RPC requests, not sure. I'm going to kill it just in case.
< provoostenator>
(no difference)\
< provoostenator>
Height 401 :-)
< wumpus>
I wouldn't expect it to be so predictable in that case
< provoostenator>
Other than ccache and skipping tests and bench, any hints on how to make it compile faster?
< wumpus>
do only 'make -j<x> src/bitcoind'
< wumpus>
you don't really need to rebuild cli and such
< provoostenator>
Right, I should test if this is QT related first, and otherwise just build bitcoind
< wumpus>
yes, you could also only built -qt with a similar command, but it takes much longer
< provoostenator>
I know, I was doing that (make src/qt/bitcoin-qt)
< provoostenator>
Is there a reason the binaries end up in the /src path rather than in e.g. /dist?
< wumpus>
that's common for automake build systems, if you want to build somewhere else you can do an out of tree build, if you want to put the binaries somewhere set a prefix and do `make install`<
< morcos>
just to clarify we think the crash is caused by what? a Bad file descriptor issue occuring on block write?
< morcos>
We've also seen it on ldb which causes a crash
< morcos>
and on socket send/recv which doesn't
< provoostenator>
Mmm, bitcoind doesn't show "Bad file descriptor" messages (with -debug=1). It does exit with " A fatal internal error occurred, see debug.log for details"
< provoostenator>
I wonder what "Interrupting HTTP server" is about.
< wumpus>
there is no error there - you didn't simply send a stop command?
< provoostenator>
I didn't. And a stop command wouldn't explain bitcoind exiting with "Error: Error: A fatal internal error occurred, see debug.log for details"
< wumpus>
no, that's true, normally that's accompanied by an error being logged, this looks like a succesful shutdown
< wumpus>
you did paste the right debug.log? :)
< provoostenator>
Not sure actually, double checking
< provoostenator>
Default og locations changed between 0.15 and 0.16. Will try again.
< wumpus>
default log location changed?!
< wumpus>
should not be the case, it's still <datadir>/debug.log, sounds more likely you've set a different datadir for -qt
< provoostenator>
Or accidentally used mainnet for bitcoind.
< provoostenator>
Ok, now I'm seeing the "Bad file descriptor" messages again. Will wait for crash and upload correct debug log. Then continue with bisect.
< BlueMatt>
provoostenator: if you're trying to bisect, I'd recommend focusing on any changes to net
< provoostenator>
It then happily processes a few more blocks and shuts down
< wumpus>
so the buckshot hit CAutoFile's descriptor this time
< wumpus>
this is so weird, looks like some evil background thread is randomly closing fds
< provoostenator>
Unfortunately that took almost 20 minutes to crash, so this bisect will take a while, but probably worth it.
< cfields>
maybe some callback isn't taking cs_main while touching block files?
< cfields>
provoostenator: it'd be great if you could catch it in gdb
< cfields>
that should allow a 'freeze' long enough to ld your fd's
< cfields>
*ls
< provoostenator>
gdb?
< cfields>
provoostenator: debugger
< wumpus>
cfields: right, it could well be something else than net calls, I remember there is a PR that changes locking around block files
< provoostenator>
Since morcos is able to reproduce, maybe it's easier if he looks at the debugger, while I just try to pinpoint which commit caused this.
< cfields>
sure
< provoostenator>
That should also provide more assurance that the fix actually fixes it.
< morcos>
cfields: now i hit the first assert
< cfields>
morcos: if it's some fd leak, it'd make sense that you'd get EBADFD randomly, all over the place
< cfields>
morcos: any chance you can catch it in gdb?
< morcos>
yeah, ok so if i run in gdb, then you want what, the list of what's in /proc/pid, or what
< cfields>
yea
< cfields>
try to break on assert (might be _assert or __assert) in gdb, so it hangs before exit is called
< wumpus>
no problems with testnet sync here w/ cfields's assertions
< morcos>
cfield: no i saved it
< wumpus>
(at least at block 319000)
< cfields>
morcos: could you do a 'thread apply all bt' for that one?
< morcos>
how do i easily output that to a file
< cfields>
the send is more interesting because it may be an optimistic send. in that case, it'd be coming from the message handler thread rather than the sockethandler, so it might show a little more
< wumpus>
"Hello. CloseSocket may be called with hSocket uninitialised, at net.cpp:448 (not confirmed to be the cause of this bug, but it seems likely)"
< cfields>
morcos: ok, so you're not running through fd's. Closing a random one makes way more sense
< morcos>
oh that was stupid, i changed the mainnet dns seeds but was running on testnet
< cfields>
heh
< cfields>
morcos: try setting it to 0.0.0.0 instead. Not sure if the resolver will actually hand that out, though
< cfields>
actually, if that's the case, I should be able to repro too instead of asking you to :p
< cfields>
off to test
< morcos>
cfields: ok yeah it did load dns seeds before this crash though
< wumpus>
well it also depends on what is in the uninitialized memory
< morcos>
btw, before i forget, it seemed that running in testnet was reading peers.dat from .bitcoin and not testnet3
< wumpus>
if there happen to be zeroes there, or some value that is larger than max fd, it will go unniticed, it still doesn't have to trigger every time
< morcos>
i deleted both of them to force dnsseeds
< cfields>
wumpus: true, but an assert on a successfull close() should point it out quickly i should think
< cfields>
wumpus: wouldn't anything other than -1 in memory cause a problem?
< wumpus>
cfields: it might, though closing fd 0 (stdin) is harmless in our case
< cfields>
hmm
< wumpus>
at least the first time. Once you close stdin, the next time you use open() you might get that fd, and if it then randomly gets closed again, it will still interfere. So, yeah. THe only harmless values would be very large ones that can't be a fd, ever.
< cfields>
makes sense
< cfields>
morcos: fyi, there's -forcednsseed
< morcos>
cfields: I added an assert in netbase.cpp CloseSocket that ret != error and I hit it
< cfields>
morcos: great! can you print hSocket there ?
< morcos>
0?
< morcos>
looks like addrConnect is 0
< morcos>
looks like this is all a test from provoostenator as its coming from his seed
< cfields>
morcos: you didn't set yours in /etc/hosts ?
< provoostenator>
Interesteing, is my testnet seed doing something funny?
< morcos>
actually i'm not sure abou tthat, since i'm not familiar with this code
< morcos>
0x00005555555ed33a in CConnman::ConnectNode (this=this@entry=0x555556bc8530, addrConnect=..., pszDest=0x0, pszDest@entry=0x7fffac08c150 "seed.testnet.bitcoin.sprovoost.nl", fCountFailure=fCountFailure@entry=false) at net.cpp:448
< provoostenator>
Mmm, it might actually be down. Let me check
< morcos>
cfields i did make changes in /etc/hosts, but isn't seed.bitcoin.sipa.be a mainnet seed?
< cfields>
morcos: yea, but you said you had all of em. didn't know what you set in there
< sdaftuar>
i don't know if this is related, but i have a lot of lines like this in my debug.log: "trying connection seed.testnet.bitcoin.sprovoost.nl lastseen=0.0hrs"
< cfields>
provoostenator: it's down for me...
< provoostenator>
For me as well.
< cfields>
but that should return 0 addresses and not try a connection. It shouldn't end up trying to connect to 0...
< provoostenator>
I still need to setup monitoring.
< cfields>
provoostenator: leave it down while we're testing :)
< sdaftuar>
i just added an else {} clause in net.cpp (before the suspicious line 448), and it triggered for me on -testnet when running with -forcednsseeds
< morcos>
huh, the fix didn't fix it?
< sdaftuar>
wasn't running with the fix -- just verifying that hSocket could indeed be uninitialized
< cfields>
or are you saying that you've verified that you can hit the else branch?
< sdaftuar>
^ that
< cfields>
ok, great
< cfields>
morcos: have you been on testnet every time you've hit this?
< cfields>
i realize that a mainnet seed could've been returning 0 as well, but that doesn't seem like it'd affect a mainnnet node that's been up for more than a few minutes
< cfields>
but i guess that does jive with your complaints that we're querying the seeds too often
< cfields>
heh, in fact, you would've been noticing that because there'd be an entry at the end of every log file
< morcos>
no all the prior times were mainnet
< provoostenator>
Doesn't it pick a seed at random?
< morcos>
but yes i was querying dns seeds occasionally
< morcos>
i don't know what you mean about seing an etnry at the end of every log file
< provoostenator>
Or does it ping them all? Because I didn't get crashes only 1 in ~5 times, I got them all the time, with a fresh datadir.
< morcos>
the problem is if there is a dnsseed that is returning garbage somehow, it'll periodically retry it right?
< morcos>
so if everytime it does that, it results in me close fd 0
< morcos>
which at that point has been reused for soemthing else
< morcos>
it'll cause an error
< morcos>
but the socket errors aren't fatal, so its only the leveldb or blockwriting errors that cause a crash and show up at the end of the log
< morcos>
but maybe thats what you're saying
< cfields>
morcos: it only hits the seeds if we don't have enough peers
< cfields>
it shouldn't keep retrying anything just because it failed
< cfields>
wait, it might now!
< provoostenator>
Should there be a functional test for dealing with broken DNS seeds?
< cfields>
provoostenator: does your seed support filtering?
< cfields>
it would make sense if a connection was tried because it was a oneshot()...
< morcos>
cfields: it looked to me that it adds it to oneshot
< morcos>
isn't that what it does with dnsseeds
< morcos>
assuming you need them
< cfields>
that's a new change, sec
< cfields>
so we're not differentiating between a failed resolve, and a resolve with 0 results
< morcos>
cfields: yeah can you look at line 390 in net.cpp
< morcos>
i don't know what that does, but what happens if it fails
< cfields>
#11512
< gribble>
https://github.com/bitcoin/bitcoin/issues/11512 | Use GetDesireableServiceFlags in seeds, dnsseeds, fixing static seed adding by TheBlueMatt · Pull Request #11512 · bitcoin/bitcoin · GitHub
< provoostenator>
cfields: I use sipa's tool with the default settings, see my 0bin past above for full command
< cfields>
morcos: ok, red herring. I see what's happening.
< cfields>
it fails on the filtered resolve, so it does a oneshot for the unfiltered one. working as intended
< cfields>
but they're both down, so you get a random socket
< cfields>
wumpus: i'm more and more confident that this is the issue
< wumpus>
cfields: great!
< cfields>
and very sorry that i introduced it :(
< wumpus>
means we can do rc2 soon
< wumpus>
heh no worries
< wumpus>
happy if it's this and some problem deep in leveldb
< morcos>
provoostenator is right though we should have a test for bad dns seed.
< cfields>
we could add that to the travis cron job
< wumpus>
I agree, would be somewhat tricky to spin up a fake dns server in the test framework though, though maybe python has something easy for that I don't know
< cfields>
oh, i thought the concern was not knowing that dns seeds were down
< sdaftuar>
i think the concern is making sure we handle it when they are? or really both i guess...
< provoostenator>
I'll keep the bisect going just in case. Leaving it running for 25 minutes until I "git bisect good" a commit, so it will take few hours.
< cfields>
2018-02-01 16:23:47 Closing bad socket: 1266668816
< cfields>
2018-02-01 16:24:30 Closing bad socket: 100640016
< cfields>
yep, that's it
< provoostenator>
Mmm, if it's DNS related bisect might end up finding the PR where my seed was merged. Anyway, we'll see.
< provoostenator>
I added the log to the above issue.
< ProfMac>
"should" the DNS discover use IPv4 when onlynet=IPv6?
< wumpus>
AFAIK there's no way in the libc API to do DNS resolving only over either IPv4 or IPv6
< wumpus>
and as many modern linux distros run a DNS cache on localhost, on the IPv4 loopback, that'd effectively mean that DNS seeding cannot be used when onlynet=ipv6
< wumpus>
(hm, how many ISPs give out IPv6 DNS servers in the first place?)
< wumpus>
probably only those that only give clients a IPv6 address
< wumpus>
I don't know, it's an interesting though experiment, but I think in practice it's good that use of dns seeding or not is a separate optoin
< cfields>
wumpus: you can kinda fudge it with AI_ADDRCONFIG, as we do
< wumpus>
so with regard for 0.16.0 status, there already have been some issues that came up with rc1, so I think it makes sense to skip uploading binaries for that and go to rc2 soon
< BlueMatt>
ack
< achow101>
what issues?
< cfields>
agreed
< cfields>
achow101: see backlog for the last ~3hrs
< wumpus>
there's another issue with onetry connections being re-tried forever, resulting in potential DoS on DNS seeds in the case they temporarily fail
< jonasschnelli>
by the way, is it a policy that a DNS seed also runs a node (same ip) for the oneshot?
< wumpus>
jonasschnelli: no, that's not necessary
< wumpus>
jonasschnelli: it looks up the DNS seed which will return a (the first?) node
< jonasschnelli>
wumpus: okay. My seeders will refuse connections to 8333
< jonasschnelli>
wumpus: okay.
< wumpus>
jonasschnelli: that is not the IP of the DNS server. I was confused about that too at some point in the past.
< jonasschnelli>
I think also has something do to with tor mode
< provoostenator>
Using A records is what makes it confusing
< cfields>
yea, it's just some random peer
< BlueMatt>
jonasschnelli: yes, you'd have to include your own ip in the dnsseed to (maybe) get the oneshot to be you, but that would be bad, and a violation of dnsseed policy (IIRC)
< jonasschnelli>
BlueMatt: sure.
< wumpus>
yes, in tor mode no resolving is used to get multiple results (that'd require some SOCKS5 extension being used), so it uses a one shot to a random node as replacement
< jonasschnelli>
But in tor only mode, don't we do a oneshot to the seeds?
< wumpus>
no
< provoostenator>
BlueMatt: and an effective way to DDOS yourself
< jonasschnelli>
wumpus: okay. Thanks... never looked that up properly
< wumpus>
it never looks up the IP of the DNS server at all, that's all happening below the libc abstraction
< wumpus>
any topics?
< jonasschnelli>
Everyone already back at work it seems
< wumpus>
ok.. seems not... well please review anything under the 0.16.0 milestone, and anything added in the next day too
< sdaftuar>
if we don't have more pressing things to discuss, i'd like to solicit feedback on #11739 (backdating p2sh /segwit v0 script rules) to genesis
< sdaftuar>
mostly i want to know if there are any concept NACKs
< wumpus>
#topic enforce SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS from genesis (sdaftuar)
< sdaftuar>
and i guess the other question is confirming whether/how such a change should be documented
< cfields>
+0
< cfields>
sdaftuar: going forward, you mean? or this one?
< sdaftuar>
both?
< sdaftuar>
i drafted a BIP for this one
< cfields>
well going forward, i think we could specify this intention as part of a soft-fork bip?
< wumpus>
no NACK from me, if the code can be simplified that way then it's great
< BlueMatt>
+1
< wumpus>
it doesn't change the rules enforced for current blocks, does it?
< wumpus>
how is it a softfork?
< sdaftuar>
no effect on current blocks
< wumpus>
if it's a softfork I am misunderstanding
< BlueMatt>
its a soft spoon - only prevents a 6-month reorg from removing segwit :p
< luke-jr>
it restricts the rules on older blocks
< sdaftuar>
it's a softfork under a technical definition
< BlueMatt>
not a fork
< sdaftuar>
of making valid things now invalid
< cfields>
sorry, my fault.
< morcos>
+1 as well.. but i do have concerns about how we could do this on a going forward basis
< wumpus>
oh, right
< provoostenator>
Or just Buried Deployment?
< wumpus>
but it makes no difference bencause the old blocks all qualify
< luke-jr>
the question comes down to, are we limiting soft/hardfork definitions to only ones that affect future blocks?
< morcos>
it seems like if this is always our intention, then as soon as we announce a future soft fork
< morcos>
some jack ass is going to mine violations just to make us annoyed
< BlueMatt>
luke-jr: yes, we should start calling buried deployments spoons
< luke-jr>
or do we consider this an implementation detail?
< cfields>
morcos: true
< sdaftuar>
morcos: i think it's not really worth worrying about that
< wumpus>
I see this as an implementation detail to validation
< wumpus>
there's no need to cause a lot of rufus about it
< wumpus>
if you call it softfork you'll have the miners in arms, and whatnot
< cfields>
luke-jr: well if there's an absolutely massive reorg, it's not just an implementation detail, no?
< morcos>
well it addresses cfields point about having the original BIP specify the intention. i think we should always only consider backdating after the fact
< sdaftuar>
morcos: oh i see your point
< wumpus>
because then it also needs to be signaled some way, I guess
< luke-jr>
cfields: if there's an absolutely massive reorg, it's unclear what the outcome will be period
< luke-jr>
cfields: for example, Knots has a checkpoint on a Segwit block
< cfields>
well isn't the intention here to clarify that outcome?
< BlueMatt>
if there's a 6 month reorg there will be debate as to whether to follow it...whether we follow it or not ends up being a community question anyway :p
< wumpus>
if there's a reorg that big that all segwit blocks are reorged out, well...
< sdaftuar>
i think in this case, it's clear that segwit transactors do not intend for their funds to be spendable on a segwit-inactive chain
< wumpus>
yes, I'm sure there will be discussion enough in that case
< sdaftuar>
so backdating the segwit rules matches consensus, in that sense
< BlueMatt>
so, definitely not a fork
< wumpus>
right
< luke-jr>
in which case, I don't see that we need a BIP for it. I suggest we make a new repo for Core-specific documentation like this.
< cfields>
morcos: i half agree about after-the-fact. Not mentioning backdating with the intention of doing so anyway is a bit... iffy
< luke-jr>
BIPs are for cross-software standards, which doesn't really include implementation details
< BlueMatt>
seems fine to me, I also appreciate gmaxwell's partially-joking suggestion of calling it a spoon
< wumpus>
hehe
< luke-jr>
(actually, we have a repo for gitian docs already, right?)
< sdaftuar>
i personally think that it's helpful to put it in a BIP, because it affects the implementation of existing BIPs
< BlueMatt>
and then doing a bip and just saying "Soft Spoon"
< sdaftuar>
but i don't feel strongly
< BlueMatt>
i mean we use BIPs for things that are core-specific anyway, like getblocktemplate
< wumpus>
but in the end it doesn't matter whether people implement this BIP
< wumpus>
because it's an implementation detail
< luke-jr>
BlueMatt: GBT isn't Core-specific
< sdaftuar>
wumpus: agreed
< sdaftuar>
it's an informational BIP
< cfields>
luke-jr: there are several post-mortem BIPs
< wumpus>
BlueMatt: well that's an interface! interfaces need documentation
< kanzure>
hi.
< luke-jr>
maybe we can put it in an annex for the BIPs it affects or something? just seems like it will get old to have two BIPs for every fork
< wumpus>
if a softspoon drops at 300000 blocks deep and no one hears it, did it happen at all?
< luke-jr>
one for the deployment and implementation, and another for the reinterpretation of the deployment
< sdaftuar>
luke-jr: that seems reasonable to me as well, if the BIP authors agree?
< wumpus>
luke-jr: agree
< luke-jr>
none of the authors appear to be here now, but I doubt they'd object
< luke-jr>
at least for Segwit
< provoostenator>
And the winner of the git bisect game is....
< provoostenator>
Meanwhile I'm checking if #12326 actually makes the crash go away now, as well as whether removing my seed from v0.16.0rc1 makes it go away (very likely yes).
< jcorgan>
hey guys, just fyi, i've passed on the management/maintainer/architect roles in gnuradio to new hands (after 12 years), to focus on bitcoin-related work full-time. part of that will be getting back into core dev process. very much looking forward to the change of pace.
< gmaxwell>
jcorgan: awesome!
< cfields>
wumpus: and 12329 ?
< wumpus>
jcorgan: congratulations!
< gmaxwell>
(I mean, nooo sucks for gnuradio; and all my SDR projects)
< cfields>
jcorgan: very cool. 12 years is a long time
< wumpus>
cfields: huh I meant that one
< jcorgan>
i think that's half over some people here's lifetime
< gmaxwell>
Would it be entirely crazy to make debug=1 not enable all debugging options, but a "many" option, that turns off absurdly chatty stuff (leveldb internals would be the one thing now).
< gmaxwell>
?
< provoostenator>
+1
< gmaxwell>
I don't think any of us have ever found the leveldb internals logging useful so far. And I know I always disable it and curse when I accidentally level it on.
< sdaftuar>
that seems to be everyone's experience afaik
< gmaxwell>
er leave it on.
< gmaxwell>
I suppose I've learned a bit about how much background stuff leveldb does due to it. :)
< wumpus>
I don't like the chatty net logging either
< gmaxwell>
At least I've found that stuff useful. I'd be okay with it off in debug=1 though I'd often turn it on.
< wumpus>
which is why I opened #12219, I think we need a DEBUG..ERROR axis as well
< provoostenator>
The flickering makes it hard to watch an IBD in real time.
< gmaxwell>
or at least chatty net doesn't bother me so much since upgrading nodes to 1TB ssds...
< wumpus>
apart from the category we also need a log level, that'd make logging things more sane without singling out specific categories
< wumpus>
or splitting up categories
< wumpus>
I think debug=1 should remain 'log everything possible', which doesn't rule out more selective logging options
< gmaxwell>
I'd buy a debug vs error sort of axis though I'm doubtful about a really granular level, as people will irritatingly set them pretty subjectively.
< gmaxwell>
or I suppose if we just had a debug=many or debug=most that would be fine too. I seem to screw up debug=1 and then excluding for some reason.
< wumpus>
or something like debug=1 debug=-leveldb
< gmaxwell>
also having a single setting is more useful when I'm asking users to set things.
< wumpus>
allowing categories to be disabled
< cfields>
gmaxwell: agreed. -debug and -debug=all don't have to be the same thing
< gmaxwell>
we have -debugexclude
< wumpus>
but yes it is subjective and dpeends on what you want to debug
< wumpus>
which is why making a single selection for -debug=1 seems weird to me
< wumpus>
some messages might be less interesting for what you're debugging, but that's why we have categories in the first place
< gmaxwell>
Though also for some of this user stuff, I think it would be very useful to have a circular buffer in ram that always gets a MUCH higher debug level than what goes to disk (e.g. every debug option that isn't computationally expensive)
< gmaxwell>
and on crashes we could make a best effort attempt to dump the circular buffer to a file in a crash handler.
< wumpus>
debug=1 is super-overkill last resort for when you really don't know where to look
< wumpus>
gmaxwell: yes, we need that too
< gmaxwell>
wumpus: or when round trips are expensive. if I have a user reporting an issue I don't want to iterate on a bunch of options. I just want all the info that might be relevant, but the leveldb stuff is so far useless and very bloaty.
< gmaxwell>
promag: you can already -debug=category to add categories and -debugexclude to remove them.
< wumpus>
gmaxwell: still you need a selection of categories then
< promag>
but you can't have levels right?
< gmaxwell>
right now I tell users to debug=1 and debugexclude leveldb.
< wumpus>
gmaxwell: it's unlikely that you want debugging for ,say, torcontrol, even though that logging is extremely useful if you're debugging that
< gmaxwell>
wumpus: tor control isn't that chatty however.
< gmaxwell>
net is chatty but one of the most useful thing to log.
< wumpus>
but it might become so, or another chatty category could be added
< wumpus>
that's pretty subjective though
< wumpus>
because you're interested in net
< gmaxwell>
probably if I could grab a circular buffer with everything the question wrt support would go away.
< wumpus>
currently I have the problem that I'm interested in high-level network stuff, e.g. incoming connections, outgoing connections, what clients are connecting, what are their IPs, when do they disconnect. I don't need to see every single packet.
< wumpus>
but debug=net is way too chatty
< gmaxwell>
I agree that net messages and net-activity should be split.
< gmaxwell>
I do frequently like net-activity for debugging because from that I can more or less trace the state that the node is in.
< wumpus>
sure, I don't mwan that detailed net logging should go away or such
< wumpus>
it certainly has good uses when you're debugging network things
< wumpus>
in any case I think a log level would resolve some of these problems
< wumpus>
'I want to see all categories, but only at INFO levell, not DEBUG and below'
< wumpus>
where DEBUG would be the chatty stuff
< promag>
hence -debug-net=X
< gmaxwell>
Please lets not make more than three levels though.
< wumpus>
then if you want net DEBUG, you enable net debug
< wumpus>
I agre,but let's not get into bikeshedding about the number of levels
< gmaxwell>
my concern there is if there are a dozen levels, people will argue over the levels, or worse not argue over them and set them randomly and then I'll just have to be debugging everything to avoid inexplicably missing log entries.
< wumpus>
ERROR is clear, INFO/DEBUG can be set depend on chattiness
< wumpus>
I don't think we need more
< gmaxwell>
At least my expirence is that I usually want almost everything or nothing. (basically, everything that isn't so chatty that it makes handling the logs a burden)
< gmaxwell>
Yes, thats why I said three. I think three we can handle easily.
< gmaxwell>
I guess one question is about "error", there are "peer violated the protocol" sorts of errors, and "omg our state is corrupted" sorts of errors.
< wumpus>
ERROR would be potentially dangerous but not fatal errors, maybe WARNING is a better name
< wumpus>
peer violated the protocol is INFO imo
< wumpus>
it's not dangerous to us
< gmaxwell>
We might want to adapt our language to call the first things "abnormal" (e.g. info level log, and the log text should not use the word error but perhaps use the word abnormal).
< wumpus>
right
< provoostenator>
I'm switching my testnet seed back on tomorrow unless something really surprising happens.
< provoostenator>
Or I can do it in 20 minutes if we want rc1 complaints to stop coming in.
< wumpus>
but anyhow, if you come up with a specific combination of categories that would be useful as single -debug= option, I wouldn't be against it
< wumpus>
provoostenator: we should just tag r2
< provoostenator>
I'll know in ~10 minutes whether todays fixex made my crash go away (fairly certain it did).
< gmaxwell>
wumpus: okay. Well right now I think all minus leveldb internals is useful, at that seems like what a lot of us are using much of the time.
< wumpus>
gmaxwell: I think it needs a better definition though, something like debug=lowtraffic
< wumpus>
not everythingbutleveldb lol
< wumpus>
it also gives guidance for people to add or not categories to that in the future
< wumpus>
or to remove categories once they become noisy
< gmaxwell>
well net is high traffic but useful, leveldb is high traffic but so far I've not found it useful.
< wumpus>
but what is the rationale of the combination then?
< gmaxwell>
omit useless chatty things.
< wumpus>
let's kill leveldb logging completely if it's so useless
< wumpus>
libevent too, probably
< meshcollider>
it seems my gitian issue lies when it is trying to download zeromq-4.2.2.tar.gz for bitcoincore.org/depends-sources
< wumpus>
it can be even more hilariously useless
< gmaxwell>
It's at least of conjectural use e.g. if we were chasing some leveldb internal bug.
< wumpus>
oh sure it can be useful, esp when adding custom debugging to leveldb to troubleshoot some issue
< wumpus>
that's why a bridge exists
< cfields>
meshcollider: you probably had all other sources already and your lxc is failing to make any net connections
< gmaxwell>
I mean we could also leave the code for the bridge there but remove the logging level.
< meshcollider>
cfields: that's true, was the version of zeromq bumped for 0.16
< cfields>
believe so
< cfields>
yes, it was
< gmaxwell>
I opened the conversation basically with the idea of having leveldb (and maybe later other things) being log categories you never get unless you explicitly ask for them.
< gmaxwell>
maybe libevent would fall into that too.
< gmaxwell>
I've noticed it being useless too but just never had it be chatty enough to bother me.
< wumpus>
so I still think =lowtraffic makes sense
< gmaxwell>
I suppose we could lowtraffic then turn back on net-messages.
< gmaxwell>
(I mean to achieve my normal desired logging config which is basically low traffic things plus net messages)
< wumpus>
we should just include net int hat
< gmaxwell>
I'm imagining net divided into per-message logging and the rest (e.g. connections)
< wumpus>
with the future remark that if there is a low-traffic net category, that should be in instead
< gmaxwell>
right
< meshcollider>
cfields: yep you're right, it works fine if I make the depends directory beforehand, the new zeromq downloaded successfully
< cfields>
great
< meshcollider>
cfields: thanks :)
< cfields>
np
< gmaxwell>
Another thing to think about is the performance impacts of logging, some of our logs cause computation that is probably pretty bad for performance.
< wumpus>
I think that's true for libevent and leveldb logging too, they require a special enabling flag, which causes those libraries to send the messages at all
< wumpus>
gmaxwell: that'd only be problematic for high-volume messages, I'm sure e.g. computing the BENCH messages takes some cycles, but they only happen once per block
< wumpus>
and in the total validation tme that's probably neglible
< gmaxwell>
The leveldb stuff looks kind of expensive.
< wumpus>
I don't think we have low-traffic messages that take significant computation
< gmaxwell>
And I recall that there were some moderate traffic messages that did stuff like an extra iteration over all inputs in transactions or something.
< wumpus>
for high traffic, yes, I woudln't be surprised if the net logging slowed some things down
< wumpus>
if you have to write a message for every incoming packet to a file, it becomes disk bound
< wumpus>
oh I didn't know that
< zelest>
sorry for asking in here, but I did some quick googling and it seems like OneFixt is known in here? May I ask who he/she is? :o
< wumpus>
can we have some review on #12329 please, I'd like to tag rc2 before I go to bed
< sipa>
oops, seems i accidentally left this channel and forgot about the meeting as well
< luke-jr>
sipa: there was discussion of how to handle Segwit-back-to-the-start type stuff, and I thought perhaps it would be better as an annex to the Segwit BIP(s) rather than an entirely new BIP (making 2 BIPs for each fork); would you be okay with that, as an author of the BIP in question?
< gmaxwell>
and then you're gonna add (hardfork) to the segwit bips and collect your cheque from Ver? :P
< jnewbery>
re logging - we already have an alias -debug=all which aliases to -debug=1 (~0). Doesn't seem like to much of a stretch to add a new alias debug=gmax* which aliases to all the categories that greg wants to see (*name TBD)
< jnewbery>
I also agree that logging should log to a circular buffer in memory and then have a background thread flushing to disk. I bet there are plenty of places where we're logging to disk while holding cs_main for example
< jnewbery>
(in fact I started implementing that a few months ago, but never finished)
< gmaxwell>
I'd like to get to a state where our logs by default can log virutally nothing, and on fault we can dump a good amount of context.
< gmaxwell>
this also would address a lot of privacy concerns, where we avoid logging detailed data that would make node logs attractive for trying to trace transactions.
< luke-jr>
gmaxwell: as an annex, it's clearer to consider it an implementation detail ;)
< luke-jr>
ie, "you can tighten the rules using method A or method B, and the outcome is the same"