#bitcoin-core-dev on 2018-02-01 — searchable irc log

00:05 < GitHub131> [bitcoin-detached-sigs] jonasschnelli opened pull request #1: 0.16: osx signatures for 0.16.0rc1 (0.16...0.16) https://github.com/bitcoin-core/bitcoin-detached-sigs/pull/1

01:16 < GitHub8> [bitcoin-detached-sigs] theuni closed pull request #1: 0.16: osx signatures for 0.16.0rc1 (0.16...0.16) https://github.com/bitcoin-core/bitcoin-detached-sigs/pull/1

01:17 < bitcoin-git> [bitcoin] fivepiece opened pull request #12321: p2wsh address in decodescript (master...decodescript-p2wsh) https://github.com/bitcoin/bitcoin/pull/12321

01:17 < meshcollider> mryandao: see #12254

01:17 < gribble> https://github.com/bitcoin/bitcoin/issues/12254 | BIP 158: Compact Block Filters for Light Clients by jimpo · Pull Request #12254 · bitcoin/bitcoin · GitHub

01:18 < meshcollider> Oh oops, I'm too late :)

01:27 < achow101> cfields: I saw that the detached sigs were pushed, but they don't seem to be working rn

01:27 < cfields> achow101: not quite yet, still working on it

01:27 < achow101> oh, ok

01:27 < cfields> this is the first release with the new key, and the first that's not just me signing. So it's a bit kludgy

01:28 < achow101> we haven't done the mpc rsa thing yet, have we?

01:29 < cfields> no, didn't make it in time

01:29 < cfields> ok, pushed the tag. should work now

01:29 < cfields> off to find food before everything closes, bbl

01:29 < cfields> gitian builders: v0.16.0rc1 detached sigs are pushed. Please ping me if there are any issues

04:45 < dx25> with 0.16.0rc1 on exit i'm seeing "IO Error: ...chainstate/244997.ldb: Bad file descriptor", system error while flushing: Database I/O error.

04:45 < dx25> Does not seem to be happening with v0.15.2 or older version

07:06 < gmaxwell> dx25: can you tell us some about your system? What OS, etc.

07:06 < gmaxwell> dx25: we believe thats a new issue, which will likely block the release until we fix it.

07:07 < gmaxwell> dx25: we've had one developer hit it but don't have a reliable reproduction yet, and maybe there is something in common between systems that have hit it which would help track it down

07:07 < gmaxwell> e.g. what OS, kernel version, etc.

09:56 < bitcoin-git> [bitcoin] murrayn opened pull request #12322: Docs: Remove step making cloned repository world-writable for Windows build. (master...doc_change) https://github.com/bitcoin/bitcoin/pull/12322

10:53 < dafuq> a PR change to the --help CLI options would fall under what category?

10:58 < wumpus> docs

11:04 < dafuq> ok, seems most suitable but does involve code changes

11:05 < wumpus> yes, that doesn't matter, as long as it is message changes

11:05 < dafuq> thanks

11:27 < wumpus> gmaxwell: dx25: I've created an issue for the problem, please add more information if you have there https://github.com/bitcoin/bitcoin/issues/12323

11:52 < bitcoin-git> [bitcoin] AkioNak opened pull request #12324: speed up Unserialize_impl for prevector (master...unserialize) https://github.com/bitcoin/bitcoin/pull/12324

12:49 < provoostenator> I'm getting "Fatal Internal Error" during IBD on testnet3 with 0.16.0rc1, twice. Will investigate.

12:50 < bitcoin-git> [bitcoin] kekimusmaximus opened pull request #12325: Use dynamic_cast for downcasting instead of static_cast. (master...use_dynamic_cast_to_downcast) https://github.com/bitcoin/bitcoin/pull/12325

12:50 < provoostenator> Might be a disk permission thing. "Pre-allocating up to position 0x3000000 in blk00002.dat" "ERROR: WriteBlockToDisk: ftell failed" "*** Failed to write block"

12:53 < wumpus> disk full?

12:53 < provoostenator> Also seeing a bunch of "socket recv error Bad file descriptor (9)" messages, not sure what that's about.

12:53 < wumpus> oof bad file descriptor

12:53 < provoostenator> No, plenty of space. It's an external SSD though, so can't rule out a hardware problem.

12:53 < wumpus> provoostenator: see https://github.com/bitcoin/bitcoin/issues/12323

12:54 < dx25> mine's also an external drive, not ssd tho

12:54 < wumpus> no, it's possibly a regression in 0.16

12:55 < provoostenator> Getting this crash every other minute now on testnet. Will try local drive just to rule out hardware issue.

12:55 < dx25> i'm on qubes, fedora25 vm, 4.9.56 kernel

13:00 < wumpus> I really wonder why the bad file descriptor thing happens, normally that happens if either a fd is used that was not returned by open, or a file descriptor that was already closed

13:01 < provoostenator> Same "bad file descriptor" on a fresh testnet3 IBD (MacOS) on the built in harddisk which has plenty of space. Waiting for it to crash.

13:03 < wumpus> I seriously doubt this is a hardware issue

13:03 < wumpus> are you doing anything special with the node? regular RPC requests, for example?

13:04 < wumpus> or any fd related settings in bitcoin.conf?

13:04 < provoostenator> I was running QT production and testnet at the same time for different users on my system, bother in seperate directories. I also suspended the computer (though it should keep syncing in that case). So trying to reproduce with fewer variables now.

13:05 < provoostenator> server=1 rpcuser=... rpcpassword=... listen=0

13:06 < provoostenator> I used a symlink to point to the SSD drive.

13:06 < dx25> i have some weird walletnotify and alertnotify thing calling curl for some reason i can't remember. haven't tried turning that off yet.

13:06 < provoostenator> Ah there we go again: crash.

13:07 < provoostenator> (no symlink involved this time, nor an external drive). Crashed happened at or after block 64709 (testnet)

13:07 < wumpus> provoostenator: so it happens even without listening

13:07 < dx25> i was doing no rpc stuff

13:08 < wumpus> provoostenator: that rules out some p2p races I guess, but what can it be then...

13:08 < wumpus> provoostenator: you're able to reliably reproduce this? could try git bisecting, if it was ok with 0.15.1

13:08 < provoostenator> Meanwhile I had production QT running all night without a problem, so I suspect it's sync related (my testnet node was doing an IBD the first time it crashed)

13:08 < wumpus> (or find some later commit where the issue doesn't exist)

13:09 < provoostenator> I just down the other mainnet QT instance, will try once more. Then I'll compare versions after that.

13:09 < dx25> also sync related here, but on mainnet

13:09 < provoostenator> Pretty reliable so far

13:16 < provoostenator> It was built with: ./configure --disable-tests --disable-bench --with-miniupnpc=no

13:17 < provoostenator> Hooray, another crash. Ok, should be easy to bisect.

13:17 < provoostenator> Height 8089 (testnet)

13:19 < wumpus> yes, would make a backup of the data directory to make sure you start with the same state every time

13:19 < wumpus> at hight 8089 at least that isn't too much data

13:20 < provoostenator> I get this crash with a fresh testnet3 directory.

13:20 < provoostenator> I'll keep a copy for forensics

13:22 < wumpus> I'll hold up on uploading executables for rc1 for now

13:26 < provoostenator> Mmm, I just found a zombie lightningd instance in the background. Maybe it was making RPC requests, not sure. I'm going to kill it just in case.

13:29 < provoostenator> (no difference)\

13:30 < provoostenator> Height 401 :-)

13:30 < wumpus> I wouldn't expect it to be so predictable in that case

13:47 < provoostenator> Other than ccache and skipping tests and bench, any hints on how to make it compile faster?

13:48 < wumpus> do only 'make -j<x> src/bitcoind'

13:48 < wumpus> you don't really need to rebuild cli and such

13:56 < provoostenator> Right, I should test if this is QT related first, and otherwise just build bitcoind

14:02 < wumpus> yes, you could also only built -qt with a similar command, but it takes much longer

14:04 < provoostenator> I know, I was doing that (make src/qt/bitcoin-qt)

14:04 < provoostenator> Is there a reason the binaries end up in the /src path rather than in e.g. /dist?

14:09 < wumpus> that's common for automake build systems, if you want to build somewhere else you can do an out of tree build, if you want to put the binaries somewhere set a prefix and do `make install`<

14:15 < morcos> just to clarify we think the crash is caused by what? a Bad file descriptor issue occuring on block write?

14:15 < morcos> We've also seen it on ldb which causes a crash

14:15 < morcos> and on socket send/recv which doesn't

14:16 < provoostenator> Mmm, bitcoind doesn't show "Bad file descriptor" messages (with -debug=1). It does exit with " A fatal internal error occurred, see debug.log for details"

14:19 < provoostenator> But http://termbin.com/83iva

14:19 < provoostenator> I wonder what "Interrupting HTTP server" is about.

14:21 < wumpus> there is no error there - you didn't simply send a stop command?

14:22 < provoostenator> I didn't. And a stop command wouldn't explain bitcoind exiting with "Error: Error: A fatal internal error occurred, see debug.log for details"

14:22 < provoostenator> I'm blaming gremlins. Bisecting now.

14:23 < wumpus> no, that's true, normally that's accompanied by an error being logged, this looks like a succesful shutdown

14:23 < wumpus> you did paste the right debug.log? :)

14:26 < provoostenator> Not sure actually, double checking

14:28 < provoostenator> Default og locations changed between 0.15 and 0.16. Will try again.

14:30 < wumpus> default log location changed?!

14:30 < wumpus> should not be the case, it's still <datadir>/debug.log, sounds more likely you've set a different datadir for -qt

14:32 < provoostenator> Or accidentally used mainnet for bitcoind.

14:37 < provoostenator> Ok, now I'm seeing the "Bad file descriptor" messages again. Will wait for crash and upload correct debug log. Then continue with bisect.

14:37 < BlueMatt> provoostenator: if you're trying to bisect, I'd recommend focusing on any changes to net

14:37 < BlueMatt> eg #11363

14:37 < gribble> https://github.com/bitcoin/bitcoin/issues/11363 | net: Split socket create/connect by theuni · Pull Request #11363 · bitcoin/bitcoin · GitHub

14:37 < BlueMatt> and #10663

14:37 < gribble> https://github.com/bitcoin/bitcoin/issues/10663 | net: split resolve out of connect by theuni · Pull Request #10663 · bitcoin/bitcoin · GitHub

14:38 < BlueMatt> though I dont see anything obvious in net on master that should be causing this

14:38 < provoostenator> BlueMatt: bisecting everything involves less thinking :-)

14:39 < provoostenator> Although I'm not sure yet how long without a crash is long enough. So far it crashes within a few minutes though if it does.

14:40 < provoostenator> Maybe "Bad file descriptor" is a good enough proxy.

14:46 < morcos> cfields asked me to add some asserts for debugging this

14:46 < morcos> https://0bin.net/paste/X-6vyuCUUlN+XY1l#5VoMIrtAP6S0Yex-7eAwDNLJl/7UT2e7medmcgsVtvA

14:46 < gribble> https://github.com/bitcoin/bitcoin/issues/5 | Make the version number the protocol version and not the client version · Issue #5 · bitcoin/bitcoin · GitHub

14:46 < morcos> I just tried a fresh sync of testnet3 with these asserts patched on 0.16.0rc1 and the middle assert triggered

14:47 < morcos> last line in debug log

14:47 < morcos> 2018-02-01 14:40:00.528984 socket select error Bad file descriptor (9)

14:49 < cfields> BlueMatt: that seems like a really good candidate, yes. I went through it earlier this week already, but maybe i keep missing something

14:49 < morcos> here is the backtrace from that thread:

14:49 < morcos> https://0bin.net/paste/OmSDtVV-u0EwBaEp#Muh7dQfn6+kzip0Ib-b5brUozDZtCZblT1ev2lNtfdT

14:49 < wumpus> also going to test with those asserts

14:50 < cfields> morcos: can you do a 'thread apply all bt' ?

14:50 < cfields> if it's a leak in something like 11363, it won't show up in a bt though :(

14:51 < cfields> though also, a leak should be obvious after a quick peek at /proc

14:51 < wumpus> would be nice if it was able to rewind the process to see what closed fd 9

14:52 < morcos> https://0bin.net/paste/FotU51A-Z98LdS0u#aWgxDyxhAn5L4XIex7Ko+kjMJ+qkAL0CJ77NubzV6io

14:52 < morcos> cfields: ^^

14:52 < cfields> thanks

14:52 < cfields> wumpus: i think 9 is EBADFD, not the fd#

14:52 < morcos> yeah i hope so, b/c it's always 9

14:53 < wumpus> cfields: oh, right it doesn't print the actual fd number

14:53 < provoostenator> http://termbin.com/h7jy

14:54 < provoostenator> "System error: CAutoFile::write: write failed: unspecified iostream_category error"

14:54 < provoostenator> It then happily processes a few more blocks and shuts down

14:54 < wumpus> so the buckshot hit CAutoFile's descriptor this time

14:55 < wumpus> this is so weird, looks like some evil background thread is randomly closing fds

14:55 < provoostenator> Unfortunately that took almost 20 minutes to crash, so this bisect will take a while, but probably worth it.

14:55 < cfields> maybe some callback isn't taking cs_main while touching block files?

14:56 < cfields> provoostenator: it'd be great if you could catch it in gdb

14:56 < cfields> that should allow a 'freeze' long enough to ld your fd's

14:56 < cfields> *ls

14:56 < provoostenator> gdb?

14:56 < cfields> provoostenator: debugger

14:56 < wumpus> cfields: right, it could well be something else than net calls, I remember there is a PR that changes locking around block files

14:57 < provoostenator> Since morcos is able to reproduce, maybe it's easier if he looks at the debugger, while I just try to pinpoint which commit caused this.

14:58 < cfields> sure

14:58 < provoostenator> That should also provide more assurance that the fix actually fixes it.

14:58 < morcos> cfields: now i hit the first assert

14:58 < cfields> morcos: if it's some fd leak, it'd make sense that you'd get EBADFD randomly, all over the place

14:58 < cfields> morcos: any chance you can catch it in gdb?

14:59 < morcos> yeah, ok so if i run in gdb, then you want what, the list of what's in /proc/pid, or what

14:59 < cfields> yea

14:59 < cfields> try to break on assert (might be _assert or __assert) in gdb, so it hangs before exit is called

15:03 < wumpus> cfields: maybe #11281 / ccd8ef65f93ed82a87cee634660bed3ac17d9eb5?

15:03 < gribble> https://github.com/bitcoin/bitcoin/issues/11281 | Avoid permanent cs_main/cs_wallet lock during RescanFromTime by jonasschnelli · Pull Request #11281 · bitcoin/bitcoin · GitHub

15:04 < morcos> it's on to me, it's not going to crash now

15:04 < wumpus> cfields: it changes locking around block file reading, at least

15:04 < cfields> morcos: heh, gdb tends to throw timings just enough so that everything works perfectly

15:05 < cfields> wumpus: good find, taking a look

15:06 < cfields> morcos: wait, you said you hit the first one? that should be much more interesting

15:06 < cfields> i guess th core file is blown away already? :(

15:06 < wumpus> cfields: the one I was thinking of https://github.com/bitcoin/bitcoin/pull/11913 is not merged :)

15:07 < cfields> wumpus: ah, heh

15:07 < wumpus> no problems with testnet sync here w/ cfields's assertions

15:07 < morcos> cfield: no i saved it

15:07 < wumpus> (at least at block 319000)

15:08 < cfields> morcos: could you do a 'thread apply all bt' for that one?

15:08 < morcos> how do i easily output that to a file

15:09 < cfields> the send is more interesting because it may be an optimistic send. in that case, it'd be coming from the message handler thread rather than the sockethandler, so it might show a little more

15:09 < wumpus> gdb logging

15:09 < wumpus> morcos: https://sourceware.org/gdb/onlinedocs/gdb/Logging-Output.html

15:15 < morcos> some serious user error there

15:15 < morcos> https://0bin.net/paste/43cNNOSrrPlIyVqt#BE1tyObep4Y9aLIpDDVC4kwVFJYxctxlhMJTE3xF08j

15:15 < cfields> thanks

15:15 < wumpus> "Hello. CloseSocket may be called with hSocket uninitialised, at net.cpp:448 (not confirmed to be the cause of this bug, but it seems likely)"

15:16 < wumpus> (https://github.com/bitcoin/bitcoin/issues/12323#issuecomment-362296158)

15:17 < cfields> I don't think it can be called unitialized?

15:18 < wumpus> closing an uninitialized fd would explain this perfectly, though

15:18 < wumpus> if you can reproduce this, maybe log all CloseSocket calls and see if it ends up with any funny data

15:18 < cfields> yes it would

15:19 < wumpus> (or at least those for which the close() fails, because we ignore closes return value right now)

15:19 < cfields> and it matches provoostenator's log as well

15:19 < wumpus> yep

15:20 < cfields> oh wait, that's an else if(), not an else

15:20 < cfields> maybe that can happen

15:21 < morcos> yeah seems like you are missing a catch all else

15:21 < cfields> looks like it'd happen as a result of a dns seed handing out a bad/local ip

15:22 < morcos> nice find by david60

15:22 < cfields> morcos: do you see dns queries before crashes in your logs?

15:22 < morcos> cfields: that would match my having it happen a lot when i noticed i was using dns seeds a lot

15:22 < morcos> i did previously, i don't know if i exclusively checked , but i had observed it was doing a lot of dns querying

15:23 < cfields> I'll pr a fix for that right now either way. Seems like a really good candidate, though

15:23 < cfields> ok

15:23 < cfields> maybe you could force it by setting up a phony seed and returning only 127.0.0.1

15:24 < cfields> i think 2 phony entries in /etc/hosts would work

15:25 < morcos> 2? any 2? one ip4 and one ip6?

15:25 < provoostenator> That would be nice, in that case you can even write a test for it?

15:27 < cfields> morcos: 127.0.0.1 seed.bitcoin.sipa.be

15:27 < cfields> i think that should do it?

15:28 < morcos> i think i tried that, let me see what happened, it didn't crash yet

15:29 < provoostenator> I've never had a crash before or during headers sync, always during block downloads.

15:31 < cfields> morcos: ok, 127.0.0.1 would still pass IsValid, we'll have to use a more busted value

15:33 < morcos> i got it to crash again not sure if it was because of those entries or what

15:33 < morcos> in gdb

15:34 < bitcoin-git> [bitcoin] theuni opened pull request #12326: net: initialize socket to avoid closing random fd's (master...fix-socket-init) https://github.com/bitcoin/bitcoin/pull/12326

15:35 < cfields> morcos: dns query at the end of your log?

15:35 < morcos> https://0bin.net/paste/3T-vAUCXmHtlqJ9X#1UvKDgv7oZR4uZEEV0WqIGFDNdUQFTW3xFHkE0R-UOe

15:35 < gribble> https://github.com/bitcoin/bitcoin/issues/1 | JSON-RPC support for mobile devices ("ultra-lightweight" clients) · Issue #1 · bitcoin/bitcoin · GitHub

15:35 < provoostenator> gribble: lol

15:35 < cfields> morcos: ok, so you're not running through fd's. Closing a random one makes way more sense

15:35 < morcos> oh that was stupid, i changed the mainnet dns seeds but was running on testnet

15:36 < cfields> heh

15:37 < cfields> morcos: try setting it to 0.0.0.0 instead. Not sure if the resolver will actually hand that out, though

15:37 < cfields> actually, if that's the case, I should be able to repro too instead of asking you to :p

15:37 < cfields> off to test

15:38 < morcos> cfields: ok yeah it did load dns seeds before this crash though

15:38 < wumpus> well it also depends on what is in the uninitialized memory

15:39 < morcos> btw, before i forget, it seemed that running in testnet was reading peers.dat from .bitcoin and not testnet3

15:39 < wumpus> if there happen to be zeroes there, or some value that is larger than max fd, it will go unniticed, it still doesn't have to trigger every time

15:39 < morcos> i deleted both of them to force dnsseeds

15:39 < cfields> wumpus: true, but an assert on a successfull close() should point it out quickly i should think

15:39 < cfields> wumpus: wouldn't anything other than -1 in memory cause a problem?

15:40 < wumpus> cfields: it might, though closing fd 0 (stdin) is harmless in our case

15:40 < cfields> hmm

15:41 < wumpus> at least the first time. Once you close stdin, the next time you use open() you might get that fd, and if it then randomly gets closed again, it will still interfere. So, yeah. THe only harmless values would be very large ones that can't be a fd, ever.

15:43 < cfields> makes sense

15:43 < cfields> morcos: fyi, there's -forcednsseed

15:47 < morcos> cfields: I added an assert in netbase.cpp CloseSocket that ret != error and I hit it

15:47 < cfields> morcos: great! can you print hSocket there ?

15:49 < morcos> 0?

15:49 < morcos> looks like addrConnect is 0

15:49 < morcos> looks like this is all a test from provoostenator as its coming from his seed

15:50 < cfields> morcos: you didn't set yours in /etc/hosts ?

15:50 < provoostenator> Interesteing, is my testnet seed doing something funny?

15:50 < morcos> actually i'm not sure abou tthat, since i'm not familiar with this code

15:50 < morcos> 0x00005555555ed33a in CConnman::ConnectNode (this=this@entry=0x555556bc8530, addrConnect=..., pszDest=0x0, pszDest@entry=0x7fffac08c150 "seed.testnet.bitcoin.sprovoost.nl", fCountFailure=fCountFailure@entry=false) at net.cpp:448

15:51 < provoostenator> Mmm, it might actually be down. Let me check

15:51 < morcos> cfields i did make changes in /etc/hosts, but isn't seed.bitcoin.sipa.be a mainnet seed?

15:51 < cfields> morcos: yea, but you said you had all of em. didn't know what you set in there

15:51 < sdaftuar> i don't know if this is related, but i have a lot of lines like this in my debug.log: "trying connection seed.testnet.bitcoin.sprovoost.nl lastseen=0.0hrs"

15:52 < cfields> provoostenator: it's down for me...

15:52 < provoostenator> For me as well.

15:52 < cfields> but that should return 0 addresses and not try a connection. It shouldn't end up trying to connect to 0...

15:52 < provoostenator> I still need to setup monitoring.

15:52 < cfields> provoostenator: leave it down while we're testing :)

15:52 < morcos> here is the bt from that thread

15:52 < morcos> https://0bin.net/paste/X7-3MRaRJlLP8PoG#ApzQJmsY+N3H64RbQ-42PwxlbdJhXwpSvnx1BoINmZ+

15:52 < provoostenator> cfields: will do, just ping me when you want me to bring it back

15:53 < morcos> let me know if you want me to look at any particular value

15:55 < cfields> weird, i don't hit an assert there

15:55 < morcos> isn't it telling that in my /proc/pid/fd i didn't have a FD 0 in the earlier paste

15:55 < morcos> i also don't have one in this paste

15:55 < morcos> sorry not paste, example or whatever

15:55 < wumpus> morcos: that's very telling

15:55 < cfields> morcos: very good point. that would explain the environment difference too

15:56 < morcos> didn't follow that last part

15:57 < cfields> morcos: if something about your OS/mem/etc. makes 0 a more likely value for you than everyone else

15:57 < provoostenator> My seed died with: https://0bin.net/paste/2bx9a3iijDdBgWXg#dlXDiXwy9dVkyvR0yuCyjzH4scHaL+6GHT-ll14wOrD

15:58 < sdaftuar> i just added an else {} clause in net.cpp (before the suspicious line 448), and it triggered for me on -testnet when running with -forcednsseeds

15:59 < morcos> huh, the fix didn't fix it?

16:00 < sdaftuar> wasn't running with the fix -- just verifying that hSocket could indeed be uninitialized

16:00 < cfields> or are you saying that you've verified that you can hit the else branch?

16:00 < sdaftuar> ^ that

16:00 < cfields> ok, great

16:00 < cfields> morcos: have you been on testnet every time you've hit this?

16:01 < cfields> i realize that a mainnet seed could've been returning 0 as well, but that doesn't seem like it'd affect a mainnnet node that's been up for more than a few minutes

16:01 < cfields> but i guess that does jive with your complaints that we're querying the seeds too often

16:02 < cfields> heh, in fact, you would've been noticing that because there'd be an entry at the end of every log file

16:02 < morcos> no all the prior times were mainnet

16:02 < provoostenator> Doesn't it pick a seed at random?

16:02 < morcos> but yes i was querying dns seeds occasionally

16:02 < morcos> i don't know what you mean about seing an etnry at the end of every log file

16:02 < provoostenator> Or does it ping them all? Because I didn't get crashes only 1 in ~5 times, I got them all the time, with a fresh datadir.

16:03 < morcos> the problem is if there is a dnsseed that is returning garbage somehow, it'll periodically retry it right?

16:03 < morcos> so if everytime it does that, it results in me close fd 0

16:03 < morcos> which at that point has been reused for soemthing else

16:03 < morcos> it'll cause an error

16:04 < morcos> but the socket errors aren't fatal, so its only the leveldb or blockwriting errors that cause a crash and show up at the end of the log

16:04 < morcos> but maybe thats what you're saying

16:04 < cfields> morcos: it only hits the seeds if we don't have enough peers

16:04 < cfields> it shouldn't keep retrying anything just because it failed

16:05 < cfields> wait, it might now!

16:06 < provoostenator> Should there be a functional test for dealing with broken DNS seeds?

16:06 < cfields> provoostenator: does your seed support filtering?

16:07 < cfields> it would make sense if a connection was tried because it was a oneshot()...

16:07 < morcos> cfields: it looked to me that it adds it to oneshot

16:07 < morcos> isn't that what it does with dnsseeds

16:07 < morcos> assuming you need them

16:07 < cfields> that's a new change, sec

16:08 < cfields> so we're not differentiating between a failed resolve, and a resolve with 0 results

16:08 < morcos> cfields: yeah can you look at line 390 in net.cpp

16:09 < morcos> i don't know what that does, but what happens if it fails

16:09 < cfields> #11512

16:09 < gribble> https://github.com/bitcoin/bitcoin/issues/11512 | Use GetDesireableServiceFlags in seeds, dnsseeds, fixing static seed adding by TheBlueMatt · Pull Request #11512 · bitcoin/bitcoin · GitHub

16:09 < morcos> if (Lookup(pszDest, resolved, default_port, fNameLookup && !HaveNameProxy(), 256) && !resolved.empty()) {

16:14 < provoostenator> cfields: I use sipa's tool with the default settings, see my 0bin past above for full command

16:16 < cfields> morcos: ok, red herring. I see what's happening.

16:16 < cfields> it fails on the filtered resolve, so it does a oneshot for the unfiltered one. working as intended

16:17 < cfields> but they're both down, so you get a random socket

16:18 < cfields> wumpus: i'm more and more confident that this is the issue

16:19 < wumpus> cfields: great!

16:19 < cfields> and very sorry that i introduced it :(

16:19 < wumpus> means we can do rc2 soon

16:19 < wumpus> heh no worries

16:20 < wumpus> happy if it's this and some problem deep in leveldb

16:20 < morcos> provoostenator is right though we should have a test for bad dns seed.

16:22 < cfields> we could add that to the travis cron job

16:22 < wumpus> I agree, would be somewhat tricky to spin up a fake dns server in the test framework though, though maybe python has something easy for that I don't know

16:23 < cfields> oh, i thought the concern was not knowing that dns seeds were down

16:23 < sdaftuar> i think the concern is making sure we handle it when they are? or really both i guess...

16:24 < provoostenator> I'll keep the bisect going just in case. Leaving it running for 25 minutes until I "git bisect good" a commit, so it will take few hours.

16:24 < cfields> 2018-02-01 16:23:47 Closing bad socket: 1266668816

16:24 < cfields> 2018-02-01 16:24:30 Closing bad socket: 100640016

16:24 < cfields> yep, that's it

16:39 < provoostenator> Mmm, if it's DNS related bisect might end up finding the PR where my seed was merged. Anyway, we'll see.

17:00 < bitcoin-git> [bitcoin] promag opened pull request #12327: [gui] Defer coin control instancing (master...2018-02-fix-12312) https://github.com/bitcoin/bitcoin/pull/12327

17:03 < instagibbs> promag, how come that error doesn't result in *all* qt coin control settings getting ignored?

17:04 < promag> instagibbs: I think the only affected are `change_type = g_change_type;` and `signalRbf = fWalletRbf;`

17:05 < promag> because these are set with argument

17:07 < * instagibbs> looking at how feerates are set

17:07 < promag> and CoinControlDialog::coinControl->SetNull() is only when the "Enable coin control features" is unchecked

17:08 < promag> instagibbs: yeah didn't look at that

17:12 < provoostenator> ^ created the above to track my bisect progress

17:12 < provoostenator> Well, below: #12328

17:12 < gribble> https://github.com/bitcoin/bitcoin/issues/12328 | Consistent crashes for v0.16.0rc1 · Issue #12328 · bitcoin/bitcoin · GitHub

17:15 < instagibbs> promag, your analysis looks correct, those two should be those effected

17:53 < provoostenator> Produced a crash using an older version: https://github.com/bitcoin/bitcoin/commit/16bac24f60fa3ae27cb2d9d89dfdd245694445d4

17:53 < provoostenator> Four more bisect steps to go

17:53 < provoostenator> Well, that's just 7 days ago...

17:54 < provoostenator> I added the log to the above issue.

17:54 < ProfMac> "should" the DNS discover use IPv4 when onlynet=IPv6?

17:56 < wumpus> AFAIK there's no way in the libc API to do DNS resolving only over either IPv4 or IPv6

17:57 < wumpus> and as many modern linux distros run a DNS cache on localhost, on the IPv4 loopback, that'd effectively mean that DNS seeding cannot be used when onlynet=ipv6

18:04 < wumpus> (hm, how many ISPs give out IPv6 DNS servers in the first place?)

18:06 < wumpus> probably only those that only give clients a IPv6 address

18:07 < wumpus> I don't know, it's an interesting though experiment, but I think in practice it's good that use of dns seeding or not is a separate optoin

18:21 < cfields> wumpus: you can kinda fudge it with AI_ADDRCONFIG, as we do

18:22 < bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/895fbd768f0c...84291d18dd69

18:22 < bitcoin-git> bitcoin/master 96dbd38 Cory Fields: net: initialize socket to avoid closing random fd's

18:22 < bitcoin-git> bitcoin/master 84291d1 Wladimir J. van der Laan: Merge #12326: net: initialize socket to avoid closing random fd's...

18:22 < cfields> oh, you mean the resolver itself

18:22 < bitcoin-git> [bitcoin] laanwj pushed 1 new commit to 0.16: https://github.com/bitcoin/bitcoin/commit/e54c1ac110664efd58b7351139da55284f58f2ca

18:22 < bitcoin-git> bitcoin/0.16 e54c1ac Cory Fields: net: initialize socket to avoid closing random fd's...

18:23 < bitcoin-git> [bitcoin] laanwj closed pull request #12326: net: initialize socket to avoid closing random fd's (master...fix-socket-init) https://github.com/bitcoin/bitcoin/pull/12326

18:23 < cfields> wumpus: i think there's another issue there, introduced by (my suggestion in) 11512

18:24 < wumpus> cfields: yes, I was assuming he meant the network used by the resolver itself, selecting only one kind of results sounds feasible

18:24 < cfields> I believe that failed resolves will end up forever connecting as oneshots, since failed oneshots get re-added

18:24 < wumpus> whoops

18:25 < cfields> looking into it

18:48 < cfields> actually, anyone know why oneshots are re-added in the first place? https://github.com/bitcoin/bitcoin/blob/master/src/net.cpp#L1685

18:49 < cfields> sipa maybe ?

18:49 < wumpus> I don't know, I woudln't have expected that either

18:49 < wumpus> if it's one-shot - if it fails, that was the shot

18:50 < cfields> right

18:50 < wumpus> I'm fairly sure I've used 'addnode ... onetry' in the past to probe if a certain node was up, not expecting it to try forever

18:51 < BlueMatt> cfields: if your network is done

18:51 < BlueMatt> or something like that

18:52 < BlueMatt> there was some pr that made things more robust if your net is down

18:52 < BlueMatt> maybe that is related?

18:52 < cfields> BlueMatt: wouldn't that same logic apply to everything, not just oneshots?

18:52 < BlueMatt> uhhhh, uhhhhh

18:52 < BlueMatt> ?

18:53 < wumpus> for non-oneshot connections I could understand it better

18:53 < wumpus> maybe the logic is the wrong way around?

18:54 < cfields> looks like it's been this way since they were introduced: 478b01d9a797f3ea

18:56 < cfields> heh, when provoostenator gets his seed back up and running, he'll likely be DDoS'd like crazy

18:56 < wumpus> maybe it's been the wrong way around since the beginning, and no one ever reasoned this far

18:56 < wumpus> well at least there haven't been rc1 binaries, so the scale of DoS is likely limited :)

18:57 < cfields> heh

18:59 < cfields> i'll go ahead and PR the removal, maybe someone will chime in with a valid reason otherwise

18:59 < wumpus> agreed

19:00 < wumpus> #startmeeting

19:00 < lightningbot> Meeting started Thu Feb 1 19:00:10 2018 UTC. The chair is wumpus. Information about MeetBot at http://wiki.debian.org/MeetBot.

19:00 < lightningbot> Useful Commands: #action #agreed #help #info #idea #link #topic.

19:00 < provoostenator> hi

19:00 < wumpus> #bitcoin-core-dev Meeting: wumpus sipa gmaxwell jonasschnelli morcos luke-jr btcdrak sdaftuar jtimon cfields petertodd kanzure bluematt instagibbs phantomcircuit codeshark michagogo marcofalke paveljanik NicolasDorier jl2012 achow101 meshcollider jnewbery maaku fanquake promag provoostenator

19:00 < achow101> hi

19:00 < jonasschnelli> hi

19:00 < cfields> hi

19:00 < sdaftuar> ack

19:00 < jcorgan> hey folks

19:01 < meshcollider> hi

19:01 < luke-jr> hi

19:01 < BlueMatt> 0.16!

19:01 < meshcollider> \o/

19:01 < wumpus> so with regard for 0.16.0 status, there already have been some issues that came up with rc1, so I think it makes sense to skip uploading binaries for that and go to rc2 soon

19:01 < BlueMatt> ack

19:01 < achow101> what issues?

19:01 < cfields> agreed

19:01 < cfields> achow101: see backlog for the last ~3hrs

19:02 < wumpus> https://github.com/bitcoin/bitcoin/milestone/30

19:02 < wumpus> the serious issue is https://github.com/bitcoin/bitcoin/issues/12323

19:02 < achow101> ok

19:03 < wumpus> there's another issue with onetry connections being re-tried forever, resulting in potential DoS on DNS seeds in the case they temporarily fail

19:03 < wumpus> cfields is working on a patch for that

19:03 < instagibbs> oh right, thursday

19:03 < wumpus> https://github.com/bitcoin/bitcoin/pull/12327 fixes a more minor issue with coin control and the change type setting

19:04 < wumpus> in the gui

19:04 < jonasschnelli> by the way, is it a policy that a DNS seed also runs a node (same ip) for the oneshot?

19:04 < wumpus> jonasschnelli: no, that's not necessary

19:04 < wumpus> jonasschnelli: it looks up the DNS seed which will return a (the first?) node

19:04 < jonasschnelli> wumpus: okay. My seeders will refuse connections to 8333

19:05 < jonasschnelli> wumpus: okay.

19:05 < wumpus> jonasschnelli: that is not the IP of the DNS server. I was confused about that too at some point in the past.

19:05 < jonasschnelli> I think also has something do to with tor mode

19:05 < provoostenator> Using A records is what makes it confusing

19:05 < cfields> yea, it's just some random peer

19:06 < BlueMatt> jonasschnelli: yes, you'd have to include your own ip in the dnsseed to (maybe) get the oneshot to be you, but that would be bad, and a violation of dnsseed policy (IIRC)

19:06 < jonasschnelli> BlueMatt: sure.

19:06 < wumpus> yes, in tor mode no resolving is used to get multiple results (that'd require some SOCKS5 extension being used), so it uses a one shot to a random node as replacement

19:06 < jonasschnelli> But in tor only mode, don't we do a oneshot to the seeds?

19:06 < wumpus> no

19:06 < provoostenator> BlueMatt: and an effective way to DDOS yourself

19:07 < jonasschnelli> wumpus: okay. Thanks... never looked that up properly

19:07 < wumpus> it never looks up the IP of the DNS server at all, that's all happening below the libc abstraction

19:07 < wumpus> any topics?

19:10 < jonasschnelli> Everyone already back at work it seems

19:10 < wumpus> ok.. seems not... well please review anything under the 0.16.0 milestone, and anything added in the next day too

19:10 < sdaftuar> if we don't have more pressing things to discuss, i'd like to solicit feedback on #11739 (backdating p2sh /segwit v0 script rules) to genesis

19:10 < gribble> https://github.com/bitcoin/bitcoin/issues/11739 | Enforce SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS from genesis by sdaftuar · Pull Request #11739 · bitcoin/bitcoin · GitHub

19:10 < wumpus> then we'll tag rc2 after that

19:10 < wumpus> and try to find the next round of bugs :)

19:10 < bitcoin-git> [bitcoin] theuni opened pull request #12329: net: don't retry failed oneshot connections forever (master...no-infinite-oneshot) https://github.com/bitcoin/bitcoin/pull/12329

19:10 < wumpus> sdaftuar: okay

19:11 < sdaftuar> mostly i want to know if there are any concept NACKs

19:11 < wumpus> #topic enforce SCRIPT_VERIFY_P2SH and SCRIPT_VERIFY_WITNESS from genesis (sdaftuar)

19:12 < sdaftuar> and i guess the other question is confirming whether/how such a change should be documented

19:12 < cfields> +0

19:12 < cfields> sdaftuar: going forward, you mean? or this one?

19:12 < sdaftuar> both?

19:12 < sdaftuar> i drafted a BIP for this one

19:13 < cfields> well going forward, i think we could specify this intention as part of a soft-fork bip?

19:13 < wumpus> no NACK from me, if the code can be simplified that way then it's great

19:13 < BlueMatt> +1

19:13 < wumpus> it doesn't change the rules enforced for current blocks, does it?

19:13 < wumpus> how is it a softfork?

19:13 < sdaftuar> no effect on current blocks

19:13 < wumpus> if it's a softfork I am misunderstanding

19:14 < BlueMatt> its a soft spoon - only prevents a 6-month reorg from removing segwit :p

19:14 < luke-jr> it restricts the rules on older blocks

19:14 < sdaftuar> it's a softfork under a technical definition

19:14 < BlueMatt> not a fork

19:14 < sdaftuar> of making valid things now invalid

19:14 < cfields> sorry, my fault.

19:14 < morcos> +1 as well.. but i do have concerns about how we could do this on a going forward basis

19:14 < wumpus> oh, right

19:14 < provoostenator> Or just Buried Deployment?

19:14 < wumpus> but it makes no difference bencause the old blocks all qualify

19:14 < luke-jr> the question comes down to, are we limiting soft/hardfork definitions to only ones that affect future blocks?

19:14 < morcos> it seems like if this is always our intention, then as soon as we announce a future soft fork

19:14 < morcos> some jack ass is going to mine violations just to make us annoyed

19:15 < BlueMatt> luke-jr: yes, we should start calling buried deployments spoons

19:15 < luke-jr> or do we consider this an implementation detail?

19:15 < cfields> morcos: true

19:15 < sdaftuar> morcos: i think it's not really worth worrying about that

19:15 < wumpus> I see this as an implementation detail to validation

19:15 < wumpus> there's no need to cause a lot of rufus about it

19:15 < wumpus> if you call it softfork you'll have the miners in arms, and whatnot

19:15 < cfields> luke-jr: well if there's an absolutely massive reorg, it's not just an implementation detail, no?

19:15 < morcos> well it addresses cfields point about having the original BIP specify the intention. i think we should always only consider backdating after the fact

19:16 < sdaftuar> morcos: oh i see your point

19:16 < wumpus> because then it also needs to be signaled some way, I guess

19:16 < luke-jr> cfields: if there's an absolutely massive reorg, it's unclear what the outcome will be period

19:16 < luke-jr> cfields: for example, Knots has a checkpoint on a Segwit block

19:16 < cfields> well isn't the intention here to clarify that outcome?

19:16 < BlueMatt> if there's a 6 month reorg there will be debate as to whether to follow it...whether we follow it or not ends up being a community question anyway :p

19:16 < wumpus> if there's a reorg that big that all segwit blocks are reorged out, well...

19:16 < sdaftuar> i think in this case, it's clear that segwit transactors do not intend for their funds to be spendable on a segwit-inactive chain

19:17 < wumpus> yes, I'm sure there will be discussion enough in that case

19:17 < sdaftuar> so backdating the segwit rules matches consensus, in that sense

19:17 < BlueMatt> so, definitely not a fork

19:17 < wumpus> right

19:18 < luke-jr> in which case, I don't see that we need a BIP for it. I suggest we make a new repo for Core-specific documentation like this.

19:18 < cfields> morcos: i half agree about after-the-fact. Not mentioning backdating with the intention of doing so anyway is a bit... iffy

19:18 < luke-jr> BIPs are for cross-software standards, which doesn't really include implementation details

19:18 < BlueMatt> seems fine to me, I also appreciate gmaxwell's partially-joking suggestion of calling it a spoon

19:18 < wumpus> hehe

19:18 < luke-jr> (actually, we have a repo for gitian docs already, right?)

19:18 < sdaftuar> i personally think that it's helpful to put it in a BIP, because it affects the implementation of existing BIPs

19:18 < BlueMatt> and then doing a bip and just saying "Soft Spoon"

19:19 < sdaftuar> but i don't feel strongly

19:19 < BlueMatt> i mean we use BIPs for things that are core-specific anyway, like getblocktemplate

19:19 < wumpus> but in the end it doesn't matter whether people implement this BIP

19:19 < wumpus> because it's an implementation detail

19:19 < luke-jr> BlueMatt: GBT isn't Core-specific

19:19 < sdaftuar> wumpus: agreed

19:19 < sdaftuar> it's an informational BIP

19:19 < cfields> luke-jr: there are several post-mortem BIPs

19:19 < wumpus> BlueMatt: well that's an interface! interfaces need documentation

19:19 < kanzure> hi.

19:20 < luke-jr> maybe we can put it in an annex for the BIPs it affects or something? just seems like it will get old to have two BIPs for every fork

19:20 < wumpus> if a softspoon drops at 300000 blocks deep and no one hears it, did it happen at all?

19:20 < luke-jr> one for the deployment and implementation, and another for the reinterpretation of the deployment

19:20 < sdaftuar> luke-jr: that seems reasonable to me as well, if the BIP authors agree?

19:22 < wumpus> luke-jr: agree

19:22 < luke-jr> none of the authors appear to be here now, but I doubt they'd object

19:22 < luke-jr> at least for Segwit

19:23 < provoostenator> And the winner of the git bisect game is....

19:23 < provoostenator> bluematt! https://github.com/bitcoin/bitcoin/commit/62e764219b25f5d5a4de855e53f62c43130ec918

19:23 < BlueMatt> we already decided it was cfields' fault

19:24 < sdaftuar> i knew it was both of you

19:24 < * BlueMatt> is gonna keep repeating that until someone buys it

19:24 < sdaftuar> :)

19:24 < * BlueMatt> *definitely* didnt also review and ack the bug-introducing pr......

19:24 < cfields> heh it was me for sure

19:24 < cfields> the busted part of 62e7642 was even my suggestion!

19:25 < wumpus> it's everyone's fault for not finding the fault in review! :)

19:25 < sdaftuar> +1

19:25 < cfields> well, sorry everyone. I'm glad it didn't make it into a release.

19:25 < luke-jr> I'm glad someone noticed it before a release XD

19:25 < wumpus> the rc process, it works

19:26 < cfields> yea, the surge of reports in the last ~day is actually really nice to see

19:26 < provoostenator> Bad linux skills on my part (causing the seed not to restart): it works

19:26 < sdaftuar> good thing it was down or we never would have found this before release!

19:26 < sdaftuar> (which is very disturbing)

19:26 < wumpus> we do need tests for the DNS seed code

19:27 < meshcollider> clearly

19:27 < provoostenator> +1 for tests there

19:27 < cfields> agreed. I'll add some.

19:27 < BlueMatt> lol, maybe keep your dnsseed down until after rc cycle? :P

19:27 < wumpus> any other topics?

19:28 < cfields> or we could just add a dummy seednode to test.com or something :p

19:30 < wumpus> testing that code is not entirely trivial as-is as you somehow need to redirect DNS resolving

19:30 < wumpus> if no other topics, I'm closing the meeting early

19:30 < wumpus> #endmeeting

19:30 < lightningbot> Meeting ended Thu Feb 1 19:30:43 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)

19:30 < lightningbot> Minutes: http://www.erisian.com.au/meetbot/bitcoin-core-dev/2018/bitcoin-core-dev.2018-02-01-19.00.html

19:30 < lightningbot> Minutes (text): http://www.erisian.com.au/meetbot/bitcoin-core-dev/2018/bitcoin-core-dev.2018-02-01-19.00.txt

19:30 < lightningbot> Log: http://www.erisian.com.au/meetbot/bitcoin-core-dev/2018/bitcoin-core-dev.2018-02-01-19.00.log.html

19:30 < cfields> sgtm

19:30 < luke-jr> noooooooooooooooooo

19:30 < sdaftuar> ack

19:30 < gmaxwell> damnit

19:31 < gmaxwell> missed the meeting by 30 seconds.

19:31 < luke-jr> lol

19:31 < gmaxwell> Without reading the logs, can we apply the fixes for that fd issue and RC2 like lightning?

19:31 < sdaftuar> i believe that is the plan

19:31 < wumpus> yes, that is the plan

19:32 < gmaxwell> Good.

19:32 < wumpus> merge+backport #12323 and #12312 and tag rc2, and just skip executables for rc1

19:32 < gribble> https://github.com/bitcoin/bitcoin/issues/12323 | File descriptor problem, causing leveldb crash · Issue #12323 · bitcoin/bitcoin · GitHub

19:32 < gribble> https://github.com/bitcoin/bitcoin/issues/12312 | QT ignores -changetype=bech32 when coin control features are enabled · Issue #12312 · bitcoin/bitcoin · GitHub

19:34 < provoostenator> Meanwhile I'm checking if #12326 actually makes the crash go away now, as well as whether removing my seed from v0.16.0rc1 makes it go away (very likely yes).

19:34 < gribble> https://github.com/bitcoin/bitcoin/issues/12326 | net: initialize socket to avoid closing random fds by theuni · Pull Request #12326 · bitcoin/bitcoin · GitHub

19:35 < jcorgan> hey guys, just fyi, i've passed on the management/maintainer/architect roles in gnuradio to new hands (after 12 years), to focus on bitcoin-related work full-time. part of that will be getting back into core dev process. very much looking forward to the change of pace.

19:35 < gmaxwell> jcorgan: awesome!

19:35 < cfields> wumpus: and 12329 ?

19:36 < wumpus> jcorgan: congratulations!

19:36 < gmaxwell> (I mean, nooo sucks for gnuradio; and all my SDR projects)

19:36 < cfields> jcorgan: very cool. 12 years is a long time

19:37 < wumpus> cfields: huh I meant that one

19:37 < jcorgan> i think that's half over some people here's lifetime

19:37 < jcorgan> *of

19:37 < wumpus> oh cool maybe I can go to gnu radio now

19:37 < sdaftuar> wait what

19:37 < cfields> jcorgan: you're Bitcoin Core maintainer now! congrats.

19:37 < achow101> jcorgan: more than half

19:38 < instagibbs> \o/

19:39 < instagibbs> who here is best to bug about gitian? I'd hate to blow away my gitian directory yet again...

19:39 < sdaftuar> who is going to answer yes to that question

19:39 < instagibbs> someone who loves... pain

19:39 < achow101> instagibbs: probably cfields

19:39 < * cfields> points at achow101

19:39 < sdaftuar> lol

19:39 < cfields> instagibbs: what's the issue?

19:39 < * instagibbs> throws VM out the window

19:40 < instagibbs> ill DM, it's likely intractible

19:40 < wumpus> it's surprising in how many different ways gitian can fail

19:40 < cfields> this is going to be the least sexy slide into a DM ever...

19:40 < promag> o/ late..

19:41 < achow101> gitian is kinda outdated. it looks like its doing things that aren't really supported anymore

19:41 < jcorgan> anyway, carry on, i have to go fly a plane in the sky

19:42 < instagibbs> cfields, lol

19:42 < wumpus> lol indeed

19:43 < meshcollider> 0.16 crashes my gitian build too grr 0.15.1 was ok

19:43 < meshcollider> I probably need to upgrade something

19:43 < instagibbs> trying as far back as 0.14.1, not working

19:43 < wumpus> strange, nothing should have changed in that regard

19:43 < instagibbs> (something I know this VM did correctly once)

19:43 < wumpus> still uses the same version of ubuntu for buildling, still uses the same packages, etc

19:44 < achow101> I bet some package updated and that just broke everything

19:51 < instagibbs> cfields fixed my issue... was looking at my even older version of gitian that I had failed to rename to something sensible

19:53 < bitcoin-git> [bitcoin] jonasschnelli pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/84291d18dd69...41363fe11df5

19:53 < bitcoin-git> bitcoin/master 6558f8a João Barbosa: [gui] Defer coin control instancing...

19:53 < bitcoin-git> bitcoin/master 41363fe Jonas Schnelli: Merge #12327: [gui] Defer coin control instancing...

19:54 < bitcoin-git> [bitcoin] jonasschnelli closed pull request #12327: [gui] Defer coin control instancing (master...2018-02-fix-12312) https://github.com/bitcoin/bitcoin/pull/12327

19:56 < gmaxwell> Would it be entirely crazy to make debug=1 not enable all debugging options, but a "many" option, that turns off absurdly chatty stuff (leveldb internals would be the one thing now).

19:56 < gmaxwell> ?

19:56 < provoostenator> +1

19:57 < gmaxwell> I don't think any of us have ever found the leveldb internals logging useful so far. And I know I always disable it and curse when I accidentally level it on.

19:57 < sdaftuar> that seems to be everyone's experience afaik

19:57 < gmaxwell> er leave it on.

19:58 < gmaxwell> I suppose I've learned a bit about how much background stuff leveldb does due to it. :)

19:58 < wumpus> I don't like the chatty net logging either

19:58 < gmaxwell> At least I've found that stuff useful. I'd be okay with it off in debug=1 though I'd often turn it on.

19:58 < wumpus> which is why I opened #12219, I think we need a DEBUG..ERROR axis as well

19:59 < gribble> https://github.com/bitcoin/bitcoin/issues/12219 | More granular net logging · Issue #12219 · bitcoin/bitcoin · GitHub

19:59 < sdaftuar> yeah we should tackle that

19:59 < provoostenator> The flickering makes it hard to watch an IBD in real time.

19:59 < gmaxwell> or at least chatty net doesn't bother me so much since upgrading nodes to 1TB ssds...

19:59 < wumpus> apart from the category we also need a log level, that'd make logging things more sane without singling out specific categories

19:59 < wumpus> or splitting up categories

20:00 < wumpus> I think debug=1 should remain 'log everything possible', which doesn't rule out more selective logging options

20:00 < gmaxwell> I'd buy a debug vs error sort of axis though I'm doubtful about a really granular level, as people will irritatingly set them pretty subjectively.

20:01 < gmaxwell> or I suppose if we just had a debug=many or debug=most that would be fine too. I seem to screw up debug=1 and then excluding for some reason.

20:01 < wumpus> or something like debug=1 debug=-leveldb

20:02 < gmaxwell> also having a single setting is more useful when I'm asking users to set things.

20:02 < wumpus> allowing categories to be disabled

20:02 < cfields> gmaxwell: agreed. -debug and -debug=all don't have to be the same thing

20:02 < gmaxwell> we have -debugexclude

20:02 < wumpus> but yes it is subjective and dpeends on what you want to debug

20:02 < wumpus> which is why making a single selection for -debug=1 seems weird to me

20:03 < wumpus> some messages might be less interesting for what you're debugging, but that's why we have categories in the first place

20:03 < gmaxwell> Though also for some of this user stuff, I think it would be very useful to have a circular buffer in ram that always gets a MUCH higher debug level than what goes to disk (e.g. every debug option that isn't computationally expensive)

20:03 < gmaxwell> and on crashes we could make a best effort attempt to dump the circular buffer to a file in a crash handler.

20:04 < wumpus> debug=1 is super-overkill last resort for when you really don't know where to look

20:04 < wumpus> gmaxwell: yes, we need that too

20:04 < gmaxwell> wumpus: or when round trips are expensive. if I have a user reporting an issue I don't want to iterate on a bunch of options. I just want all the info that might be relevant, but the leveldb stuff is so far useless and very bloaty.

20:04 < promag> -debug-net -debug-foo (-debug enables all)?

20:05 < wumpus> we already have -debug=net -debug=foo

20:05 < gmaxwell> promag: you can already -debug=category to add categories and -debugexclude to remove them.

20:05 < wumpus> gmaxwell: still you need a selection of categories then

20:05 < promag> but you can't have levels right?

20:05 < gmaxwell> right now I tell users to debug=1 and debugexclude leveldb.

20:06 < wumpus> gmaxwell: it's unlikely that you want debugging for ,say, torcontrol, even though that logging is extremely useful if you're debugging that

20:06 < gmaxwell> wumpus: tor control isn't that chatty however.

20:06 < gmaxwell> net is chatty but one of the most useful thing to log.

20:06 < wumpus> but it might become so, or another chatty category could be added

20:06 < wumpus> that's pretty subjective though

20:06 < wumpus> because you're interested in net

20:07 < gmaxwell> probably if I could grab a circular buffer with everything the question wrt support would go away.

20:07 < wumpus> currently I have the problem that I'm interested in high-level network stuff, e.g. incoming connections, outgoing connections, what clients are connecting, what are their IPs, when do they disconnect. I don't need to see every single packet.

20:08 < wumpus> but debug=net is way too chatty

20:08 < gmaxwell> I agree that net messages and net-activity should be split.

20:08 < gmaxwell> I do frequently like net-activity for debugging because from that I can more or less trace the state that the node is in.

20:09 < wumpus> sure, I don't mwan that detailed net logging should go away or such

20:09 < wumpus> it certainly has good uses when you're debugging network things

20:09 < wumpus> in any case I think a log level would resolve some of these problems

20:09 < wumpus> 'I want to see all categories, but only at INFO levell, not DEBUG and below'

20:10 < wumpus> where DEBUG would be the chatty stuff

20:10 < promag> hence -debug-net=X

20:10 < gmaxwell> Please lets not make more than three levels though.

20:10 < wumpus> then if you want net DEBUG, you enable net debug

20:10 < wumpus> I agre,but let's not get into bikeshedding about the number of levels

20:11 < gmaxwell> my concern there is if there are a dozen levels, people will argue over the levels, or worse not argue over them and set them randomly and then I'll just have to be debugging everything to avoid inexplicably missing log entries.

20:11 < wumpus> ERROR is clear, INFO/DEBUG can be set depend on chattiness

20:11 < wumpus> I don't think we need more

20:12 < gmaxwell> At least my expirence is that I usually want almost everything or nothing. (basically, everything that isn't so chatty that it makes handling the logs a burden)

20:12 < gmaxwell> Yes, thats why I said three. I think three we can handle easily.

20:12 < gmaxwell> I guess one question is about "error", there are "peer violated the protocol" sorts of errors, and "omg our state is corrupted" sorts of errors.

20:13 < wumpus> ERROR would be potentially dangerous but not fatal errors, maybe WARNING is a better name

20:13 < wumpus> peer violated the protocol is INFO imo

20:13 < wumpus> it's not dangerous to us

20:13 < gmaxwell> We might want to adapt our language to call the first things "abnormal" (e.g. info level log, and the log text should not use the word error but perhaps use the word abnormal).

20:14 < wumpus> right

20:15 < provoostenator> I'm switching my testnet seed back on tomorrow unless something really surprising happens.

20:16 < provoostenator> Or I can do it in 20 minutes if we want rc1 complaints to stop coming in.

20:16 < wumpus> but anyhow, if you come up with a specific combination of categories that would be useful as single -debug= option, I wouldn't be against it

20:17 < wumpus> provoostenator: we should just tag r2

20:18 < provoostenator> I'll know in ~10 minutes whether todays fixex made my crash go away (fairly certain it did).

20:20 < gmaxwell> wumpus: okay. Well right now I think all minus leveldb internals is useful, at that seems like what a lot of us are using much of the time.

20:20 < wumpus> gmaxwell: I think it needs a better definition though, something like debug=lowtraffic

20:21 < wumpus> not everythingbutleveldb lol

20:21 < wumpus> it also gives guidance for people to add or not categories to that in the future

20:22 < wumpus> or to remove categories once they become noisy

20:22 < gmaxwell> well net is high traffic but useful, leveldb is high traffic but so far I've not found it useful.

20:22 < wumpus> but what is the rationale of the combination then?

20:22 < gmaxwell> omit useless chatty things.

20:23 < wumpus> let's kill leveldb logging completely if it's so useless

20:24 < wumpus> libevent too, probably

20:24 < meshcollider> it seems my gitian issue lies when it is trying to download zeromq-4.2.2.tar.gz for bitcoincore.org/depends-sources

20:24 < wumpus> it can be even more hilariously useless

20:24 < gmaxwell> It's at least of conjectural use e.g. if we were chasing some leveldb internal bug.

20:25 < wumpus> oh sure it can be useful, esp when adding custom debugging to leveldb to troubleshoot some issue

20:25 < wumpus> that's why a bridge exists

20:25 < cfields> meshcollider: you probably had all other sources already and your lxc is failing to make any net connections

20:25 < gmaxwell> I mean we could also leave the code for the bridge there but remove the logging level.

20:25 < meshcollider> cfields: that's true, was the version of zeromq bumped for 0.16

20:26 < cfields> believe so

20:26 < cfields> yes, it was

20:26 < gmaxwell> I opened the conversation basically with the idea of having leveldb (and maybe later other things) being log categories you never get unless you explicitly ask for them.

20:26 < gmaxwell> maybe libevent would fall into that too.

20:27 < gmaxwell> I've noticed it being useless too but just never had it be chatty enough to bother me.

20:27 < wumpus> so I still think =lowtraffic makes sense

20:27 < gmaxwell> I suppose we could lowtraffic then turn back on net-messages.

20:28 < gmaxwell> (I mean to achieve my normal desired logging config which is basically low traffic things plus net messages)

20:28 < wumpus> we should just include net int hat

20:28 < gmaxwell> I'm imagining net divided into per-message logging and the rest (e.g. connections)

20:28 < wumpus> with the future remark that if there is a low-traffic net category, that should be in instead

20:28 < gmaxwell> right

20:29 < meshcollider> cfields: yep you're right, it works fine if I make the depends directory beforehand, the new zeromq downloaded successfully

20:29 < cfields> great

20:29 < meshcollider> cfields: thanks :)

20:29 < cfields> np

20:30 < gmaxwell> Another thing to think about is the performance impacts of logging, some of our logs cause computation that is probably pretty bad for performance.

20:32 < wumpus> I think that's true for libevent and leveldb logging too, they require a special enabling flag, which causes those libraries to send the messages at all

20:33 < wumpus> gmaxwell: that'd only be problematic for high-volume messages, I'm sure e.g. computing the BENCH messages takes some cycles, but they only happen once per block

20:34 < wumpus> and in the total validation tme that's probably neglible

20:34 < gmaxwell> The leveldb stuff looks kind of expensive.

20:34 < wumpus> I don't think we have low-traffic messages that take significant computation

20:35 < gmaxwell> And I recall that there were some moderate traffic messages that did stuff like an extra iteration over all inputs in transactions or something.

20:35 < wumpus> for high traffic, yes, I woudln't be surprised if the net logging slowed some things down

20:35 < wumpus> if you have to write a message for every incoming packet to a file, it becomes disk bound

20:36 < wumpus> oh I didn't know that

20:39 < zelest> sorry for asking in here, but I did some quick googling and it seems like OneFixt is known in here? May I ask who he/she is? :o

20:40 < wumpus> can we have some review on #12329 please, I'd like to tag rc2 before I go to bed

20:40 < gribble> https://github.com/bitcoin/bitcoin/issues/12329 | net: dont retry failed oneshot connections forever by theuni · Pull Request #12329 · bitcoin/bitcoin · GitHub

20:50 < wumpus> and that's the last one

21:51 < sipa> oops, seems i accidentally left this channel and forgot about the meeting as well

22:13 < luke-jr> sipa: there was discussion of how to handle Segwit-back-to-the-start type stuff, and I thought perhaps it would be better as an annex to the Segwit BIP(s) rather than an entirely new BIP (making 2 BIPs for each fork); would you be okay with that, as an author of the BIP in question?

22:16 < gmaxwell> and then you're gonna add (hardfork) to the segwit bips and collect your cheque from Ver? :P

22:17 < jnewbery> re logging - we already have an alias -debug=all which aliases to -debug=1 (~0). Doesn't seem like to much of a stretch to add a new alias debug=gmax* which aliases to all the categories that greg wants to see (*name TBD)

22:18 < jnewbery> perhaps -debug=standard , -debug=default, -debug=...

22:18 < jnewbery> I also agree that logging should log to a circular buffer in memory and then have a background thread flushing to disk. I bet there are plenty of places where we're logging to disk while holding cs_main for example

22:21 < jnewbery> (in fact I started implementing that a few months ago, but never finished)

22:22 < gmaxwell> I'd like to get to a state where our logs by default can log virutally nothing, and on fault we can dump a good amount of context.

22:24 < gmaxwell> this also would address a lot of privacy concerns, where we avoid logging detailed data that would make node logs attractive for trying to trace transactions.

22:24 < luke-jr> gmaxwell: as an annex, it's clearer to consider it an implementation detail ;)

22:24 < luke-jr> ie, "you can tighten the rules using method A or method B, and the outcome is the same"