< luke-jr> I wonder if it would be better to prune debug.log with fallocate/FALLOC_FL_PUNCH_HOLE
< bitcoin-git> [bitcoin] tjps closed pull request #10059: [trivial] Removed one Boost usage and headers in util.cpp (master...tjps_boost) https://github.com/bitcoin/bitcoin/pull/10059
< wumpus> achow101: there's been discussion of having a signal to prune the debug log on command, none exists though. SIGHUP does reopen it, so you could rotate it using a stock log rotator
< wumpus> with default settings the debug log grows only very slowly so this is not an issue for most users
< wumpus> and developers that enable extra debug options tend to not want to automatically lose information
< wumpus> e.g. to collect info over a complete sync cycle
< gmaxwell> the way the prune thing works kinda stinks in any case. when I get log data from users it very frequently has omitted the interesting part, because they restart before asking for help.
< gmaxwell> and so I can't see why their node has rejected a valid block.
< jonasschnelli> BlueMatt: I can't follow your comment: https://github.com/bitcoin/bitcoin/pull/9681/files#r108679775 ... can you elaborate?
< bitcoin-git> [bitcoin] jtimon opened pull request #10119: Util: Remove ArgsManager wrappers: (master...0.14-args-wrappers) https://github.com/bitcoin/bitcoin/pull/10119
< bitcoin-git> [bitcoin] MarcoFalke pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/f34cdcbd806d...8ac804128671
< bitcoin-git> bitcoin/master 159fe88 John Newbery: Remove SingleNodeConnCB...
< bitcoin-git> bitcoin/master 8ac8041 MarcoFalke: Merge #10109: Remove SingleNodeConnCB...
< bitcoin-git> [bitcoin] MarcoFalke closed pull request #10109: Remove SingleNodeConnCB (master...remove_single_node_conn_cb) https://github.com/bitcoin/bitcoin/pull/10109
< wumpus> FALLOC_FL_PUNCH_HOLE wouldn't improve that; though maybe the aggressive name gives some outlet when there's yet another issue with useless debug logs
< wumpus> fanquake: it's eating blocks instead of syncing them!
< wumpus> (yes yes '100% progress' is a moving target, so staying in the same place moves you backwards, relatively, you have to run to stay in the same place)
< wumpus> I guess we should add a max(0, progress)
< fanquake> wumpus any schedule for 0.14.1 ? I'm going to remove 9464, not important enough to be fixed.
< wumpus> fanquake: no precise shedule, though agreement seems to be that there should be a 0.14.1 'soon'
< fanquake> Only two issues tagged now, are there any more sitting around that should be making it in?
< wumpus> it's still blocked on the out of memory problems, we're going to PR some workarounds for 0.14
< wumpus> yep I'm working on one and sipa should have one
< fanquake> ok
< wumpus> agree about #9464, as a display problem it's fairly low priority and should at least not block 0.14.1
< gribble> https://github.com/bitcoin/bitcoin/issues/9464 | [GUI] No GUI updates when running with --reindex or --reindex-chainstate · Issue #9464 · bitcoin/bitcoin · GitHub
< wumpus> but if someone fixes it it and the solution is not too invasive should be backported
< bitcoin-git> [bitcoin] fanquake closed pull request #10089: Fix shadowing of 'what' as described in #10080. (master...fix-what-shadowing) https://github.com/bitcoin/bitcoin/pull/10089
< wumpus> hrm some RPCs such as getmemoryinfo could be available from the point the RPC server is launched, not just after iniitalization finishes
< wumpus> also 'stop'
< bitcoin-git> [bitcoin] laanwj opened pull request #10120: util: Work around (virtual) memory exhaustion on 32-bit w/ glibc (master...2017_03_address_space_exhaustion_workaround) https://github.com/bitcoin/bitcoin/pull/10120
< luke-jr> wumpus: maybe debug.log should only be pruned when the node considers itself synced + some time has passed?
< luke-jr> (FALLOC_FL_PUNCH_HOLE possible improvement is that it can do it without rewriting the kept data)
< jonasschnelli> fanquake: does #9464 fixes the negative progress (https://imgur.com/a/Q5Dsq)?
< gribble> https://github.com/bitcoin/bitcoin/issues/9464 | [GUI] No GUI updates when running with --reindex or --reindex-chainstate · Issue #9464 · bitcoin/bitcoin · GitHub
< fanquake> jonasschnelli no, 9464 is just an issue I created that cover a few different GUI issues/quirks, I should add the negative progress output there as well.
< wumpus> btw travis is being flaky on master
< wumpus> not sure if this is caused by #9294
< gribble> https://github.com/bitcoin/bitcoin/issues/9294 | Use internal HD chain for change outputs (hd split) by jonasschnelli · Pull Request #9294 · bitcoin/bitcoin · GitHub
< jonasschnelli> oh... i'll check
< jonasschnelli> assumevalid.py failed...
< jonasschnelli> maybe a rebase/merge issue.
< wumpus> travis is failing on #10120 as well, but on wallet-hd.py
< gribble> https://github.com/bitcoin/bitcoin/issues/10120 | util: Work around (virtual) memory exhaustion on 32-bit w/ glibc by laanwj · Pull Request #10120 · bitcoin/bitcoin · GitHub
< wumpus> could also be a test framework related issue
< fanquake> jonasschnelli what are your thoughts on the OSX Growl notification code? I'm not sure that it's even being developed/supported anymore. They moved to selling it through the mac app store, and last update was in October 2013.
< wumpus> test_framework.authproxy.JSONRPCException: non-JSON HTTP response with '503 Service Unavailable' from server (-342)
< wumpus> while waiting for the node to come up... I think you'll get that if no handler is installed
< wumpus> there is a small window before handlers are registered
< wumpus> so not completely unexpected
< fanquake> wumpus did you just restart the build? I was in the middle on reading the log heh
< wumpus> yes I restarted the build, sorry :)
< wumpus> may well be that travis is just being slow and brings this issue to the surface
< jonasschnelli> wumpus: hmmm.. assumevalid.py failed/failes on master (CRON travis): https://travis-ci.org/bitcoin/bitcoin/jobs/216360061#L5396
< jonasschnelli> Can't reproduce locally
< jonasschnelli> (I try now on Ubuntu 14.04
< wumpus> I couldn't reproduce it locally either
< jnewbery> #10114 should fix assumevalid.py (and make assumptions in test cases more robust in general)
< gribble> https://github.com/bitcoin/bitcoin/issues/10114 | An error has occurred and has been logged. Please contact this bot's administrator for more information.
< wumpus> jnewbery: great!
< jnewbery> Travis is good (too good!) at uncovering sync and timing issues
< wumpus> yes
< BlueMatt> jonasschnelli: ie the user might call bumpfee, get an error saying they need to pay more, then call bumpfee again with a higher number, and get another error saying they need to pay /even/ more
< BlueMatt> which is ok, but kinda annoying
< BlueMatt> if they're already checked back-to-back, might as well merge them into one error
< BlueMatt> so that that cant happen
< bitcoin-git> [bitcoin] shitakee opened pull request #10121: Add missing header file (master...patch-1) https://github.com/bitcoin/bitcoin/pull/10121
< compumatrix> Is it really this silent in here?
< compumatrix> hi magicwund...
< magicwund> hey
< bitcoin-git> [bitcoin] jnewbery opened pull request #10123: Allow debug logs to be excluded from specified component (master...debugexclude) https://github.com/bitcoin/bitcoin/pull/10123
< bitcoin-git> [bitcoin] jnewbery opened pull request #10124: [test] Suppress test logging spam (master...suppress_test_logging_spam) https://github.com/bitcoin/bitcoin/pull/10124
< jonasschnelli> Reminder for the european CEST people. Meeting will be in 2h 10' (summer time now!).
< kanzure> i thought we had consensus to eliminate time zones?
< gmaxwell> kanzure: we did, but the rest of the world hasn't yet.
< bitcoin-git> [bitcoin] TheBlueMatt opened pull request #10125: Exit bitcoind immediately on shutdown instead of 200ms later (master...2017-03-faster-shutdown) https://github.com/bitcoin/bitcoin/pull/10125
< bitcoin-git> [bitcoin] jnewbery closed pull request #9630: Add logging to GetAncestor if pindex->pprev == NULL (master...crashlogging) https://github.com/bitcoin/bitcoin/pull/9630
< sdaftuar> wumpus: i think #9959 is ready for merge
< gribble> https://github.com/bitcoin/bitcoin/issues/9959 | Mining: Prevent slowdown in CreateNewBlock on large mempools by sdaftuar · Pull Request #9959 · bitcoin/bitcoin · GitHub
< wumpus> sdaftuar: ok!
< bitcoin-git> [bitcoin] laanwj pushed 4 new commits to master: https://github.com/bitcoin/bitcoin/compare/8ac804128671...cde9b1a864c1
< bitcoin-git> bitcoin/master eed816a Suhas Daftuar: Mining: return early when block is almost full
< bitcoin-git> bitcoin/master 42cd8c8 Suhas Daftuar: Add benchmarking for CreateNewBlock
< bitcoin-git> bitcoin/master 011124a Suhas Daftuar: Update benchmarking with package statistics
< bitcoin-git> [bitcoin] laanwj closed pull request #9959: Mining: Prevent slowdown in CreateNewBlock on large mempools (master...2017-03-cnb-optimizations) https://github.com/bitcoin/bitcoin/pull/9959
< sdaftuar> thanks!
< BlueMatt> wumpus: what restictions are there on what code you can call from signal handlers?
< sipa> BlueMatt: as far as I know, only modify simple variables
< wumpus> BlueMatt: almost nothing can be invoked from signal handlers because signals arrive synchronously, so can be called in any context from any thread
< BlueMatt> mmm
< wumpus> everything that has the slightest risk of crashing when called reentrantly can't be used
< wumpus> so yes, in practice most programs restrict to setting variables
< BlueMatt> hmm, alright
< wumpus> another used technique is a pipe
< jonasschnelli> *dong*
< gmaxwell> sdaftuar: plz backport #9959
< gribble> https://github.com/bitcoin/bitcoin/issues/9959 | Mining: Prevent slowdown in CreateNewBlock on large mempools by sdaftuar · Pull Request #9959 · bitcoin/bitcoin · GitHub
< gmaxwell> #bitcoin-core-dev Meeting: wumpus sipa gmaxwell jonasschnelli morcos luke-jr btcdrak sdaftuar jtimon cfields petertodd kanzure bluematt instagibbs phantomcircuit codeshark michagogo marcofalke paveljanik NicolasDorier
< kanzure> hi.
< sdaftuar> gmaxwell: will do (and here!)
< paveljanik> Hi!
< wumpus> *reading* from a pipe that has one byte in it, or something like that
< michagogo> Oh, right, these are back to 10 PM now that DST is back
< jtimon> hi
< BlueMatt> wumpus: yea, I get it, but man that sucks
< cfields> hi
< wumpus> yes, UNIX signals suck
< wumpus> #meetingstart
< wumpus> #startmeeting
< lightningbot> Meeting started Thu Mar 30 19:01:18 2017 UTC. The chair is wumpus. Information about MeetBot at http://wiki.debian.org/MeetBot.
< lightningbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
< instagibbs> hi
< jeremyrubin> here
< wumpus> BlueMatt: it shouldn't matter for the AbortNode() case though. Just make the conditional wait 200 milliseconds or so
< jtimon> topics?
< wumpus> doesn't matter if signals are slightly delayed
< BlueMatt> wumpus: the wait is already 200ms
< wumpus> yes, topics?
< BlueMatt> talk about abortnode
< BlueMatt> thats a reasonable topic
< wumpus> BlueMatt: yes, but make it a wait on a conditional
< BlueMatt> mmm, fair
< BlueMatt> yes, for sigterm its reasonable to wait a few 100 ms
< BlueMatt> for AbortNode it'll wake it immediately
< wumpus> right.
< BlueMatt> in other news, so we got 1/6 "Review Priority" PRs merged this week, that's ok, but we can do better :)
< BlueMatt> oh, nvm, got 2/6
< gmaxwell> BlueMatt: 2/6.
< wumpus> removed the two that have been merged
< jtimon> I don't think anyone commented in mine, at least I got review in others
< gmaxwell> I forgot about the project tracker thing, but reviewed some of them anyways.
< wumpus> doesn't matter, we'll only add pulls that were mentioned to have priority in the meeting to that project tracker, so if you pay attention at the meeting you don't need it :)
< wumpus> anyhow, everyone go review them
< wumpus> topic: 0.14.1?
< bitcoin-git> [bitcoin] sipa opened pull request #10126: Compensate for memory peak at flush time (master...peak_compensation) https://github.com/bitcoin/bitcoin/pull/10126
< gmaxwell> Yes, please.
< jeremyrubin> topic: slow tests?
< wumpus> #topic 0.14.1
< achow101> already?
< gmaxwell> yes. lots of nice fixes to ship. Minor releases are minor.
< sipa> we have 11 merged PRs marked 0.14.1
< wumpus> sipa: only one still has the "needs backport" tag, I think MarcoFalke ported the rest
< gmaxwell> I think when we get those three in and those and the recent ones backported, we should be goof for a 0.14.1.
< gmaxwell> good too.
< wumpus> agreed
< wumpus> ok, next topic I guess
< wumpus> #topic slow tests
< wumpus> I made an overview here: https://github.com/bitcoin/bitcoin/issues/10026 of the slowest unit tests
< jeremyrubin> A lot of it is my fault it seems. https://github.com/bitcoin/bitcoin/pull/10099 is ready to go
< wumpus> some of those have already been worked on or even PRs open to make them faster
< gmaxwell> jnewbery: if you're around, if 10106 doesn't move forward in a few hours, I recommend you create a new pr, cherry picking that one fix, with your new tests and the fix for the other function. (just because the PR wasn't opened by a regular so they may not be responsive or may not know how to use github well enough to pull your changes.)
< jnewbery> gmaxwell: I'll do that this afternoon
< wumpus> yes, tend to agree, they may not know how to cherry-pick them. I could cherry-pick them into his branch but meh
< MarcoFalke_lab> wumpus: Yes, dropping GetRand() gave a 4* speedup.
< gmaxwell> jnewbery: and thanks for the awesome response.
< jnewbery> np
< wumpus> MarcoFalke_lab: that'll at least move it down a bit in the top 20
< jeremyrubin> The cuckoocache tests require a bit more discussion to decrease the parameters; I can pick something arbitrary
< jeremyrubin> The checkqueue tests should also speed up a lot with #9938, but I'm preparing some tweaks to those
< gribble> https://github.com/bitcoin/bitcoin/issues/9938 | Lock-Free CheckQueue by JeremyRubin · Pull Request #9938 · bitcoin/bitcoin · GitHub
< Chris_Stewart_5> re tests: Has anyone given any thought to #8469 , obviously this pull request sacrafices speed for exhaustiveness
< gribble> https://github.com/bitcoin/bitcoin/issues/8469 | [POC] Introducing property based testing to Core by Christewart · Pull Request #8469 · bitcoin/bitcoin · GitHub
< jeremyrubin> If anyone has high-core machines, they should try benching how that works
< wumpus> jeremyrubin: alternatively we could have an -extended mode for the unit tests, like we have for the functional tests, to do extra-thorough tests that shouldn't run every time
< MarcoFalke_lab> jeremyrubin: If the parameters matter, you could provide a switch to test against quick_run and an expensive one
< wumpus> yes ^^
< gmaxwell> I think we should have an extended test, and make running it part of the release process.
< jeremyrubin> Agree
< wumpus> but for the standard 'make check' there should be a guideline of max ~1 second per test case
< MarcoFalke_lab> Everyone agrees! :)
< gmaxwell> I don't care if the tests take an hour to run if it's only a mandatory once in release thing.
< cfields> sgtm
< jeremyrubin> no, 2 seconds!
< wumpus> gmaxwell: yes or once per day or so on master
< jonasschnelli> (which is failing currently)
< gmaxwell> jonasschnelli: what is failing?
< wumpus> yes, travis is broken again, but there's a pull to fix that
< cfields> note to self: gitian should run whatever extended tests it can
< gmaxwell> oh travis
< MarcoFalke_lab> Jup, will merge the travis fix tonight when I have access to my keys. (If no one beats me to it)
< wumpus> does anyone have the # perhaps?
< sdaftuar> i think #10114?
< MarcoFalke_lab> Though, we should be cautious with putting too much into the cron jobs. The extended test rely on the cache being up to date
< gribble> https://github.com/bitcoin/bitcoin/issues/10114 | An error has occurred and has been logged. Please contact this bot's administrator for more information.
< MarcoFalke_lab> Otherwise they would break the 49 minute limit on traivs
< wumpus> there's also #10072
< gribble> https://github.com/bitcoin/bitcoin/issues/10072 | Remove sources of unreliablility in extended functional tests by jnewbery · Pull Request #10072 · bitcoin/bitcoin · GitHub
< wumpus> but yes 114 was the one I meant
< gmaxwell> We should probably contemplate having some seperate process against master that does continual gitian builds or something.
< MarcoFalke_lab> jonasschnelli: Is doing that, gmaxwell
< gmaxwell> oh good.
< gmaxwell> ignore me.
< jonasschnelli> gmaxwell: https://bitcoin.jonasschnelli.ch
< wumpus> yes, jonasschnelli has a build server with a very nice web UI
< MarcoFalke_lab> jonasschnelli: Are you running the tests?
< jonasschnelli> (it's currently by manual trigger)
< jonasschnelli> no.. just gitian
< gmaxwell> somehow I missed this. cool.
< wumpus> travis already runs the tests, this is to get executables for testing
< jnewbery> travis is only broken now because we've set it to run the extended tests once per day, so we're currently flushing out all the extended tests that have always failed on travis. I think once #10114 and #10072 are merged the daily runs should succeed reliably
< gribble> https://github.com/bitcoin/bitcoin/issues/10114 | An error has occurred and has been logged. Please contact this bot's administrator for more information.
< gribble> https://github.com/bitcoin/bitcoin/issues/10072 | Remove sources of unreliablility in extended functional tests by jnewbery · Pull Request #10072 · bitcoin/bitcoin · GitHub
< jonasschnelli> thanks jnewbery for the info (and the fixes)
< bitcoin-git> [bitcoin] sdaftuar opened pull request #10127: [0.14 backport] Mining: Prevent slowdown in CreateNewBlock on large mempools (0.14...2017-03-backport-cnb-optimizations) https://github.com/bitcoin/bitcoin/pull/10127
< wumpus> ok, any other topics?
< bitcoin-git> [bitcoin] laanwj pushed 3 new commits to master: https://github.com/bitcoin/bitcoin/compare/cde9b1a864c1...edc62c959ab3
< bitcoin-git> bitcoin/master 6426716 John Newbery: Add send_await_disconnect() method to p2p-compactblocks.py...
< bitcoin-git> bitcoin/master 6a18bb9 John Newbery: [tests] sync_with_ping should assert that ping hasn't timed out...
< bitcoin-git> bitcoin/master edc62c9 Wladimir J. van der Laan: Merge #10114: [tests] sync_with_ping should assert that ping hasn't timed out...
< gmaxwell> I should make notes for topics...
< BlueMatt> oh, new prs
< BlueMatt> for review priority
< BlueMatt> looks like jonasschnelli already added his
< bitcoin-git> [bitcoin] laanwj closed pull request #10114: [tests] sync_with_ping should assert that ping hasn't timed out (master...improve_sync_with_ping) https://github.com/bitcoin/bitcoin/pull/10114
< BlueMatt> anyone else have something to add (aside from 0.14.1-tagged things)
< wumpus> fine with me, but don't forget the current bunch :p
< wumpus> yes please review 0.14.1 tagged things
< wumpus> although those should be easy
< jonasschnelli> BlueMatt: Yes. The bumpfee refactor (to get a QT fee bump function)
< sipa> 0.14.1 has priority, but i'd like to see #9792 in at some point to further the get-rid-of-openssl thing :)
< BlueMatt> wumpus: there is a "For backport" column there...
< gribble> https://github.com/bitcoin/bitcoin/issues/9792 | FastRandomContext improvements and switch to ChaCha20 by sipa · Pull Request #9792 · bitcoin/bitcoin · GitHub
< wumpus> but I think we should be able to do 0.14.1 tomorrow so to have something to review for the rest of the week :D
< wumpus> BlueMatt: yes good point
< BlueMatt> ok, so morcos and sdaftuar get to propose new ones since they dont have ones up, anyone else want to propose ones?
< gmaxwell> oh someone opened a PR to do something with debug log excludes, and it adds more insane string processing in the debugging critical path. Would anyone mind if I brought back the PR I shelved to make debug categories use an enum?
< BlueMatt> gmaxwell: wfm, the pr was jnewbery's
< sipa> gmaxwell: ack enum debug categories
< wumpus> gmaxwell: not sure
< jnewbery> #10123
< gribble> https://github.com/bitcoin/bitcoin/issues/10123 | Allow debug logs to be excluded from specified component by jnewbery · Pull Request #10123 · bitcoin/bitcoin · GitHub
< gmaxwell> (the PR was just shelved because I opened it right before a release, and it is a conflict-the-universe PR)
< wumpus> gmaxwell: well no I guess it makes sense
< BlueMatt> gmaxwell: go for it now, I'd say
< wumpus> yes, do it
< sdaftuar> i'd like to get this DisconnectTip improvement wrapped up (#9208) but i need some help on resolving this annoying issue BlueMatt raised
< gribble> https://github.com/bitcoin/bitcoin/issues/9208 | Improve DisconnectTip performance by sdaftuar · Pull Request #9208 · bitcoin/bitcoin · GitHub
< cfields> gmaxwell: +1.
< jonasschnelli> Yes. ACK
< cfields> along the same lines, I'd like to do something similar for net messages
< wumpus> then you can also map the enum values to strings and automate the help message generation
< gmaxwell> #9424
< gribble> https://github.com/bitcoin/bitcoin/issues/9424 | Change LogAcceptCategory to use uint32_t rather than sets of strings. by gmaxwell · Pull Request #9424 · bitcoin/bitcoin · GitHub
< BlueMatt> sdaftuar: want to bring it up now/
< jnewbery> gmaxwell: +1. I'll happily review that one
< BlueMatt> ?
< wumpus> cfields: yes, should be fairly easy to do, I already changed the syntax for net messages to be enum-like at some point
< jtimon> since #8855 didn't got new review nor re-review I think just leave that (just remind to potential reviewers that #8994 continues it, in case you "don't see the point")
< gribble> https://github.com/bitcoin/bitcoin/issues/8855 | Use a proper factory for creating chainparams by jtimon · Pull Request #8855 · bitcoin/bitcoin · GitHub
< gribble> https://github.com/bitcoin/bitcoin/issues/8994 | Testchains: Introduce custom chain whose constructor... by jtimon · Pull Request #8994 · bitcoin/bitcoin · GitHub
< wumpus> cfields: went as far as I could without making it an actual enum :)
< gmaxwell> cfields: I looked at adding a perfect hash for net messages... but didn't know if that would be a way you'd want to go. :)
< BlueMatt> new for review this week is 9792 and 9208
< cfields> wumpus: yes, it sure looks like an enum :)
< sipa> gmaxwell: just make sure all debug category names have different length
< gmaxwell> sipa: well we don't need to do any runtime lookup of category names at all.. so no need to do anything performant there.
< wumpus> at least give them consecutive values so a bit field can be used to represent the set
< wumpus> intead of a per-thread map :-(
< cfields> gmaxwell: i had considered making it an array<char, 12>. But I really don't care, as long as it's switchable and a compile-time constant
< gmaxwell> wumpus: that is what PR 9424 does.
< wumpus> gmaxwell: ok, great
< gmaxwell> wumpus: it assigns them each to a bit.
< wumpus> the current code that assigns it a thread-local variable is really strange
< bitcoin-git> [bitcoin] JeremyRubin opened pull request #10128: Speed Up CuckooCache tests (master...cuckoo-tests-faster) https://github.com/bitcoin/bitcoin/pull/10128
< cfields> (and now that I think about it, std::array probably isn't switchable anyway)
< gmaxwell> yes. horrors from beyond time.
< sdaftuar> topic suggestion: dealing with abortnode / ConnectTip / DisconnectTip failures
< gmaxwell> in any case, if people support-- 9424 is hard to rebase because it touches the whole codebase.
< wumpus> cfields: in some modern languages it's possible to switch on compound types and arrays and strings, but no, not c++XX before XX=50 or so :)
< gmaxwell> but I think it also makes jnewbery's exclude trivial. (in fact: no runtime impact...)
< wumpus> gmaxwell: yes, it's how it should be done
< sipa> in haskell it's pretty much the only way you can inspect a compound type at all :p
< jnewbery> gmaxwell: agree
< wumpus> sipa: indeed
< sipa> ok, sdaftuar's topic?
< gmaxwell> cfields: my intution for that would be using gperf to map the messages to integers. then switching on them... but perhaps overkill.
< wumpus> #topic dealing with abortnode / ConnectTip / DisconnectTip failures
< BlueMatt> context: #9208
< gribble> https://github.com/bitcoin/bitcoin/issues/9208 | Improve DisconnectTip performance by sdaftuar · Pull Request #9208 · bitcoin/bitcoin · GitHub
< sipa> sdaftuar: i saw i was pinged in the PR, but haven't read it
< sdaftuar> so this issue might be hard to grasp without reviewing #9208
< gribble> https://github.com/bitcoin/bitcoin/issues/9208 | Improve DisconnectTip performance by sdaftuar · Pull Request #9208 · bitcoin/bitcoin · GitHub
< wumpus> gmaxwell: will that work for unknown messages too?
< sdaftuar> but basically matt brought up that we have some edge cases in our code if ConnectTip or DisconnectTip return false
< wumpus> gmaxwell: does a perfect hash handle collisions? I don't remember
< cfields> wumpus: perfect hash means no collisions :)
< wumpus> cfields: for known values
< jeremyrubin> wouldn't that not be backwards compatible?
< gmaxwell> wumpus: gperf can check, it's an option.
< wumpus> cfields: but what if 'moo' maps to the same 32-bit value as 'block'
< jeremyrubin> someone sends you a new type of message that hashes to something else
< wumpus> scary
< sipa> sdaftuar: hmm, i'll review the PR
< sdaftuar> the last issue i'm trying to resolve is if ConnectTip fails (but the block is not invalid), then we should be in an AbortNode() state. what consistency are we aiming for in this situation?
< cfields> wumpus: ah, true
< gmaxwell> wumpus: making it never have collisions for unknows is just an extra comparison with the correct value.
< gmaxwell> wumpus: which IIRC the gperf emitted code will do, if enabled.
< gmaxwell> (actually, looks like it's the default)
< BlueMatt> lol, ok, so lets pick a topic instead of two
< gmaxwell> sdaftuar: we should be able to shut down and come back up with the underlying issue resolved and continue.
< wumpus> yes, sorry, I was just curous about gperf
< gmaxwell> sdaftuar: I don't care if we lose a bunch of recent blocks.
< sipa> sdaftuar: it's fine if we lose progress in that case, but if at all avoidable, the on disk state should not be corrupted
< BlueMatt> anyway, so I further observed that shutdown upon AbortNode can take up to 200ms (see recent PR), which is somewhat frightening, given that mempool and chainstate may not be consistent which we assume in many places :/
< sdaftuar> gmaxwell: sipa: what should we do with the mempool?
< sipa> sdaftuar: who cares?
< BlueMatt> so we keep running for a while normally and possibly have incorrect assumptions
< gmaxwell> I don't care.
< wumpus> just drop it
< sipa> sdaftuar: at restart we'll try to reimport what is on disk
< sipa> and re-run all necessary consistency checks
< sipa> so mempool.dat doesn't need to be consistent with the chainstate
< BlueMatt> and wallet?
< sdaftuar> i think the concern matt was raising was if before shutdown, it's possible for eg an RPC call to get incorrect results
< sdaftuar> or the wallet
< wumpus> yes, no tight coupling, good design
< BlueMatt> wumpus: point is we *have* tight coupling
< wumpus> wallet doesn't need to be in sync either, though it should not be ahead
< sipa> sdaftuar: ugh
< BlueMatt> and continue running after realizing something is corrupted
< gmaxwell> well abortnode should be instant-ish.
< gmaxwell> BlueMatt: so you're fixing that, what is there to discuss?
< wumpus> a bit behind (as long as it's not further than the prune distance) is ok, it will catch up in next start
< BlueMatt> gmaxwell: i mean the new pr is an improvement, but its not a fix
< BlueMatt> its not like you cant have something pending cs_main when coming out of Disconnect/ConnectTip
< wumpus> but if the wallet is ahead of the chain it will trigger a rescan IIRC
< gmaxwell> so perhaps abortnode needs to Abort()?
< BlueMatt> gmaxwell: thats pretty much the question
< BlueMatt> thats super shit, though, of course
< gmaxwell> and not faffabout witht politely shutting off threads.
< wumpus> I don't think so
< sdaftuar> i inadvertently introduced an assert failure in one of these situations. maybe that's a feature not a flaw! :)
< gmaxwell> Why is it shit? we should be durable across a power off, which is worse than aborting.
< wumpus> the point of abortnode is to be able to show a GUI dialog to the user
< gmaxwell> ah
< wumpus> if you abort() the user will never know to check the debug.log
< BlueMatt> wumpus: well we can show a gui dialog and abort() prior to unlocking cs_main
< BlueMatt> :P
< jeremyrubin> have a graceful shutdown falg
< jeremyrubin> *flag
< gmaxwell> wumpus: but it's not unlikely that we can't show a gui... if we can't allocate (likely cause).
< BlueMatt> which would effectively be sufficient
< wumpus> I always thought that AbortNode was for errors that could tolerate a graceful shutdown
< wumpus> the really terrible errors we already assert() or abort() on
< jeremyrubin> and if we're shutting down gracefully, write to a file "was graceful"
< BlueMatt> gmaxwell: AbortNode() is usually disk corruption
< jeremyrubin> On next start, show dialogue first thing?
< wumpus> please don't make it routine to abort() for everything
< wumpus> only for things that really warrant it
< wumpus> not crash on every single error
< BlueMatt> or too little disk space
< sdaftuar> or too little disk space, not quite teh same as corruption
< wumpus> that's just a mess
< gmaxwell> okay, sure.
< wumpus> for some problems it's fine to try to clean up normally
< wumpus> but we still need to exit with a message
< wumpus> that's what abortnode is for
< wumpus> if BlueMatt can make it work faster that's great, but don't silently kill the program on every error
< gmaxwell> wumpus: how about every other error?
< gmaxwell> :P
< wumpus> gmaxwell: hehe
< sipa> do #9208 and #9725 interact?
< gribble> https://github.com/bitcoin/bitcoin/issues/9208 | Improve DisconnectTip performance by sdaftuar · Pull Request #9208 · bitcoin/bitcoin · GitHub
< gribble> https://github.com/bitcoin/bitcoin/issues/9725 | CValidationInterface Cleanups by TheBlueMatt · Pull Request #9725 · bitcoin/bitcoin · GitHub
< gmaxwell> Y'all so worried about the dressings on the coffin. "It's already dead!" But sure. Sorry I was only thinking of OOM, it's just a recent subject.
< BlueMatt> sipa: they dont, afaik, aside from there now being two structs that can be merged
< jeremyrubin> I think deleting a file on graceful shutdown should work
< jeremyrubin> And then starting when that file is present shows the dialouge rather than starting
< BlueMatt> gmaxwell: yea, AbortNode isnt used for thigns like OOM and total death
< BlueMatt> disk space also does it
< BlueMatt> but it can also be db corruption
< jeremyrubin> This means you don't do anything at all on the Abort, but requires the user to restart their node to see the message
< BlueMatt> so maybe the solution is AbortNode gets renamed to ShutdownSoon() and use make sure disk corruption is something different?
< wumpus> BlueMatt: yes, makes sense to me
< wumpus> BlueMatt: or add a flag
< morcos> I haven't really thought this all the way through, but doesn't it seem that as we move more and more towards things happening in other threads, that you'd rather let those threads finish what they are doing
< gmaxwell> BlueMatt: you could also harden things up against your current concer by having the rpc stuff always check the shutdown flag right after taking cs_main (whenever it does)?
< morcos> If you're committing wallet changes for instance
< gmaxwell> BlueMatt: and if it finds out that its on its way down, just returning at that point.
< wumpus> in the long run morcos we should move toward looser coupling of things
< wumpus> I agree
< BlueMatt> gmaxwell: yea, I'm worried that if we start down that path we have to check it everywhere
< BlueMatt> eg wallet may make decisions based on some corrupted mempool state
< morcos> Let's say you just got some new address from the wallet, and the wallet hasn't committed that state yet and then you Abort()
< BlueMatt> (I havent thought all the way through all the potential issues here, just a potential concern)
< wumpus> morcos: luckily we have the keypool to handle that specific contingency
< jeremyrubin> Every thread could have their own unique ungraceful-close file that it should delete (via RAII) on clean exit. Starting with any present would show error. Uncoupled!
< gmaxwell> morcos: well at least the worst you get there is replay (due to keypool/hdwallets).. though replay can be pretty bad.
< wumpus> pretty bad but well at least they will never lose live keys
< BlueMatt> jeremyrubin: I have a feeling we can be more agressive on the super-strange issues, afaiu this stuff is pretty much hit just with out-of-disk everything else is rare enough.....
< sipa> maybe we should just have some std::atomic<bool> shits_on_fire_yo; which when set prevents RPCs etc
< wumpus> jeremyrubin: that's not what I mean with uncoupling, what I mean is that if one thread messes up it doesn't need to take the others with it because it can operate more-or-less independently
< gmaxwell> sipa: well shutdown can be more or less that.
< sipa> that can even be set from a signal handler
< BlueMatt> <BlueMatt> so maybe the solution is AbortNode gets renamed to ShutdownSoon() and use make sure disk corruption is something different?
< jeremyrubin> sipa: what do you do with an in progress RPC?
< BlueMatt> in ShutdownSoon cases (eg out of disk) we're ok to continue and just shut down, i think
< sipa> jeremyrubin: we can only do so much
< wumpus> you can't generally break off an in-progress RPC
< BlueMatt> in other cases, we can just assert(false)
< wumpus> although some will pay attention to fShutdown
< BlueMatt> because they're super rare anyway
< jeremyrubin> wumpus: ^ yes, O
< gmaxwell> but really, what probably should be done is having the rpcs being able to handle being told "No, you can't have access to [whatever chain state you wanted] right now." then these error conditions that could leave them unstable... could set them.
< BlueMatt> and if your disk is corrupted we probably shouldnt be flushing wallet and chainstate as a part of shutdown anyway
< BlueMatt> gmaxwell: yea, but thats a huge rewrite for cases that should be super rare
< gmaxwell> Or we take a whole different approach to failure messages, .e.g. wrap bitcoin-qt in another program that when qt exists it can go through the logs and give you a useful error. (though this doesn't answer morcos' wallet flush, but really that should be in another process.)
< wumpus> BlueMatt: I think there should be a clear separation between (A) I/O issues such as out of disk spae, which just happen regularly (B) rare implementation issues such as internal consistency errors
< cfields> iirc rpc has a IsRPCShuttingDown() or so, but only a few things (gbt only, maybe?) checks it
< jeremyrubin> wumpus: * yes, I'm aware that isn't quite what you meant, just making noise about having a graceful shutdown file because I think it's a reasonable way to mark if a node shut down dirty
< wumpus> (A) normal errors should just give the user an error as AbortNode does now and shut down properly
< wumpus> (B) internal state corruption errors could break off the process immediately
< gmaxwell> It's obnoxious that we can't preallocate resources such that we can guarentee that we never take a failure at an inoppturtune time.
< gmaxwell> having to slather the code in error handling to deal with that is not ideal.
< jeremyrubin> gmaxwell: you can preallocate lots of things?
< gmaxwell> wumpus: "I had to abort processing a block" is a weak kind of internal state corruption. It leaves assumed invariants violated.
< gmaxwell> jeremyrubin: in particular we can't make leveldb's disk usage constant.
< BlueMatt> wumpus: yes, thats my proposal
< BlueMatt> wumpus: but, specifically, B would include things like chainstate-on-disk-corruption
< BlueMatt> which it already partially does, just not completely
< wumpus> having code to handle normal errors is perfectly normal and all code has that, I agree that paranoid memory/disk corruption errors tend not to be possible to handle in a sane way
< wumpus> BlueMatt: yes
< BlueMatt> ok, soooo, acks on:<BlueMatt> <BlueMatt> so maybe the solution is AbortNode gets renamed to ShutdownSoon() and use make sure disk corruption is something different?
< wumpus> BlueMatt: as I said beforer, yes, that or add a flag/criticality level
< BlueMatt> s/use make sure/make sure/
< jeremyrubin> BlueMatt: maybe if you paste it again
< BlueMatt> sure, that too
< BlueMatt> jeremyrubin: ok, <BlueMatt> ok, soooo, acks on:<BlueMatt> <BlueMatt> so maybe the solution is AbortNode gets renamed to ShutdownSoon() and use make sure disk corruption is something different?
< gmaxwell> wumpus: for example, if we run out of space while flushing our view of the blockchain will be corrupt, and information about the blockchain from rpcs will be wrong. In a world where error handling is free, every code path that gets access to the blockchain data would be able to gracefully handle being told it's just not available. But I think that would be an unreasonable and dangerous amount
< gmaxwell> of complexity. :(
< BlueMatt> wumpus: I missed that statement, but, yes
< gmaxwell> BlueMatt: that sounds okay to me. but I don't know that diskfull is really a shutdown soon though we really do need to give it a nice message. :(
< wumpus> gmaxwell: note that the out-of-disk error happens *before* the disk is entirely full
< jeremyrubin> BlueMatt: sgtm
< wumpus> gmaxwell: there's a threshold
< wumpus> gmaxwell: so that should already be mitigated
< gmaxwell> wumpus: yes, but it's not atomic. you can't check and then successfully allocate space.
< wumpus> no, it's not perfect, but it works pretty well
< wumpus> I've never had corruption on full disk
< wumpus> also leveldb write failing shouldn't generally be fatal
< gmaxwell> on my desktop, which runs with debug=1 it almost always gets checked at the start of the flush. It doesn't corrupt things on disk, but as matt points out the rpc would return somewhat incorrect results during that time.
< wumpus> it just means the last transaction is not committed
< jtimon> it seems it's time to abort the meeting
< wumpus> #endmeeting
< lightningbot> Meeting ended Thu Mar 30 20:01:06 2017 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
< BlueMatt> wumpus: we need to change that to #abort()
< gmaxwell> But I wanted to cleanly flush!
< bitcoin-git> [bitcoin] theuni opened pull request #10129: scheduler: fix sub-second precision with boost < 1.50 (master...fix-scheduler-millisecs) https://github.com/bitcoin/bitcoin/pull/10129
< cfields> BlueMatt: ^^ fixes the ramped-up cpu issue
< wumpus> cfields: 10129 needs backport I suppose?
< BlueMatt> cfields: ahh, lol, I dont have old boost
< BlueMatt> wumpus: only if we also backport the wallet-flush-in-cscheduler, probably?
< BlueMatt> (which i think we should, but....)
< BlueMatt> questionable given boost issues
< wumpus> eh no, let's not do that, sorry
< cfields> wumpus: won't hurt, but doesn't currently matter
< BlueMatt> wumpus: fair, yea
< cfields> wumpus: probably best to go ahead and backport, just in case any other stuff is backported in the future that relies on the correct behavior
< BlueMatt> yea, i mean if it turns out there are complications and its not dead simple, not backporting seems like the best idea
< gmaxwell> wallet-flush-in-cscheduler removes a thread?
< wumpus> cfields: I assued that was the point because one of your other pulls changes the times to std::chrono instead
< wumpus> this one is backportable at least
< cfields> wumpus: yes, and BlueMatt was discussing possibly backporting other scheduler changes. Was just being proactive :)
< gmaxwell> if wallet-flush-in-cscheduler reduces the thread count, then it's also perhaps interesting in a backport. (less memory / address space!)
< wumpus> gmaxwell: it does, but imo it's too risky to backport, we're not entirely sure of it yet this might be only the first issue with it
< gmaxwell> aww
< gmaxwell> :P
< gmaxwell> K
< BlueMatt> gmaxwell: yea, thats why i was proposing backport....but its not simple, apparently, because boost :(
< wumpus> I've always been kind of wary of cscheduler
< BlueMatt> wumpus: yea.....
< gmaxwell> yea... well we've had a lot of problems with it in the past.
< wumpus> there's been lots of issues with it historically at least in the tests
< wumpus> yes
< cfields> wumpus: same. even moreso after diving in to debug that one.
< cfields> in fairness though, i think many issues have been boost-related
< gmaxwell> the general idea of having all the rare periodic stuff in a thread that handles them, is perfectly reasonsble.
< wumpus> yes, and most bugs don't show up in our actual single-threaded usage
< cfields> wumpus: right. I'd be very wary of adding a second thread. I'd expect to uncover a bug or two for sure.
< wumpus> yes, the idea is reasonable
< gmaxwell> oh we were thinking about adding another thread to service the schedueler?
< cfields> gmaxwell: that's how it's designed, anyway
< wumpus> gmaxwell: nonono, please not, it's just that it was designed to be able to work with multiple threads, but that functionality has never been properly verified
< gmaxwell> yea, well I think thats a bad idea.
< wumpus> there used to be a test that tested it with multiple threads, but that failed randomly and we never found out why
< gmaxwell> At least it always seemed to be that the whole utility for such a thing is periodic things that don't have strong realtime requirements.
< BlueMatt> wumpus: sounds like its time to just rewrite the thing from scratch just because....
< wumpus> BlueMatt: or just drop the mult-thread functionality
< BlueMatt> well, ok, that too
< wumpus> I think it could be shortened a lot with c++11 functionality like lambdas and tasks etc
< BlueMatt> i mean whatever, leave it there, I plan on moving lots of things to cscheduler, just will have to make it sane before turning it on
< BlueMatt> ie c++11 stuff, probably
< gmaxwell> the periodic wallet flush is seperate from the realtime wallet flush that happens after issuing a key or whatever, right?
< cfields> std::map<std::chrono::time_point, std::packaged_task> done :)
< wumpus> same for the closure stuff and worker queue in http_server
< gmaxwell> well it should be a min-heap, :P I'm sure there is one of those in stl no?
< BlueMatt> gmaxwell: by "wallet flush", I mean "wallet compaction"
< wumpus> no, stl does not have heap types afaik
< BlueMatt> wallet flush, indeed, is the thing that happens when you write something directly
< wumpus> boost does *ducks*
< wumpus> yes please call it compaction not flush
< gmaxwell> BlueMatt: okay sounds fine then... not the stuff that we need a critical realtime response on.
< wumpus> yes
< BlueMatt> indeed, the function name changed to compaction from flush when it moved into cscheduler
< BlueMatt> because thats a more accurate description
< bitcoin-git> [bitcoin] gmaxwell reopened pull request #9424: Change LogAcceptCategory to use uint32_t rather than sets of strings. (master...log_category_simplify) https://github.com/bitcoin/bitcoin/pull/9424
< wumpus> I don't think I remember ever seeing that pull before. Then again, so many pulls get openened, it's easy to lose track :/
< wumpus> oh end of dec 2016, yea, wasn't too active at that time
< bitcoin-git> [bitcoin] jnewbery opened pull request #10130: bitcoin-tx input verification (master...bitcoin_tx_input_sanitiser) https://github.com/bitcoin/bitcoin/pull/10130
< phantomcircuit> wait there's an issue with flush in csceduler?
< cfields> phantomcircuit: only with old boost
< phantomcircuit> oh
< achow101> if a ping message is sent, but a pong is not received, will other messages still be sent to the pinged node until the ping timeout?
< sipa> yes
< achow101> interesting, I'm not seeing that on wireshark..
< achow101> for that matter, I don't see the ping in question being sent, I just see it in the debug.log
< bitcoin-git> [bitcoin] fanquake closed pull request #10106: bitcoin-tx: Fix missing range check (master...bitcointx-addoutaddr) https://github.com/bitcoin/bitcoin/pull/10106