< sipa> but the progress estimation code was changed significantly in 0.14
< gmaxwell> reindexing spends something like 20 minutes up front scanning for headers, which might be distorting your numbers.
< pfeerpedr> who do i need to talk to in order to speed up my transaction?
< bitcoin-git> [bitcoin] MarcoFalke opened pull request #9846: doc: Small release notes fixups in the list of pulls (0.14...Mf1702-014doc) https://github.com/bitcoin/bitcoin/pull/9846
< bitcoin-git> [bitcoin] sipa opened pull request #9847: Extra test vector for BIP32 (master...bip32up) https://github.com/bitcoin/bitcoin/pull/9847
< achow101> cfields: just reset my gitian and got 8d4bb27b5ab1916f04b74a2bcdccf8781c46fea96a3d5eb4a4a7f587577df64c bitcoin-0.14.0-osx-unsigned.dmg
< achow101> does that match yours?
< achow101> It's probably doing the alternating thing again
< fanquake> achow101 looks like it does match
< fanquake> So you've got the alternating builds again? I'm just about to finish mine.
< bitcoin-git> [bitcoin] appop opened pull request #9848: update (master...master) https://github.com/bitcoin/bitcoin/pull/9848
< bitcoin-git> [bitcoin] fanquake closed pull request #9848: update (master...master) https://github.com/bitcoin/bitcoin/pull/9848
< fanquake> achow101 Interestingly, my osx gitian results now match cfields. Which is weird, because nothings changes since rc1 that could have fixed gitian issues.
< achow101> actually, just ran gitian again and it got cfields's results. I'll run it a few more times to make sure it is deterministic
< bitcoin-git> [bitcoin] luke-jr opened pull request #9849: Qt: Network Watch tool (master...gui_netwatch) https://github.com/bitcoin/bitcoin/pull/9849
< cfields> achow101: it'd be really helpful if you could upload the .o files from a non-matching build
< achow101> I think I can give you the kvm image of the non-matching build. I just need to make sure it is the right one
< cfields> achow101: "on-target" after the build gives you a shell
< achow101> cfields: well that build ended a while ago and I have since done other builds. right now I am trying to start the vm with that image of the mismatching build which I saved and then ssh'ing into it, but it doesn't seem to be working now
< cfields> achow101: ok, let me know if you manage to get them. I'll check back in the morning
< achow101> cfields: I got all of the build stuff off of the vm and tar'ed it. It should contain all of the .o files. Download: https://drive.google.com/file/d/0Bxw3ip9QfNOUVzkwUnlhMTExYjg/view?usp=sharing
< achow101> also I can give you the vm which contains all of that stuff too. I'm waiting for the upload of that to finish
< achow101> cfields: vm with the mismatching build: https://drive.google.com/file/d/0Bxw3ip9QfNOUN0E2aDZZQU1Pd2s/view?usp=sharing
< cfields> achow101: er, you sure that's a broken build?
< luke-jr> (we have 3 sigs on rc2)
< luke-jr> oh, but not all matching
< cfields> luke-jr: yea, i think i'll delay signing until morning once a few more are in
< cfields> now that we have achow101's objects for comparison, I'm hoping it'll point to the culprit
< achow101> cfields: I'm pretty sure that's the broken build
< achow101> luke-jr: my matching osx ones are pr'ed
< cfields> achow101: are you positive? All of my object files are identical as far as i can tell
< achow101> yes.
< achow101> you can fire up the vm image I gave you to check as well
< cfields> achow101: ok nm, got the diff now
< achow101> cool
< cfields> achow101: mmm, they're different kernels
< cfields> that's the only obvious thing i see
< cfields> maybe qt embeds uname output?
< achow101> but why would it only affect osx?
< achow101> also, how are they different kernels? I thought the vms were built exactly the same
< cfields> they should be
< achow101> oh, maybe the upgrade that happens every time was failing some of the time?
< cfields> -uname -r = 3.13.0-108-generic
< cfields> +uname -r = 3.13.0-77-generic
< luke-jr> LXC uses the host's kernel
< luke-jr> so no matter what, we can't rely on kernels to match
< achow101> luke-jr: I'm using kvm
< cfields> luke-jr: well the fact that the kernels don't match is indicative that they're not using the same base
< cfields> in which case glibc (or something) may be different
< luke-jr> hm
< cfields> so it seems to be some kind of gitian issue
< luke-jr> jonasschnelli: what kind of locking issues? can you elaborate?
< jonasschnelli> luke-jr: the app is unresponsive. I had to force shut down... will take a closer look
< jonasschnelli> luke-jr: but I like the PR
< wumpus> so it looks like someone had the test_bitcoin issue outside of travis: #9850
< gribble> https://github.com/bitcoin/bitcoin/issues/9850 | test_bitcoin: /usr/include/boost/thread/pthread/recursive_mutex.hpp:104: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy() failed. · Issue #9850 · bitcoin/bitcoin · GitHub
< jonasschnelli> yes
< jonasschnelli> I tried to reproduce in ubuntu 14.04. but did not had the issue
< wumpus> same here.
< wumpus> did a depends build, just like travis, on 14.04, just like travis
< wumpus> so that means the same version of boost, gcc, etc
< wumpus> this is really strange
< jonasschnelli> Oh. Even that.
< gmaxwell> hurray! (?)
< jonasschnelli> I ran test_bitcoin in valgrind and I could see some uninitialised value
< jonasschnelli> invoked by the toggle_network RPC tests
< wumpus> jonasschnelli: that is a potential concern, however what happens in the RPC tests shouldn't affect test_bitcoin?
< jonasschnelli> wumpus: I meant the RPC unit tests...
< wumpus> no valgrind errors in test_bitcoin?
< wumpus> ooh!
< jonasschnelli> look for rpc_togglenetwork
< jonasschnelli> rpc_tests.cpp
< jonasschnelli> Not sure if its related... we have added this a couple of weeks (or even months) ago
< jonasschnelli> Here's my valgrind run: https://0bin.net/paste/2xS-7aRGhWA11BlS#uwUOiDB9X4h+puz6AxdtnWiMXF5KJlUhC-WFL8bCy4k
< jonasschnelli> This also frightens me: ==59692== Conditional jump or move depends on uninitialised value(s)
< gmaxwell> what version are you running it against those line number do not agree with my code here.
< jonasschnelli> 9949ebfa6a548260858df429f4d0e716e0a26065
< jonasschnelli> I think this is 0.14.0rc1
< jonasschnelli> my setup: ./configure --enable-zmq --enable-glibc-back-compat --enable-reduce-exports CPPFLAGS=-DDEBUG_LOCKORDER --with-incompatible-bdb
< jonasschnelli> (same as the failing travis setup)
< gmaxwell> oh geesh we have source files with the same name. bet that'll be fun for anyone trying to build with msvc.
< jonasschnelli> you mean the problem when we removed the rpc_ prefix and moved them into the rpc/ folder?
< gmaxwell> yea, at least last time I used it MSVC couldn't handle source files having the same name even if they were in different directories. :)
< jonasschnelli> My IDEs find by filename also doesn't like this
< jonasschnelli> We could have kept the rpc_ prefix even after moving them into the specific folder
< gmaxwell> so in that rpc tests I don't see anything that sets up the conman object. But if it's executing those objects it's not null. How is the g_conman setup in the tests?
< fanquake> jonasschnelli I can see the same results with valgrind
< jonasschnelli> TestingSetup() jas a g_connman = std::unique_ptr<CConnman>(new CConnman(0x1337, 0x1337)); // Deterministic randomness for tests.
< jonasschnelli> Thanks fanquake
< cfields> should fix the connman issue, though i seriously doubt that's the crasher
< cfields> (thanks marcofalke for pointing that out earlier)
< cfields> i'll PR that in the morning
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/692c9eddba67...00285cece814
< bitcoin-git> bitcoin/master f81f0d0 Russell Yanofsky: Update sendfrom RPC help to correct coin selection misconception
< bitcoin-git> bitcoin/master 00285ce Wladimir J. van der Laan: Merge #9840: Update sendfrom RPC help to correct coin selection misconception...
< bitcoin-git> [bitcoin] laanwj closed pull request #9840: Update sendfrom RPC help to correct coin selection misconception (master...pr/fromacct) https://github.com/bitcoin/bitcoin/pull/9840
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/00285cece814...dd6e0d630167
< bitcoin-git> bitcoin/master ef9f495 Marko Bencun: Trivial: fix comments referencing AppInit2...
< bitcoin-git> bitcoin/master dd6e0d6 Wladimir J. van der Laan: Merge #9833: Trivial: fix comments referencing AppInit2...
< bitcoin-git> [bitcoin] laanwj closed pull request #9833: Trivial: fix comments referencing AppInit2 (master...stalecomments) https://github.com/bitcoin/bitcoin/pull/9833
< paveljanik> FWIW - I'm not able to reproduce test_bitcoin failures on any of my machines (different unices) :-(
< wumpus> darn
< bitcoin-git> [bitcoin] laanwj closed pull request #9846: doc: Small release notes fixups in the list of pulls (0.14...Mf1702-014doc) https://github.com/bitcoin/bitcoin/pull/9846
< wumpus> there seems to be nothing *special* in the config.log posted in #9850
< gribble> https://github.com/bitcoin/bitcoin/issues/9850 | test_bitcoin: /usr/include/boost/thread/pthread/recursive_mutex.hpp:104: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy() failed. · Issue #9850 · bitcoin/bitcoin · GitHub
< wumpus> standard ubuntu 16.04 versions of everything
< wumpus> no arguments to configure
< paveljanik> I suspect some travis issue
< paveljanik> (even if it was reproduced outside of it)
< wumpus> I forgot something in my testing yesterday; the travis build passes, --enable-glibc-back-compat --enable-reduce-exports and LDFLAGS=-static-libstdc++" . No difference in reproduction, though
< wumpus> I also test it faster now, launch test_bitcoin and kill it after a second (after all, the problem happens just before the Running ... line so there's no need to go all the way)
< wumpus> in any case it just works perfectly, every time, no matter what I do. Almost feels like travis is trolling us
< gmaxwell> "Why do the patterns of failuers seem to be spelling ascii digits? ...'wouldnt want to give yo..'"
< wumpus> hehe, yes that would be a giveaway
< * wumpus> threw DEBUG_LOCKORDER into the mix. No, that didn't help either
< wumpus> never felt so unhappy to see "*** No errors detected"
< wumpus> well, so much for trying to reproduce locally, going to try set up a trap for this on travis
< wumpus> ok my gdb script is working, this should work
< bitcoin-git> [bitcoin] laanwj opened pull request #9851: [do not merge] travis gdb parachute for #9825 (master...2017_02_travisissue) https://github.com/bitcoin/bitcoin/pull/9851
< bitcoin-git> [bitcoin] zcc0721 opened pull request #9852: Merge remote-tracking branch 'refs/remotes/bitcoin/master' (master...master) https://github.com/bitcoin/bitcoin/pull/9852
< bitcoin-git> [bitcoin] laanwj closed pull request #9852: Merge remote-tracking branch 'refs/remotes/bitcoin/master' (master...master) https://github.com/bitcoin/bitcoin/pull/9852
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/dd6e0d630167...f19afdbfb4cb
< bitcoin-git> bitcoin/master dc222f8 Karl-Johan Alm: Trivial: Rephrase the definition of difficulty in the code.
< bitcoin-git> bitcoin/master f19afdb Wladimir J. van der Laan: Merge #9612: [trivial] Rephrase the definition of difficulty....
< bitcoin-git> [bitcoin] laanwj closed pull request #9612: [trivial] Rephrase the definition of difficulty. (master...clarify-difficulty) https://github.com/bitcoin/bitcoin/pull/9612
< wumpus> wth, one of the builds in #9825 is rebuilding all the dependencies?
< gribble> https://github.com/bitcoin/bitcoin/issues/9825 | Intermittent FAIL: test/test_bitcoin in Travis · Issue #9825 · bitcoin/bitcoin · GitHub
< wumpus> eh #9851
< gribble> https://github.com/bitcoin/bitcoin/issues/9851 | [do not merge] travis gdb parachute for #9825 by laanwj · Pull Request #9851 · bitcoin/bitcoin · GitHub
< wumpus> Everything that can go wrong is going wrong, man, it's hard to think of a more nightmarish way to debug things. Well maybe debugging the kernel for GPU cache issues wins by a bit :/
< wumpus> I'm going to cancel all other travis builds to give this one priority, sorry
< wumpus> ah the builds are starting, let's see what surprises await this time
< wumpus> NOOOOOOO don't start building ccache :(
< wumpus> cfields: what would be the best way to skip buildling of dependencies for a PR, for debugging?
< wumpus> I don't understand why all three builds of #9851 trigger a complete dependency rebuild, but this way it's not going to work, I need a fast iteration time to have any chance of reproducing the issue
< gribble> https://github.com/bitcoin/bitcoin/issues/9851 | [do not merge] travis gdb parachute for #9825 by laanwj · Pull Request #9851 · bitcoin/bitcoin · GitHub
< wumpus> oh not all three, just #3, which is the nowallet one. Could just remove that one.
< gribble> https://github.com/bitcoin/bitcoin/issues/3 | Encrypt wallet · Issue #3 · bitcoin/bitcoin · GitHub
< achow101> did the signed binary detached sigs come out yet?
< BlueMatt> wumpus: you could do it on your own personal fork?
< jonasschnelli> Any idea why the LXC gitian initialization takes that long?
< jonasschnelli> Here it takes >5mins during "Upgrading system, may take a while"... seems to be very long
< jonasschnelli> (step between "install.log" and starting of "build.log")
< cfields> wumpus: note that DEBUG=1 is used for the crash case. That adds the extra bounds checking from libstdc++
< cfields> wumpus: as for rebuilding depends, the travis cache depends on the env vars set. So if you change an env var, it will create a new cache because it looks like a new build that it shouldn't clobber
< cfields> where "change" also includes adding/removing env vars
< cfields> gitian builders: sigs for v0.14.0rc2 are pushed
< wumpus> ah so the env vars are the secret :)
< bitcoin-git> [bitcoin] jnewbery opened pull request #9853: Fix error codes from various RPCs (master...fixerrorcodes) https://github.com/bitcoin/bitcoin/pull/9853
< bitcoin-git> [bitcoin] jnewbery closed pull request #9713: Fix error causes and messages in rpc/net.cpp (master...fixsetbanerrormessages) https://github.com/bitcoin/bitcoin/pull/9713
< bitcoin-git> [bitcoin] jnewbery closed pull request #9714: Return correct error codes from bumpfee() (master...bumpfeeerrormessages) https://github.com/bitcoin/bitcoin/pull/9714
< BlueMatt> so now that we have named args someone should probably do a pass and fix the million places that we reject args that are null even when they have a default value, I suppose?
< wumpus> BlueMatt: yes - null should be interpreted as the default value, on a call by call basis
< wumpus> I intend to get around to that for 0.15
< wumpus> in most cases it's trivial
< wumpus> there are a few such as getbalance that have slightly different functionality based on the number of arguments, some discussion will be needed there
< sipa> i can't file an issue right now, but my RPi3 bitcoind OOMed, and marked a block invalid as a result
< sipa> that's very bad...
< sipa> on 0.14.0rc1
< cfields> sipa: yikes
< cfields> sipa: any idea where it oom'd?
< sipa> cfields: #9854
< gribble> https://github.com/bitcoin/bitcoin/issues/9854 | Bitcoind 0.14.0rc1: OOM -> block marked invalid · Issue #9854 · bitcoin/bitcoin · GitHub
< cfields> sipa: seems i just managed to bring down my dev box while testing a fix (forcing OOM). Hope you're happy :)
< cfields> woohoo, rescued
< BlueMatt> cfields: so you have a fix? or should I go look into it?
< cfields> BlueMatt: yea, i have a patch ready. I'm uneasy about it though, so debate welcome
< cfields> sec
< BlueMatt> k
< cfields> BlueMatt: see 9854
< BlueMatt> oh
< BlueMatt> hmmmm, I like that
< BlueMatt> wait, does this apply to more than bad_alloc?
< cfields> no
< gmaxwell> cfields: next time replace malloc with a wrapper. :P
< BlueMatt> if we can make it apply only to std::bad_alloc then I'm all for it (or is there a list of all the things this could apply to?)
< BlueMatt> lol
< gmaxwell> BlueMatt: sipa pointed out the error to me earlier in private, my comment:
< gmaxwell> 11:55 <gmaxwell> God damnit. it really should not reject the block because of a fucking exception!
< cfields> gmaxwell: you mean new? :)
< gmaxwell> 11:55 <gmaxwell> I hate that we use exceptions for error handling in the seralization.
< gmaxwell> 11:56 <gmaxwell> maybe we can wrap the allocator so that failures kill the process.
< gmaxwell> cfields: well I mean the underlying libc function new calls, which is malloc. (same way tcmalloc replaces the allocator)
< cfields> gmaxwell: this isn't our exception. This is a c++ feature.
< cfields> gmaxwell: right, this overrides what happens when "new" fails. So this is essentially what you're asking for
< gmaxwell> cfields: no no: Our mistake is that a var int decode failure is an exception. Because of this we cannot wrap block processing with a catch * {tell user their hardware is befucked or someting bad happened}.
< cfields> gmaxwell: oh, i see what you mean
< gmaxwell> Which basically means that random programming errors that throw exceptions can cause blocks to be rejected intead of the node shutting down, which is exactly what produced the bdb locks as a fork rather than a brief DOS.
< BlueMatt> wait, ok, so has someone identified what actually happened here?
< gmaxwell> There are basically three states for block processing: "I have a valid block", "I have an invalid block.", and "I notice that I am confused." the latter should shut down without marking the block invalid.
< sipa> gmaxwell: i think you're overgeneralizing
< cfields> gmaxwell: i completely agree. but this is a specific case that can be easily detected
< sipa> gmaxwell: problems during deserialization shouldn't _ever_ cause a block to be marked invalid
< gmaxwell> yes, this one we can work around. But where is the next one? this is the second one of those btw.
< gmaxwell> Leveldb internal errors also used to do this to us.
< gmaxwell> Third if you count bdb's internal errors.
< cfields> gmaxwell: so let's fix that independently :)
< gmaxwell> I'm fine with your general fix approach for now.
< cfields> there's one gotcha there, though... prevector calls malloc directly
< gmaxwell> I am lamenting that C++ code randomly calls exceptions without documenting the possiblity clearly. And that we make use of exceptions to mark invalidity. Which means that random internal errors can mark invalidity. And you all know I hate exceptions, so that bias is not in question. :)
< gmaxwell> cfields: why are you replacing new and not malloc? (I don't have a strong opinion, it's just a question)
< sipa> cfields: it could use new[] instead, i think
< sipa> gmaxwell: how do you replace malloc?
< gmaxwell> glibc has a specific override. But perhaps there is no portable way?
< sipa> you'd need to do it with link-time magic, and hope that libstdc++ doesn't bypass it somehow
< BlueMatt> I'm still confused, where do we use such exceptions to mark invalidity?
< sipa> BlueMatt: i don't know!
< sipa> we shouldn't!
< BlueMatt> yes, I dont see the specific issue here, yet
< gmaxwell> sipa: your logs showed we did exactly that.
< BlueMatt> gmaxwell: no they dont
< BlueMatt> "ERROR: ConnectBlock(): inputs missing/spent"
< cfields> BlueMatt: my take-away from the above was that if we didn't throw in deserialization, we could just wrap acceptblock and activatebestchain in try/catch(), and abort any time something's caught
< BlueMatt> that was after the bad_alloc
< cfields> gmaxwell: memory allocation failed, then the _next block_ was rejected
< BlueMatt> cfields: yes, and we should do that, probably still
< sipa> gmaxwell: my assumption is that the error _is_ caught somewhere, not passed up, and as a result a normal "fail" return value is returned, and a higher layer interprets that as invalid block
< sipa> gmaxwell: i don't think we have anywhere a direct "exception? mark invalid!" logic
< BlueMatt> sipa: script interpreter does
< BlueMatt> but thats it i believe
< BlueMatt> (in the debug log you posted I do not believe that was the error, either)
< gmaxwell> we do all over the place! we have a generic catch that returns false on functions that must be true for validity.
< BlueMatt> gmaxwell: we do?
< gmaxwell> Open up validation.cpp basically every catch in there does this.
< cfields> sipa: my take was that the block was accepted, but we didn't switch to the new tip, so the next block failed when looking up inputs
< BlueMatt> only script interpreter i believe
< gmaxwell> okay it's not as bad as I thought.
< sipa> did we in 0.14 introduce the SendRejectsAndCheckIfBanned(pfrom, connman) call in net_processing:2754 ?
< BlueMatt> well now that i check it is worse than I thought :p
< sipa> which before used to be inside the catch block?
< BlueMatt> some disk reads shit that probably should be smarter than it is
< BlueMatt> sipa: yes, and no, before it didnt exist
< BlueMatt> (was only in SendMessages)
< cfields> sipa: it's new, we used to only send rejects+ban from SendMessages()
< sipa> i see
< cfields> imo the throw happened somewhere around SetBestChain, it was just caught in ProcessMessages because that's the only place we do a generic catch(...)
< gmaxwell> BlueMatt: well I thought it was _every_ one of them, but I checked readblockfromdisk and it's not.