#bitcoin-core-dev on 2017-02-24 — searchable irc log

00:00 < sipa> but the progress estimation code was changed significantly in 0.14

00:02 < gmaxwell> reindexing spends something like 20 minutes up front scanning for headers, which might be distorting your numbers.

00:35 < pfeerpedr> who do i need to talk to in order to speed up my transaction?

00:39 < bitcoin-git> [bitcoin] MarcoFalke opened pull request #9846: doc: Small release notes fixups in the list of pulls (0.14...Mf1702-014doc) https://github.com/bitcoin/bitcoin/pull/9846

02:02 < bitcoin-git> [bitcoin] sipa opened pull request #9847: Extra test vector for BIP32 (master...bip32up) https://github.com/bitcoin/bitcoin/pull/9847

02:42 < achow101> cfields: just reset my gitian and got 8d4bb27b5ab1916f04b74a2bcdccf8781c46fea96a3d5eb4a4a7f587577df64c bitcoin-0.14.0-osx-unsigned.dmg

02:42 < achow101> does that match yours?

02:42 < achow101> It's probably doing the alternating thing again

02:56 < fanquake> achow101 looks like it does match

02:56 < fanquake> So you've got the alternating builds again? I'm just about to finish mine.

03:11 < bitcoin-git> [bitcoin] appop opened pull request #9848: update (master...master) https://github.com/bitcoin/bitcoin/pull/9848

03:12 < bitcoin-git> [bitcoin] fanquake closed pull request #9848: update (master...master) https://github.com/bitcoin/bitcoin/pull/9848

03:21 < fanquake> achow101 Interestingly, my osx gitian results now match cfields. Which is weird, because nothings changes since rc1 that could have fixed gitian issues.

03:42 < achow101> actually, just ran gitian again and it got cfields's results. I'll run it a few more times to make sure it is deterministic

04:01 < bitcoin-git> [bitcoin] luke-jr opened pull request #9849: Qt: Network Watch tool (master...gui_netwatch) https://github.com/bitcoin/bitcoin/pull/9849

04:03 < cfields> achow101: it'd be really helpful if you could upload the .o files from a non-matching build

04:03 < achow101> I think I can give you the kvm image of the non-matching build. I just need to make sure it is the right one

04:04 < cfields> achow101: "on-target" after the build gives you a shell

04:09 < achow101> cfields: well that build ended a while ago and I have since done other builds. right now I am trying to start the vm with that image of the mismatching build which I saved and then ssh'ing into it, but it doesn't seem to be working now

04:16 < cfields> achow101: ok, let me know if you manage to get them. I'll check back in the morning

04:39 < achow101> cfields: I got all of the build stuff off of the vm and tar'ed it. It should contain all of the .o files. Download: https://drive.google.com/file/d/0Bxw3ip9QfNOUVzkwUnlhMTExYjg/view?usp=sharing

04:40 < achow101> also I can give you the vm which contains all of that stuff too. I'm waiting for the upload of that to finish

04:46 < achow101> cfields: vm with the mismatching build: https://drive.google.com/file/d/0Bxw3ip9QfNOUN0E2aDZZQU1Pd2s/view?usp=sharing

05:25 < cfields> achow101: er, you sure that's a broken build?

05:30 < luke-jr> (we have 3 sigs on rc2)

05:30 < luke-jr> oh, but not all matching

05:31 < cfields> luke-jr: yea, i think i'll delay signing until morning once a few more are in

05:31 < cfields> now that we have achow101's objects for comparison, I'm hoping it'll point to the culprit

05:31 < achow101> cfields: I'm pretty sure that's the broken build

05:35 < achow101> luke-jr: my matching osx ones are pr'ed

05:36 < cfields> achow101: are you positive? All of my object files are identical as far as i can tell

05:37 < achow101> yes.

05:37 < achow101> you can fire up the vm image I gave you to check as well

05:44 < cfields> achow101: ok nm, got the diff now

05:48 < achow101> cool

05:53 < cfields> achow101: mmm, they're different kernels

05:53 < cfields> that's the only obvious thing i see

05:54 < cfields> maybe qt embeds uname output?

05:56 < achow101> but why would it only affect osx?

05:56 < achow101> also, how are they different kernels? I thought the vms were built exactly the same

05:56 < cfields> they should be

05:57 < achow101> oh, maybe the upgrade that happens every time was failing some of the time?

05:57 < cfields> -uname -r = 3.13.0-108-generic

05:57 < cfields> +uname -r = 3.13.0-77-generic

05:57 < luke-jr> LXC uses the host's kernel

05:57 < luke-jr> so no matter what, we can't rely on kernels to match

05:57 < achow101> luke-jr: I'm using kvm

05:59 < cfields> luke-jr: well the fact that the kernels don't match is indicative that they're not using the same base

05:59 < cfields> in which case glibc (or something) may be different

05:59 < luke-jr> hm

06:00 < cfields> so it seems to be some kind of gitian issue

07:42 < luke-jr> jonasschnelli: what kind of locking issues? can you elaborate?

07:43 < jonasschnelli> luke-jr: the app is unresponsive. I had to force shut down... will take a closer look

07:43 < jonasschnelli> luke-jr: but I like the PR

07:48 < wumpus> so it looks like someone had the test_bitcoin issue outside of travis: #9850

07:48 < gribble> https://github.com/bitcoin/bitcoin/issues/9850 | test_bitcoin: /usr/include/boost/thread/pthread/recursive_mutex.hpp:104: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy() failed. · Issue #9850 · bitcoin/bitcoin · GitHub

07:50 < jonasschnelli> yes

07:51 < jonasschnelli> I tried to reproduce in ubuntu 14.04. but did not had the issue

07:51 < wumpus> same here.

07:51 < wumpus> did a depends build, just like travis, on 14.04, just like travis

07:51 < wumpus> so that means the same version of boost, gcc, etc

07:52 < wumpus> this is really strange

07:52 < jonasschnelli> Oh. Even that.

07:53 < gmaxwell> hurray! (?)

07:53 < jonasschnelli> I ran test_bitcoin in valgrind and I could see some uninitialised value

07:54 < jonasschnelli> invoked by the toggle_network RPC tests

08:00 < wumpus> jonasschnelli: that is a potential concern, however what happens in the RPC tests shouldn't affect test_bitcoin?

08:00 < jonasschnelli> wumpus: I meant the RPC unit tests...

08:00 < wumpus> no valgrind errors in test_bitcoin?

08:00 < wumpus> ooh!

08:01 < jonasschnelli> look for rpc_togglenetwork

08:01 < jonasschnelli> rpc_tests.cpp

08:01 < jonasschnelli> Not sure if its related... we have added this a couple of weeks (or even months) ago

08:02 < jonasschnelli> Here's my valgrind run: https://0bin.net/paste/2xS-7aRGhWA11BlS#uwUOiDB9X4h+puz6AxdtnWiMXF5KJlUhC-WFL8bCy4k

08:02 < jonasschnelli> This also frightens me: ==59692== Conditional jump or move depends on uninitialised value(s)

08:05 < gmaxwell> what version are you running it against those line number do not agree with my code here.

08:06 < jonasschnelli> 9949ebfa6a548260858df429f4d0e716e0a26065

08:06 < jonasschnelli> I think this is 0.14.0rc1

08:07 < jonasschnelli> my setup: ./configure --enable-zmq --enable-glibc-back-compat --enable-reduce-exports CPPFLAGS=-DDEBUG_LOCKORDER --with-incompatible-bdb

08:08 < jonasschnelli> (same as the failing travis setup)

08:08 < gmaxwell> oh geesh we have source files with the same name. bet that'll be fun for anyone trying to build with msvc.

08:09 < jonasschnelli> you mean the problem when we removed the rpc_ prefix and moved them into the rpc/ folder?

08:09 < gmaxwell> yea, at least last time I used it MSVC couldn't handle source files having the same name even if they were in different directories. :)

08:09 < jonasschnelli> My IDEs find by filename also doesn't like this

08:10 < jonasschnelli> We could have kept the rpc_ prefix even after moving them into the specific folder

08:13 < gmaxwell> so in that rpc tests I don't see anything that sets up the conman object. But if it's executing those objects it's not null. How is the g_conman setup in the tests?

08:13 < fanquake> jonasschnelli I can see the same results with valgrind

08:13 < jonasschnelli> TestingSetup() jas a g_connman = std::unique_ptr<CConnman>(new CConnman(0x1337, 0x1337)); // Deterministic randomness for tests.

08:14 < fanquake> https://0bin.net/paste/DLBX7+ZYaQ79TrRS#ACJ-Fp8c8aAZrLW2jDShhRMKbnTlxlnRJDkCRhXfpcI

08:15 < jonasschnelli> Thanks fanquake

08:33 < cfields> https://github.com/theuni/bitcoin/commit/72aa3324bc69640937f2fda6a63634bcf1e8c6c1

08:33 < cfields> should fix the connman issue, though i seriously doubt that's the crasher

08:33 < cfields> (thanks marcofalke for pointing that out earlier)

08:35 < cfields> i'll PR that in the morning

09:22 < bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/692c9eddba67...00285cece814

09:22 < bitcoin-git> bitcoin/master f81f0d0 Russell Yanofsky: Update sendfrom RPC help to correct coin selection misconception

09:22 < bitcoin-git> bitcoin/master 00285ce Wladimir J. van der Laan: Merge #9840: Update sendfrom RPC help to correct coin selection misconception...

09:22 < bitcoin-git> [bitcoin] laanwj closed pull request #9840: Update sendfrom RPC help to correct coin selection misconception (master...pr/fromacct) https://github.com/bitcoin/bitcoin/pull/9840

09:54 < bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/00285cece814...dd6e0d630167

09:54 < bitcoin-git> bitcoin/master ef9f495 Marko Bencun: Trivial: fix comments referencing AppInit2...

09:54 < bitcoin-git> bitcoin/master dd6e0d6 Wladimir J. van der Laan: Merge #9833: Trivial: fix comments referencing AppInit2...

09:54 < bitcoin-git> [bitcoin] laanwj closed pull request #9833: Trivial: fix comments referencing AppInit2 (master...stalecomments) https://github.com/bitcoin/bitcoin/pull/9833

09:58 < paveljanik> FWIW - I'm not able to reproduce test_bitcoin failures on any of my machines (different unices) :-(

10:03 < wumpus> darn

10:03 < bitcoin-git> [bitcoin] laanwj closed pull request #9846: doc: Small release notes fixups in the list of pulls (0.14...Mf1702-014doc) https://github.com/bitcoin/bitcoin/pull/9846

10:05 < wumpus> there seems to be nothing *special* in the config.log posted in #9850

10:05 < gribble> https://github.com/bitcoin/bitcoin/issues/9850 | test_bitcoin: /usr/include/boost/thread/pthread/recursive_mutex.hpp:104: boost::recursive_mutex::~recursive_mutex(): Assertion `!pthread_mutex_destroy() failed. · Issue #9850 · bitcoin/bitcoin · GitHub

10:06 < wumpus> standard ubuntu 16.04 versions of everything

10:08 < wumpus> no arguments to configure

10:22 < paveljanik> I suspect some travis issue

10:23 < paveljanik> (even if it was reproduced outside of it)

10:24 < wumpus> I forgot something in my testing yesterday; the travis build passes, --enable-glibc-back-compat --enable-reduce-exports and LDFLAGS=-static-libstdc++" . No difference in reproduction, though

10:26 < wumpus> I also test it faster now, launch test_bitcoin and kill it after a second (after all, the problem happens just before the Running ... line so there's no need to go all the way)

10:30 < wumpus> in any case it just works perfectly, every time, no matter what I do. Almost feels like travis is trolling us

10:35 < gmaxwell> "Why do the patterns of failuers seem to be spelling ascii digits? ...'wouldnt want to give yo..'"

10:36 < wumpus> hehe, yes that would be a giveaway

10:38 < * wumpus> threw DEBUG_LOCKORDER into the mix. No, that didn't help either

10:45 < wumpus> never felt so unhappy to see "*** No errors detected"

10:54 < wumpus> well, so much for trying to reproduce locally, going to try set up a trap for this on travis

11:05 < wumpus> ok my gdb script is working, this should work

11:10 < bitcoin-git> [bitcoin] laanwj opened pull request #9851: [do not merge] travis gdb parachute for #9825 (master...2017_02_travisissue) https://github.com/bitcoin/bitcoin/pull/9851

11:41 < bitcoin-git> [bitcoin] zcc0721 opened pull request #9852: Merge remote-tracking branch 'refs/remotes/bitcoin/master' (master...master) https://github.com/bitcoin/bitcoin/pull/9852

11:42 < bitcoin-git> [bitcoin] laanwj closed pull request #9852: Merge remote-tracking branch 'refs/remotes/bitcoin/master' (master...master) https://github.com/bitcoin/bitcoin/pull/9852

11:49 < bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/dd6e0d630167...f19afdbfb4cb

11:49 < bitcoin-git> bitcoin/master dc222f8 Karl-Johan Alm: Trivial: Rephrase the definition of difficulty in the code.

11:49 < bitcoin-git> bitcoin/master f19afdb Wladimir J. van der Laan: Merge #9612: [trivial] Rephrase the definition of difficulty....

11:49 < bitcoin-git> [bitcoin] laanwj closed pull request #9612: [trivial] Rephrase the definition of difficulty. (master...clarify-difficulty) https://github.com/bitcoin/bitcoin/pull/9612

12:00 < wumpus> wth, one of the builds in #9825 is rebuilding all the dependencies?

12:00 < gribble> https://github.com/bitcoin/bitcoin/issues/9825 | Intermittent FAIL: test/test_bitcoin in Travis · Issue #9825 · bitcoin/bitcoin · GitHub

12:01 < wumpus> eh #9851

12:01 < gribble> https://github.com/bitcoin/bitcoin/issues/9851 | [do not merge] travis gdb parachute for #9825 by laanwj · Pull Request #9851 · bitcoin/bitcoin · GitHub

12:04 < wumpus> Everything that can go wrong is going wrong, man, it's hard to think of a more nightmarish way to debug things. Well maybe debugging the kernel for GPU cache issues wins by a bit :/

12:06 < wumpus> I'm going to cancel all other travis builds to give this one priority, sorry

12:11 < wumpus> ah the builds are starting, let's see what surprises await this time

12:11 < wumpus> NOOOOOOO don't start building ccache :(

12:41 < wumpus> cfields: what would be the best way to skip buildling of dependencies for a PR, for debugging?

12:43 < wumpus> I don't understand why all three builds of #9851 trigger a complete dependency rebuild, but this way it's not going to work, I need a fast iteration time to have any chance of reproducing the issue

12:43 < gribble> https://github.com/bitcoin/bitcoin/issues/9851 | [do not merge] travis gdb parachute for #9825 by laanwj · Pull Request #9851 · bitcoin/bitcoin · GitHub

12:49 < wumpus> oh not all three, just #3, which is the nowallet one. Could just remove that one.

12:49 < gribble> https://github.com/bitcoin/bitcoin/issues/3 | Encrypt wallet · Issue #3 · bitcoin/bitcoin · GitHub

14:18 < achow101> did the signed binary detached sigs come out yet?

14:27 < BlueMatt> wumpus: you could do it on your own personal fork?

14:45 < jonasschnelli> Any idea why the LXC gitian initialization takes that long?

14:46 < jonasschnelli> Here it takes >5mins during "Upgrading system, may take a while"... seems to be very long

14:46 < jonasschnelli> (step between "install.log" and starting of "build.log")

15:36 < cfields> wumpus: note that DEBUG=1 is used for the crash case. That adds the extra bounds checking from libstdc++

15:41 < cfields> wumpus: as for rebuilding depends, the travis cache depends on the env vars set. So if you change an env var, it will create a new cache because it looks like a new build that it shouldn't clobber

15:42 < cfields> where "change" also includes adding/removing env vars

15:48 < cfields> gitian builders: sigs for v0.14.0rc2 are pushed

16:13 < wumpus> ah so the env vars are the secret :)

16:29 < bitcoin-git> [bitcoin] jnewbery opened pull request #9853: Fix error codes from various RPCs (master...fixerrorcodes) https://github.com/bitcoin/bitcoin/pull/9853

16:29 < bitcoin-git> [bitcoin] jnewbery closed pull request #9713: Fix error causes and messages in rpc/net.cpp (master...fixsetbanerrormessages) https://github.com/bitcoin/bitcoin/pull/9713

16:29 < bitcoin-git> [bitcoin] jnewbery closed pull request #9714: Return correct error codes from bumpfee() (master...bumpfeeerrormessages) https://github.com/bitcoin/bitcoin/pull/9714

16:39 < BlueMatt> so now that we have named args someone should probably do a pass and fix the million places that we reject args that are null even when they have a default value, I suppose?

16:41 < wumpus> BlueMatt: yes - null should be interpreted as the default value, on a call by call basis

16:42 < wumpus> I intend to get around to that for 0.15

16:46 < wumpus> in most cases it's trivial

16:46 < wumpus> there are a few such as getbalance that have slightly different functionality based on the number of arguments, some discussion will be needed there

21:17 < sipa> i can't file an issue right now, but my RPi3 bitcoind OOMed, and marked a block invalid as a result

21:17 < sipa> that's very bad...

21:17 < sipa> on 0.14.0rc1

21:28 < cfields> sipa: yikes

21:28 < cfields> sipa: any idea where it oom'd?

22:23 < sipa> cfields: #9854

22:23 < gribble> https://github.com/bitcoin/bitcoin/issues/9854 | Bitcoind 0.14.0rc1: OOM -> block marked invalid · Issue #9854 · bitcoin/bitcoin · GitHub

23:10 < cfields> sipa: seems i just managed to bring down my dev box while testing a fix (forcing OOM). Hope you're happy :)

23:11 < cfields> woohoo, rescued

23:16 < BlueMatt> cfields: so you have a fix? or should I go look into it?

23:17 < cfields> BlueMatt: yea, i have a patch ready. I'm uneasy about it though, so debate welcome

23:17 < cfields> sec

23:18 < BlueMatt> k

23:22 < cfields> BlueMatt: see 9854

23:22 < BlueMatt> oh

23:22 < BlueMatt> hmmmm, I like that

23:22 < BlueMatt> wait, does this apply to more than bad_alloc?

23:23 < cfields> no

23:23 < gmaxwell> cfields: next time replace malloc with a wrapper. :P

23:23 < BlueMatt> if we can make it apply only to std::bad_alloc then I'm all for it (or is there a list of all the things this could apply to?)

23:23 < BlueMatt> lol

23:23 < gmaxwell> BlueMatt: sipa pointed out the error to me earlier in private, my comment:

23:23 < gmaxwell> 11:55 <gmaxwell> God damnit. it really should not reject the block because of a fucking exception!

23:23 < cfields> gmaxwell: you mean new? :)

23:24 < gmaxwell> 11:55 <gmaxwell> I hate that we use exceptions for error handling in the seralization.

23:24 < gmaxwell> 11:56 <gmaxwell> maybe we can wrap the allocator so that failures kill the process.

23:24 < gmaxwell> cfields: well I mean the underlying libc function new calls, which is malloc. (same way tcmalloc replaces the allocator)

23:24 < cfields> gmaxwell: this isn't our exception. This is a c++ feature.

23:24 < cfields> gmaxwell: right, this overrides what happens when "new" fails. So this is essentially what you're asking for

23:25 < gmaxwell> cfields: no no: Our mistake is that a var int decode failure is an exception. Because of this we cannot wrap block processing with a catch * {tell user their hardware is befucked or someting bad happened}.

23:25 < cfields> gmaxwell: oh, i see what you mean

23:26 < gmaxwell> Which basically means that random programming errors that throw exceptions can cause blocks to be rejected intead of the node shutting down, which is exactly what produced the bdb locks as a fork rather than a brief DOS.

23:26 < BlueMatt> wait, ok, so has someone identified what actually happened here?

23:26 < gmaxwell> There are basically three states for block processing: "I have a valid block", "I have an invalid block.", and "I notice that I am confused." the latter should shut down without marking the block invalid.

23:26 < sipa> gmaxwell: i think you're overgeneralizing

23:27 < cfields> gmaxwell: i completely agree. but this is a specific case that can be easily detected

23:27 < sipa> gmaxwell: problems during deserialization shouldn't _ever_ cause a block to be marked invalid

23:27 < gmaxwell> yes, this one we can work around. But where is the next one? this is the second one of those btw.

23:27 < gmaxwell> Leveldb internal errors also used to do this to us.

23:27 < gmaxwell> Third if you count bdb's internal errors.

23:27 < cfields> gmaxwell: so let's fix that independently :)

23:27 < gmaxwell> I'm fine with your general fix approach for now.

23:28 < cfields> there's one gotcha there, though... prevector calls malloc directly

23:28 < gmaxwell> I am lamenting that C++ code randomly calls exceptions without documenting the possiblity clearly. And that we make use of exceptions to mark invalidity. Which means that random internal errors can mark invalidity. And you all know I hate exceptions, so that bias is not in question. :)

23:29 < gmaxwell> cfields: why are you replacing new and not malloc? (I don't have a strong opinion, it's just a question)

23:29 < sipa> cfields: it could use new[] instead, i think

23:29 < sipa> gmaxwell: how do you replace malloc?

23:29 < gmaxwell> glibc has a specific override. But perhaps there is no portable way?

23:29 < sipa> you'd need to do it with link-time magic, and hope that libstdc++ doesn't bypass it somehow

23:29 < BlueMatt> I'm still confused, where do we use such exceptions to mark invalidity?

23:29 < sipa> BlueMatt: i don't know!

23:29 < sipa> we shouldn't!

23:30 < BlueMatt> yes, I dont see the specific issue here, yet

23:30 < gmaxwell> sipa: your logs showed we did exactly that.

23:30 < BlueMatt> gmaxwell: no they dont

23:30 < BlueMatt> "ERROR: ConnectBlock(): inputs missing/spent"

23:30 < cfields> BlueMatt: my take-away from the above was that if we didn't throw in deserialization, we could just wrap acceptblock and activatebestchain in try/catch(), and abort any time something's caught

23:30 < BlueMatt> that was after the bad_alloc

23:31 < cfields> gmaxwell: memory allocation failed, then the _next block_ was rejected

23:31 < BlueMatt> cfields: yes, and we should do that, probably still

23:31 < sipa> gmaxwell: my assumption is that the error _is_ caught somewhere, not passed up, and as a result a normal "fail" return value is returned, and a higher layer interprets that as invalid block

23:31 < sipa> gmaxwell: i don't think we have anywhere a direct "exception? mark invalid!" logic

23:31 < BlueMatt> sipa: script interpreter does

23:31 < BlueMatt> but thats it i believe

23:32 < BlueMatt> (in the debug log you posted I do not believe that was the error, either)

23:32 < gmaxwell> we do all over the place! we have a generic catch that returns false on functions that must be true for validity.

23:33 < BlueMatt> gmaxwell: we do?

23:33 < gmaxwell> Open up validation.cpp basically every catch in there does this.

23:33 < cfields> sipa: my take was that the block was accepted, but we didn't switch to the new tip, so the next block failed when looking up inputs

23:33 < BlueMatt> only script interpreter i believe

23:34 < gmaxwell> okay it's not as bad as I thought.

23:34 < sipa> did we in 0.14 introduce the SendRejectsAndCheckIfBanned(pfrom, connman) call in net_processing:2754 ?

23:34 < BlueMatt> well now that i check it is worse than I thought :p

23:34 < sipa> which before used to be inside the catch block?

23:34 < BlueMatt> some disk reads shit that probably should be smarter than it is

23:35 < BlueMatt> sipa: yes, and no, before it didnt exist

23:35 < BlueMatt> (was only in SendMessages)

23:35 < cfields> sipa: it's new, we used to only send rejects+ban from SendMessages()

23:36 < sipa> i see

23:38 < cfields> imo the throw happened somewhere around SetBestChain, it was just caught in ProcessMessages because that's the only place we do a generic catch(...)

23:38 < gmaxwell> BlueMatt: well I thought it was _every_ one of them, but I checked readblockfromdisk and it's not.