< achow101>
For some reason, all of my builds locally continuously fail p2p-fullblocktest.py (it's been happening for a while now, I've just been ignoring it). any ideas why?
< achow101>
it just times out
< achow101>
and it's unrelated to anything that I am working on, happens on master and checked out PRs that are passing Travis
< wumpus>
well it used to be that such issues were related to the cache, but I think that's properly cleared now
< wumpus>
(e.g. sticky issues that seem to happen only in one environment)
< gmaxwell>
how do timeouts even work if not running in travis?
< wumpus>
there are some timeouts during the test as well, say if it's waiting for two nodes to be synced and it takes too long, or when an RPC command takes too long
< gmaxwell>
makes sense.
< gmaxwell>
achow101: sounds concerning.
< wumpus>
these should be really high but just make sure it finishes some day even if there's some bug preventing forward progress
< wumpus>
in some cases they're set too tight though and they can trigger if a system is just at high load
< wumpus>
(then again that tends to happen on travis not locally)
< achow101>
It's a consistent test failure, happens on test 98 of p2p-fullblocktest.py and it is a timeout
< achow101>
I think it's literally the same line every single time, but I haven't actually bothered to investigate
< wumpus>
they're among the most annoying failures to debug, at least with wrong output there's something clear to investigate
< gmaxwell>
Which test is "98"?
< gmaxwell>
I don't see a number 98 in the file, so... (sorry, I'm still kind of embarassingly ignorant about a lot of the test framework)
< achow101>
gmaxwell: I dunno. It just says "Test 98: PASS" and then "(ERROR): Assertion failed"
< achow101>
ehh, I guess it's failing on test 99 then..
< wumpus>
any specifics on the assertion that fails?
< bitcoin-git>
bitcoin/master 5ef3b69 Wladimir J. van der Laan: Merge #11524: [net] De-duplicate connection eviction logic...
< bitcoin-git>
[bitcoin] practicalswift opened pull request #11634: wallet: Add missing cs_wallet/cs_KeyStore locks to wallet (master...missing-wallet-locks) https://github.com/bitcoin/bitcoin/pull/11634
< instagibbs>
achow101, same issue for months locally
< instagibbs>
"glad" im not the only one
< morcos>
can you guys open an issue or something for these test failures. they should not be failing. they can be annoying to debug, but i feel like we are slowly improving them so lets not become immune to failures
< jnewbery>
#11468 should make comp test framework tests more debuggable (and we should remove them entirely at some point)
< luke-jr>
I noticed timeouts give a useless assertion error with regard to times.. I wonder what it would take to have the message describe the actual condition
< luke-jr>
(otoh, the traceback typically includes that, so not a huge deal)
< jnewbery>
luke-jr: yeah, it's buried in there, but assert message could def be improved
< luke-jr>
just not sure how to do that kind of thing with Python. I guess catch it and analyze the traceback object to get info to re-throw with?
< MarcoFalke>
re: test failure. Agree that an issue should be opened in case the fix is not trivial.
< MarcoFalke>
Also, the command line switch to jump into pdb only works on shutdown/teardown of nodes
< MarcoFalke>
Currently not possible where the assertion hits, iirc
< BlueMatt>
luke-jr: care to close #10593 or at least provide some justification? there were objections about it in principal and you didn't materially address them or explain what the pr was even trying to do.....
< gmaxwell>
I'm still in favor of 0.15.1 though I do think the urgency is reduced a little.
< Chris_Stewart_5>
MarcoFalke: It was my understanding that is the release for segwit wallet support because of s2x stuff
< jnewbery>
I reckon the improvements in v0.15.1 are helpful to fortify against any hostile hard fork. Si vis pacem, para bellum.
< luke-jr>
I thought we moved it to an early 0.16 *shrug*
< MarcoFalke>
Yeah. 15.1 has other bugfixes in as well. Though, not all that I liked to put in
< MarcoFalke>
We should get it out ntl
< sipa>
luke-jr: undecided in last meeting, but i think that's a reasonable thing to do - push 0.16 forward a bit rather than a 0.15.2 in between
< gmaxwell>
I hope a lot rather than a bit. :)
< gmaxwell>
that would be pretty awesome.
< MarcoFalke>
Can move it at most 1 or 2 months ahead. Moving it to december would be way too rushed, imo
< aj>
is it bringing 0.16 features forward or renaming 0.15.2 as 0.16 and 0.16 as 0.17?
< luke-jr>
meh, when it's ready it's ready..?
< luke-jr>
aj: IMO the latter
< aj>
MarcoFalke: 0.16 timeline has two months for translations, that seems like it puts an absolute minimum? ie, make the decision on friday, open translations this month some time, then no release prior to mid january?
< MarcoFalke>
aj: Segwit wallet review/testing is still needed
< aj>
MarcoFalke: yeah. and needed prior to merging and hence prior to translations?
< gmaxwell>
MarcoFalke: there are thousands of b2x nodes that will go off in space still, and also still make bcash nodes that are off in space.
< gmaxwell>
MarcoFalke: so I don't think removing the service flag thing makes sense.
< gmaxwell>
unless I'm missing something.
< achow101>
gmaxwell: for 0.16, not 0.15.x I think
< achow101>
hopefully they'll be gone by then
< gmaxwell>
they're not gone yet, and if they're not by then we need to add it back. Plus people run master.
< gmaxwell>
(some people)
< MarcoFalke>
Jup, this is meant for 0.16. We can merge it in a couple of weeks. That should be fine, no?
< gmaxwell>
I don't expect it'll make it for 0.16. There are still hundreds of bcash nodes that use the old magic.
< gmaxwell>
no harm in having a pull req, I suppose.
< achow101>
maybe we should hold off on that until shortly before 0.16 and re-evaluate then
< cfields>
oh, commented on the PR before i saw the discussion here. Agree it's too early for that.
< achow101>
can we remove the "good first issue" tag from the release schedule issue? It apparently confuses the people that I talk to about how they can start contributing to core
< sdaftuar>
luke-jr: regarding #10593 -- as i read the code, a peer serving a header that doesn't connect will be assigned 20 dos points. is that correct?
< sdaftuar>
luke-jr: actually, scratch that (i just looked at it again)
< sdaftuar>
luke-jr: an outbound peer that serves a header that doesn't connect will be disconnected?
< luke-jr>
sdaftuar: right
< luke-jr>
we want our outbound peers to always be on the same chain ideally
< sdaftuar>
so in particular someone mining a block 2 hours ahead can split the network
< luke-jr>
disconnecting outbound peers only should never split the network, am I wrong?
< sdaftuar>
i think, at the least, that being so aggressive is risky
< luke-jr>
it might make sense to be softer on the time rule I guessw
< luke-jr>
not sure that's related to this PR though
< sdaftuar>
the point of that unconnecting headers PR was to avoid (eventual) disconnection in the situation where peers weren't on an incompatible chain
< sdaftuar>
that's why i called your PR a reversion of that logic
< luke-jr>
it only disconnects when the peer *is* on an incompatible chain, though?
< sdaftuar>
i don't think so? it disconnects if they serve a header that doesn't connect
< sdaftuar>
which could be for a number of reasons
< sdaftuar>
including the intermediate block being just over 2 hrs in the future
< sdaftuar>
or because they sent a header when they shoudl have sent an inv or something
< sdaftuar>
which is a p2p error, not a consensus error
< luke-jr>
not sure disconnecting for p2p errors is a bad thing; are there any cases where it's neither a p2p issue nor a consensus error (which the 2h limit kindof is)?
< sdaftuar>
but the original way i saw this was on testnet, where a peer using headers announcements for blocks (eg via the sendheaders p2p protocol) would announce a block, via header, that didn't connect, because the prior block was on the wrong side of the 2hr rule
< luke-jr>
and don't we disconnect after N such headers anyway already?
< sdaftuar>
we do, but we set N to be big-ish to account for this occasionally happening iirc
< sdaftuar>
the thing that scares me is if an attacker can get us to disconnect an outbound peer
< sdaftuar>
i mean, disconnect an outbound peer for spurious reasons
< sdaftuar>
because that makes us more vulnerable to sybil attack
< luke-jr>
hmm
< luke-jr>
sdaftuar: can you think of any cases besides the time one this might be an issue for?
< sdaftuar>
well i think matt hated the idea that peers couldn't just switch to announcing everything via header, rather than having to send an inv if they weren't sure the header would connect
< sdaftuar>
the idea that we might disconnect a peer for an unconnecting header is an undocumented p2p behavior
< sdaftuar>
so i think it's sort of a question of how the p2p layer ought to work, and what we should tolerate
< luke-jr>
sdaftuar: so long as you know your peers have header X, sending X+1, X+2, etc should be safe so long as they're on the same chain?
< luke-jr>
(and if they're not, you don't want them as a primary peer)
< sdaftuar>
i'm not sure i understand your question, but i think the answer is yes. can you remind me what case you're concerned about addressing?
< sdaftuar>
i think the recently merged PRs for 0.15.1 address all the cases of bad outbound peer that i was able to come up with, with the goal of ensuring that not all of the outbound peers are on bogus chains
< sdaftuar>
(i'm actually working on a document now that explains all the cases and our logic, which i can share when i'm done)
< luke-jr>
sdaftuar: "This is necessary to avoid banning peers that merely run old formerly-full nodes, after a softfork. We disconnect primary peers because we want compatible full nodes for that role, but allow non-full nodes to remain connected to inbound slots so they can sync the correct chain from us."
< luke-jr>
I suppose there's no reason we couldn't leave the counter in too.
< sdaftuar>
luke-jr: ok yes, i agree with that objective (sorry i haven't paged in your PR in a while). i will try to take a look with fresh eyes, i've lost track a little of where we are (i seem to recall some other PRs that try to address this as well)
< sdaftuar>
to rephrase -- i think we have enough protection in place for outbound peers now (to ensure that we're not solely connected to bogus outbound peers). but we can do more to protect inbound peers from disconnection, eg in the event that they are unaware of a softfork
< BlueMatt>
luke-jr: you might consider rebasing on #11639
< gribble>
https://github.com/bitcoin/bitcoin/issues/11639 | Rewrite the interface between validation and net_processing wrt DoS by TheBlueMatt · Pull Request #11639 · bitcoin/bitcoin · GitHub
< BlueMatt>
then you get nice things like invalid reasons (including SOFT_FORK and BAD_TIME) so you can avoid splitting the network
< BlueMatt>
plus the resulting pr will be much clearer
< p3scobar>
Roger, you in here?
< luke-jr>
BlueMatt: from the description, I wonder if 11639 makes 10593 unnecessary
< BlueMatt>
it might, i still dont fully understand 10593
< BlueMatt>
but 11639 (mostly) doesnt change behavior
< luke-jr>
BlueMatt: the goal of 10593 is essentially to not ban pre-softfork nodes
< BlueMatt>
I doubt its fully fixed by 11639, but I dont know exactly which case you're handling differently there
< promag>
ryanofsky: ping
< ryanofsky>
i'm here
< promag>
does the qt test timeout?
< promag>
if one callback stays pending
< ryanofsky>
it will hang if an expected signal never arrives
< ryanofsky>
existing wallettests also does this, probably there should be a global test timeout
< promag>
ok, something to improve later if relevant
< boreddanman>
Hey all, my name is Dan, and I'm a software engineer & crypto analyst. Figured i'd stop by the dev chat to see what's cookin'
< BlueMatt>
not much, usually, usually pretty quiet unless people are talking about specific things - keeps the backlog short so folks can stay caught up easily
< boreddanman>
gotchya. any other channels i should check out?
< BlueMatt>
#bitcoin is more active, #bitcoin-wizards is occasionally interesting