< bitcoin-git>
[bitcoin] MarcoFalke merged pull request #21840: test: Misc refactor to get rid of &foo[0] raw pointers (master...2105-testRefactor) https://github.com/bitcoin/bitcoin/pull/21840
< bitcoin-git>
[bitcoin] MarcoFalke opened pull request #21848: refactor: Make CFeeRate constructor arch-independent (master...2105-feerate) https://github.com/bitcoin/bitcoin/pull/21848
< bitcoin-git>
[bitcoin] MarcoFalke opened pull request #21849: fuzz: Limit toxic test globals to their respective scope (master...2105-fuzzToxic) https://github.com/bitcoin/bitcoin/pull/21849
< hugohn>
I saw some chatter earlier about folks wanting to change the BIP process, but until that becomes more clear I assume the process stays the same.
< hugohn>
(also yay on Taproot signaling progress)
< wumpus>
hugohn: nothing, it should just get a BIP number assigned
< wumpus>
yes, moving to names might make sense too, in any case please don't let bitcoin innovation be stuck on having to centrally assign numbers, this shouldn't be an issue
< hugohn>
I've already named the branch "bip-hugonguyen-bsms". Does it need to be included in the spec itself?
< hugohn>
on a side note: I think using BIP numbers or acronyms like BSMS / PSBT as unique identifiers for a spec is still more ideal. Using personal names as identifiers for a standard seems... weird :)
< hugohn>
not to mention the long ID is a mouthful / harder to share
< hugohn>
although the idea of of a more decentralized process does sound good
< hugohn>
maybe the process could be decoupled, but the ID aspect could stay centralized (e.g. there could be a BIP on BIP number generation). just thinking out loud.
< wumpus>
an idea we had early was to simply use the PR number as BIP number
< wumpus>
some projects (rust, I think?) use this approach
< wumpus>
but also there was some effort to make BIP numbers meaningful, certain ranges map to certain things, it is not simply a sequence generator
< wumpus>
agree that having an author name in it is weird, on the other hand, namespacing is hard, many projects eventually settle on organization/pseudonym based namespacing to prevent name squatting etc
< hugohn>
right. if it's not a simple sequence generator, a BIP on BIPs probably makes more sense.
< wumpus>
this is mustly a problem if it would be managed by as scriptbot
< wumpus>
you mean like BIP1 and 2?
< hugohn>
yes
< wumpus>
in any case, this discussion is kind of off-topic here, the bitcoin core project is one implementation, that the current BIPs repository is in the same orginazation is a historical artifact, it does not mean BIPs 'belong' to bitcoin core or something like that
< michaelfolkson>
I think there will be future discussions on a revised BIP process once (hopefully) Taproot activation is completed hugohn if you'd like to engage with that https://github.com/bitcoin/bips/pull/1015
< michaelfolkson>
A few things I think are probably in need of a revamp (or at least a discussion)
< hugohn>
thanks @michaelfolkson . I'd take a look.
< wumpus>
ideally it should be a fast and low-friction process
< michaelfolkson>
Right, there are a few edge cases that need ironing out too eg what "Rejected" means and how you get out of "Rejected" status (if indeed you ever do)
< hugohn>
@wumpus I understand. I think we should continue with this "historical artifact", as unfortunate as it is, until there's a better way.
< hugohn>
As the Bitcoin protocol slowly ossifies, the bulk of new specs will likely be in the Application layer. It does make less sense to bundle them with the Core project.
< michaelfolkson>
hugohn: They aren't really bundled with the Core project. It is true the BIPs repo is under the Bitcoin Core GitHub org but no Bitcoin Core maintainers merge BIP PRs afaik so I think it is fine under the Core org
< michaelfolkson>
hugohn: I personally think it would be overkill to give the BIPs repo its own org
< michaelfolkson>
wumpus: There already is "An issue or feature request related to the GUI" option when you open an issue. Is that what you mean?
< michaelfolkson>
wumpus: You click on that option and you get "Any report, issue or feature request related to the GUI should be reported at
< jonatack>
MarcoFalke: question regarding fuzzed_data_provider.ConsumeBool(), which exists but isn't used yet...is it ok to add a line to an enum to use it? e.g. adding "kMaxValue = NET_MAX," at the end of netaddress.h::enum Network
< fanquake>
I see 169 uses of fuzzed_data_provider.ConsumeBool() ?
< jonatack>
er, ConsumeEnum, mistyped...or do we prefer to avoid changing enums to use it
< jonatack>
will propose using ConsumeEnum() and see what reviewers say
< bitcoin-git>
[bitcoin] kiminuo opened pull request #21850: Remove `GetDataDir(net_specific)` function (master...feature/2021-05-get-data-dir-step-2) https://github.com/bitcoin/bitcoin/pull/21850
< bitcoin-git>
[bitcoin] fanquake opened pull request #21851: [WIP] build: support cross-compiling for arm64-apple-darwin20 (Apple M1) in depends (master...m1_support_depends) https://github.com/bitcoin/bitcoin/pull/21851
< jonatack>
update on FuzzedDataProvider::ConsumeEnum(), proposed to allow passing it a std::optional<T> max_value, so ConsumeEnum() may be used without needing to change existing enums.
< vasild>
does it assume contigious values starting from 0?
< jonatack>
yes, same as before
< jonatack>
only avoids having to add an alias to the enum in the codebase
< vasild>
yeah, can't be otherwise :/
< jonatack>
otherwise, there's the existing ConsumeWeakEnum(values list...
< bitcoin-git>
[bitcoin] jonatack opened pull request #21855: fuzz: enable passing a max value to FuzzedDataProvider::ConsumeEnum() (master...ConsumeEnum-enable-passing-max-value) https://github.com/bitcoin/bitcoin/pull/21855
< hugohn>
@luke-jr the inclusion of things like NO_ENCRYPTION mode and BIP39-like PBKDF2 function parameters in the spec is to help with backward compatibility. IMO the main thing that might not be backward compatible with all vendors is the stateful nature of the Signers, which I have stated in the Compatibility section.
< gmaxwell>
Current bitcoin core causes really old nodes to be unable to sync and to dos attack you. The issue is that prior to 0.7-ish the size of headers messages wasn't limited, and the node requests headers from everyone. They blast you with a multimegabyte header and disconnect and go do it to someoen else. These old nodes seem to have also formed a fork at an early height, nodes that have this
< gmaxwell>
fork will send it to you and get rejected for sending a fork before the highest checkpoing-- another product of requesting headers.
< gmaxwell>
Fixing this appears to be trivial: Just don't ever request headers from peers that aren't NODE_WITNESS-- the node will never request blocks from them anyways. But I changed this and it breaks like a dozen tests, some of which didn't seem trivial to fix.
< gmaxwell>
Someone who cares about p2p might want to fix this.
< gmaxwell>
(until I stopped sending headers reqests to non-node witness peers, this garbage was few percent of my (pruned) node's bandwidth usage)
< gmaxwell>
(my correct behavior might be slowly fixing the second forked-chain problem since I should be healing the partition)
< gleb>
sipa: thank you! I'm working on more or less finalizing erlay to invite people running nodes. Fully went through antoine's review, gonna make one protocol tweak it seems.
< gleb>
*to invite people to run nodes.
< sipa>
gleb: cool, will run
< glozow>
\o/
< jnewbery>
sipa: are you looking for any specific help?
< sipa>
not in particular, but review is of course always welcome
< sipa>
with this PR https://github.com/sipa/minisketch/pull/20 (making us able to just build field size 32 instead of everything) the binary size goes from 3 MB to 64 KiB
< jnewbery>
gmaxwell: I'm surprised there are a dozen tests with nodes that aren't signalling NODE_WITNESS. Do you have a branch?
< jnewbery>
as far as I'm aware, the only functional test with a node that doesn't signal NODE_WITNESS is p2p_segwit.py (and that should be removed in #21090
< gmaxwell>
(I say might because the patch I run locally is much larger, but it looks like I previously saved out that chunk while trying to run the tests.)
< gmaxwell>
gleb: what protocol change?
< jnewbery>
Did you try gating on fHaveWitness instead of fClient?
< gmaxwell>
No, though we shouldn't be sending headers requests to non-node-network peers either--. But ah, I see your point wrt implicated tests.
< sipa>
gleb: was it needed to include minisketch in bitcoin-tx?
< gmaxwell>
(my sequence was that I just fixed it for myself to see that it fixed the behavior... then just tried running the tests to see if it looked like it would be easy to submit upstream, it failed a number of tests, so I just dropped it into the patch I carry.
< gleb>
gmaxwell: Ah hold on, maybe ignore that.
< gleb>
I just realized maybe it's fine all the way.
< gleb>
sipa: sorry, are you referring to the way I integrate minisketch in the erlay PR?
< sipa>
gleb: yeah, cherry picking that
< sipa>
just wonder if there was a reason for that
< gleb>
gmaxwell: yeah indeed, I withdraw my comment, no protocol tweaks. Will finish last touches per antoine feedback tomorrow, and freeze the version to run nodes.
< gmaxwell>
gleb: good because I was about to respond to you hear after reading that saying I didn't undrestand the issue there. :P
< gleb>
gmaxwell: I'm glad you're following the work anyway.
< gmaxwell>
gleb: I think in general when we talk about erlay (e.g. in the paper and elsewhere) we use this "non-reachable peer" terminology, but non-reachability doesn't exist explicitly anywhere in the implementation of bitcoin core or the bitcoin protocol... so although non-reachable peers were a good concept for modling, it can be a little confusing when it comes to the implementation.
< gleb>
gmaxwell: Agree, I should think if I could gracefully disconnect the "general intuition" from protocol language.
< sipa>
right, the protocol should only concern itself with "inbound" and "outbound" and "recon sender" and "recon receiver"
< gleb>
Also, I should think about non-reachable nodes with a node from the same NAT or something, I just realized our current approach is prone in this case?
< gleb>
Say my "non-reachable" node A (as the peers can easily tell by probing IP) has my own trusted inbound node B and that's the only connection for B, then the peers can easily tell that transactions flooded from B are either from B or from A.
< gleb>
I should admit I never considered this corner case.
< sipa>
hmm, do we just want to disable reconciliation on connections to/from non-public IPs?
< gmaxwell>
that shouldn't be neceesary.
< sipa>
i think the problem is: you have internal node A, only connected to bridge node B, which is connected to the public network
< sipa>
if A relay a tx to B, B will treat this as being a reachable node, and use erlay to relay it further... which it would never do if B were just a non-reachable node without internal nodes behind it?
< gleb>
sipa: right. Or maybe if there are N bridges, but A is still generally unavailable.
< sipa>
sorry, i swapped notation; my A and B are the B and A in your explanation
< gmaxwell>
I don't want to comment much because I've probably forgotten everything but this seems confused to me. The privacy loss comes from flooding conditionally flooding, using the reconcillation itself is just additive and harmless.
< sipa>
gleb: is the logic "only use recon for transaction received from inbound connections" now?
< gleb>
sipa: only use flood in that case.
< gleb>
Ok, there are 2 relevant flags.
< sipa>
gmaxwell: the choice whether to relay through recon or flood reveals information
< gmaxwell>
sipa: It should always relay through recon (otherwise it's potentially just adding missed items to the sets), it might choose to additionally relay through flooding.
< sipa>
gmaxwell: that's not what erlay proposes
< gmaxwell>
I don't remember it being broken. :P
< gleb>
sipa is correct, it was always either or
< sipa>
come on
< gmaxwell>
sipa: failing to add it to recon strictly wastes bandwidth, unless I'm confused.
< gmaxwell>
Quite possible I'm confused.
< gmaxwell>
If you don't add it to recon but your remote peer does, you just end up with a fake difference.
< gmaxwell>
As your remote peer won't make the same flood/no-flood decision you do.
< gmaxwell>
Am I confused?
< sipa>
i'm trying to swap in the whole idea again :)
< gleb>
Yeah same here, gimme a second.
< sipa>
gleb: what happens with locally generated transactions?
< gmaxwell>
I don't think this changes the fundimental issue gleb is looking at though.
< gmaxwell>
But I'm trying to remember how any of this works so every piece of confusion is standing out for me.
< gleb>
sipa: The idea was to always reconcile local transactions. It's done to preserve privacy of non-reachable, but you won't understand why unless you know the full protocol...
< sipa>
gleb: so in which cases do you only flood a tx?
< gleb>
sipa: if the transaction is received from inbound AND the peer is 1 of 8 outbound flooding peers.
< gleb>
The idea is: non reachable has no inbounds, so they never flood (so local txs should also not be flooded)
< sipa>
which for now is effectively all outbound peers?
< gleb>
sipa: yes, now. ANd my confusion was the case after bump, but now I think that case is fine.
< gleb>
The bridge case is not fine though. When this non-reachable is a bridge.
< gmaxwell>
If a transaction that comes in from an inbound peers is only flooded out and not reconciled, how will those txn ever end up in reconcoillation? :P
< sipa>
i believe the solution is just changing the criterion to "received from inbound *with public IP* AND ..."
< gmaxwell>
ehhh
< sipa>
but i still want to understand this again
< sipa>
because i don't remember why this flooding is done in the first place
< gmaxwell>
I still don't believe that it floods instead of reconcilling rather than in addition to, I think that is either transparently wrong or I'm badly corrupted. :)
< sipa>
gleb: you say "received", is that both when received through flooding and through recon?
< gleb>
sipa: currently there is no distinction, since we added an extra round for INV (just non-duplicate INV). It's possible to compare by short id, but currently I don't.
< gleb>
It all looks like inbound inv.
< sipa>
ok, just trying to reconcile my memory
< gleb>
gmaxwell: I have to think why I didn't do additive
< lightlike>
isn't recon/flooding a property of the connection instead of the transaction? So that a tx received from an inbound would be flooded to outbounds, but reconciled with other inbound peers?
< sipa>
lightlike: no
< sipa>
(well, both)
< gmaxwell>
My stored approximation of early is that you reconcile with peers, and also flood each transaction you recieve, regardless of how you recieve them, with low fanout to outbound peers (obeying the normal rules about not sending a peer a txn you already know they have). I know this isn't exact in the details.
< gleb>
yeah, both.
< sipa>
so the decision to flood and not reconcile should correspond to a prediction that you will have that tx while your peer won't
< sipa>
how does "received from an inbound connection" lead to that prediction?
< gleb>
gmaxwell: The problem with that is that non-reachable nodes will do too much useless flooding. They learn a transaction from 1 reconciliation, and then they flood to all the rest outbound peers?
< gmaxwell>
gleb: If nodeA and nodeB both recieve tx X, and nodeA decides to recon it, and nodeB decides to flood it, then the recon wastes bandwidth. The process I understood you describing above wouldn't have peers making the same recon decision for the same txn...
< gleb>
gmaxwell: ok, my approach works, and let me explain.
< gmaxwell>
gleb: that is why the flooded relay is low fanout.
< gleb>
Node A doesn't add tx to the set_B because it flooded the tx to B. Node B doesn't add it to the set_A because it was received from A.
< gleb>
That's why both sets won't have the tx at the end
< gmaxwell>
gleb: But node_b also could have recieved it from someone else first.
< sipa>
yes, it's clear to me you don't want to both flood and add to recon set, because the peer won't relay the tx back from to, so adding it to the set would hurt reconciliation
< gmaxwell>
sipa: Why wouldn't they add the flooded txn to it, if was new to them? It's free if both sides have it.
< sipa>
the question is why is there an assumption in this received-from-inbound-and-relay-to-outbound case that there won't be a relay in the other direction (as in: why can't the peer have learned the tx from elsewhere)
< sipa>
gmaxwell: any tx you actually receive announced through flooding you can delete from that peer's recon set (not sure if the current implementation does that), but if it does, that seems strictly better than adding
< sipa>
(in case you flood)
< gmaxwell>
Consider, nodeb gets a txn from c, nodea also gets a txn from c. Nodea elects to flood it to a, nodeb elects to recon it to a. Now there is an excess set difference. If instead nodea flooded it and added it to recon, and B did as well (including for flooded txn recieved from a) the only time there would be a retcon difference is during a race with the flooding.
< sipa>
delete if it was in that set in the first place, of course
< gmaxwell>
okay I agree that would also work.
< gleb>
Yeah, I see the point indeed. We should get rid of this extra assumption
< gmaxwell>
I think they're ... the same? in both cases you get a extra item in recon during a race between a flood and a recon.
< sipa>
gleb: do you do this currently in the code? if i INV a tx to you for whatever reason, and that tx was in your recon set with me, do you delete it from that set?
< gleb>
sipa: now, but I'm realizing now I should.
< sipa>
gmaxwell: yeah, you either want to both add it, or both delete it when flooded- deleting seems slightly more efficient
< gmaxwell>
In any case, the model where everything is recon and some flooding occurs at low order doesn't have the issue that outbound only nodes behave differently, other than they just happened to not recieve any txn via flooding so they'll get txn later. I'm not arguing that it should work that way, just trying to remember the motivation for it not working that way.
< gmaxwell>
sipa: Agreed. Though I'm not clear why deleting is slightly more effcicient? just that never adding it on one side takes less computation?
< sipa>
gmaxwell: yes, it ~ns scal different
< sipa>
not adding on both sides
< gleb>
gmaxwell: well, I chose to reconcile in a queue-manner every 2 seconds (16 seconds per peer). And the queue consists of outbounds.
< gleb>
If we stick to these rules, we also want to flood some to make relay across the "backbone" faster.
< gleb>
gmaxwell: how would you suggest to flood?
< gleb>
My suggestion is to flood to fixed 8 outbounds, and only what was received inbound. In that case, it just gets relayed though the network of reachable nodes super fast, without non-reachable doing useless stuff ever.
< gmaxwell>
sipa: GOOD. I am glad to agree!
< gmaxwell>
gleb: okay, I don't understand how I was ever okay with that. lol
< gmaxwell>
The issue is that it imposes a strong assumption on the existance of a "non-reachable node" -- which as you are realizing now (and maybe did before?) -- no such clean distinction really exists.
< gmaxwell>
(e.g. the NATed node with some lan peers)
< gleb>
gmaxwell: not really, I never have "if non-reachable". It just makes this implicit categorization by looking at "txs received inbound"
< gleb>
Which is, yeah, not true for NAT stuff.
< gmaxwell>
I don't have any of our old discussions-- lost the server :( -- it might be useful to see how we changed from "flood each new txn to a few outbound peers" to "flood to 8 outbound peers (currently all) but only things learned from inbound peers"?
< sipa>
gmaxwell: by new do you mean "locally created", or "any new tx learned through whatever means at all" ?
< gleb>
gmaxwell: despite us not liking, non-reachable nodes is the majority of the network. If they flood, it's very likely to be useless, because their public peers likely already know the tx.
< sipa>
(i assume the second, as the first is a pretty bad privacy leak)
< gmaxwell>
gleb: yes, its often useless. But it is still a constant amount of data per accepted transaction.
< gmaxwell>
(and not an amount that grows with the number of peers it has)
< gmaxwell>
if fanout > 1 the vast majority of flooding will always be useless.
< gleb>
A reconciliation happens every 16 seconds, flood trigger is every 1/2 seconds or something. If flood to only outbound, a non-reachable nodes will flood almost all transactions to their reachable peers. So recon set will be empty?
< gleb>
You suggest to try fanout=2/3? If fanout=8, we end up getting 0 gain today (just better scaling)
< sipa>
the fact that flooding is only outbound does imply that more well-reachable nodes will learn about transactions faster than less-reachable nodes
< gmaxwell>
Right, at least at one point I know I was thinking in terms of fanout=2.
< gmaxwell>
perhaps we should reconsider that asymmetry, and instead do something like flooding only happens in one direction on each link, and whichever side knew more transactions in the last recon is the potentially-flodder.
< sipa>
so in that sense i can see how deciding to not flood outbound-received transactions may be a useful optimization... the network structure implies that you're receiving generally from a group of peers who likely are better connected than you are
< gmaxwell>
sipa: yes I am starting to see how things ended up here.
< sipa>
i don't really see the problem with it
< gleb>
And we can always switch to something different if at some point NAT landscape changes?
< gmaxwell>
Basically nowhere else in the bitcoin protocol is there this in/out distinction -- just in tx relay privacy timers (which were 'recently' introduced relatively speaking) and in peer eviction.
< gmaxwell>
for one thing it takes 80% or whatever of bitcoin nodes out of participation in rapidly forwarding transactions, which seems ... like a really big step to take without an obvious reason.
< gmaxwell>
I'm not sure why we'd intentionally do that.
< gleb>
gmaxwell: the timings were 1s for 95% reachable, 5s for 95% non-reachable. Something along those numbers.
< gleb>
(the time it takes to spread a transaction). Saying "rapidly forwarding" is not so different
< gleb>
Again, assuming the topology.
< gmaxwell>
Like, have we made a series of single steps that were logcal but gives a weird outcome? Step 1. relay faster to outbound peers because we control them, so they're less likely to be spies monitoring our txn. Step 2. Introduce reconcillation, but make flooding only happen for reachable nodes because it has some moral similarity to step 1. Conclusion, 80% of bitcoin nodes no longer participate
< gmaxwell>
in fast propagation? But that wasn't the original goal, it seems like a side-effect though perhaps a benign one?
< gmaxwell>
but not completely benign since now we have problems with txn originated 'near' these flooding non-participants.
< gleb>
gmaxwell: I believe I made those steps, I even had 1-3 steps above. I always tried to keep you and pieter in sync, but you might have got lost at some point.
< sipa>
but even without the assymetry in flood relay speed between outbound/inbound, it is the case that better connected nodes (which is correlated with connections to them) will learn about transactions fasters
< gmaxwell>
sipa: indeed, that was why the strawman alternative I suggested above elected whatever direction that sourced more txn to be the flooding direction.
< sipa>
figure 19 in the paper shows bandwidth/latency simulations for a number of configurations
< gmaxwell>
gleb: I probably wasn't lost, I probably was right there with you at the time, following along with whatever reasoning got us there.
< gmaxwell>
:P
< gmaxwell>
but I don't really get the reasoning now. :(
< sipa>
"flooding is not as useful to peers which are better connected than you are" is the summary i think
< gmaxwell>
sipa: re: better connected, esp if outbound connections are increased, it's not even unlikely that a NATed node could end up better connected than some arbitarily selected inbound only node.
< gleb>
sipa: in those experiments, I never tried to pick the flood peers in a smart way. It was always N, M% random across inbound/outbound.
< sipa>
gleb: it is surprising to me that in fig 19 in the paper, increasing the number of outbound flood peers improves both bandwidth and latency
< sipa>
gmaxwell: that's fair
< gleb>
sipa: I can't explain that now.
< sipa>
gleb: is that graph using the same logic (only flood transactions receiving on incoming connections)?
< gleb>
sipa: yes I think so.
< sipa>
that suggests that that decision (relay tx received via inbound through flooding to outbounds) is (in your simulation model) a very good predictor of who won't have a tx already
< sipa>
basically you want to use flooding when you have reasons to believe you had a tx earlier than the recipient, while recon is for when you and your peer are equals
< gmaxwell>
maybe I should step back and also try to explain my general unease with the hard in/out split. Essentially, it makes a strong assumption on the global structure of the network. Recently in dogecoin there were network collapse issues related to a similar asymmetry during IBD... where almost the entire network was all in IBD because it had fallen behind, and as result wouldn't serve blocks
< gmaxwell>
and only requested them from peers they connected out to (which were more themselves in the same broken state)
< sipa>
and i guess "receiving through inbound" is a very good predictor for that (again, in your model, which may not actually correspond to reality)
< gmaxwell>
sipa: If you flood txn recieved through flooding, then indeed, those are going to be the txn that are new. I don't actually know that in/out really matters for that to be true, except when only flooding to outbound makes it true.
< gleb>
sipa: yeah, and that achieved in the following ways. Flooding is mainly used across reachable. We cap at 8 outbounds, so even though "having earlier" is true in 50%, that's fine because it's capped.
< gleb>
sipa: For reconciliation it matters mainly for non-reachable, and efficiency is achieved by reconciling every 2 seconds via queue. When you hit 2*8=16 seconds interval per peer, you already reconciled with 7 other peers since last time, so you are almost equal to that 8th peer.
< gleb>
Sorry for keeping the reachable terminology, I'm just explaining how sipa logic maps to my design.
< gleb>
Yeah, if the topology changes, it would no longer be as efficient. We might see more bandwidth (unlikely more than currently)
< gmaxwell>
thats fine. my comment about the term being confusing was because it looked like it was causing confusion in the review.
< gleb>
well, i mean you also would prefer to avoid reasoning about it in the design i thought?
< gmaxwell>
gleb: like I think with the currently proposed design, the bandwidth per node on reachable nodes may go up a lot... becuase previously non-reachable nodes participated actively in forwarding turn into passive participants and don't forward mcuh txn at all.
< gleb>
gmaxwell: you mean sending tx bodies?
< gmaxwell>
gleb: maybe, thats my impression now but it could just be that I forgot why I used to think it was okay. But I'm not confused by its current existance in the design
< gmaxwell>
yes
< sipa>
perhaps it is the case that even today most relay done by non-reachable nodes doesn't actually contribute much to tx propagation?
< gmaxwell>
nah, it's like a 2:1 asymmetry as seen on my nodes.
< sipa>
by "contribute" in don't mean in terms of bandwidth, but in terms of how much propagation speed would suffer if they stopped
< gmaxwell>
(if you go through your IRC logs with me you'll see maybe a year ago I saw that and thought something was wrong that it was so asymetric then realized it was finally that nodes updated to the most recent broadcast timing logic)
< gmaxwell>
sipa: I actually mean bandwidth though both might be true.
< gleb>
gmaxwell: yes, I think this is very true. Reachable will take more work on forwarding tx bodies, making it even more asymmetrical. I think that's fair to expect.
< sipa>
i don't see why that would be the case
< gmaxwell>
just thinking through the implications of taking the majority of nodes out of the general forwarding process.
< gmaxwell>
sipa: for discussion, assume 80% of nodes are behind nat. Right now, they probably carry the majority of the work transmititng txs (yes, they're slower, but there are many more of them). Post erlay, they won't realy tx bodies except fairly rarely.
< gleb>
sipa: consider an average non-reachable node. The time between any reconciliations for it is 2 seconds. And the time of relaying across *all reachable* is 1 second. I think this implies most tx body work is done by reachable.
< gmaxwell>
^
< sipa>
ok let me be more specific about what i'm not convinced of
< gmaxwell>
that doesn't seem desirable to me, ... some asymetry is unavoidable because they have fewer connections each.
< gleb>
I probably was like "but they will still save a lot of announcements, so it's fine to increase tx body work"
< gmaxwell>
gleb: I don't think I ever considered that particular aspect before.
< gmaxwell>
(lol well, I really don't know what I thought before, it's been too long)
< sipa>
i believe it is possible (but don't know) that if we could today change the code so that non-reachable nodes drastically reduce how much they perform tx announcements and that when doing so (a) bandwidth usage of reachable nodes won't be affected much and (b) propagation delays won't suffer much
< sipa>
basically i'm saying that it could be the case that effectively a lot of tx propagation work performed today by non-reachable nodes is redundant
< gleb>
I think greg's 2:1 can't agree with you.
< gleb>
I assume, non-reachable sends 0.5x of reachable's tx body traffic?
< gmaxwell>
sipa: sendign tx _bodies_ is never redundant.
< gmaxwell>
lemme go get current figures one sec.
< sipa>
gmaxwell: ah good point, every body only arrives at every node once
< sipa>
whatever non-reachable nodes are doing in that regard must be compensated by others
< sipa>
ok, that's convincing
< gmaxwell>
crap I lost my shell history on my node and now need to remember how to use JQ... I want to look at the ratio tx message sizes on inbound peers.
< gmaxwell>
(it's just data in getpeerinfo)
< gleb>
So potentially 2 problems with current design: 1) NAT privacy (could probably be fixed) 2) deepening the asymmetry
< gmaxwell>
my very bad informal memory says that it was asymmetric, but except for bognon peers not THAT asymmetric.
< gmaxwell>
bogon*
< gmaxwell>
also: fwiw, we created most of this asymmetry 'recently' with the relay grouping stuff... and it wasn't an intended effect. When I did notice it I was confused/surprised. Though I don't think the current level is a problem.
< gmaxwell>
I had automation that was identifyign spies to ban based on part of that ratio and I noticed when it started flagging lots more peers.
< gleb>
gmaxwell: was it symmetrical before groups? Even when we just had 2s/5s independent timers?
< gmaxwell>
gleb: SO, it wasn't asymmetrical enough for me to notice, but I think I was triggering on unusual as 2:1. It's a little hard to say for absolute sure because deployments happen over time.
< gmaxwell>
It's not impossible that the independant 2/5sec change was the cause, and it just took a long time to be deployed enough for me to notice (or some other factor made it more visible). But I think it's likely that making all inbounds share a group was the cause.
< gmaxwell>
I didn't *really* realize something was going on that I didn't understand until I saw a big asymmetry on a connection between sipa's node and mine.
< gmaxwell>
which I couldn't dismiss as being some weird peer.
< gmaxwell>
well, sipa is weird but his bitcoin node isn't. :P
< gleb>
gmaxwell: sad we haven't noticed that during the shared timer design, that's the stuff i like to model... but probably didn't know how to at a time
< gleb>
I think that could have been my first PR to core in 2018.
< gmaxwell>
Well you don't know what you don't know.
< gmaxwell>
You could easily model the txdata burden, I just never thought of it.
< gmaxwell>
The tx serving burden has come up before, but in a different context-- like if you get multiple INVs for the same tx at almost the same time, it's better to pick the source more randomly rather than always go to the first.
< gmaxwell>
sorry.
< gleb>
But that stuff we also changed in the context of invblock or tx overhaul or something recently? :)
< gleb>
like, this exact suggestion, making it random and not first received
< gleb>
So, if we want to make it more symmetrical, we indeed could direct some flooding into inbound, and independently from the tx source (maaaybe based on prev performance). I expect we loose some overall efficiency that way, but if that's desired.
< gleb>
Yeah, that's exactly figure 19, except it chooses the flood peers randomly.
< gmaxwell>
these need to be normalized though because naturally you'll only recieve once.
< gleb>
sipa: I'm looking at my simulator, and I'm realizing I had no "received from inbound" policy at a time. Same in the paper. It was always just "flood to 8 outbounds" policy.
< sipa>
gleb: that's surprising
< gmaxwell>
that is anti-surprising to me! :P
< sipa>
it conflicts with my intuition for why increasing flooding reducing bandwidth
< gleb>
oh, hold on, sorry, the simulator is old code and hard to understand now :(
< gleb>
It seems like at the time the idea was to only flood what's announced by reconciliation initiator to reconciliation responder (not backwards). Sorry again for this thought bouncing. Yeah, it seems i always had this idea in mind: try not to flood from non-reachable nodes.
< gleb>
And since inbound always initiates, we effectively announce "what is learned inbound". The lattest simulator has that it seems.
< gleb>
Anyway, so should we try other configurations again and see how much bandwidth gain we loose by not making topology assumptions?
< sipa>
just trying "flood relay everything to 2-3-4-5-6-7-8 outbound peers" ?
< gmaxwell>
sipa: so my node has 65 peers that I've exchanged txn with, and overall has sent 3.26x the tx data it has recieved. I think that ratio is pretty low, considering it only recieves once but sends up to 65 times.
< gmaxwell>
gleb: also maybe add to the simulator something to get information on how much tx body data will be sent/recieved? I'm pretty sure the current scheme will behave pretty poorly in that respect.
< gleb>
gmaxwell: you mean poor in the sense of asymmetry?
< gmaxwell>
gleb: yeah, poor in the sense that total traffic on reachable nodes would increase a lot, while traffic at each non-reachable node decreases a bit.
< gmaxwell>
(because the latter outnumber the former)
< gleb>
Okay, maybe it's time to re-implement the simulator so that other people can actually review and use it easier. I'll start looking into it tomorrow.
< gmaxwell>
:( I feel bad.
< sipa>
this is a fairly simple policy change, so i don't think this needs to hold up code review much
< sipa>
(if we'd make it)
< gmaxwell>
yea I don't think it really changes the implementation much except for a few lines here and there.
< sipa>
oh, and don't forget the "remove from recon sets whatever is send/received through flooding" - i think that matters
< gleb>
sipa: i agree, most of the module remains the same, this stuff is even outside the module.
< gleb>
sipa: i noted it, don't worry
< sipa>
gleb: thanks!
< gmaxwell>
in the meantime, minisketch could get merged, and dropped from the erlay PR. That would be nice progress.
< gleb>
okay, thank you, i'll come back once i have the results. I agree on the plan ^
< gmaxwell>
sipa: I have this vague idea that since my tx body ratio is 3.26x that my inv data ratio should be also 3.26x, ideally. (of course it won't be, it'll be beteen 32 and 64x because I have 64 peers and invs are flooded)
< gmaxwell>
oh, no it won't the invs ratio will be near 1 but both sides will be 64 times larger. duh.