< achow101>
I was getting a bunch of errors earlier
< jonasschnelli>
sipa, gmaxwell: I think we should keep 2^31 for the maximal message size, reducing to 3bytes (23bits + rekey bit) seems too small and not worth saving a byte
< jonasschnelli>
BTW: -maxreceivebuffer is also respected with the new message types... (this is a level deeper more on the socket)
< jonasschnelli>
Ah... nm. I was wrong. The buffer check only happens when a complete message has been loaded
< wumpus>
why would you ever need such a big message? in my experience it's usually better to split something up into smaller messages in that case
< wumpus>
(for one, example to be able to continue an interrupted transfer, if it's all-or-nothing that won't work)
< jonasschnelli>
3 bytes - one rekey bit results in max size of 8’388’608 bytes (hardcoded in the protocol). I agree that we want smaller junks enforced on the application layer... but unsure about the protocol layer
< wumpus>
so if you really need >24MB network messages, that is to me a suggestion something is wrong with the design at a higher level
< wumpus>
right
< wumpus>
if you also want to cram other data into the length word, it changes, though you could reserve the bits instead
< jonasschnelli>
Yes. Thats also an idea.
< jonasschnelli>
I also fail to see why someone should not be able to mem DOS a current peer with a incomplete message up to MAX_SIZE (33'554'432)...
< jonasschnelli>
The -maxreceivebuffer hits only when the message is complete
< wumpus>
so still leave 24 bits for the message length, a bit for rekey, and leave the rest for future use--it seems a bit wasteful but if your protocol is byte-based you don't get around such things
< wumpus>
right
< wumpus>
though as attacker you have to actually send the data, so if so it's a very slow DoS, but yes DoS potential is another good reason to not allow very large messages
< jonasschnelli>
[22:56:32] <gmaxwell>I think as you have it now thats probably a trivial DOS attack, e.g. I can make your node use 32mb * npeers ram... so 4GB additional usage in the default configuration
< jonasschnelli>
wumpus: I think that attack is possible today
< jonasschnelli>
npeers * MAX_SIZE
< wumpus>
yes
< jonasschnelli>
The question is, why is MAX_SIZE 0x2000000
< jonasschnelli>
Ah. That constant is also used at http level
< jonasschnelli>
and other places
< wumpus>
for http is pretty arbitrary
< wumpus>
but less important, as the http interface is meant for internal use, there are probably tons of ways to DoS it
< jonasschnelli>
I mean as an attacker, you can create a IPv6 peer and create 128 connections to a peer (assume remote peer has no used ports) and send the same incomplete 32MB message to the peer and consume 4GB
< wumpus>
as buffers are per connection, the only way around that would be to account, on a process level, of the total size of those buffers and go into some kind of DoS mode if too much is consumed in total
< wumpus>
you *need* a fairly large receive buffer (though maybe not that large) to handle all the current messages, multiplied with 120+ that's always going to be a lot !
< jonasschnelli>
I think there should be a combined view on those buffers and as you said some sort of DoS prevention mode or a max total buffer size that makes per-connection-queues pause
< wumpus>
indeed
< wumpus>
one way to handle going over the total buffer cap would be to simply stop reading until it's processed, another way would be to drop connections with low priority and large buffers
< jonasschnelli>
Yes.
< wumpus>
might be out of scope for what you're doing, *not making it worse* would be nice
< wumpus>
(re: bitcoin P2P specifically: do we ever get *unexpected* big messages?)
< wumpus>
my intuition would be that the larger messages are always replies to requests such as getblock, getaddr, etc
< jonasschnelli>
The buffer attack problem is orthogonal to the v2 protocol I'm implementing
< wumpus>
if so, someone connecting to you and instantly offering a large message should probably be blasted
< wumpus>
yes, agree
< jonasschnelli>
I guess the attack would be, a) connect with a bunch of fake peers, do proper version handshake, send incomplete 32MB message (only header will be parsed) via all connections.
< jonasschnelli>
The attacked peer could reject messages based on per command max sizes
< jonasschnelli>
But then, unknown messages.. (unknown commands)
< wumpus>
well if someone wants to send you an unknown message, out of the blue, larger than N -> bye
< wumpus>
there's some scope for behavior modeling here, though I think keeping track of total buffer sizes is more predictable and easier to implement
< jonasschnelli>
Yes. That is implemented, though N is 0x2000000
< jonasschnelli>
wumpus: yes. agree
< wumpus>
it's even possible that libevent already has this funcionality, as it's not protocol specific
< jonasschnelli>
good point...
< jonasschnelli>
Anyone know how I can solve this gcc -fPIC error while compiling test_bitcoin or bench_bitcoin:
< jonasschnelli>
CXXLD test/test_bitcoin
< jonasschnelli>
/usr/bin/ld: crypto/libbitcoin_crypto_base.a(crypto_libbitcoin_crypto_base_a-chacha.o): relocation R_X86_64_32 against `.rodata.cst16' can not be used when making a PIE object; recompile with -fPIC
< jonasschnelli>
Is it because I added extern C code?
< jonasschnelli>
I think its clang not gcc, sry
< wumpus>
"You can assign bufferevents to a rate limiting group if you want to limit their total bandwidth usage." seems like a different thing though
< wumpus>
rate limiting versus total buffer limiting
< wumpus>
jonasschnelli: you're probably linking to a static library that wasn't compiled with -fPIC
< wumpus>
either link to a dynamic library instead or re-compile it with -fPIC
< jonasschnelli>
wumpus: could it be because test_bitcoin has LDFLAGS -static? It works on master, but when I add my chacha20/poly1305 code to LIBBITCOIN_CRYPTO, it failes
< wumpus>
on a general level it happens when some objects are built without -fPIC while requesting a position independent library/executable at link time-- '-static' will, on most platforms, generate a position dependent executable so no I don't think that's the reason
< wumpus>
in this specific case, it looks like crypto_libbitcoin_crypto_base_a-chacha.o is built without -fPIC so the linker cannot include it
< jonasschnelli>
wumpus: thanks... I'll take a closer look
< jonasschnelli>
wumpus: I guess i found it, the added code (chacha/poly1305) are .c files and compiled with CC (CFLAGS) and not with CXX
< wumpus>
ahh the ancient problem
< luke-jr>
can probably just rename to .cpp?
< jonasschnelli>
luke-jr: trying right now...
< wumpus>
yes, 'porting' it to C++ might be the most straightforward solution, unless you need to do it repeatedly (such as when there's an upstream)
< jonasschnelli>
I guess porting is fine... but the compile cache is killing me
< wumpus>
both the name of the file and the compile flags will have changed, why would the cache pick that up?
< wumpus>
please, don't do this: #14087 we have PRs enough to cope with
< wumpus>
the code is correct and clear, no need to fiddle with it
< wumpus>
ok, apparently the code was not correct and clear, it was assigning nullptr to a std::function, sorry
< wumpus>
btw: I intend to change my git mail to laanwj@protonmail.com (from laanwj@gmail.com), is this going to give any problems or set off alarms? (I'll keep using the same gpg key FWIW)
< gmaxwell>
02:07:36 < jonasschnelli> sipa, gmaxwell: I think we should keep 2^31 for the maximal message size, reducing to 3bytes (23bits + rekey bit) seems too small and not worth saving a byte
< gmaxwell>
jonasschnelli: I agree that the byte is unimportant, but we will never have such big messages.
< wumpus>
indeed
< gmaxwell>
Even sending around 2MB messages is pretty terrible. Anything where a message would be that big should be split into smaller messages.
< gmaxwell>
I would say that we should introduce new messages to do this for blocks already, except that other than in IBD/catchup blocks are almost never sent whole anymore.
< wumpus>
yep, said the same thing
< gmaxwell>
E.g. a N-way split block message could send the merkle branches for the remaining parts, so you can verify the merkle root for each chunk as you get it, and not bother buffering something that is never going to be useful.
< wumpus>
similar to how bittorrent uses merkle trees IIRC
< wumpus>
but yes, good idea
< jonasschnelli>
yes. agree.
< gmaxwell>
Really the message sizes should be driven by being large enough to make overheads low, and large enough to usefully process something incrementally (it's not so useful to send a message where we can do nothing at all with it except wait for more), and otherwise as small as possible to avoid creating large buffering and head of line blocking problems.
< sipa>
wumpus: add a new uid to your gpg key and upload it somewhere
< achow101>
waxwing: that could happen in the case that someone adds a non-witness utxo to a witness one because it is p2sh wrapped. Then someone else could, seeing that there was no witness utxo, add a witness one without removing the non-witness one
< achow101>
that would of course be an implementation error, but it should still be a case that needs to be handled
< sipa>
waxwing: both are added, and the irrelevant one is later deleted (because it's the signing code that determines which one was needed)
< sipa>
it will never make it out to the serialization
< waxwing>
it's technically more lax than what was specc'ed right, which was to reject anything with both. but the example you give is a plausible case for allowing it.
< sipa>
waxwing: this is just internals
< sipa>
it's not observable behavior
< waxwing>
ah ok, thanks
< sipa>
(scroll down a bit, see line 280-287
< gmaxwell>
jonasschnelli: re memory usage, see MAX_PROTOCOL_MESSAGE_LENGTH. -- you can't cause 32MB usage today.
< gmaxwell>
I think if we were to redo things from scratch, we'd end up with a 256KB maximum message size or something like that. The only reason I don't suggest making the encrypted transport limited to that sort of size is because we'd have to introduced split block messages first.
< gmaxwell>
and as I mentioned before, due to compact blocks I think it isn't particularly important to add split blocks.
< ryanofsky>
MarcoFalke, any suggestion on how handle "AppVeyor was unable to build non-mergeable pull request" in 9381? there are no conflicts. maybe close and reopen?
< gmaxwell>
I recommend a goat sacrifice.
< wumpus>
yes, a goat should do in this case
< luke-jr>
who has one? someone is more rural than me?
< Chris_Stewart_5>
any idea what is going on testnet ?
< Chris_Stewart_5>
or ¯\_(ツ)_/¯
< jonasschnelli>
gmaxwell: oh. Right. I missed MAX_PROTOCOL_MESSAGE_LENGTH.
< jonasschnelli>
That check is also there for the v2 protocol (in my PR)
< jonasschnelli>
But its in-precise in v2 because one package can contain multiple messages (and this checks a "package")
< gmaxwell>
We might want to log the command, length, number for each message we send, and see if the ability to batch multiple messages even potentially pays for the extra 3/4 bytes of length.
< gmaxwell>
jonasschnelli: what do you mean that its imprecise?
< gmaxwell>
The package payload should be limited to MAX_PROTOCOL_MESSAGE_LENGTH.
< jonasschnelli>
gmaxwell: if you combine 10 blocks into a single encryption package it gets rejected
< jonasschnelli>
gmaxwell: Yes. I agree.
< jonasschnelli>
I guess its a wording thing,...
< jonasschnelli>
message is IMO the v2 inner message (the actual command&size&payload)
< willyko_>
hi all, i'm wondering what's the current gitian build tech stack people are using right now? i haven't been able to gitian build since the bionic upgrade
< gmaxwell>
Right good. I understand, you're just saying that by applying it as max package length you can't always use a message of that size.
< jonasschnelli>
Where the outer message is the "package"
< jonasschnelli>
If combining messages gets a thing, people would expect that MAX_PROTOCOL_MESSAGE_LENGTH is per inner message
< gmaxwell>
jonasschnelli: if you would otherwise buffer more than MAX_PROTOCOL_MESSAGE_LENGTH then that just goes into another package.
< jonasschnelli>
gmaxwell: but you need to read-process the whole package to check the MAC before you can look at the inner messages
< gmaxwell>
Yes. I am referring to the sender.
< jonasschnelli>
oh. yes.
< jonasschnelli>
Yeah. The question is if we want to drop the multi-message thing and save 4 bytes (and some complication is size checkes)
< jonasschnelli>
I guess originally cfields came up with that idea
< jonasschnelli>
(the idea to use multi-message envelopes
< jonasschnelli>
+)
< gmaxwell>
But I'm wondering if the ability to put multiple messages in a package even makes sense. It adds 4 bytes (maybe we change to 3) overhead to every message, ... but only saves 16 bytes when we bundle. And except for ping no common message is especially small.
< gmaxwell>
A headers message, we want to minimize latency, so it will almost never be bundled.
< gmaxwell>
INVs get batched internally.
< cfields>
jonasschnelli: I believe I argued against :)
< cfields>
jonasschnelli: iirc my request was that IF we bundle, to throw an index up-front so that we could do optimistic preallocation
< gmaxwell>
It seemed like sipa was for some reason thinking that we could incrementally process inner messages before buffering the whole thing. But we can't or otherwise that moots the MAC.
< jonasschnelli>
Yeah...
< jonasschnelli>
I say drop it (the envelope/inner-message wrapping)
< jonasschnelli>
And reduce the length to 3 bytes (- rekey bit) resulting in 2^23 (8’388’608) max message size
< gmaxwell>
Does the inner length as done now include the variable length command?
< jonasschnelli>
no
< gmaxwell>
It should. So you don't have to read the command in order to process the mac.
< gmaxwell>
in the above the crypto layer just becomes completely oblivious to commands. it's all just payload to it.
< gmaxwell>
how is the variable length command implemented? the bip just says varlen which is not very clear.
< jonasschnelli>
varstr (a varint plus a string),... I think probably always one byte & the string
< jonasschnelli>
0x03 'i' 'n' 'v'
< gmaxwell>
yes, it should be at most one byte. Allowing more to be encoded is just begging for bugs.
< jonasschnelli>
Yes. We could enforce one bytes length by the specs
< gmaxwell>
We could also change to command byte where 0-12 mean 0-12 byte explicit comands, and other values are mapped to some existing commands (like ping/inv/header) and the rest are undefined.
< gmaxwell>
but at least the size should just be made a byte.
< gmaxwell>
in general we should watch for cases where a parser would need to impose limits to protect itself from bad values, and just make those bad values impossible to encode.
< jonasschnelli>
Yeah.. would save another byte. :)
< gmaxwell>
well for ping the packed commands would save 4 bytes. For inv 3 bytes. for headers it would save 7 bytes...
< jonasschnelli>
gmaxwell: Aha. I see now. Dropping the str-based command for a number-to-command mapping...
< jonasschnelli>
But smells a bit after central planing (I guess developers will tend then to address new values to new commands)
< gmaxwell>
jonasschnelli: str based exists for some messages, its needed for extensibility.
< jonasschnelli>
Yes. Sure. We can set 0xFF for str or something
< jonasschnelli>
or as you said, just use 0-12 for the very known ones
< gmaxwell>
Well I was suggesting that we have values 1-12 be for strings of length 1-12.
< gmaxwell>
right.
< jonasschnelli>
Or that. Yes.
< gmaxwell>
then use values for the few very common small protocol messages (ping, inv, header probably) and the rest just stay strings.
< jonasschnelli>
Jup. I like that.
< gmaxwell>
as far as centeral planning goes there isn't a reason that nodes couldn't use different numbers on dinfferent links.
< gmaxwell>
we wouldn't implement support for that unless needed, but it wouldn't be a problem.
< gmaxwell>
they 1-byte commands would really only be needed for commands which are frequent and small... So, for example if we introduced a new kind of headers message, we might want to use one for that.
< gmaxwell>
But I don't see any reason to use a 1-byte command for a block.
< jonasschnelli>
Your argument is then more for latency then for bandwidth, right?
< jonasschnelli>
I guess if your peer gets heavy used for IBDing, saving a couple of bytes on the block command has the same effect then on the inv command (in terms of total bandwidth
< gmaxwell>
But say, for example, we decide we want to have a HEADERS2 message coded as value 27 in bitcoin core. We're going to negoiate its activation with something like SENDHEADERS. Say another implementation BitcoinConnectUnlimitedClassic (BCUX) wants to use 27 for XHEADERS. An implementation could support talking to both kinds of peers just by assigning the meaning of 27 based on the negoation.
< gmaxwell>
jonasschnelli: not in terms of percentage, however.
< gmaxwell>
Over the life of a node, even one that serves IBDing peers, it should service a lot more INV messages than block messages.
< jonasschnelli>
agree
< gmaxwell>
we can even specify that any usage of short types other than the ones initially set should be negoigated. So then there is basically no collision problem.
< jonasschnelli>
With negotiating you mean some "proprietary" message?
< gmaxwell>
I mean just that you don't use them unless you know the other side can handle them, based on something else they've sent.
< jonasschnelli>
Yes. We should mention that...
< gmaxwell>
So for example, we'll likely introduce a new ADDR message in the future. So your peer can send you a SENDADDR2 command, and then after they get that they know they can send you value=22 messages, per the ADDR2 BIP.
< jonasschnelli>
Got it. Yes.
< jonasschnelli>
gmaxwell: since the length is now caped at 2^23, how would decomposing a message distributed over two packages look like?
< gmaxwell>
Thats specific to the message. It should be decomposed in a way that allows incremental processing.
< jonasschnelli>
Aha.. PARTIALBLOCK or something
< jonasschnelli>
Good. Then there is no extra handling on the transport layer.
< gmaxwell>
So for example, say we wanted to support breaking blocks up, you'd split a block into N messages, the first message might be the header, tx count, and N hashes, where the N hashes are the merkelroots for the N chunks of the block. Then additional messages for each chunk. But the transport layer doesn't care.
< gmaxwell>
The rationale is that the right splitting is message type dependant.
< gmaxwell>
For an INV, for example, there is _just no reason_ to ever make a single INV that big, just make another INV.
< gmaxwell>
so the splitting for INVs is more invs.
< gmaxwell>
jonasschnelli: interesting to note: 4(magic) + 12(command) + 4(length) + 4(sha2) = 24 which is more than 3 (length) + 1(type) + 16(mac) = 20 for messages that can use the 1-byte type.
< Chris_Stewart_5>
luke-jr: I could get my hands on a goat -- don't think i could sacrifice it though.