< Jmabsd>
wait, where in the code again are the logics for checking if the PoW is on-target - i think there are two locations that do this comparison.
< sipa>
pow.cpp, CheckProofOfWork
< phantomcircuit>
sipa, can pnode->GetRefCount() be == instead of <= 0
< phantomcircuit>
you put in an assert that it's >= 0 in 2013 soo
< sipa>
phantomcircuit: i'm probably the wrong person to ask
< ossifrage>
gmaxwell, it seems adding MADV_RANDOM to the leveldb mmap would help reduce the memory footprint, but it isn't going to help performance if there is any locality or larger reads
< sipa>
there is no locality
< sipa>
they're sorted by txid
< gmaxwell>
ossifrage: I think it's somewhat likely to help performance specifically because there is no locality in reads.
< gmaxwell>
ossifrage: I think the thing we really need to do is start measuring dirty pages, rather than RES. And I expect MADV_RANDOM will make it faster.
< ossifrage>
gmaxwell, it all depends on the timeframe, if over enough time does it hit contiguous blocks and is that timeframe shorter then the the life of the page
< ossifrage>
pay now vs pay later
< ossifrage>
looking at: pmap -x $(pidof bitcoin-qt) | grep .ldb | awk '{if($3 == 0){print}}' shows that many of the ldb files aren't using any RSS memory (after being up for almost a month)
< ossifrage>
But I tend to OOM the POS box far too often
< Jmabsd>
crypto/sha256.* is *single-iteration* SHA256? (would be great if this was marked out in the file)
< sipa>
Jmabsd: of course; SHA256 is SHA256, not double SHA256
< Jmabsd>
(y)
< sipa>
double SHA256 is called "Hash" in the sourcecode
< Jmabsd>
, you first shift-right a 32bit uint by 24 positions - this means the highest 8 bits become the lowest 8 bits, and the remainder of bits will be zero
< Jmabsd>
then you bitwise-and the whole thing with 0x007fffff - but that's useless, isn't it?
< Jmabsd>
here that's the same as bitwise-anding with 0xff, which was already implied by the shift-right ??
< sipa>
Jmabsd: read again
< Jmabsd>
>> is right-shift, line 208 does "int nSize = nCompact >> 24;".
< Jmabsd>
nCompact is uint32_t which is afaik an unsigned 32bit integer.
< Jmabsd>
right-shifting 24 bits means what i suggested above. "int" here doesn't matter because it's at least 32bits and we're decimating the size here.
< Jmabsd>
OH
< Jmabsd>
sipa: ah you're right, the AND is on the original compact value. thx.
< luke-jr>
FWIW, the most non-mmap files open I've seen is 39
< luke-jr>
seems like it's txindex-related, as that's using 47 mmaps itself
< luke-jr>
err, not txindex, I have that disabled apparently
< luke-jr>
blocks/index :p
< luke-jr>
I imagine it'd be worse with txindex
< Jmabsd>
Wait, just a funny detail on the "showing block hash in reverse hex byte order in order for the PoW zeroes to be represented in the string's leading position":
< Jmabsd>
Do you see there, that, zeroes at the end of the block hash, which do count as PoW zeroes, are actually represented in the hex string *BEHIND* the digit 3,
< Jmabsd>
yeah so this is not a big deal but the reversing (done for illustrative purposes) is only partial.
< luke-jr>
Jmabsd: pretty sure you are wrong
< luke-jr>
endianness deals with the bytes, not the bits
< sipa>
Jmabsd: you're getting far too worked up about endianness issues
< sipa>
they're conventions, and usually every convention sucks for some reason, but you still need to pick one
< sipa>
deal with it
< sipa>
if you get it wrong, try the other way
< luke-jr>
only downside to big endian AFAIK is that you can't truncate by a simple cast :P
< sipa>
but yes, endianness is about the bytes (or symbols), not the indivudual bits
< Jmabsd>
the 5:th byte in the reverse order there, is 0x30, the 0 there is zeroes however representation is byte-reversed only not bit reversed, and so the last bits in that byte get on the second hex position behind the 3.
< Jmabsd>
this is trivial but anyhow.
< sipa>
Jmabsd: i think it's getting offtopic here
< sipa>
we can discuss the general principle, but at some point you just to try and see
< Jmabsd>
yep.
< ossifrage>
luke-jr, the original reason for the leveldb mmap change was that it was using a large number of file descriptors, enough to push incoming connections out of the <1024 fd select() limit. Bumping up the mmap count was sorta a workaround for that problem
< luke-jr>
ossifrage: yes, I understand why now
< sdaftuar>
luke-jr: thanks for diving into the mmap leveldb issue. i was also confused about how those limits interact and was doing some measurements of mmap usage.
< sdaftuar>
with txindex off it looked like i'd never get more than 1000 mmaps, but with it on i'd get up to 2000
< sdaftuar>
one thing i was curious about was whether we have a good handle on the maximum number of non-mmap'ed files
< sdaftuar>
to make sure there's not some remotely-triggerable leveldb behavior where we end up opening 1000 fd's and breaking select()
< luke-jr>
sdaftuar: looks like we'd be safe doing something like (4096 - 1) / (2 + txindex?1:0) for the file limit
< luke-jr>
(4096 only with the patched mmap limit ofc)
< luke-jr>
in theory, we could probably burn fds on top of mmaps, but that might hurt performance more than it's worth
< luke-jr>
(since we know 64 * 3 was safe)
< jnewbery>
Review beg for #14023. It's not urgent, but it requires frequent rebase and it should be a very easy, mechanical review.
< jnewbery>
It'd be nice to get a critical number of people review it once than have lots of re-ACKs as it gets rebased
< jnewbery>
wumpus: not sure if it makes the bar for high priority, but if you think it does, please add it :)
< wumpus>
jnewbery: sure, added!
< jnewbery>
thanks!
< jamesob>
gmaxwell sipa: so are we thinking at this point that RSS is kind of a useless metric because clean ldb pages will just be pushed out under memory pressure?
< sipa>
jamesob: possibly
< gmaxwell>
jamesob: That is my view.
< jamesob>
do we want to have bitcoinperf count dirty pages as an additional metric?
< * wumpus>
waves farewell to account API (finally)
< jnewbery>
\o/
< gmaxwell>
jamesob: I think we should.
< jnewbery>
thanks wumpus!
< sipa>
wumpus: it will be bittersweet goodbye
< sipa>
i think i've developed somewhat of a stockholm syndrome with it
< instagibbs>
sipa, be healed
< wumpus>
yes that's what happens :(
< instagibbs>
I spent hours trying to write a sensible comment about balances with accounts in the code; failed. No love lost there.
< wumpus>
it was the same for me for getinfo
< wumpus>
but it just had to happen, after all that time, and I'm happy this particular knot is finally cut
< jamesob>
gmaxwell: so basically we want to poll `pmap -x $(pidof bitcoind) | tail -n 1 | tr -s ' ' | cut -d' ' -f 4` for a max value throughout ibd/reindex
< gmaxwell>
jamesob: I think that is what we want, yes.
< jnewbery>
The PR that was just merged removed the accounts RPCs. There's a bunch of dead code left behind that's removed in #13825. Feel free to review that if you want to help remove all traces of accounts. (non-urgent and I'm happy to keep rebasing)
< gmaxwell>
14023 was merged without any actual answer to the locked fund issue.
< sipa>
what locked fund issue?
< sipa>
ah
< sipa>
a reference would be useful
< rockhouse>
gmaxwell: Would you say this definition is correct? "CTs don't use homomorphic encryption. They use "additively homomorphic commitments" which are, essentially, cryptographic proofs."
< gmaxwell>
sipa: I don't know if there is an issue on github anywhere. But as I pointed out, in the past people have attached accounts to addresses, gotten funds assigned to accounts, then found themselves unable to use send to address to send funds until they MOVEed the funds back.
< gmaxwell>
rockhouse: They're additively homorphic commitments, but I wouldn't say that the commitments themselves are "cryptographic proofs", they're just commitments. CT also uses cryptographic proofs.
< rockhouse>
thx
< gmaxwell>
I'm sorry I didn't see that jnewbery responded to me (weird, I thought I had looked yesturday too!)
< sipa>
gmaxwell: that's certainly possible with accounts based RPCs... there were some that did balance checks
< sipa>
gmaxwell: but now the whole concept of balances is gone
< gmaxwell>
What PR removed the balance checks?
< sipa>
accounts are gone.
< sipa>
without accounts, there are no more RPCs that do balance checks
< sipa>
because the concept of balance doesn't exist anymore
< gmaxwell>
Okay, what I'm asking is where were the balance checks removed? I looked at 13825 and I didn't see that.
< sipa>
well the account RPCs are gone, no?
< gmaxwell>
My concern is that people assigned funds to labels in the past, then load up one of their wallets in new code. So they don't need accounts rpcs to get in that state, if the backend code still checks balances.
< sipa>
i think we really need to see a report
< sipa>
because the only things that should be doing balance checks is accounts RPCs
< sipa>
even in the past
< gmaxwell>
There are people in my IRC logs reporting issues sending with sendtoaddress then luke telling them to move and then them saying it worked.
< sipa>
that seems like a very serious bug, if true
< sipa>
first time i hear about it, though
< gmaxwell>
"sendtoaddress only subtracts the amount from the default account,"
< jnewbery>
what are you quoting?
< gmaxwell>
IRC logs.
< gmaxwell>
If that changed (or was never the case) OKAY, but I also thought it was the case, and I don't remember a PR changing it.
< sipa>
sendtoaddress subtracts the default account, but doesn't check its balance
< sipa>
it doesn't do anything with accounts at all
< sipa>
as balances are just defined implicitly
< gmaxwell>
it might be the case that the reporters were actually using sendmany, which did check, but which has since had it removed.
< gmaxwell>
I found the PR disabling it in sendmany.
< sipa>
right
< gmaxwell>
and/or people have been uselessly telling people to use move when their real problem was something else.
< gmaxwell>
(like needing to wait for unconfirmed funds to confirm)
< jnewbery>
yes, sendmany would call getLegacyBalance() and earlier getAccountBalance(). senttoaddress has always used getBalance(), which doesn't check accounts/labels at all
< jnewbery>
sendmany now uses getLegacyBalance() without an account argument
< jnewbery>
if 14023 was done right, then none of the RPCs should now know anything about account balances at all
< phantomcircuit>
so i notice select has a 50ms timeout, that seems kind of short given all it's only purpose is to consume pnode->vSend when the initial attempt failed
< gmaxwell>
I assume it was made short so that it wouldn't block sending new messages.
< phantomcircuit>
gmaxwell, send() is called on the full buffer before anything is added to pnode->vSend
< phantomcircuit>
actually wait this is still doing the thing of calling send() for each message
< gmaxwell>
oh we didn't fix the thing where the header and payload go out in seperate packets? :(
< phantomcircuit>
gmaxwell, i think that's fixed i mean sipa changed this in 2013 from the buffer per connection to a list of messages per connection
< phantomcircuit>
and i cant remember why
< phantomcircuit>
doesn't change the optimistic send logic meaningfully though
< phantomcircuit>
so im still wondering about the 50ms timeout
< jonasschnelli>
You mean a 32byte pubkey with missing first byte to test for?
< gmaxwell>
BIP151 sends a 32 byte pubkey, not a 33 byte one. Right? so a test that gives a junk value to the first byte doesn't actually test any case that could be triggered by BIP151. (though it's fine to test that too) The test I'm going for is an x value not on the curve, since failing to check that is a common implementation bug in ECDH implementations.
< gmaxwell>
jonasschnelli: make sense now?
< jonasschnelli>
Ah. I see! Will fix thanks gmaxwell
< jonasschnelli>
gmaxwell, sipa: what was the conclusion on the inefficiency of the ChaCha20Poly1305@openssh AEAD?
< jonasschnelli>
(and this is SHA256 asm vs non-asm ChaCha)
< jonasschnelli>
But I agree that the chacha20 rounds (one for the poly key, one for the length) is not very very efficient...
< jonasschnelli>
But compared to dbl-SHA256 is probably faster
< jonasschnelli>
(once optimized)
< gmaxwell>
jonasschnelli: so there is an obvious optimization to get the length key from bytes 33-36 of the polykey chacha run. the only apparent argument against doing that is that its less compatible with existing constructs. it sounds like openssl didn't do it that way so they could just call a function that implemented that TLS style AEAD.
< jonasschnelli>
gmaxwell: Yes. That would work,... but would reduce the AAD len to 4 (currently flexible, but not required in our case).
< jonasschnelli>
My only worry is that we start cooking our own crypto...
< jonasschnelli>
There is still the option to use AES-CGM *duck*
< jonasschnelli>
Though non NI implementation will bring back cache-time attacks (or non efficient implementations)
< jonasschnelli>
I guess ChaCha20 is preferable if we can't guarantee for AES-NI (which I guess we can't)
< gmaxwell>
jonasschnelli: for AES-GCM there are very fast non-NI that are cache time fine, for machines with SIMD.
< gmaxwell>
It's only CBC mode that really stinks without NI.
< jonasschnelli>
Okay. Yeah. Unsure then if switching "back" (we once discussed AES-CGM vs ChaChaPoly back in 2015) would make sense then.
< gmaxwell>
basically chacha20 is the fastest on ARM by a wide margin. Basically tied with non-NI on x86 if you use SIMD, and somewhat worse than NI GCM. Our reasoning, which I assume is the same as others using chacha-- is that it makes sense to optimize for the least powerful computers.
< gmaxwell>
I think thats still true, though the 3x runs of chacha for small messages shifts the equation further in favor of AES simply because most of our messages are pretty small.
< jonasschnelli>
Yes. Indeed. I think optimizing for an AAD-len of 4 and reducing thus to 2x could make sense then
< gmaxwell>
jonasschnelli: I'm not sure why it would reduce the AAD len? poly1305 uses 32 bytes of key regardless of the auth token size.
< jonasschnelli>
The 2nd rund is currently to derive the 2nd ChaCha20 key for the AAD encryption (the 4 byte length in our case).
< gmaxwell>
oh I see what you're saying the length of the lenghts. Well there are 64 bytes available.
< gmaxwell>
32 get used for poly1305's key, leaving 32 that get throw away right now.
< jonasschnelli>
Yes. We can set the max AAD size to 32
< jonasschnelli>
But I'm not a cryptographer and I'm not sure if this would require some analysis first (isn't it basically a new AEAD)?
< jonasschnelli>
Where I'm unsure: currently, there are two ChaCha20 contexts (two keys), one context is for the actual stream _AND_ the poly1305 key, the 2nd context for the AAD (the payload length)... and now
< jonasschnelli>
We would drop the second context,... and take the AAD key from the main context
< jonasschnelli>
Reducing back to 32bytes of required key material...
< gmaxwell>
At least the discussion I linked to before said that they designed it that way so they could use an unmodified implementation of the TLS AEAD to implement the openssl one. (this seems dumb to me, but its the rational they gave)
< jonasschnelli>
Which sounds _not_ after a problem... but again, IF we do that, probably good if a cryptograph-expert takes a closer look at this
< gmaxwell>
the defense against the length oracle just depends on you not being able to shift the input.
< jonasschnelli>
I see...
< jonasschnelli>
Yes. Indeed. Reducing the performance in our system due to legacy stuff from the AEAD-origin system seems indeed inefficient
< gmaxwell>
also, does the openssl usage actually send the full 128 bit poly1305 tag? I had thought it used a 96 bit tag.
< jonasschnelli>
To bad this is not an optimisation we can do later. :/
< jonasschnelli>
gmaxwell: you mean openssh?
< gmaxwell>
well thats why I'm looking at it now. What got me going down this path was checking to see if anything would get in the way of getting the most out of AVX2/SSE2 in the future.
< gmaxwell>
jonasschnelli: ye.
< gmaxwell>
yes.
< jonasschnelli>
The openssh function produces a 128bit tag,... but not if they wired it over (probably)
< gmaxwell>
There is another change we might want to consider, but I wanted to research if anyone else has proposed it-- that we should chain the unused part of the tag from each message to the AAD (additional authenticated data) in the next message, so we get the full tag's worth of protection, but then noticed we were using the whole tag output. When I thought it was truncated.
< jonasschnelli>
gmaxwell: with the goal of saving 4 bytes per message, right?
< jonasschnelli>
(in case we trunc down to 96bits)
< jonasschnelli>
gmaxwell: where I'm also unsure – in case we would not reduce the AEAD to use a single 256bit key: the length is pretty much known (known plaintext), right now, in the AEAD@openssh, compromising the second, AAD key, is eventually not too critical
< jonasschnelli>
*in case we would reduce
< gmaxwell>
There is nothing wrong with the lengths being known. 99% of the protocol is known. It's very obvious from traffic analysis when pings are sent, for example. And the sequences of messages we send at the start are the same each time.
< jonasschnelli>
Yes. That true,...
< gmaxwell>
jonasschnelli: yes, making the tag shorter reduces overhead. Right now for short messages like a tx inv, almost half the bandwidth is used by overhead (the 4 byte length and the 16 byte tag). If I was imagining it and no one else is truncating the tags, then I guess we shouldn't just to avoid the security analysis.
< jonasschnelli>
I think its def. worth to see if we can.
< jonasschnelli>
Compared to the v1 protocol (I agree that comparing is not the best metric), the encryption-overhead is not too bad. -4byte magic, -4bytes SHA-chksm,-~8bytes cmd, +16bytes MAC + 4byte length
< gmaxwell>
still ends up being about twice as much. Which is not unacceptable. It seems I was dreaming about the 96-bit tag, I don't see anyone using poly1305 with truncated tags.
< jonasschnelli>
twice as much? v1 versus v2? v1 inv: 24(hdr)+36(msg) == 60 bytes < versus >. v2: 4(outter-len)+4(cmd)+4(inner-len)+36(msg)+16MAC == 64
< jonasschnelli>
So an inv is 4 bytes longer
< jonasschnelli>
Also,... multiple messages can be combined into a single encrypted package
< jonasschnelli>
reducing the "weight" of the AAD (payload length) and the MAC tag
< gmaxwell>
oh I forgot we could do that, well that basically kills any overhead concerns I had.
< gmaxwell>
on your message above, an INV is 4+12+1+36+4 on the wire today.
< gmaxwell>
The twice comment was 4+4 vs 4+16 (-command savings) ~= 2x more overhead data.
< jonasschnelli>
didn't you miss the message size in your calc?
< jonasschnelli>
gmaxwell: AFAIK a v1 header is 24 bytes (4+12+!4!+4)
< gmaxwell>
and in the new protocol we made the lengths implicit to the command type?
< jonasschnelli>
10x v1 invs result in 600 bytes where a 10x v2 inv (packed) could result in 388 bytes
< jonasschnelli>
gmaxwell: the v2 protocol uses varstr for the command rather then fixed 12 bytes
< jonasschnelli>
but that reduction above comes at the cost of combining 10invs (time lag)
< gmaxwell>
I know that the command is variable in v2. But do we have a second length then now?
< gmaxwell>
jonasschnelli: if you're going to combine 10 invs, you'd combine them in the inv message. :)
< jonasschnelli>
inner length, outer length
< jonasschnelli>
gmaxwell: oh. yes. :/
< jonasschnelli>
outer length is the encrypted payload length that can combine multiple messages
< jonasschnelli>
(where the inner length is the message payload length)
< gmaxwell>
right so it's not like we saved coding that length.
< jonasschnelli>
you mean by using varlen for the inner message size?
< jonasschnelli>
*varint
< gmaxwell>
No.
< gmaxwell>
the 16 byte tag replaces magic and the checksum but the additional length at the encryption layer is just additional.
< jonasschnelli>
yes
< jonasschnelli>
Its somehow required since you first need to check the MAC before processing the inner messages
< gmaxwell>
Sure. from the above I thought you were saying that it was the only length we have. generally. But sadly we can't use implicit lengths because we need to be able to handle unknown messages.
< jonasschnelli>
yeah
< gmaxwell>
Whats the maximum size of an encrypted message? the outer length. Just the same as the maximum message size?
< jonasschnelli>
2^31
< jonasschnelli>
(bit 32 is used for the rekey)
< gmaxwell>
I sure hope not, else it'll be awfully easy to exhaust your ram. :P
< gmaxwell>
I know what the length can encode, but there needs to be a smaller limit in the implementation or otherwise there is a trivial memory exhaustion DOS.
< jonasschnelli>
gmaxwell: of course there is the MAX_SIZE check before decrypting the payload
< gmaxwell>
So, if it's the same MAX_SIZE as messages, does this end up effectively reducing MAX_SIZE by a few bytes?
< jonasschnelli>
gmaxwell: Yes. I guess that would require fixing once we combine multiple messages (or expect it will be combined).
< jonasschnelli>
So,... I guess it needs fixing now
< jonasschnelli>
It's indeed missing bytes,... especially if we assume combined messages into a single enc package
< jonasschnelli>
(will fix)
< gmaxwell>
well yes/no, so max_size is much larger than any message we allow today.
< gmaxwell>
so it's not actually a limitation. But I think it actually needs to be set to the largest message plus a few bytes for the overhead.
< gmaxwell>
I think as you have it now thats probably a trivial DOS attack, e.g. I can make your node use 32mb * npeers ram... so 4GB additional usage in the default configuration.
< jonasschnelli>
hmm...
< sipa>
use 3 byte lengths :)
< gmaxwell>
3 byte is kinda limiting, I really wanted to suggest that previously. Realistically if we ever had messages that needed to be bigger we'd have to break them up for processing reasons.
< jonasschnelli>
How does the current network code prevents that (there is the same MAX_SIZE check)?
< sipa>
the serialization code will only reallocate in blobs of 5 MB as the data is read
< gmaxwell>
jonasschnelli: there are size limits on the particular messages.
< jonasschnelli>
ah... we don't need to auth before process there, right
< sipa>
it never allocates more than 5 MB in addition to what it has already read from the stream
< gmaxwell>
Like if we were to support (say) 40MB blocks pratically we should have messages that split the blocks into chunks, because transfering and allocating 40MB at a time would be kind of absurd.
< gmaxwell>
so 3byte lengths might actually be reasonable.
< jonasschnelli>
ack
< jonasschnelli>
But due to the rekey approach, it would be 2^23
< jonasschnelli>
But I guess we don't want more then that per encryption package that needs auth (doesn't mean we don't want larger messages)