#bitcoin-core-dev on 2018-08-25 — searchable irc log

00:00 < gmaxwell> right. well there is also no need to discard bytes from K_2 but it does that too. the performance hit is especially gratitious for lengths.

00:04 < gmaxwell> sipa: oh geesh, for every message we do _another_ chacha20 run to derrive the poly key.

00:05 < gmaxwell> so encrypting a single small message requires 3 runs of the chacha20 function, one to encrypt the length, one to establsh the poly1305 key, and one to encrypt the payload.

00:06 < gmaxwell> this seems pants on head stupid.

00:07 < gmaxwell> The polykey needs to be per packet for poly1305's requirements, so I suppose it's only throwing out 32 bytes of chacha output.

00:14 < gmaxwell> https://www.ietf.org/mail-archive/web/secsh/current/msg01224.html seems other people have suggested combining the poly1305 chacha block with the length encryption.

00:14 < gmaxwell> and it seems that they didn't so that you could just use a RFC5116 implementation of it.

00:14 < * gmaxwell> cries

00:24 < gmaxwell> So encrypting a 12 byte message will run on the order of 109 cycles/byte... which means that for small messages a straighforward implementation of AES-GCM would likely be faster, even on hardware without AES instructions.

00:24 < gmaxwell> (109 cycles/byte for the chacha20 part alone)

00:31 < sipa> gmaxwell: but what is the average message length for us?

00:34 < sipa> it seems we don't keep stats on message counts

01:24 < cfields> luke-jr: hmm?

01:24 < cfields> luke-jr: if qt's copy is missing the files that need patching, what's to patch?

01:55 < luke-jr> cfields: libpng has optimisations for ARM and POWER in separate files missing in Qt, but Qt's copy of the normal files still tries to link them

01:57 < cfields> luke-jr: armhf/aarch64 build fine, what's different about power?

01:58 < luke-jr> cfields: I don't know how ARM works

01:58 < cfields> (not arguing, just trying to understand)

01:59 < cfields> luke-jr: anyway, breaking out libpng is fine with me. IIRC I didn't do that because it requires zlib, as does qt, so that would've meant 2 copies of zlib. But we've since broken zlib out anyway I believe.

02:00 < luke-jr> yeah

02:01 < cfields> luke-jr: while you're at it, feel free to flip -qt-jpeg to -disable-jpeg too

02:01 < cfields> something like those options, anyway

02:01 < cfields> I think we've had no need for jpegs for a long time

02:04 < luke-jr> did we ever? O.o

02:05 < cfields> pretty sure we had some at some point

02:10 < midnightmagic> ‰/w 39

02:11 < cfields> luke-jr: https://github.com/bitcoin/bitcoin/commit/f9124587ccea723dbd743e3877a7071fbb6c5732

02:12 < cfields> 0.8, heh

02:16 < gmaxwell> sipa: the most common message by far is transaction inv.

02:18 < gmaxwell> sipa: it's just so weird that it uses 3 chacha runs, the poly1305 run has 32 bytes totally unused.

04:01 < Jmabsd> wait, so Bitcoin has the tendency to print (256 & 160bit) hashes in *reverse* order, right - block hashes, transaction hashes and merkle root hashes.

04:02 < Jmabsd> What about pubkey hashes (20B), pubkeys (32B) and signatures (64B) - are those printed in normal or reverse byte order? so, I have a P2SH pubkey script, say. in there is a 20B hash of my redeemscript, right. when I use Bitcoin Core's script disassembly function, will it print that hash in byte or normal order? i mean there is an outer extent to what Core prints in reverse order - for instance, binary transaction dumps (in hex) are in

04:02 < Jmabsd> *normal* order, not reverse.

04:06 < sipa> Jmabsd: that's just printing the bytes one by one

04:07 < sipa> it's only when a hash is interested as a number the printing gets reversed

04:07 < sipa> because the bytes are interpreted as little-endian number, but then printed in big endian for human consumption (humans want to see numbers in big endian)

04:07 < sipa> but a script is a number

04:08 < luke-jr> isn't*

04:08 < Jmabsd> gotcha.

04:08 < sipa> *indeed, isn't

04:11 < Jmabsd> aha. so let's see - if you print a hex dump of a signature (71/72/73B), that's not a hash and hence printed in normal order

04:11 < Jmabsd> a P2SH hash, for instance when printing the disassembly of a P2SH pubkey script - will the 20B hash there be printed in reverse ordeR?

04:12 < Jmabsd> also if a pubkey (32B) is printed out, could that ever be in reverse order?

04:13 < luke-jr> why don't you just try it and see? -.-

04:14 < sipa> Jmabsd: pubkeys are not 32 bytes, and they're not hashes

04:15 < Jmabsd> sipa: so the hex printer for other byte structures are never printed in reverse orders.

04:15 < sipa> indeed

04:15 < sipa> only for things that are internally treated as numbers

04:15 < Jmabsd> but.. a P2SH 20B hash, that's a hash right. for printing purposes, is it considered a hash or a byte blob?

04:15 < sipa> nope!

04:15 < sipa> because the printer cannot know it is a hash

04:16 < sipa> you'd need to execute the script to know it is treated as such

04:16 < sipa> the script opcode is just "put some bytes on the stack"

04:17 < sipa> so, not reversed there

04:21 < Jmabsd> (sorry disconnect)

04:21 < Jmabsd> last, > interesting. except for the HD wallet root seed (160b=20B), there is no instance ever where a 20B hash e.g. in P2SH pubkeyscript, is printed in reverse order.

04:21 < Jmabsd> > sipa, right and when getting a disassembly printout in Bitcoin Core and related tools, those 20B:s are printed in normal order

06:16 < Jmabsd> the proper way to phrase Core's reversing policy is something like, "any hash that is not part of another binary blob or produced as script data, is hex-serialized in reverse byte order."

06:17 < Jmabsd> i'd hope any hash values introduced in the future will not be reversed though.

06:26 < sipa> i don't see why not

06:26 < sipa> we've always treated hash outputs as numbers and printed them as such

06:26 < sipa> if byte swapping is the hardest problem to deal with, i'm not very worried :)

20:39 < jamesob> re: memory usage increase: preliminary bisections are in and MarcoFalke and I are betting it's the leveldb changes. https://i.imgur.com/8aXRzwe.png

20:42 < gmaxwell> jamesob: wait. how are we measuring memory usage in that benchmark?

20:42 < jamesob> gmaxwell: time -f %M (ie resident set size)

20:49 < jamesob> https://github.com/chaincodelabs/bitcoinperf/blob/master/runner/main.py#L659-L668

20:53 < sipa> so every mmap causes 150 kB RSS increase? :s

20:54 < gmaxwell> no because we're not actually using the new maximum.

20:54 < gmaxwell> (I mean not using all of)

20:54 < sipa> aha.

20:54 < gmaxwell> So even MOAR.

20:55 < gmaxwell> which is suspect

20:55 < sipa> mmap by default will prefetch mapped pages into memory

20:55 < sipa> by default, 31 4kB blocks

20:55 < gmaxwell> oh interesting. but those pages are clean they'll just get evicted, they really shouldn't be counted bt RSS :(

20:57 < gmaxwell> so MADV_RANDOM

20:57 < gmaxwell> ...

20:57 < sipa> madvice(MADV_DONTNEED) will disable the reading entirely

20:58 < sipa> which may be useful to diagnose the issue

20:58 < jamesob> (for those following along at home: https://github.com/bitcoin/bitcoin/pull/13925/files)

20:59 < gmaxwell> Also setting the maximum maps really low, like..2 might be interesting.

21:02 < gmaxwell> but if this is the problem, MADV_RANDOM is probably the fix to the extent that its an actual problem at all.

21:02 < gmaxwell> Though we should do a reindex benchmark to make sure MADV_RANDOM doesn't hurt performance.

21:03 < wumpus> PSA: if after the latest merge you get a linker error "/usr/local/include/boost/smart_ptr/shared_ptr.hpp:728: undefined reference to `translationInterface", you need to do a 'make clean' and re-do the make and it will work

21:04 < luke-jr> wumpus: is there a reason the gitian linux yml has g++-riscv64-linux-gnu as a dep? seems to pull in GCC 7 when we're using GCC 8 now?

21:04 < luke-jr> wumpus: if `make clean` ever fixes something, that means there's a bug in the build system :/

21:06 < sipa> gmaxwell, jamesob: LMDB uses MADV_RANDOM it seems

21:06 < sipa> (though its design is different, i don't know their access patterns)

21:06 < wumpus> luke-jr: yes, it must be missing some changes in dependency detection between source and header files (another one is if you change something in univalue, it won't detect it)

21:06 < sipa> jamesob: try this:

21:06 < sipa> void* base = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);

21:06 < sipa> if (base != MAP_FAILED) {

21:06 < sipa> + madvise(base, size, MADV_DONTNEED);

21:07 < sipa> in src/leveldb/util/env_posix.cc

21:07 < sipa> (and/or MADV_RANDOM)

21:07 < wumpus> luke-jr: it pulls in both gcc 7 and 8, I think that's necessary due to some strangeness with the packages (some symlink will only exist when g++-riscv64-linux-gnu is also installed)

21:07 < luke-jr> DONTNEED sounds wrong?

21:07 < luke-jr> wumpus: ah, weird

21:08 < wumpus> luke-jr: you might be able to get around it, but I noticed and tried as well and ran into a dead end

21:08 < sipa> luke-jr: to diagnoze

21:08 < sipa> luke-jr: it would be interesting to see what the effect on RSS is with DONTNEED, to have an idea to what extent our memory usage is due to mmap caching

21:09 < jamesob> sipa: giving it a shot now

21:09 < wumpus> luke-jr: at least it's not MADV_HWPOISON!

21:10 < luke-jr> wumpus: lol

21:11 < gmaxwell> It's plausable to me that MADV_RANDOM will help performance or at least be neutral.

21:11 < wumpus> yes, to me too

21:11 < wumpus> our access pattern is more or less random

21:12 < gmaxwell> I don't recall now, though I know I researched this before... does leveldb's bisection interpolate assuming keys are uniform and that their values are uniformly sized or does it plain bisect?

21:12 < wumpus> (except in the rare times it's iterating over the whole utxo set in order, like when computing statistics)

21:13 < gmaxwell> esp in the case of plain bisection, prefetching is a bad behavior.

21:13 < wumpus> I don't know

21:15 < sipa> gmaxwell: there is an index at the end of each ldb file

21:15 < sipa> i assume it bisects in that index in a naive way, but i'm not sure

21:21 < jamesob> sipa: bench is running; we'll know how your change works in about six hours

21:22 < sipa> jamesob: awesome

21:22 < luke-jr> hm, I think I will regret relatime when I try to prune gitian caches