< gmaxwell> right. well there is also no need to discard bytes from K_2 but it does that too. the performance hit is especially gratitious for lengths.
< gmaxwell> sipa: oh geesh, for every message we do _another_ chacha20 run to derrive the poly key.
< gmaxwell> so encrypting a single small message requires 3 runs of the chacha20 function, one to encrypt the length, one to establsh the poly1305 key, and one to encrypt the payload.
< gmaxwell> this seems pants on head stupid.
< gmaxwell> The polykey needs to be per packet for poly1305's requirements, so I suppose it's only throwing out 32 bytes of chacha output.
< gmaxwell> https://www.ietf.org/mail-archive/web/secsh/current/msg01224.html seems other people have suggested combining the poly1305 chacha block with the length encryption.
< gmaxwell> and it seems that they didn't so that you could just use a RFC5116 implementation of it.
< * gmaxwell> cries
< gmaxwell> So encrypting a 12 byte message will run on the order of 109 cycles/byte... which means that for small messages a straighforward implementation of AES-GCM would likely be faster, even on hardware without AES instructions.
< gmaxwell> (109 cycles/byte for the chacha20 part alone)
< sipa> gmaxwell: but what is the average message length for us?
< sipa> it seems we don't keep stats on message counts
< cfields> luke-jr: hmm?
< cfields> luke-jr: if qt's copy is missing the files that need patching, what's to patch?
< luke-jr> cfields: libpng has optimisations for ARM and POWER in separate files missing in Qt, but Qt's copy of the normal files still tries to link them
< cfields> luke-jr: armhf/aarch64 build fine, what's different about power?
< luke-jr> cfields: I don't know how ARM works
< cfields> (not arguing, just trying to understand)
< cfields> luke-jr: anyway, breaking out libpng is fine with me. IIRC I didn't do that because it requires zlib, as does qt, so that would've meant 2 copies of zlib. But we've since broken zlib out anyway I believe.
< luke-jr> yeah
< cfields> luke-jr: while you're at it, feel free to flip -qt-jpeg to -disable-jpeg too
< cfields> something like those options, anyway
< cfields> I think we've had no need for jpegs for a long time
< luke-jr> did we ever? O.o
< cfields> pretty sure we had some at some point
< midnightmagic> ‰/w 39
< cfields> 0.8, heh
< gmaxwell> sipa: the most common message by far is transaction inv.
< gmaxwell> sipa: it's just so weird that it uses 3 chacha runs, the poly1305 run has 32 bytes totally unused.
< Jmabsd> wait, so Bitcoin has the tendency to print (256 & 160bit) hashes in *reverse* order, right - block hashes, transaction hashes and merkle root hashes.
< Jmabsd> What about pubkey hashes (20B), pubkeys (32B) and signatures (64B) - are those printed in normal or reverse byte order? so, I have a P2SH pubkey script, say. in there is a 20B hash of my redeemscript, right. when I use Bitcoin Core's script disassembly function, will it print that hash in byte or normal order? i mean there is an outer extent to what Core prints in reverse order - for instance, binary transaction dumps (in hex) are in
< Jmabsd> *normal* order, not reverse.
< sipa> Jmabsd: that's just printing the bytes one by one
< sipa> it's only when a hash is interested as a number the printing gets reversed
< sipa> because the bytes are interpreted as little-endian number, but then printed in big endian for human consumption (humans want to see numbers in big endian)
< sipa> but a script is a number
< luke-jr> isn't*
< Jmabsd> gotcha.
< sipa> *indeed, isn't
< Jmabsd> aha. so let's see - if you print a hex dump of a signature (71/72/73B), that's not a hash and hence printed in normal order
< Jmabsd> a P2SH hash, for instance when printing the disassembly of a P2SH pubkey script - will the 20B hash there be printed in reverse ordeR?
< Jmabsd> also if a pubkey (32B) is printed out, could that ever be in reverse order?
< luke-jr> why don't you just try it and see? -.-
< sipa> Jmabsd: pubkeys are not 32 bytes, and they're not hashes
< Jmabsd> sipa: so the hex printer for other byte structures are never printed in reverse orders.
< sipa> indeed
< sipa> only for things that are internally treated as numbers
< Jmabsd> but.. a P2SH 20B hash, that's a hash right. for printing purposes, is it considered a hash or a byte blob?
< sipa> nope!
< sipa> because the printer cannot know it is a hash
< sipa> you'd need to execute the script to know it is treated as such
< sipa> the script opcode is just "put some bytes on the stack"
< sipa> so, not reversed there
< Jmabsd> (sorry disconnect)
< Jmabsd> last, > interesting. except for the HD wallet root seed (160b=20B), there is no instance ever where a 20B hash e.g. in P2SH pubkeyscript, is printed in reverse order.
< Jmabsd> > sipa, right and when getting a disassembly printout in Bitcoin Core and related tools, those 20B:s are printed in normal order
< Jmabsd> the proper way to phrase Core's reversing policy is something like, "any hash that is not part of another binary blob or produced as script data, is hex-serialized in reverse byte order."
< Jmabsd> i'd hope any hash values introduced in the future will not be reversed though.
< sipa> i don't see why not
< sipa> we've always treated hash outputs as numbers and printed them as such
< sipa> if byte swapping is the hardest problem to deal with, i'm not very worried :)
< jamesob> re: memory usage increase: preliminary bisections are in and MarcoFalke and I are betting it's the leveldb changes. https://i.imgur.com/8aXRzwe.png
< gmaxwell> jamesob: wait. how are we measuring memory usage in that benchmark?
< jamesob> gmaxwell: time -f %M (ie resident set size)
< sipa> so every mmap causes 150 kB RSS increase? :s
< gmaxwell> no because we're not actually using the new maximum.
< gmaxwell> (I mean not using all of)
< sipa> aha.
< gmaxwell> So even MOAR.
< gmaxwell> which is suspect
< sipa> mmap by default will prefetch mapped pages into memory
< sipa> by default, 31 4kB blocks
< gmaxwell> oh interesting. but those pages are clean they'll just get evicted, they really shouldn't be counted bt RSS :(
< gmaxwell> so MADV_RANDOM
< gmaxwell> ...
< sipa> madvice(MADV_DONTNEED) will disable the reading entirely
< sipa> which may be useful to diagnose the issue
< jamesob> (for those following along at home: https://github.com/bitcoin/bitcoin/pull/13925/files)
< gmaxwell> Also setting the maximum maps really low, like..2 might be interesting.
< gmaxwell> but if this is the problem, MADV_RANDOM is probably the fix to the extent that its an actual problem at all.
< gmaxwell> Though we should do a reindex benchmark to make sure MADV_RANDOM doesn't hurt performance.
< wumpus> PSA: if after the latest merge you get a linker error "/usr/local/include/boost/smart_ptr/shared_ptr.hpp:728: undefined reference to `translationInterface", you need to do a 'make clean' and re-do the make and it will work
< luke-jr> wumpus: is there a reason the gitian linux yml has g++-riscv64-linux-gnu as a dep? seems to pull in GCC 7 when we're using GCC 8 now?
< luke-jr> wumpus: if `make clean` ever fixes something, that means there's a bug in the build system :/
< sipa> gmaxwell, jamesob: LMDB uses MADV_RANDOM it seems
< sipa> (though its design is different, i don't know their access patterns)
< wumpus> luke-jr: yes, it must be missing some changes in dependency detection between source and header files (another one is if you change something in univalue, it won't detect it)
< sipa> jamesob: try this:
< sipa> void* base = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, 0);
< sipa> if (base != MAP_FAILED) {
< sipa> + madvise(base, size, MADV_DONTNEED);
< sipa> in src/leveldb/util/env_posix.cc
< sipa> (and/or MADV_RANDOM)
< wumpus> luke-jr: it pulls in both gcc 7 and 8, I think that's necessary due to some strangeness with the packages (some symlink will only exist when g++-riscv64-linux-gnu is also installed)
< luke-jr> DONTNEED sounds wrong?
< luke-jr> wumpus: ah, weird
< wumpus> luke-jr: you might be able to get around it, but I noticed and tried as well and ran into a dead end
< sipa> luke-jr: to diagnoze
< sipa> luke-jr: it would be interesting to see what the effect on RSS is with DONTNEED, to have an idea to what extent our memory usage is due to mmap caching
< jamesob> sipa: giving it a shot now
< wumpus> luke-jr: at least it's not MADV_HWPOISON!
< luke-jr> wumpus: lol
< gmaxwell> It's plausable to me that MADV_RANDOM will help performance or at least be neutral.
< wumpus> yes, to me too
< wumpus> our access pattern is more or less random
< gmaxwell> I don't recall now, though I know I researched this before... does leveldb's bisection interpolate assuming keys are uniform and that their values are uniformly sized or does it plain bisect?
< wumpus> (except in the rare times it's iterating over the whole utxo set in order, like when computing statistics)
< gmaxwell> esp in the case of plain bisection, prefetching is a bad behavior.
< wumpus> I don't know
< sipa> gmaxwell: there is an index at the end of each ldb file
< sipa> i assume it bisects in that index in a naive way, but i'm not sure
< jamesob> sipa: bench is running; we'll know how your change works in about six hours
< sipa> jamesob: awesome
< luke-jr> hm, I think I will regret relatime when I try to prune gitian caches