< bitcoin-git>
[bitcoin] practicalswift opened pull request #12665: Add compile time checking for all run time locking assertions (master...compile-time-checking-of-runtime-assertions) https://github.com/bitcoin/bitcoin/pull/12665
< eklitzke>
the memory use for heap 7 (where the loadblk thread is running) is basically the same before and after the 2GB flush
< sipa>
eklitzke: how do i view that?
< eklitzke>
bitcoin-cli getmemory mallocinfo
< sipa>
o
< eklitzke>
you can get global stats with mallinfo(3) but the xml thing seems to be the only way to get the stats broken down by arena (for bitcoin each thread effectively gets its own arena)
< gmaxwell>
eklitzke: by do so hot, you're expecting usage to go down... I wouldn't expect it to due to fragmentation. the cache is kazillions of tiny sparse allocations that all happened at different times.
< eklitzke>
it may not be fragmentation per se but it's not releasing the memory back to the os
< eklitzke>
i was experimenting with trying to dynamically change the dbcache size and it was ineffecitve for this reason, even when using rehash etc.
< eklitzke>
it would be nice if the dbcache could be given its own arena since it uses so much memory (on many configurations) and has a somewhat unique allocation pattern
< gmaxwell>
Instead, I'd rather it didn't allocate at all.
< eklitzke>
you mean like with a slab allocator?
< gmaxwell>
what I'd previously propsosed is that we make it into an open hash table, with all data internal. (and an exception map for rare unusually large entries)
< eklitzke>
i was thinking about that too
< gmaxwell>
everything inline, so no pointer chasing in lookups.
< gmaxwell>
(except for the exception entries)
< eklitzke>
nearly everything in the cache is the same size, the data could fit very compactly
< gmaxwell>
right.
< gmaxwell>
just make every entry the same size, and the rare ones that don't fit, the storage is a union with a pointer, and a flag tells you to look in an map for the payload.
< gmaxwell>
or something along those lines.
< eklitzke>
makes sense
< sipa>
but avoiding allocating inside the LevelDB batch creation would need serious refactoring
< eklitzke>
leveldb isn't really optimized for the way bitcoin writes to it
< eklitzke>
you could make it a lot more efficient
< gmaxwell>
and entries could be never deleted from the open hash table, just flagged.. and make the insert routine understand that it can just write over the first non-dirty entry it encounters.
< eklitzke>
it would require kind of serious surgery to the leveldb guts, but leveldb already represents the data in ordered ranges for the sstables, and the data to be flushed can be sorted, so i think you could redo the merge algorithm to zip through the on-disk data and the data being flushed and do the merge without doing a lot of allocatoins
< luke-jr>
IIRC, at least at one point, Linux didn't *support* processes releasing memory back to the OS. But that may have changed (it's been years)
< eklitzke>
linux has supported releasing memory back to the operating system for 20+ years if you munmap the data
< eklitzke>
you can't do it with brk/sbrk though
< eklitzke>
glibc mixes both which is kind of weird
< eklitzke>
from what i can tell the main arena is allocated with sbrk and the other arenas are allocated with mmap
< sipa>
iirc it's just a threshold
< sipa>
memory ranges above a certain size are anonymous mmaps
< eklitzke>
i think step 1 though would be to implement greg
< eklitzke>
s idea, since that doesn't require messing with the allocator used by the rest of the code
< sipa>
i believe greg is notoriously hard to implement
< sipa>
ah.
< sipa>
eklitzke: i have a flame graph!
< eklitzke>
did you get it color coded?
< sipa>
yes; it's inside libsecp256k1 though, in a microbenchmark
< eklitzke>
for micro benchmarks you should look at "perf stat" (man perf-stat)
< sipa>
ah
< sipa>
the same scripts apply?
< eklitzke>
no it just gives you detailed information about things like cache hits/misses, cpu cycles, page faults, etc.
< eklitzke>
so much gmpz realloc
< sipa>
the problem i'm trying to investigate is why this call to GMP takes 600 cpu cycles when run inside the microbenchmark, but 6000 cpu cycles when called from a higher-level benchmark in core
< sipa>
and at this point i have no better guess than cache effects... but i'm very surprised it would be that much difference
< eklitzke>
one guess: in the benchmark there's one thread so everything is in the same core and numa memory zone, but in core the code has to access memory in another numa zone
< eklitzke>
there's some stuff in numactl you could do try to force core to run all on one core to test that hypothesis
< sipa>
the whole thing should be one thread even when called from bench_bitcoin
< eklitzke>
try perf-stat -a --per-core
< jbnj>
hi
< jbnj>
hi everybody
< bitcoin-git>
[bitcoin] luke-jr opened pull request #12666: configure: UniValue 1.0.4 is required for pushKV(, bool) (master...univalue-1.0.4-required) https://github.com/bitcoin/bitcoin/pull/12666