< Luke-Jr>
How important is memory comparison for #6851 ? I get a std::bad_alloc with my test wallet, so I'll need to find something less spammy and start over :/
< gmaxwell>
Luke-Jr: how much memory does the host you're testing on have?
< Luke-Jr>
gmaxwell: 16 GB, but 32-bit can only alloc <4 GB
< Luke-Jr>
I'm not surprised that a wallet with dice addresses overflows it.
< gmaxwell>
Luke-Jr: did you try turning down caching? ... we're already pretty close to the wire address space wise on 4gb to begin with.
< Luke-Jr>
I didn't. But IIRC dice uses >4 GB of transactions in the blockchain, so really I don't expect anything to help except remaking the wallet with a lower volume address
< Luke-Jr>
ok, so I did the memory comparison on Eligius's mining key
< Luke-Jr>
= 10184 wtxns
< Luke-Jr>
ps showed lower memory usage (in all three of SIZE, VSZ, and SZ) for the optimised/cached bitcoind. Obviously it couldn't have been actually reduced, so I conclude it's unmeasurably small.
< gmaxwell>
10000 isn't that many. :(
< Luke-Jr>
gmaxwell: not enough to at least show a measurable difference if it were a problematic increase?
< * Luke-Jr>
ponders how he has no block sources with 10 connections :/
< Luke-Jr>
cute, my next block is inflight from an XT node which is apparently not sending it?
< wumpus>
curious
< wumpus>
how openbsd, even with the recent release, still manages to package a BDB that is *older* than the 4.8 required to build bitcoin core
< Luke-Jr>
lol
< midnightmagic>
netcraft confirms it..?
< * Luke-Jr>
ponders if it would be difficult to gitian-build bitcoind for Android <.<
< wumpus>
not difficult, probably, a matter of pointing the depends at the NDK toolchain, then the type of slog work that cross-building usually is
< wumpus>
I vaguely remember cfields_ has done it at some point
< wumpus>
midnightmagic: netcraft?
< midnightmagic>
wumpus: Ancient meme. BSD is dying, netcraft confirms it. Used to be the repeated mantra on Slashdot, turn of the century.
< GitHub43>
bitcoin/0.11 0720324 Alex Morcos: Make fee aware of min relay in pruning.py RPC test...
< GitHub65>
[bitcoin] MarcoFalke opened pull request #6860: Drop minRelayTxFee to 1000 (master...MarcoFalke-2015-minRelayTxFeeDrop) https://github.com/bitcoin/bitcoin/pull/6860
< jgarzik>
wumpus, You are confusing an issue with a PR
< wumpus>
this is completely crazy
< jgarzik>
wumpus, calm down please. The issue is univalve conversion cause unintended behavior changes that are as yet not documented or audited
< wumpus>
it is simply not a bug
< jgarzik>
wumpus, objecting to a solution that was not proposed as a PR is fine
< wumpus>
EWONTFIX
< jgarzik>
wumpus, it is a user noticeable behavior change that potentially breaks people in the field
< jgarzik>
wumpus, who knows what else breakage was caused, until analysis is complete
< wumpus>
if you want to change documentation, be my guest, I don't think there is any documentation that says yo ucan provide numbers there that exceed 32 bit numeric range
< wumpus>
well at most you'll get an error now
< wumpus>
instead of dumbly ANDing off the upper bits
< wumpus>
which can cause much worse problems, ask the openssl devs
< jgarzik>
wumpus, not disagreeing, please read what I wrote
< jgarzik>
wumpus, user breakage - breakage was noticed - let's be methodical
< wumpus>
well, do you intend on doing analysys thee?
< wumpus>
if not, I want to close the issue and move on
< jgarzik>
wumpus, yes the issue should stay open until this is analyzed
< wumpus>
I disagree that it is so important
< jgarzik>
wumpus, without analysis it is provably impossible to say what is the impact
< wumpus>
but anyhow, if you look into this this week, I'm fine with keeping it open
< wumpus>
I don't think you'll find anyone that thinks the new behavior doesn't make sense
< wumpus>
I can ask a few people if you think jonasschnelli, dcousens and mine opinion isn't enough
< wumpus>
arguably if you were using large values in the field it was already broken. You *thought* you were parsing a large value, but you could have been passing a negative value for all you know
< jonasschnelli>
i think the out of range exception behavior is okay and expected
< jonasschnelli>
But we need to write some univalue release notes anyways and could mention this there
< wumpus>
1000000000000 turns into -727379968, for example
< jgarzik>
wumpus, Please consider the wider picture. You are focusing on one impacted callsite - and I consistently agree with you that the new behavior at that callsite is better - the issue is that that is not the only .get_int() in the entire codebase. There may be other user visible breakage where the impact is not so fortunate. Do not blindly assume.
< wumpus>
on other callsites the breakage would be exactly the same
< jgarzik>
you hope and assume...
< wumpus>
not, I'm sure
< wumpus>
get_int() simply does a range check now
< wumpus>
that is safer
< wumpus>
if you have one example where this causes dangerous behavior, just say so
< wumpus>
if not, please stop this nonsense
< jonasschnelli>
release notes -> mention integers will now face a range check -> done?
< jgarzik>
the new behavior is throwing an exception versus returning a range bound value without an exception
< wumpus>
the old behavior was dangerous, not the newo ne
< jgarzik>
throwing an exception causes different error return messages and different downstream behaviors
< jgarzik>
which in turn cause user visible interface changes
< jonasschnelli>
jgarzik: i agree: it's and API change. But in my opinion one that is worth to take.
< wumpus>
but ony if the input *was already invalid*
< jgarzik>
jonasschnelli, yes
< jgarzik>
jonasschnelli, and my point has always been it needs to be analyzed and documented
< wumpus>
at least now you are told that the input is invalid, instead of replacing it with an effectively random number
< jgarzik>
not simply make the assumption it is OK
< jgarzik>
that is extremely poor software engineering - hope and pray
< wumpus>
well then do something better
< jgarzik>
keeping an issue open until it is analyzed fully is not a huge burden
< wumpus>
opening lots of issues that no one ever looks at is just doesn't cut it either
< wumpus>
it is a burdern, though
< jonasschnelli>
Mention it in the release notes is okay, but not mandatory IMO. If a API consumer was relying range bound value he needs to be slapped anyways. :)
< jgarzik>
don't make mountains out of molehills.
< wumpus>
well possibly they didn't even know the range was not 64 bits. This bug could even be exploitable in some cases
< wumpus>
at least now they aretold
< jgarzik>
the issue is assigned to me now. please stop hyperventilating over a minor issue.
< wumpus>
there could be cases where *application side* range checks can be bypassed by providing large numbers, unaware that bitcoind cuts them off
< morcos>
wumpus: I'm a bit out of my depth in the memory "leak" related to getblocktemplate, but i have an idea on the cause
< morcos>
in CreateNewBlock, we have a vecPriority that is approximately equal to the size of the mempool.
< morcos>
the first time you call getblocktemplate your resident usage jumps by the size of your mempool
< morcos>
repeated calls to getblocktemplate will continue to cause that to happen, up to the number of rpc threads you are running
< morcos>
this is still mostly guesswork at this point, but its not clear to me why that memory used by vecPriority appears to be sticking around (a copy per thread)
< wumpus>
morcos: strange; I don't think we use any thread-local storage at all
< morcos>
well, like i said, still guess work, could be wrong.
< morcos>
but vecPriority is a local variable, so that would be thread-local storage right, it just should go away when the function exits
< morcos>
but maybe its some stl "optimization" to keep it allocated or something
< jgarzik>
In some thread systems (not sure about RPC threads), the threads do not die - they are recycled
< jgarzik>
so a huge stack would remain
< morcos>
but vecPriority is local to the function CreateNewBlock
< jgarzik>
indeed, that should go out of scope
< jgarzik>
as should CCoinsViewCache
< wumpus>
the threads certainly stay around
< jgarzik>
also in terms of allocator behavior, large amounts of memory are not necessarily reclaimed at the process level. In the C library (not sure about C++), lots of free(3) memory is recycled back into allocator, not returned to system via munmap(2)
< wumpus>
but the stack can't grow that far, a vector is allocated on the heap not the stack
< wumpus>
(also pthreads default stack size is 8mb per thread; hardly possible to cause so much 'leaking' that way)
< jgarzik>
So even though the app has released memory to the allocator, the allocator will not necessarily release memory to the system, as shown in top(1)
< wumpus>
right, freed memory is not returned to the system immediately
< wumpus>
sometimes never, depending on the size of the allocation
< jgarzik>
Process virtual memory size therefore just sits at its maximum, even if unused by app
< jgarzik>
*sometimes
< jgarzik>
(recent libraries do release memory if chunks are large enough)
< wumpus>
(but it's reused if the application allocates again, so it should not result in growing process size over time in itself)
< morcos>
we're talking about resident memory, not virtual memory, but you're saying it could still reflect there
< jgarzik>
nod
< wumpus>
do multiple getblocktemplates run at the same time indifferent threads? that'd explain it
< morcos>
wumpus: that's why i put "leak" in quotes, it seems entirely possible to me that the memory usage is limited to an excess of #rpcthreads * maxseenmempoolsize
< jgarzik>
per-thread heaps are certainly possible in C++
< jgarzik>
does memory exceed RPC thread count
< morcos>
wumpus: i was running them serially
< jgarzik>
seems like the steady state could be reached once all RPC threads are "huge" :)
< wumpus>
morcos: ok
< morcos>
yeah, so there might be a steady state, but the steady state is particularly inefficient if you have a lot of RPC threads
< wumpus>
the behavior seems consistent with per-thread heap, yes
< wumpus>
default rpcthreads is only 4, there shouldn't usually be a reason to increase that
< wumpus>
(especially as everything holds the cs_main lock anyway, so there is only very little scope for paralellism)
< morcos>
yes but 4 * max_mempool_usage is a lot of extra memory right?
< jgarzik>
not if you have a studly machine
< * jgarzik>
runs
< wumpus>
would be interesting to use google heap profiler to see if the memory gets released in between the calls
< wumpus>
if it doesn't show up in there, it's the matter of memory not being released to the OS
< jgarzik>
Amusingly I bet you could actually go faster with fork(2) and a lockless COW mempool copy that disappears upon child exit(2). (unrealistic due to portability... not a real proposal)
< wumpus>
well if that means forking a child for every http connection, I don't think it'd be faster, but yeah you could use the processes a few times and then have them exit...
< wumpus>
I'm fairly sure apache does as much
< jgarzik>
nah just fork off for that one RPC call, and have the thread wait for results
< jgarzik>
when you're making a complete copy of the process you can do stuff like that :)
< jgarzik>
but doesn't work at all on Windows so...
< jgarzik>
on Windows, you have to fake it by executing your own .exe again
< jgarzik>
And correct - Apache re-uses both processes and threads for a limited time, then closes then - for reasons precisely like the ones we are seeing
< jgarzik>
limited recycle prevents memory from building up
< jgarzik>
*closes them
< wumpus>
morcos: can you try export MALLOC_ARENA_MAX=1 , this is supposed to disable per-thread arenas for newer glibc
< wumpus>
by default it creates an arena for every CPU core
< jgarzik>
yep - that's so thread local structures can run lock free simply by accessing arena_vector[my_cpu_core_number]
< wumpus>
from a performance point of view it makes a lot of sense
< wumpus>
at least when there is actual paralellism :)
< jgarzik>
The cs_main lock has amusing historical parallels. both freebsd and linux kernels had a Big Kernel Lock - a single lock that nearly everything would grab - during the early days of their conversion to SMP/parallel kernels
< wumpus>
yes and don't forget python's big lock :)
< jgarzik>
that too
< jgarzik>
python is so bad at threading you -have- to fork ;p
< wumpus>
but I hope bitcoin's is more like the BSD/Linux kernel lock than the python one, at least the former got rid of it at some point
< wumpus>
with python it's such an essential part of the design that it's unrealistic to get rid of it. At least that was in the 2.x days, i don't know about 3.x
< GitHub87>
bitcoin/master e253e83 Matt Corallo: Update debian/changelog and slight tweak to debian/control
< GitHub87>
bitcoin/master c7b36cc Matt Corallo: Change URLs to https in debian/control
< GitHub87>
bitcoin/master c6de5cc Wladimir J. van der Laan: Merge pull request #6796...
< GitHub160>
[bitcoin] laanwj closed pull request #6796: Update debian/changelog and slight tweak to debian/control (master...debian) https://github.com/bitcoin/bitcoin/pull/6796
< jgarzik>
yep
< jgarzik>
and we're well on our way to peeling off the cs_main (bsd/linux road)
< jgarzik>
imo
< morcos>
wumpus: sorry its taking me a while, i was trying to get a full mempool but not also spam the rest of the network... i haven't run the getblock templates yet, but it sure does make bitciond use a lot less memory in the first place (using MALLOC_ARENA_MAX=1)
< morcos>
normally i'm using close to 1G shortly after starting, with this, it was about 250M
< wumpus>
morcos: good to know! thanks for testing, I always had this issue where only very little memory usage would show up in the heap profiler, but top showed much more, now we understand why!
< wumpus>
and the more cores, the more it will use
< wumpus>
should add that one to the use-less-memory tips
< morcos>
wumpus: yep, after repeated calls to getblocktemplate, it only jumped in memory usage once
< morcos>
i think concentrating on vecPriority was misguided (its not actually that big, just ptrs to txs) but between all the stuff thats stl-allocated in CNB, that probably explains the big jump. i havent' gone through really to see if it adds up yet
< morcos>
it does still seem like a big jump.
< morcos>
whether it happens 1x or #rpcthreads times
< morcos>
i suppose it might depend on how much the CCoinsViewCache allocates?
< wumpus>
that depends; they will cache everything that is queried through them
< jgarzik>
wumpus, any remaining merge blockers for #6722 in your view?
< morcos>
+1 for merging it sooner rather than later. only reason not to merge i think is if we can coerce more people into review/test. but beyond that, lets get it out into the wild to find more bugs.
< morcos>
however its worth considering merging the lower chain limits soon as well
< jgarzik>
yeah this sort of stuff is perfect for master
< GitHub85>
[bitcoin] jtimon reopened pull request #6625: BLOCKING: Consensus: Move blocksize and related parameters to consensusparams ...without removing consensus/consensus.h [#6526 alternative] (master...consensus-blocksize-0.12.99) https://github.com/bitcoin/bitcoin/pull/6625
< jgarzik>
the Internet is always a better test lab than anything we cook up in private
< jgarzik>
merge easy, revert easy
< jgarzik>
I would have merged it myself if it were a normal change, given the ACK quotient, but for a change of this notability I want wumpus, gmaxwell, sipa etc. approval of some sort (sipa has already ACK'd FTR)
< GitHub107>
[bitcoin] jtimon closed pull request #6672: Consensus: Separate most consensus functions to consensus.cpp (master...consensus-moveonly-0.12.99) https://github.com/bitcoin/bitcoin/pull/6672
< morcos>
wumpus: do you think thats a sufficient "fix" for the memory usage.
< morcos>
still not clear exactly how much heap allocation is happening in getblocktemplate
< morcos>
but lets say for the sake of argument its on the order of your mempool usage size
< morcos>
now you're saying every node out there (that doesn't run with your low mem tip) is going to use 5x maxmempoolsize
< morcos>
at least
< morcos>
i guess only if they run getblocktemplate
< morcos>
but maybe we can do something to try to reduce that. we know for instance that there is no need for this particular memory usage to worry about trying to allocate at the same time
< morcos>
if that was even a static variable that was just cleared between calls, it would persist the memory all the time, but only 1x right?
< morcos>
so maybe vecPriority and view and whatevers using memory in createnewblock
< gmaxwell>
memory usage I'm seeing is certantly beyond 5x.
< gmaxwell>
3.77GB res compared to "usage": 90103744
< morcos>
gmaxwell: i think it turns out this arena per core thing will affect a lot of bitcoind mem usage... also, i think its the max ever mempool usage that matters, not current
< gmaxwell>
ah! reading more backscroll, so the idea is that the pre-thread (or whatever) arena is ending up allocating but not sbrking data in each thread (or core?) that runs a GBT and thus has a high peak usage because it's copied the mempool?
< gmaxwell>
Having control of stuff like this is why firefox uses jemalloc
< morcos>
gmaxwell: yeah to the limit of my understanding. except still not exactly clear what is taking up the memory, b/c the mempool isn't copied. but could be the CoinsViewCache.
< wumpus>
coinsviewcache has methods to measure the size
< gmaxwell>
(not that it wouldn't also be good to reduce virtual usage, at least on 32bit.. but I'm seeing about 4x resident than what I expect)
< morcos>
gmaxwell: yeah that confused me as well. but from my very ad hoc testing it seemed to affect RES
< wumpus>
gmaxwell: well the memory is at least temporary resident when it is used
< wumpus>
it's not just reserving an arena, it's actually returning memory from it, to be used and later freed
< wumpus>
after which it sticks around, no longer touched so it doesn't have to stay resident, but that depends on mem pressure
< wumpus>
(also if you keep sending getblocktemplate it probably reuses them in more-or-less round-robin fashion)
< gmaxwell>
oh i bet that interacts super well with people who've cranked rpc threads to get around the issues with rpcblocking.
< wumpus>
nah the number of threads is not that relevant, just the number of cores
< wumpus>
it tries to balance arenas over cores
< wumpus>
(which makes a lot of sense for performance, but in our case it's not useful)
< gmaxwell>
I wonder if its ever useful for large allocations. hm. then again the usage is probably made out of zillions of tiny allocations.
< wumpus>
we had a workstation with a crazy NUMA motherboard at uni, where each processor had its own memory. I guess that's the only case where it makes sense for large, infrequent allocations.
< wumpus>
(but yes in bitcoin we use tons of small allocations)
< CodeShark>
wumpus: it seems test_bitcoin behavior has changed recently. not too long ago, if I ran src/test/test_bitcoin --run_tests=<testname> from the root project directory, it would only run that specific test - but now it seems to run all of them
< CodeShark>
am I tripping?
< wumpus>
--run-tests= takes a test suite name, not an individual test name
< CodeShark>
I mean singular
< CodeShark>
--run_test=
< wumpus>
if the test suite doesn't exist, it somehow gets them all
< wumpus>
that's not possible
< CodeShark>
let me verify what I'm saying to make sure it's accurate...
< wumpus>
yes it's singular run-test, but it takes a suite name
< CodeShark>
if the test suite doesn't exist I get an error
< * wumpus>
test_bitcoin -l test_suite will show exactly what test suites and tests it's executing, and how long they take
< wumpus>
that behavior is part of boost test, so if it changed, you probably changed your boost version
< wumpus>
-run-tests= isn't picked up at all here, -t is
< jgarzik>
(apologies if dup; didn't see it in scrollback)
< * jgarzik>
looks for some way to cap the number of arenas besides MALLOC_ARENA_MAX
< wumpus>
don't think I saw that one yet
< Luke-Jr>
heh, when I wanted to run a specific test, I edited the makefile to not include the rest <.<
< wumpus>
"Freed blocks near the end of heaps in an arena are given back to the system either by using madvise(MADV_DONTNEED) or by explicitly unmapping." that doesn't seem to happen for us, probably due to heap fragmentation
< jgarzik>
nod - lots of little allocations will do that :/
< gmaxwell>
Change to jemalloc which should also reduce fragmentation.