< cfields>
sipa: want me to PR that? Or you have something else in mind?
< BlueMatt>
sipa: ok, here's a theory......"10:39:08 Prune:" is the prune called by AcceptBlock's FlushStateToDisk call (seems obvious given the Pre-allocating blk just above)....now, that flush call hit fDoFullFlush triggering a pcoinsTip->Flush(). OK, so now in CCoinsViewDB::BatchWrite we iterate over the entries in mapCoins, allocating memory for each one in the leveldb batch, erasing *AS WE GO*...at some point it ran out of memory, throwing
< BlueMatt>
bad_alloc and leaving pcoinsTip in an inconsistent state, but std::bad_alloc is not an std::runtime_error so the catch at the end of FlushStateToDisk didnt catch it. OK, phew, so now we go to connect the next block and fail to connect its inputs (without allocating new disk space). I /think/ that explains your debug.log without hand waving about hardware errors
< sipa>
BlueMatt: sounds like a plausible theory
< sipa>
cfields: PR what?
< cfields>
BlueMatt: that was my reading of it as well
< sipa>
if that's the problem (and that seems to become more plausible as dbcache gets set higher, as you get the duplication only at flush time)
< sipa>
we should probably make the batch construction not erase things for ccoinscache
< BlueMatt>
sipa: yes, an easy way to make this particular, very specific issue, less likely is to not erase as we go
< BlueMatt>
ofc that means more memory usage, but...better than blowing up
< sipa>
BlueMatt: alternatively, we could fail to flush, and still delete everything
< sipa>
from the cache
< BlueMatt>
sipa: hmm?
< sipa>
... and then reset pcoinsTip->hashBlock and chainActive to the last succesfully flushed block (which may be 1000s ago)
< BlueMatt>
sipa: in that case we must AbortNode, I think?
< sipa>
oh, absolutely
< sipa>
but i'm worried we may not be able to to instantly abort (due to locks being held all over the place while the OOM occurs)
< BlueMatt>
yea, I mean ideally we'd broaden the catch in FlushStateToDisk and use the AbortNode there
< BlueMatt>
just this time dont let the coins map get into an inconsistent state
< sipa>
cfields: looks good
< BlueMatt>
cfields: oh, somehow I missed that that /is/ only for bad_alloc
< BlueMatt>
yea, lets totally do that!
< cfields>
<BlueMatt> wait, does this apply to more than bad_alloc?
< cfields>
<cfields> no
< cfields>
:)
< BlueMatt>
oh, reading comprehension fail
< cfields>
sipa: want to switch prevector to new[] and ditch realloc()? Or call get_new_handler()() if they return null?
< cfields>
(or just leave it alone and let something write to 0 :p)
< sipa>
cfields: just switch it to new[]
< cfields>
ok
< BlueMatt>
it would suck to drop realloc?
< BlueMatt>
just assert(new_ptr)
< sipa>
how often does realloc actually help?
< BlueMatt>
no idea, but the difference between get_new_handler()() and assert(new_ptr) is minimal
< sipa>
well i don't like making prevector assume that OOMs are fatal
< sipa>
that's something application level
< BlueMatt>
sipa: yea, and in our application they always are :p
< sipa>
still, it's nicer to have that knowledge in only one place
< BlueMatt>
sipa: internet says realloc is useful, though in our usage of prevector its likely not super noticeable
< BlueMatt>
so I'm fine with whatever
< sipa>
BlueMatt: oh, i just realized... calling get_new_handler()() in prevector doesn't imply it's going to fail
< BlueMatt>
huh? you mean doesnt imply its going to std::terminate instead of throwing or so?
< sipa>
yes
< sipa>
so, i don't care.
< BlueMatt>
ok, if you prefer that, do that
< bitcoin-git>
[bitcoin] jtimon opened pull request #9855: RPC: Use integer satoshis instead BTC with decimals (master...0.15-rpc-satoshis) https://github.com/bitcoin/bitcoin/pull/9855
< sipa>
cfields, BlueMatt: crap, it seems i reported this exact same bug a year ago already...
< cfields>
sipa: heh. Well, at least we can get it in for 0.14
< cfields>
sipa: does that mean it takes 1+ years to sync your rpi? :)
< sipa>
cfields: it's a new one
< sipa>
it took 2 days to sync to the point where it was
< cfields>
sipa: i have no clue how representative the CCheckQueueSpeedPrevectorJob bench is, but there's a noticible difference when switching to new[] and dropping realloc
< cfields>
sipa: for the sake of being conservative this late in the release cycle, i think get_new_handler()() is likely to have much less impact
< sipa>
cfields: ack
< bitcoin-git>
[bitcoin] theuni opened pull request #9856: Terminate immediately when allocation fails (master...bad_alloc_terminate2) https://github.com/bitcoin/bitcoin/pull/9856
< cfields>
damn fanquake, you're fast!
< cfields>
sipa: well i guess that discussion has been rendered moot. assert for 0.14, and do an efficient specialized unsigned char* prevector for master?
< wumpus>
I understand wanting to terminate immediately on a memory error in some cases, but can't you just catch the exception at some point instead of registereing a global handler?
< sipa>
wumpus: the problem is exactly that it is being caught... and then we continue in an inconsistent state
< wumpus>
well then crash the program at the catch site!
< sipa>
that's what this does
< sipa>
tell the program to crash whenever new fails
< wumpus>
no, this uses some poorly documented feature of c++ to change a global callback for every failed allocation
< wumpus>
instead of just in the consensus code or wherever it is actually dangerous
< wumpus>
some allocation errors, either in bitcoin itself or in libraries may actually be recoverable, and they would now cause a crash
< sipa>
oh, sure, longer term i think we can find something more specific
< sipa>
but as long as we have "catch std::exception& e" everywhere, which catches everything, that's pretty hard
< wumpus>
I think this is a bad direction to go in with the code base
< wumpus>
shoudln't we be addressing that then?
< sipa>
in 0.14?
< wumpus>
for 0.14 this is ok, but this is a PR to master
< sipa>
hmm, you think this should go in just 0.14?
< wumpus>
the way the PR is now? yes, I think so. This just too aspecific an hack
< sipa>
i think terminating when OOM is a very reasonable thing, and certainly more reasonable than trying to continuing
< wumpus>
I think terminating on OOM in select cases is a reasonable thing
< sipa>
there may be more user-friendly things to do in some cases, but it's very hard... allocation failures can occur pretty much everywhere, and it seems really hard to avoid inconsistent states
< wumpus>
sure, it's difficult
< wumpus>
but this just feels like giving up on any kind of sane error handling
< sipa>
i don't think so... OOM is different from disk errors, for example
< sipa>
you can cleanly recover from that, usually
< wumpus>
I really dislike global handlers like this, and I don't think much software uses this
< wumpus>
it just changes the expectation of the c++ standard that allocations return an exception
< wumpus>
which is adding surprising behavior at a low level
< sipa>
i'm surprised to read that
< gmaxwell>
on many systems allocation failures are just silently ignored and the program just crashes when it tries to use the memory.
< sipa>
i think that most software just doesn't try to catch bad_alloc
< sipa>
and it just crashes
< gmaxwell>
In virtually none of the software could an allocation even possibly be handled, esp as almost everything we do involves allocation.
< wumpus>
ok ,never mind
< wumpus>
seems you all agree, just do this
< gmaxwell>
but I agree that just fixing the badalloc is not a full solution. What about other exceptions? If bad alloc can cause spurrious block invalidations how long until the next thing.
< sipa>
well, i think we should get rid of the catch std::exception and catch ... everywhere
< wumpus>
yes if the code is not exception-safe, that should be addressed
< sipa>
and only catch expected exceptions
< gmaxwell>
So I am of the opinion that bad alloc should just crash but also that the software needs to deal better with unexpected exceptions in block validation... as I mentioned before this is the third case of exceptions causing bogus block rejection that we've had.
< sipa>
but by doing that, we'd still just crash in case of an OOM
< sipa>
at least with this PR, we write to debug.log what happened
< wumpus>
but unlike AbortNode() no attempt is made to show a dialog to the user
< gmaxwell>
can't show a dialog without allocating.
< sipa>
well in many cases, that would be possible
< wumpus>
well the point is: maybe it needs less allocation than the original allocation request
< wumpus>
not all allocations are equal
< sipa>
indeed
< sipa>
typically those that trip an OOM are large ones
< wumpus>
someone might be trying to allocate a 4GB buffer, which fails, then after that the program could continue keeping in mind that it can't have that buffer
< gmaxwell>
That is a fair point. Also if it did something like shut down the mempool it might free enough to display a message.
< gmaxwell>
I wonder what the distribution of vm_overcommit settings is, it used to be ubiquitious, the fact that sipa even saw this suggests that it isn't anymore.
< sipa>
for this specific issue, it may be possible to catch the bad_alloc inside the leveldb wrapper, and if so, mark the db object as read-only
< gmaxwell>
apparently some hurestic is now the default.
< wumpus>
in many cases showing a simple modal dialog woudl still work, especially on windows where it's sort of a direct sytem operation. It should at least we attempted
< sipa>
that's fair
< sipa>
but i think it requires too many changes for 0.14
< gmaxwell>
I have no issue with that, so long as if it fails the system still shuts down.
< wumpus>
and we have a general 'A fatal issuee has occured check debug.log' already in AbortNode
< sipa>
i really want to think harder about possible inconsistent states
< wumpus>
well an allocation failure in qt will just propagate the std::alloc_error
< wumpus>
qt is exception safe
< gmaxwell>
sipa: I'm kinda surprised that cfield's patch works. IIRC from gdbing boost asio it used to try to allocate 4TB of memory and the fail. I'm surprised nothing else in boost does.
< wumpus>
so just catch that and abort() if there is one
< sipa>
gmaxwell: it's c++11
< sipa>
boost does not assume c++11
< sipa>
for example, do we want to prevent writing to mempool.dat after some unhandled system error, to prevent corrupting it?
< wumpus>
yes
< sipa>
if catching of the exception occurs higher up, there is less risk of those things
< sipa>
but what about writes that happen in unrelated background threads
< sipa>
the wallet gets flushed in a background thread
< wumpus>
set a flag and don't flush anything anymore
< wumpus>
I mean all the databases are atomic right?
< wumpus>
you can call writes, and as long as you don't flush, they'll be ignored on shutdown
< sipa>
but a flag where? an exception can only propagate up to its threads' main function
< wumpus>
a global flag
< wumpus>
checked before each database flag
< wumpus>
flush*
< sipa>
if an OOM occurs inside the background runner thread, it will - if uncaught - just kill the background runner
< wumpus>
a kind of alarm state, which makes sure nothing is permanently committed
< wumpus>
catch the exception and call a handler to set that flag?
< sipa>
that's really ugly...
< wumpus>
why is that ugly? you should catch exceptions such as std::alloc anyway to warn the user, log something , and initiaite shutdown
< sipa>
doesn't leveldb have some background writer thread?
< wumpus>
why is leveldb's background thread a problem here?
< wumpus>
we're concerned with corruptions of our own state, right?
< wumpus>
leveldb won't be writing anything that we haven't committed
< wumpus>
so as soon as there is the suspicion of inconsistent state, don't write anything to leveldb anymore
< sipa>
i'm worried about an OOM caused by a thread we don't control
< wumpus>
then the inconsistent state won't be committed to disk, and leveldb should be fine. Same for the wallet
< sipa>
but i assume that would just cause a crash, as we can't catch a bad_alloc there anyway
< wumpus>
well yes the writer thread could die in the middle of writing, sure, but there's no way to avoid that. The database needs t obe able to handle that, also for other crash robustness
< wumpus>
what it won't do, however, is write inconsistent state if we haven't told it to
< sipa>
yes, indeed
< sipa>
i think you're right - with some changes it may be possible to do this safely
< sipa>
if we make sure there is no code that may unintentionally catch a bad_alloc
< wumpus>
right
< sipa>
and that all the catches that do, also call AbortNode, which sets this flag
< wumpus>
I think there should be a rule: if you catch a general std::exception, you can't continue there, and need to stop the program
< wumpus>
as you have no idea what kind of error you have and whether it's recoverable
< sipa>
right
< wumpus>
(so that's for top-level handlers only)
< sipa>
we have 53 places where std::exception is caught
< sipa>
and 22 where ... is caught (even worse, as you can't even inspect the object)
< wumpus>
the ptential worry is in the networking code... some parsing problems throw exceptions, we need to be really sure to catch them and continue otherwise there's a DoS
< wumpus>
e.g. serialization throws std::runtime_error instead of something specific
< wumpus>
okay That's always an i/o error. Better than I thought
< sipa>
there are some other things that can throw that
< wumpus>
catching that and just reporting it and continuing in the network code would be ok
< sipa>
network code or net_handling?
< sipa>
net handling should catch serialization failures, and disconnect the offending peer
< wumpus>
well the packet processing. It should make sure that one node is disconnected and not thewhole program comes tumbling down
< wumpus>
yes
< sipa>
right
< sipa>
in fact, it's that except catch this is the culprit here
< sipa>
*that exact
< wumpus>
right, it should be more specific
< wumpus>
catching ios::base and potentially a few others instead of std::exception would go a long way
< sipa>
agree
< wumpus>
and some exceptions like std::bad_alloc should initiate a quick shutdown. It could still be a DoS, but the program is probably in such a bad state when it happens that continuing with the other nodes is hopeless.
< wumpus>
(though theoretically it could stil be a botched alloc due to the remote peer specifying a very large buffer which would never occur in reality)
< wumpus>
so I guess uploading executables for rc2 is no longer necessary and we should merge #9856 and do rc3?
< wumpus>
uhm. Caught that issue in the act now. Unfortunately, the traceback is useless. Gdb near-crashes while generating it!
< wumpus>
"Error while running hook_stop: Cannot access memory at address 0x24"
< wumpus>
okay, so much for gdb...
< wumpus>
that leaves trying to make it cough up a core dump, and uploading that somewhere...
< jeremyrubin>
wumpus: cuckoocache in theory may have increase memory usage a bit
< jeremyrubin>
(if you're curious for all steadystate memory usage increases)
< wumpus>
by how much?
< jeremyrubin>
unclear
< jeremyrubin>
it always uses 100% of whatever's it's allocated
< jeremyrubin>
whereas the previous one only uses space when it has entries (in theory)
< sipa>
the default max is 32 MB instead of 40 MB though
< sipa>
so at peak, it's smaller
< jeremyrubin>
so at startup, would be using 32 MB whereas the prior was 0
< jeremyrubin>
it's just unclear how long you have to be on the network to experience a full cache w/ old code
< jeremyrubin>
anyways I'm pretty sure this isn't the cause :)
< sipa>
1GB of memory is not much
< wumpus>
indeed, runninga node with 1GB of memory has always required tweaking
< sipa>
it's more surprising to me that 0.13.2 worked, though
< sipa>
he's even reducing dbcache
< wumpus>
I do see increased reporting of out-of-memory errors with 0.14
< jeremyrubin>
yeah, totally could be the sigcache, could definitely put someone over-edge
< jeremyrubin>
especially if they set the value higher
< jeremyrubin>
e.g., 128 mb
< jeremyrubin>
prior version you could set it to a GB and run happily for a while
< wumpus>
they don't touch that setting, just dbcache
< wumpus>
possibly they should. We should update the recommedations for low-memory nodes.
< jeremyrubin>
I don't think practically they'd need to adjust it down, just be less agressive adjusting it up -- not much benefit in making it too large
< jeremyrubin>
Can we (by any chance) fast track sipa's hack to get cuckoocache back to be non pow 2?
< jeremyrubin>
it sucks if they're picking between 8 16 32 MB sizes, some more resolution there would be good
< wumpus>
jeremyrubin: would help to have more than a concept ACK there from you
< jeremyrubin>
k I'll do some more thorough review
< wumpus>
thanks
< wumpus>
also like gmaxwell I wonder what the performance difference would be, 8 multiplication would be slower than 8 logical AND
< jeremyrubin>
I'm not currently set up to run great performance tests currently (will take some work to reset up my benching environment)
< jeremyrubin>
But honestly performance was fine even when it was %
< wumpus>
'fine' hehe
< jeremyrubin>
*shrug*
< sipa>
jeremyrubin: i'm sure that for the sigcache it doesn't matter
< sipa>
but maybe it does for other things in the future
< wumpus>
well it's obviously a compromise between memory usage and performance
< jeremyrubin>
I mostly changed it to hash_mask to make sipa happy :p
< wumpus>
neither '&' or '* >>' is 'better', but we need to be careful what to choose
< sipa>
well you violated the core tenet of never using integer division by a variable!
< sipa>
it invokes the wrath of maxwell
< jeremyrubin>
It was checked to not be 0 ;)
< jeremyrubin>
anyways...
< wumpus>
division? where?
< jeremyrubin>
(old version of cuckoocache used % in the hash computation)
< wumpus>
I saw only multiplies, if there's a division, I'm going to NACK this one
< wumpus>
ooh :p
< sipa>
and that PR brings back the flexibility of variable sizes, but with a * and a >> instead of a %
< jeremyrubin>
I recall benchmarking and not being able to see a difference between %, &, >>
< wumpus>
yes using % would be more straightforward code readability wise but division is even worse than multiplication
< jeremyrubin>
yeah
< jeremyrubin>
so I think it's fairly safe to take this one.
< wumpus>
yes
< jeremyrubin>
My one suggestion sipa is to convert size to a uint64_t
< jeremyrubin>
like the member
< jeremyrubin>
so you can get rid of all the casts in the hash part
< wumpus>
would this result in more efficient code on 32-bit though?
< wumpus>
as it guarantees that the top bits are zero
< jeremyrubin>
I'm not sure
< jeremyrubin>
but they have to be cast to uint64_t in hashing part now
< jeremyrubin>
so my thought is not having to do that cast every hash is slightly more efficient code
< wumpus>
casts don't generally generate code
< jeremyrubin>
uint32_t to uint64_t even?
< jeremyrubin>
there are instructions that auto-0 it to the alu?
< jeremyrubin>
that's fine then
< wumpus>
no. on 32-bit, it just means 'assume top bits are 0'. On 64 bit evrything is stored in 64-bit registers and the top bits are always 0
< jeremyrubin>
cool, then that's fine as is
< wumpus>
at least on x86_64 you can't load something into the lower 32 bits without zeroing the upper bits
< jeremyrubin>
I'd consider it an Ack from me then.
< jeremyrubin>
The tests are pretty agressive on cuckoocache, so if the hashing idea were broken in some way it would either catch it or it would be broken in a very minor way.
< wumpus>
of course the considereations depend on the specific architecture, but in general casts are very cheap or generate no extra code at all especially between big types. Arithmetic with 8-bit and 16-bit values can result in extra 'ANDs' in some architectures you're right
< wumpus>
but then it's not so much the cast that generates the code
< wumpus>
yes, having good tests is great!
< wumpus>
the state of testing in bitcoin core improved a lot last year
< jeremyrubin>
*cough* merge my checkqueue tests *cough* :P
< wumpus>
usually test-only pulls are merged very quickly
< wumpus>
then again I cannot really keep up anymore with the volume of pulls
< jeremyrubin>
It's actually fine that they weren't, I found a mixed up loop condition (such that one test wasn't doing much) when I was sitting with kalle
< wumpus>
sipa: yeah, that would classify as 'bad' in my eyes :)
< sipa>
wumpus: the mul or the div?
< wumpus>
the div
< jeremyrubin>
nehalem?
< sipa>
yes, absolutely
< sipa>
but 'even worse' sounds like multiplication is already pretty bad... every 2 cycles is pretty good i'd say :)
< wumpus>
divs have historically been really bad, some CPUs don't even have a specific instruction for it, but even those that have it's generally terrible
< wumpus>
because of the underlying computational structure of course, not because those CPUs are badly implemented or so :)
< jeremyrubin>
ah, nehalem: an intel arch name I'd thoroughly forgotten :)
< wumpus>
2 cycles is pretty good, espeically if it's *every* 2 cycles
< jeremyrubin>
and that hash routine should pipeline well
< sipa>
the latency is higher than 2 cycles
< wumpus>
usually mul is quite quick but the number of multiplication units is limited, so issuing a lot will bottleneck on that
< wumpus>
but not on nehalem, apparently :)
< sipa>
but the throughput is once every 2 cycles
< sipa>
same on skylaky, btw
< wumpus>
what would be the throughput for logical and?