< aj>
right, different VMs, but same test_bitcion.exe, under wine coinselector_tests/knapsack_solver_test takes 581.8 seconds, under windows server 2016 w/ updates it takes 31.4s
< sipa>
is there a way to do profiling in wine?
< aj>
i figure i'll try a newer version of wine than what's in trusty first
< aj>
same version of wine on xenial seems no better, unsurprisingly; but, wine 3.0 at home seems much faster
< aj>
total of 236s for the entire test suite under wine 3.0 vs 193s under windows server; different hardware though
< aj>
wine1.8 from some random ppa on xenial also gets the entire test suite in 377s or so, different hardware yet again
< aj>
oh, bugger, that's wine3.0 as well, should have paid more attention when installing
< karl2>
sipa: you can use gprof with wine binaries as normal, I found out the other day. It wasn't particularly exciting stuff when I tried to track down job #2 timeout thing the other day, though. Maybe running different wine versions will trigger it, like aj is saying!
< bitcoin-git>
[bitcoin] sdaftuar opened pull request #12902: [qa] Handle potential cookie race when starting node (master...2018-04-improve-dbcrash-restarts) https://github.com/bitcoin/bitcoin/pull/12902
< stevenroose>
Does bitcoin core have a utxo cache? If so, could anyone point me to the file where it is defined?
< stevenroose>
Aha, I found -dbcache, which is the size in MiB for the utxo db cache
< stevenroose>
I thought the cuckoocache was used as a sigcache.
< stevenroose>
Is it also used for the utxo cache?
< sdaftuar>
stevenroose: no, it is not. see src/coins.h and src/coins.cpp for more information about the utxo cache.
< stevenroose>
sdaftuar: thanks!
< stevenroose>
from the init code, I see that the -dbcache is split in 3, of which one is "chain state cache", what is that one for?
< bitcoin-git>
[bitcoin] sdaftuar opened pull request #12904: [qa] Ensure bitcoind processes are cleaned up when tests end (master...2018-04-always-kill-bitcoind) https://github.com/bitcoin/bitcoin/pull/12904
< sipa>
stevenroose: for the UTXO set the layers are: (1) disk (with OS cache etc) (2) LevelDB's cache (3) Bitcoin Core's pcoinsTip (a CCoinsViewCache object)
< setpill>
is there any way to actively check for conflicting transactions in the mempool for a given tx?
< sipa>
stevenroose: there is another data set, the block index which is loaded entirely in memory, but is stored in a separate LevelDB database
< sipa>
so the 3 pieces -dbcache is split over is the blockindex leveldb cache, the chainstate leveldb cache (=utxo), and pcoinTip
< setpill>
i suppose trying to rebroadcast would give an error in some cases, but i am not sure how reliable that is, and i would prefer just simply checking without broadcasting
< bitcoin-git>
[bitcoin] sdaftuar opened pull request #12905: [rpcwallet] Clamp walletpassphrase value at 100M seconds (master...2018-04-wallet-encryption-timeout) https://github.com/bitcoin/bitcoin/pull/12905
< sdaftuar>
setpill: unfortunately i don't think we have great tools right now. in the next release, we'll have an rpc called "testmempoolaccept" which you could use to determine whether a given transaction would be accepted to your mempool, which might be along the lines of what you'd want?
< sdaftuar>
but it's tricky in general, because dealing with transaction chains is not easy
< sdaftuar>
for instance, if someone sends you a transaction that depends on another unconfirmed transaction, and then a third transaction conflicts with the parent and evicts it from the mempool, it's hard to tell that your transaction is indirectly conflicted as well
< stevenroose>
sipa: the block index I understand
< stevenroose>
"the chainstate leveldb cache (=utxo), and pcoinTip" -> so what exactly is the difference there?
< stevenroose>
they are both used to cache the utxo set, right?
< jamesob>
does POTENTIAL DEADLOCK DETECTED being logged by a node during functional test runs indicate something is definitely out of the ordinary?
< stevenroose>
sipa: oh you meant with (=utxo) that it's just the leveldb cache for the store that has the utxo data
< stevenroose>
ok, so the only thing core maintains itself is the CCoinsViewCache, right?
< sipa>
stevenroose: yes, but that's thr most important source of speedups
< sipa>
the leveldb cache helps ob systems with very slow i/o
< stevenroose>
sipa: the coinsCache map is from outpoint to utxo entry, right?
< stevenroose>
Doesn't that mean that for a new tx, the txhash is potentially added a lot of times?
< stevenroose>
in btcd, we have a structure where we map txid to utxoentry that has a (potentially sparse) map from index to output
< stevenroose>
but it
< stevenroose>
's currently not cached, and indeed, that's one of the mayor performance bottlenecks
< drexl>
when an opcode has inputs, do these come from the stack?
< sipa>
stevenroose: i believe btcd's design was based on bitcoin core's previous
< sipa>
we switched from per-tx to per-txout in 0.15
< sipa>
leveldb deduplicates multiple key-value pairs with keys that share a prefix anyway, so on disk it's not all that impactful
< sipa>
and it simplifies the in-memory cache and serialization overhead for read/writes significantly
< stevenroose>
simplifies it a lot indeed
< stevenroose>
any numbers on extra memory usage /utxo stored?
< sipa>
500 MB on disk extra
< stevenroose>
is disk more important than memory?
< sipa>
imho, no
< sipa>
it's a small constant factor extra disk
< stevenroose>
I mean for initila sync mostly, a big memory cache can be very significant, no?
< sipa>
but it makes memory usage for the cache faster, more effective, and more efficient
< stevenroose>
hmm
< sipa>
because now we don't need to load unrelated other unspent outputs into memory
< sipa>
when one output is being spent
< stevenroose>
oh, but you don't have to do that anyway, right?
< stevenroose>
the entry map (index -> output) is sparce
< stevenroose>
also on disk, you don't store entire txs, only the unspent outputs
< sipa>
yes
< sipa>
but then you need a complex operation to write changes to disk
< sipa>
and i don't see how you can easily perform the freshness optimization on that
< sipa>
(that's the idea that if you create a utxo in memory, and then later spend it, in memory, it can be deleted from the cache, and nothing ever needs to hit disk, because both the create-txout and spent-txout operations are idempotent)
< stevenroose>
hmm, I only just started looking into this more deeply, but I don't see how storing utxos grouped per tx change that
< sipa>
well it means you can only do that optimization if all utxos of a tx are spent before a flush
< sipa>
i guess you can have a hybrid where you store them per-txout on disk, but with shared txids in memory
< sipa>
i looked into doing that, but the memory usage savings are tiny
< stevenroose>
I don't know what optimization you talk about tbh. let's say you add a tx with two outputs (you always add whole txs right? I don't think single outpoints make much sense), so you git txid -> (o1, o2)
< sipa>
yup
< stevenroose>
then, before a flush, o2 gets spent, so you keep txid -> (o1) and then when you flush, I don't see the overhead over writing txid1 -> o over txid -> (o1)
< stevenroose>
(opposite overhead)
< sipa>
that's actually a pretty significant overhead
< sipa>
you need a dynamically allocated structure for variable-length output array
< sipa>
it's hard to compare what we're talking about because there are so many variations, depending on how you do things in memory vs on disk
< stevenroose>
I realize, trying to see if it's worth turning the whole structure here upside down :)
< sipa>
what could be done is change the in-memory representation slightly where you have two maps, one txid->(int,coinbase,height,#unspents) and another (int,vout)->(amount,script)
< sipa>
rather than the single map we currently have (txid,vout)->(amount,script,coinbase,height)
< sipa>
but it becomes a lot more complicated in the presence of more complicated cache flushing strategies
< sipa>
perhaps if average txn over time gain more txouts this becomes better
< stevenroose>
well, mostly to avoid changing all the db code etc, I will probably start with a cache on the current system
< stevenroose>
the argument of txs becoming bigger could make sense, though
< stevenroose>
if we want actual unlinkability with CT, coinjoin-like structures will become increasingly common
< stevenroose>
but yeah the simplicity of a txout based structure is also very compelling
< sipa>
in any case, i'm very skeptical that any attempts to share the txids and other tx metadata in memory are worthwhile
< sipa>
and unfortunately the CCoinsViewCache code is pretty complicated as it takes advantage of a bunch of tricks that are specific to utxo data
< sipa>
so it's not trivial to just drop in another cache design
< sipa>
you may want to talk to eklitzke
< stevenroose>
"a bunch of tricks that are specific to utxo data" hmm
< sipa>
well in particular the freshness optimization
< sipa>
that i mentioned above
< stevenroose>
I was going through it a bit, will certainly do some more consideration before i dive into coding
< stevenroose>
yeah, I'll look into that
< sipa>
it seems like a very hard first project if you're not already somewhat familiar with the codebase :)
< stevenroose>
how does it handle crashes? keeping latest flushed height or so and rebuilding newer blocks from disk in case of crash?
< stevenroose>
when talking about cached "chain state", I suppose new blocks are always directly written to disk, no?
< sipa>
the chain state is the utxo set
< sipa>
blocks are stored completely independently
< stevenroose>
ok, that clears it up
< sipa>
basically on disk there is a marker with a hash of a block that means "utxos created or spent after this block MAY be present" and another that means "all utxos created or spent before this block MUST be on disk"
< sipa>
and at startup all blocks before the first and the second hash are replayed and applied to the UTXO set
< sipa>
this approach means we don't need to write the whole cache utxo set in one atomic operation, and also don't need to remember old deleted entries
< ThinkOfANick>
sipa: Wait, why not remember old entries?
< sipa>
ThinkOfANick: because you want to save memory
< stevenroose>
sipa: I'm having difficulty to see how those two block locators are not the same
< setpill>
sdaftuar: perhaps ill have to resort to adding all the "from" addresses of the entire chain as watch-only addresses...
< sipa>
stevenroose: you have 1 million modified UTXO entries in memory
< sipa>
stevenroose: you can't construct a single batch to write them all at once to memory, as that batch would be gigabytes in side
< sipa>
so you write part of it
< sipa>
*size
< sipa>
say you last full flush was at block 400000
< sipa>
the current tip is 450000
< sipa>
that means your cache may include entries from up to block 450000
< stevenroose>
(partial flush, right?)
< stevenroose>
ah full
< stevenroose>
srry didn't see
< sipa>
you're now doing a partial flush, writing just some subset of UTXO cache entries to disk
< sipa>
that may include UTXOs created up to block 450000 (or may miss things that were created between 400000 and 450000)
< sipa>
but it's not guaranteed to contain everything up to 450000 (in fact, because you know it's partial, it can't be)
< sipa>
so the range you write is 400000..450000
< sipa>
and if a crash happens then and there
< sipa>
at startup you'll need to replay everything between 400000 and 450000
< sipa>
because all utxo creation and spending operations are idempotent, it never hurts to replay an operation that was already applied
< stevenroose>
I see, I just don't see how the 450000 is not just the same as the tip.. you also said "utxos created or spent after this block MAY be present" that must have been "before" then?
< sipa>
450000 is the tip
< sipa>
in this scenario
< stevenroose>
yeah
< sipa>
no
< sipa>
utxos created or spent after 400000 MAY be present
< sipa>
utxos created or spent before 450000 MUSt be present
< sipa>
so the range is 400000..450000
< sipa>
eh, i guess this is vague
< stevenroose>
> utxos created or spent before 450000 MUSt be present
< stevenroose>
yeah that's where you lost me
< sipa>
let me reformulate
< sipa>
all operations up to block 400000 are guaranteed to be on disk
< stevenroose>
you said "MUST be before" and "MAY be after" so I assume that is the same one
< sipa>
all operations between 400000 and 450000 may be present on disk, but are not guaranteed to be
< sipa>
my "may" and "must" were very confusing before
< stevenroose>
yeah, that's what confused me
< sipa>
but things after 450000 are guaranteed to not be on disk
< stevenroose>
so now let's assume there is already a persistent chain tip indicator, then you only need to keep one, right?
< sipa>
well this is the chain tip indicator
< sipa>
instead of a tip, it's a range
< stevenroose>
that's why I was confused for it to be two. ok ok, yeah I got it. I just assumed you would always need a chaintip anyways
< sipa>
in the ideal scenario the two are the same
< sipa>
after a full flush
< stevenroose>
one last thing I'm missing
< stevenroose>
how to know when to update that consistence height
< stevenroose>
(the first one of the range)
< sipa>
ah, right now it's very simple
< sipa>
when we start a flush operation, we check what the previous lower-height (the 400000) was, and update it to (that lower height, current tip)
< sipa>
after a flush operation completes, it's replaced with (current tip, current tip)
< stevenroose>
oh, but then you still need full flushes?
< stevenroose>
so you can't update in partial writes?
< sipa>
well we only have full flushes now
< sipa>
but they're implemented as a sequence of partial flushes
< sipa>
longer term i want a system where we have a background thread that's constantly flushing
< sipa>
and is always "running behind" on the tip
< sipa>
to give the memory db a chance to cache creates/spends that cancel each other out before writing
< sipa>
but keeping track of which is the lower hash in the range in that system is more complicated
< sipa>
it's basically the lowest height of which you either have an unwritten create or unwritten spend
< sipa>
but it's more tricky in the presence of reorganizations
< stevenroose>
ok, I thought you were doing a partial flush when the cache was full
< stevenroose>
(like LRI fashion or so)
< sipa>
nope
< stevenroose>
but it's just a way to reduce the size of the transaction
< stevenroose>
ldb transaction
< sipa>
i've experimented with various approaches for MRU eviction from the cache etc
< stevenroose>
oh yeah than you can just do all batches and update the pointer
< sipa>
but they're basically all slower than what we're doing now (on fast hw at least)
< stevenroose>
sipa: yeah you'd have the problem of knowing to what hash it's consistent. you'd need to keep heights in entries and iterate over all entries once and a while to see what the most recent dirty one is
< sipa>
the *least* recent dirty one, yes
< sipa>
thankfully, utxo entries already have a height
< stevenroose>
excuseer :p
< sipa>
unfortunately, that's the creation height and not really the modification height (which may differ in the case of a spend or a reorg)
< stevenroose>
yeah I keep having a hard time picturing reorg handling there
< stevenroose>
because when you don't have a txindex and you delete an entry, it's impossible to get it back :D
< sipa>
oh, we have undo data
< sipa>
the *.rev files
< stevenroose>
do you keep like "revert objects" for the last X blocks
< stevenroose>
ah
< stevenroose>
how many are there? (thinking hardfork races here where two chains constantly catch up with each other and fuck old nodes)
< sipa>
there is one per block
< sipa>
and we prune them along with the blocks themselves
< stevenroose>
(technically wouldnt be a hardfork in that case though)
< stevenroose>
wait, why keep for all? or are they only like pointers to the actual data?
< sipa>
because we need to be able to reorg?
< stevenroose>
I haven't seen any of that data in btcd's codebase, let me dig to that tomorrow :)
< stevenroose>
yeah I know, but well, reorging over half the chain is kinda unlikely, isn't it?
< sipa>
yes
< stevenroose>
I'd say at least before the last checkpoint makes no sense..
< sipa>
checkpoints need to go away
< stevenroose>
oh
< sipa>
but yes, sure, it's unlikely that deep reorgs happen
< stevenroose>
what's the fundamental problem with checkpoints?
< sipa>
they confuse people
< sipa>
:)
< sipa>
they're seen as a security measure
< stevenroose>
more as an efficiency tool :p I mean you can skip verification up to that point
< sipa>
we have assumevalid for that now, which is far less invasive
< stevenroose>
(f.e. when syncing a node I always "ask my friends for the latest block they trust" (i.e. check some explorers) and do --addcheckpoint
< sipa>
it doesn't restrict which chain is valid
< sipa>
it just skips validation for any block that is an ancestor of a known valid block
< sipa>
but if the best chain we see is different than the assumevalid one, we'll accept it (after validating)
< stevenroose>
ok yeah that's a better version of a checkpoint
< sipa>
yes, assumevalid is updated from time to time, but we haven't modified checkpoints in years
< stevenroose>
but checkpoints are also usefull against eclipse attacks when you're just getting started
< sipa>
no
< stevenroose>
at least they let you know something is up, no?
< sipa>
they're useful against being spammed with low difficulty headers
< sipa>
but that's independent of eclipse attacks
< sipa>
we need backward headers sync to remove that dependency on checkpoints
< stevenroose>
backward header sync?
< sipa>
first learn the best header, and only then download headers along that path
< sipa>
as opposed to downloading whatever header people give you, hoping that indeed it'll turn out to be one with more work than your current one
< sipa>
(that's how it works now)
< stevenroose>
how can you learn the best header?
< sipa>
using a yet to be devised protocol :)
< stevenroose>
asking all peers? (heighest checkpoint? :p)
< stevenroose>
oh
< sipa>
there are some ideas about random sampling, where someone can send you a merkle sum tree over all their headers, and then you randomly query it a number of times to see if they indeed have the distribution of pow they claim
< stevenroose>
I recently thought about a backwards sync mechanism for initila utxo building. but I guess it's kinda not worth the effort when there is a good utxo cache
< sipa>
and once you've done enough queries, you know they actually have a chain with a certain amount of work
< sipa>
and if that amount of work is good enough, you can start downloading the actual headers
< setpill>
sipa: wouldn't just believing the claimed amount of accumulated work + ban on lie work?
< stevenroose>
are checkpoints really that bad?
< sipa>
stevenroose: people seem to misunderstand that if checkpoints ever have an effect, bitcoin is broken
< sipa>
i'm much more comfortable to have much weaker assumptions about correctness of the code
< sipa>
(which includes the checkpoints)
< sipa>
setpill: how is that better than what we have now?
< sipa>
stevenroose: i don't think they're terrible, but we also don't really need them anymore, except for this tiny DoS concern
< setpill>
sipa: last blocks are likely to have more pow behind them so are more expensive to maliciously craft
< stevenroose>
"if checkpoints ever have an effect, bitcoin is broken" you mean that when code breaks validity of an old block that no one will validate because checkpoints?
< sipa>
stevenroose: i mean that if checkpoints ever prevent the network from reorging to an attacker chain, it's clear that the concept of PoW itself is brokenb
< stevenroose>
and what if they prevent *a new node* from syncing a wrong chain? that's what their main use is imho
< sipa>
stevenroose: that's not what they do
< sipa>
stevenroose: they just prevent OOM
< stevenroose>
I mean even as a spam vector, a decent miner right now can prob create a quite significantly long chain that is use in size (1MB blocks) and has legit work
< stevenroose>
s/use/huge/
< sipa>
yes, absolutely - that's exactly the one thing they still do
< stevenroose>
OOM?
< sipa>
out of memory
< sipa>
also, not actually blocks, just headers
< sipa>
we don't download block data until a chain of validated headers is actually the best chain
< stevenroose>
yeah true, so that would only work if eclipsed
< sipa>
oh, and there is a known min amount of work
< sipa>
independently of checkpoints
< stevenroose>
"min amount of work at height x"?
< sipa>
so we never accept a headers chain until it passes that point
< stevenroose>
oh llike that
< stevenroose>
cumulative
< stevenroose>
thats neat
< setpill>
sipa: interesting, i hadnt heard of that; is that documented somewhere?
< sipa>
-minimumchainwork cmdline option
< stevenroose>
well, thanks for the insights :)
< sipa>
yw!
< setpill>
sipa: ahh, so its another "checkpoint-esque" thing, as in a hardcoded value that gets updated periodically?
< setpill>
for a second i was under the impression pow inflation somehow had a lower bound ^^'
< sipa>
setpill: there is a minimum difficulty, but it's trivial
< setpill>
yeah and wont help much against a malicious chain
< sipa>
a single modern CPU thread can create a minimum-difficulty block in a few minute
< sipa>
modern HW miners can make 1000s per second
< setpill>
yeah, so i thought there was a higher-than-that, actually useful lower bound on difficulty based on something i was unaware of
< sipa>
there is also the max-divide-by-4 rule for difficulty changes
< stevenroose>
sipa: is that a consensus rule??
< sipa>
yes
< stevenroose>
was not aware of that
< sipa>
always has been
< stevenroose>
is there a max-multiply-by-x one?
< sipa>
yes, by 4
< stevenroose>
aha
< sipa>
neither rule has ever been hit
< sipa>
on mainnet at least
< stevenroose>
I hate the zero-diff-after-20-mins testnet rule
< sipa>
haha
< stevenroose>
I used to try and testnet mine with an old butterfly labs jalapeno but somehow was never hitting one
< sipa>
you need to set your timestamp in the future :)
< stevenroose>
btcd didn't notify gbt long polls on the 20 minute hit
< stevenroose>
also
< jtimon>
sipa: re protocol to know the best chain: couldn't compact pow proofs be better for that than random sampling?
< jtimon>
hi stevenroose
< sipa>
jtimon: compact pow proofs need consensus rules
< sipa>
bitcoin doesn't have those
< jtimon>
sure, I mean assuming a compact proofs sf
< intcat>
is there any (u)txo commitment design (close to becoming) a BIP?
< intcat>
ive read some ml interaction between peter todd and bram cohen but it's a few years old and im not sure how much has happened on that since