< randy-waterhouse>
ok, another 'devil's advocate' tester on-board segwit test smoothly ... thnx gmaxwell, sipa wumpus, etc ... and many others, good job.
< Lauda>
Does the dbcache=N parameter only start using up additional memory during block processing?
< gmaxwell>
Lauda: I think what you're asking is if it will use it right away, no it won't-- only has there is more data read into the cache
< Lauda>
gmaxwell all that is required is "dbcache=3000" in bitcoin.conf right (since mine is practically empty) and I want to run another reindex over night?
< Lauda>
since*
< gmaxwell>
right!
< Lauda>
Okay great. Thanks!
< Lauda>
The reindex issue is indepdant of the blockchain, i.e. it doesn't matter whether one tests on testnet or mainnet (someone asked me this)?
< Lauda>
independant*
< midnightmagic>
i can convert my testnet node(s) to segwitty. is it time?
< jl2012>
What would happen if I upgrade my node after segwit is active and there are already segwit blocks on the chain?
< GitHub136>
[bitcoin] paveljanik opened pull request #8261: The bit field is shown only when status is "started" (master...20160625_sw_getblockchaininfo_bit) https://github.com/bitcoin/bitcoin/pull/8261
< gmaxwell>
jl2012: it reorgs back to before segwit activated.
< gmaxwell>
jl2012: then will download blocks as needed.
< jl2012>
Same for other soft forks like csv?
< jl2012>
I think it has to be
< gmaxwell>
jl2012: it arguably should be, but we haven't done that before.
< gmaxwell>
for segwit it had to go get the witness data in order to serve blocks
< gmaxwell>
we'd talked about doing it in the past for other soft forks but I think we thought it would be harder to implement than it turned out to be.
< gmaxwell>
(and unlikely to really matter that much unless the reason you were upgrading was to fight a network split, in which case the invalidateblock rpc can be used)
< jl2012>
But theoretically I may be following an invalid chain
< gmaxwell>
Potentially; though if there were a large invalid chain we would make loud announcements to recommend the use of the invalidateblock command. I think now that the code exists, we'd likely use it in the future.
< Lauda>
What is the main bottleneck for reindex, storage speed or CPU processing?
< gmaxwell>
probably depends on the hardware, on my laptop I think it's IO. on systems with faster IO, I think its cpu inside leveldb code... at least with default dbcache. with dbcache cranked, its likely cpu elsewhere in bitcoin.
< Lauda>
Hmm, seems like the last 20-30 weeks are taking forever
<@wumpus>
Lauda: unless you have a very fast disk/ssd, or increase the dbcache, i/o will be your main bottleneck
< Lauda>
I've just checked in on my node, and some bans say e.g. until June 19th but the nodes are still banned. Do these get removed after a restart or something went wrong?
<@wumpus>
Lauda: you can easily check though: is CPU maxed out?
< Lauda>
It isn't
< Lauda>
DBcache 3GB
<@wumpus>
(unless you have changed the number of script verification threads, bitcoind will max out your CPU cores in initial sync when it's not i/o bound)
< Lauda>
Okay so even with 3GB dbcache that's still not enough
<@wumpus>
but the utxo and block index is simply stored in a flat file, which is loaded at startup and written at shutdown
< Lauda>
That's interesting and those times are amazing in comparison to leveldb
<@wumpus>
if you can afford the memory :)
<@wumpus>
though it uses less memory than keeping everyting in the dbcache, and doesn't have the issue that the cache is not seeded at startup
<@wumpus>
I think research and experimentaitno how to best store the utxo set is in order
< Lauda>
The move towards SSDs should definitely help with this, but the industry is not there yet..
< Lauda>
I can afford 4 GB on this machine, but it still takes a fair amount of time.
<@wumpus>
memristor would be nice
<@wumpus>
but no matter what, also at some point, unrestrained growth of the utxo set needs to be addressed
< Lauda>
^
< sipa>
wumpus: we should try switching to a model where all utxos are stored as separate db entries, rather than in a vector of unspends per txid
<@wumpus>
but it may well be we're running against the limits of what databases can (with good performance) handle, which means there is no room for scaling there at all
< gmaxwell>
gigantic cuckoo hash table. with a update log. :P
<@wumpus>
sipa: yes, that would be an interesting experiment too
<@wumpus>
also the access pattern is essentially random, so the only type of caching that helps very well is keep everything
< gmaxwell>
sipa: the ripple people have claimed that leveldb performance falls off a cliff with more than some threshold number of entries (I believe they were storing every transaction in it)
< sipa>
gmaxwell: i think they don't have application level caching
<@wumpus>
well I'm not actually sure how random the access pattern is, but it looks like that from a disk perspective with the current organization
<@wumpus>
it's very possible for optimizations to be possible based on sorting utxos smartly which are expected to be accessed together? I don't know
< gmaxwell>
sipa: sure, but that wouldn't change the performance of the underlying database.
<@wumpus>
it's not like the other databases that we tried perfomed better
<@wumpus>
leveldb still seems, all in all, the best perforing on-disk database for utxo storage
< gmaxwell>
Ripple folks created their own.
< gmaxwell>
(and also suggested we might be interested in using it)
<@wumpus>
lmdb looked promising but it has it's own performance cliff
<@wumpus>
(depending on the amount of memory in the system, it seems)
< GitHub121>
bitcoin/master b0be3a0 Wladimir J. van der Laan: doc: Mention Windows XP end of support in release notes...
< GitHub121>
bitcoin/master 63fbdbc Wladimir J. van der Laan: Merge #8240: doc: Mention Windows XP end of support in release notes...
< GitHub49>
[bitcoin] laanwj closed pull request #8240: doc: Mention Windows XP end of support in release notes (master...2016_06_windows_xp) https://github.com/bitcoin/bitcoin/pull/8240
< gmaxwell>
as every read would simply be one or two random disk accesses... and its hard to do better than that. it's just writing is awful. (e.g. end up with read-write-write to update a log, with both sequential reads and writes, and if the table needs to be resized woe is you).
<@wumpus>
going to try nudb when I have some time
<@wumpus>
unfortunately we also do a lot of writing, at least during initial sync, every utxo read is updated and written back
<@wumpus>
so making reading much faster at the expense of writing is going to give yo mixed results
<@wumpus>
has research been done on utxo access patterns? e.g. are more recent blocks more often accessed, or the other way around, or are there other regularities that could be used?
< gmaxwell>
Spending is more freuqntly from recently created utxo.
<@wumpus>
interesting
< gmaxwell>
I would expect naievely that the expected lifetime of a utxo is how long it's lived so far. If something had made it a year without being spent, you should expect it to last another year. But beyond knowing that an unusually large number of utxo have short lives, I've not done anything to try to verify this hypothesis.
< gmaxwell>
we could probably construct fairly elaborate predictions using other features like how many txouts were in the creating transactions, reuse of the pubkey, and the amount of the coin.
< gmaxwell>
(or even using non-fungibility-- a coin is likely to be spent soon if its recent ancestors were spent soon)
< sipa>
wumpus: that's why the "fresh" optimization helps a lot... we create utxo entries indide the database, and fully spend them before they even hit disk
< sipa>
s/inside the database/inside the cache/
< gmaxwell>
with the cache turned way up, the whole initial sync runs without writing the chainstate until the end.
< gmaxwell>
oh, seems nudb is a big hashtable (uses external storage for values)
< sipa>
it keeps the entire keyset in memory?
< gmaxwell>
no, the keys are an a file. sounds like it's chunked so it can independantly resize sub tables.
<@wumpus>
isn't the fact that nudb is insert-only a problem? we delete and change entries a lot
<@wumpus>
gmaxwell: would be a good research project to investigate that hypothesis in detail, and see if it is possible to optimize storage based on those predictions/assumptions. Maybe one huge key/value store is not the best way to handle this
< gmaxwell>
hm. I thought it could delete keys but not the values in external storage.
< gmaxwell>
Oh I see what you mean there... I hadn't caught that implication before.. that effect is more or less why caching smaller than the utxo set in memory is still effective, but depending on the geometry of the effect it might make sense to have two databases.. so that the high access parts are in something with low log(n) costs.
< Lauda>
Comparing the partial data shows that re-index is much faster on the newer version than one before the re-index changes (at least on custom dbcache).
< sipa>
yes, for a fair comparison you need to disable checkpoints
< sipa>
before the reindex changes, signatures were always checked
< Lauda>
How do I disable checkpoints?
< sipa>
after thry're only checked past the last checkpoints
< sipa>
-nocheckpoints i think
< Lauda>
So I should delete this data (version before re-index changes) and run it again with that flag?
< MarcoFalke>
no need to delete data
< Lauda>
It's still re-indexing the build from 16-05.
< sipa>
you can start over
< Lauda>
How would I add that within the .conf file?
< sipa>
checkpoints=0
< sipa>
or nocheckpoints=1
< sipa>
but please consult the help
< sipa>
(bitcoind -help)
< Lauda>
Okay thanks!
< Lauda>
sipa is it normal that the wallet shows weird/non-existing transactions (date-wise) during reindex?