#bitcoin-core-dev on 2016-06-25 — searchable irc log

00:01 < randy-waterhouse> ok, another 'devil's advocate' tester on-board segwit test smoothly ... thnx gmaxwell, sipa wumpus, etc ... and many others, good job.

00:04 < Lauda> Does the dbcache=N parameter only start using up additional memory during block processing?

00:05 < gmaxwell> Lauda: I think what you're asking is if it will use it right away, no it won't-- only has there is more data read into the cache

00:06 < Lauda> gmaxwell all that is required is "dbcache=3000" in bitcoin.conf right (since mine is practically empty) and I want to run another reindex over night?

00:06 < Lauda> since*

00:06 < gmaxwell> right!

00:06 < Lauda> Okay great. Thanks!

00:07 < Lauda> The reindex issue is indepdant of the blockchain, i.e. it doesn't matter whether one tests on testnet or mainnet (someone asked me this)?

00:07 < Lauda> independant*

00:16 < midnightmagic> i can convert my testnet node(s) to segwitty. is it time?

03:36 < jl2012> What would happen if I upgrade my node after segwit is active and there are already segwit blocks on the chain?

06:21 < GitHub136> [bitcoin] paveljanik opened pull request #8261: The bit field is shown only when status is "started" (master...20160625_sw_getblockchaininfo_bit) https://github.com/bitcoin/bitcoin/pull/8261

06:30 < gmaxwell> jl2012: it reorgs back to before segwit activated.

06:31 < gmaxwell> jl2012: then will download blocks as needed.

06:32 < jl2012> Same for other soft forks like csv?

06:33 < jl2012> I think it has to be

06:33 < gmaxwell> jl2012: it arguably should be, but we haven't done that before.

06:34 < gmaxwell> for segwit it had to go get the witness data in order to serve blocks

06:34 < gmaxwell> we'd talked about doing it in the past for other soft forks but I think we thought it would be harder to implement than it turned out to be.

06:35 < gmaxwell> (and unlikely to really matter that much unless the reason you were upgrading was to fight a network split, in which case the invalidateblock rpc can be used)

06:36 < jl2012> But theoretically I may be following an invalid chain

06:41 < gmaxwell> Potentially; though if there were a large invalid chain we would make loud announcements to recommend the use of the invalidateblock command. I think now that the code exists, we'd likely use it in the future.

07:54 <@wumpus> segwit testnet node: 213.46.222.31:18333

10:23 < Lauda> What is the main bottleneck for reindex, storage speed or CPU processing?

10:29 < gmaxwell> probably depends on the hardware, on my laptop I think it's IO. on systems with faster IO, I think its cpu inside leveldb code... at least with default dbcache. with dbcache cranked, its likely cpu elsewhere in bitcoin.

10:30 < Lauda> Hmm, seems like the last 20-30 weeks are taking forever

10:30 <@wumpus> Lauda: unless you have a very fast disk/ssd, or increase the dbcache, i/o will be your main bottleneck

10:30 < Lauda> I've just checked in on my node, and some bans say e.g. until June 19th but the nodes are still banned. Do these get removed after a restart or something went wrong?

10:30 <@wumpus> Lauda: you can easily check though: is CPU maxed out?

10:31 < Lauda> It isn't

10:31 < Lauda> DBcache 3GB

10:31 <@wumpus> (unless you have changed the number of script verification threads, bitcoind will max out your CPU cores in initial sync when it's not i/o bound)

10:32 < Lauda> Okay so even with 3GB dbcache that's still not enough

10:33 <@wumpus> I have a branch to run bitcoind db-less: https://github.com/laanwj/bitcoin/tree/2016_04_dummy_db it does mean it loses all data when e.g. bitcoind crashes

10:34 <@wumpus> but the utxo and block index is simply stored in a flat file, which is loaded at startup and written at shutdown

10:35 < Lauda> That's interesting and those times are amazing in comparison to leveldb

10:35 <@wumpus> if you can afford the memory :)

10:35 <@wumpus> though it uses less memory than keeping everyting in the dbcache, and doesn't have the issue that the cache is not seeded at startup

10:37 <@wumpus> I think research and experimentaitno how to best store the utxo set is in order

10:38 < Lauda> The move towards SSDs should definitely help with this, but the industry is not there yet..

10:38 < Lauda> I can afford 4 GB on this machine, but it still takes a fair amount of time.

10:39 <@wumpus> memristor would be nice

10:40 <@wumpus> but no matter what, also at some point, unrestrained growth of the utxo set needs to be addressed

10:40 < Lauda> ^

10:45 < sipa> wumpus: we should try switching to a model where all utxos are stored as separate db entries, rather than in a vector of unspends per txid

10:46 <@wumpus> but it may well be we're running against the limits of what databases can (with good performance) handle, which means there is no room for scaling there at all

10:47 < gmaxwell> gigantic cuckoo hash table. with a update log. :P

10:47 <@wumpus> sipa: yes, that would be an interesting experiment too

10:48 <@wumpus> also the access pattern is essentially random, so the only type of caching that helps very well is keep everything

10:48 < gmaxwell> sipa: the ripple people have claimed that leveldb performance falls off a cliff with more than some threshold number of entries (I believe they were storing every transaction in it)

10:49 < sipa> gmaxwell: i think they don't have application level caching

10:49 <@wumpus> well I'm not actually sure how random the access pattern is, but it looks like that from a disk perspective with the current organization

10:49 <@wumpus> it's very possible for optimizations to be possible based on sorting utxos smartly which are expected to be accessed together? I don't know

10:50 < gmaxwell> sipa: sure, but that wouldn't change the performance of the underlying database.

10:50 <@wumpus> it's not like the other databases that we tried perfomed better

10:50 <@wumpus> leveldb still seems, all in all, the best perforing on-disk database for utxo storage

10:51 < gmaxwell> Ripple folks created their own.

10:51 < gmaxwell> (and also suggested we might be interested in using it)

10:51 <@wumpus> lmdb looked promising but it has it's own performance cliff

10:51 <@wumpus> (depending on the amount of memory in the system, it seems)

10:51 <@wumpus> what license is it under?

10:52 <@wumpus> I could give it a try pretty easily

10:53 < GitHub74> [bitcoin] btccode opened pull request #8262: Forgetaddress 0.12 (0.12...forgetaddress-0.12) https://github.com/bitcoin/bitcoin/pull/8262

10:53 < gmaxwell> I think it's permissively licensed, looking for it now

10:53 < GitHub2> [bitcoin] btccode closed pull request #8262: Forgetaddress 0.12 (0.12...forgetaddress-0.12) https://github.com/bitcoin/bitcoin/pull/8262

10:54 < sipa> gmaxwell: well is it read performance or write performance that is bad?

10:55 <@wumpus> with leveldb it's read performance, and also latency

10:56 <@wumpus> write performance of leveldb is quite good, I suppose because it writes consecutive files

10:56 < gmaxwell> https://bitcointalk.org/index.php?topic=1004943.0

10:56 <@wumpus> but no database likes huge databases + random seek patterns for reads

10:57 <@wumpus> https://github.com/vinniefalco/nudb

10:57 < gmaxwell> (thats why I made the half serious suggestion of a gigantic hash table)

10:58 <@wumpus> lmdb read latency seems to be - on average- better, but its writing is worse than leveldb, I think it does more random writing

10:59 < GitHub121> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/5cdc54b4b62d...63fbdbc94d76

10:59 < GitHub121> bitcoin/master b0be3a0 Wladimir J. van der Laan: doc: Mention Windows XP end of support in release notes...

10:59 < GitHub121> bitcoin/master 63fbdbc Wladimir J. van der Laan: Merge #8240: doc: Mention Windows XP end of support in release notes...

10:59 < GitHub49> [bitcoin] laanwj closed pull request #8240: doc: Mention Windows XP end of support in release notes (master...2016_06_windows_xp) https://github.com/bitcoin/bitcoin/pull/8240

11:00 < gmaxwell> as every read would simply be one or two random disk accesses... and its hard to do better than that. it's just writing is awful. (e.g. end up with read-write-write to update a log, with both sequential reads and writes, and if the table needs to be resized woe is you).

11:00 <@wumpus> going to try nudb when I have some time

11:01 <@wumpus> unfortunately we also do a lot of writing, at least during initial sync, every utxo read is updated and written back

11:01 <@wumpus> so making reading much faster at the expense of writing is going to give yo mixed results

11:02 <@wumpus> has research been done on utxo access patterns? e.g. are more recent blocks more often accessed, or the other way around, or are there other regularities that could be used?

11:04 < gmaxwell> Spending is more freuqntly from recently created utxo.

11:04 <@wumpus> interesting

11:06 < gmaxwell> I would expect naievely that the expected lifetime of a utxo is how long it's lived so far. If something had made it a year without being spent, you should expect it to last another year. But beyond knowing that an unusually large number of utxo have short lives, I've not done anything to try to verify this hypothesis.

11:07 < gmaxwell> we could probably construct fairly elaborate predictions using other features like how many txouts were in the creating transactions, reuse of the pubkey, and the amount of the coin.

11:10 < gmaxwell> (or even using non-fungibility-- a coin is likely to be spent soon if its recent ancestors were spent soon)

11:10 < sipa> wumpus: that's why the "fresh" optimization helps a lot... we create utxo entries indide the database, and fully spend them before they even hit disk

11:11 < sipa> s/inside the database/inside the cache/

11:11 < gmaxwell> with the cache turned way up, the whole initial sync runs without writing the chainstate until the end.

11:14 < gmaxwell> oh, seems nudb is a big hashtable (uses external storage for values)

11:14 < sipa> it keeps the entire keyset in memory?

11:15 < gmaxwell> no, the keys are an a file. sounds like it's chunked so it can independantly resize sub tables.

11:20 <@wumpus> isn't the fact that nudb is insert-only a problem? we delete and change entries a lot

11:22 <@wumpus> gmaxwell: would be a good research project to investigate that hypothesis in detail, and see if it is possible to optimize storage based on those predictions/assumptions. Maybe one huge key/value store is not the best way to handle this

11:23 < gmaxwell> hm. I thought it could delete keys but not the values in external storage.

11:24 < gmaxwell> Oh I see what you mean there... I hadn't caught that implication before.. that effect is more or less why caching smaller than the utxo set in memory is still effective, but depending on the geometry of the effect it might make sense to have two databases.. so that the high access parts are in something with low log(n) costs.

11:29 < gmaxwell> LOL, totally offtopic: https://lkml.org/lkml/2016/3/31/1109

11:30 < btcdrak> inb4 linus splats him

11:35 < gmaxwell> it's old, just turned up in a random google search

11:36 < btcdrak> oh it's an april fool!

11:38 < Lauda> Error reading from database, shutting down 15 weeks left :<

11:39 < Lauda> I think I can still get detect results if I test another version and break at 15 weeks left?

11:39 < Lauda> decent*

11:42 < gmaxwell> do you know why it failed?

11:42 < gmaxwell> if you're benchmarking you can just compare to a common height.

11:43 < gmaxwell> e.g. use the timestamps in the log

11:44 < Lauda> I can't be sure. It's possible that my HDD disconnected for a second (the cable seems a bit unstable if touched).

11:44 < spudowiar> Lauda: check dmesg for I/O errors?

11:45 < Lauda> Okay then, I'll compare the timestamp of the same height. Running a test on a revision before the reindex changes now

11:45 < Lauda> I'll check system error log (these tests are on Windows not Unix).

11:47 < Lauda> http://pastebin.com/Zau75AHY it doesn't tell me much.

12:36 < sipa> Lauda: you have debug.log.

12:36 < sipa> ?

12:36 < Lauda> Yes

12:38 < Lauda> The last entry is just an UpdateTip.

12:39 < Lauda> Comparing the partial data shows that re-index is much faster on the newer version than one before the re-index changes (at least on custom dbcache).

12:42 < sipa> yes, for a fair comparison you need to disable checkpoints

12:42 < sipa> before the reindex changes, signatures were always checked

12:42 < Lauda> How do I disable checkpoints?

12:43 < sipa> after thry're only checked past the last checkpoints

12:43 < sipa> -nocheckpoints i think

12:43 < Lauda> So I should delete this data (version before re-index changes) and run it again with that flag?

12:47 < MarcoFalke> no need to delete data

12:47 < Lauda> It's still re-indexing the build from 16-05.

12:48 < sipa> you can start over

12:48 < Lauda> How would I add that within the .conf file?

12:49 < sipa> checkpoints=0

12:49 < sipa> or nocheckpoints=1

12:52 < sipa> but please consult the help

12:52 < sipa> (bitcoind -help)

12:55 < Lauda> Okay thanks!

13:12 < Lauda> sipa is it normal that the wallet shows weird/non-existing transactions (date-wise) during reindex?

13:14 < sipa> example?

13:14 < Lauda> http://i.imgur.com/J7YeKdu.png

13:14 < Lauda> e0a871f4897af619c4e0d8ab91d6c6f81e25d23f4dea421439b60e9c9dd8cb83

13:15 < Lauda> Received Time2015-09-09 20:57:49

13:15 < Lauda> wallet shows yesterday

13:15 < Lauda> I don't even recognize these transactions, a (big) list of microtransactions (incoming and outgoing) for 24/06

13:18 < sipa> heh

13:18 < sipa> did you import some common brainwallet

13:19 < Lauda> No. I didn't do anything. I'm running this on my wallet machine (since my other one is down)

13:19 < Lauda> I have never used anything besides QT.

13:19 < Lauda> I think it wasn't showing on 24-06 nightly build.

13:20 < sipa> that seems unlikely :)

13:21 < Lauda> The dates seem correct for all transactions up to this point. There are surely a few hundred TX's stamped at this date now

13:21 < Lauda> Hmm..

13:23 < Lauda> My wallet.dat grew. I made a backup before I started reindex tests. It was ~400kb, now it is 4.6MB

14:10 < GitHub165> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/63fbdbc94d76...1922e5a65458

14:10 < GitHub165> bitcoin/master 27f8126 Daniel Cousens: remove unnecessary LOCK(cs_main)

14:10 < GitHub165> bitcoin/master 1922e5a Wladimir J. van der Laan: Merge #8244: remove unnecessary LOCK(cs_main) in getrawpmempool...

14:10 < GitHub97> [bitcoin] laanwj closed pull request #8244: remove unnecessary LOCK(cs_main) in getrawpmempool (master...patch-1) https://github.com/bitcoin/bitcoin/pull/8244

19:37 < da2ce7_mobile> Well done! https://github.com/bitcoin/bitcoin 11,111 comments :)

19:40 < sipa> *commits

20:25 < spudowiar> No one is allowed to commit

20:26 < spudowiar> You must squash commits in order to add more

20:26 < spudowiar> Huh, it says 10,000 commits

20:27 < spudowiar> Not 11,111

20:30 < da2ce7_mobile> oh spelling. oh well.

20:42 < spudowiar> da2ce7_mobile: you are a commit :)

20:43 < spudowiar> [da2ce7] Fix spelling mistakes

20:43 < spudowiar> Committer: spudowiar

23:11 < GitHub41> [bitcoin] bitcoiner opened pull request #8264: src: Fix typo in comment - tinyformat.h (master...bitcoiner-fix-typo-tinyformat) https://github.com/bitcoin/bitcoin/pull/8264

23:43 < GitHub43> [bitcoin] bitcoiner opened pull request #8265: src: Fix spelling error in comment - netbase.h (master...bitcoiner-fix-typo-netbase) https://github.com/bitcoin/bitcoin/pull/8265