#bitcoin-core-dev on 2018-08-26 — searchable irc log

00:15 < gmaxwell> sipa: ah that might have been why I didn't bother trying to make it interpolate-- the seeks are in an index.

02:38 < jamesob> sipa gmaxwell: mem usage increase is for sure the leveldb change (https://i.imgur.com/yMHJI37.png) and sipa's change didn't change the elevated usage

02:39 < sipa> jamesob: so it's still elevated with my patch?

02:39 < sipa> or it's not elevated anymore?

02:40 < jamesob> still elevated :(

02:41 < gmaxwell> might want to make a microbenchmar to figure out if its mmap behavior or something else in leveldb.

02:42 < gmaxwell> also because it'll be easier to expirement (e.g. see if MAP_NORESERVE does anything) when we don't have to wait 5 hours.

02:42 < gmaxwell> just a loop to mmap all the files in a directory then sleep 30 seconds should test the earlier hypothesis.

02:43 < jamesob> gmaxwell: yup, agree

02:43 < jamesob> will write if I have time tomorrow

03:53 < ossifrage> It seems like MADV_RANDOM is useful if there is less locality to the reads then the size of the prefetch window, but I was under the impression that the prefetch is triggered by the pagefault and not by the mmap() so you actually have to do reads to get the extra pages

05:14 < luke-jr> was there a point to https://github.com/bitcoin-core/leveldb/pull/19 considering we limit leveldb to 1000 open files anyway?

05:19 < sipa> luke-jr: mmap'ed files aren't kept open after creating the mapping

05:19 < sipa> so they don't count towards the 1000 files limit iirc

05:24 < luke-jr> sipa: but we increased the file limit to 1000 *because* mmaps didn't count toward the fd limit

05:24 < luke-jr> the limit I'm talking about is max_open_files, not ulimit

05:35 < sipa> luke-jr: yes, but there is a separate mmap limit it seems

06:13 < luke-jr> sipa: max_open_files effectively limits the number of mmaps too

06:13 < luke-jr> so it seems like the LevelDB mmap limit change is effectively a no-op since we keep the 1000 max_open_files limit :x

06:14 < gmaxwell> luke-jr: it's very much not a no op change.

06:15 < luke-jr> gmaxwell: how does it have an effect?

06:15 < gmaxwell> it causes more mmaps to be open, and less files. you can see this easily with lsof.

06:16 < luke-jr> the mmap limit is enforced by NewRandomAccessFile, which is only ever called by TableCache which enforces the max_open_files limit :/

06:16 < gmaxwell> Well I'm not telling you what the source code says, I'm telling you what it does. :P

06:16 < luke-jr> is there a bug in the LRU cache stuff? :/

06:18 < luke-jr> gmaxwell: is there an easy way to reproduce what you observe?

06:19 < gmaxwell> yea, start a node up no connect, with and without the change and lsof it and look at the number of mmaped files. you might need to crank checkblocks up a bit.

07:15 < wumpus> 0.17.0rc2 binaries up https://bitcoincore.org/bin/bitcoin-core-0.17.0/test.rc2/

07:50 < wumpus> luke-jr: the only reason that the change to num files was acceptable is because leveldb, at least the version in our tree, doesn't actually keep more files open (on platforms with mmap) see also https://github.com/bitcoin/bitcoin/pull/12495#issuecomment-377228329

07:53 < luke-jr> gmaxwell: cannot reproduce with -checkblocks=10000 :/

07:53 < luke-jr> wumpus: I understand that. I just don't understand why increase the mmap-specific limit, without increasing the num files limit that is required for any mmap

07:53 < luke-jr> we will never have more than 1001 mmaps because the files limit is 1000

08:04 < wumpus> --I don't think mappings into closed files count against the open files ulimit

08:09 < sipa> wumpus: luke's point is that internally in leveldb, the mmapped files count towards the open file limit

08:10 < wumpus> but we increased leveldb's limit

08:11 < wumpus> though to be honest I'm completely confused now with all the different limites, how they interact, and how they're supposed to interact

08:11 < sipa> wumpus: yeah...

08:12 < wumpus> given all the confusion about this, I'm starting to think it might be better to revert these changes, I like understanding things even if it means not getting every last possible % of performance

08:12 < sipa> so there is a configurable leveldb max open files limit (which is 1000), and a mmap limit (which is hardcoded to 1000, but we raised it to 4096)

08:12 < wumpus> and seemingly it now relies on all kinds of undocumented behavior

08:12 < sipa> luke says (i haven't verified) that mmapped files count towards that limit... which is inconsistent with our observations

08:13 < sipa> i'm just clarifying my understanding as there seems to be some confusion

08:13 < wumpus> thanks

08:13 < gmaxwell> right if we revert though we should revert both changes.

08:13 < wumpus> yes

08:13 < gmaxwell> I wouldn't oppose doing that for 0.17.

08:13 < sipa> both? what other chanfe?

08:14 < sipa> i saw the mmap increaee

08:14 < gmaxwell> sipa: the fd count change and the mmap limit change.

08:15 < gmaxwell> first FD count was increased with an argument that basically it wouldn't use more FDs due to using mmap. But then we found out that actually when it hit the mmap limit, it started actually using more FDs.

08:15 < wumpus> #12495 and https://github.com/bitcoin-core/leveldb/pull/19

08:15 < gribble> https://github.com/bitcoin/bitcoin/issues/12495 | Increase LevelDB max_open_files by eklitzke · Pull Request #12495 · bitcoin/bitcoin · GitHub

08:16 < gmaxwell> Since the lack of mmaps is probably also bad for performance (though I don't think anyone benchmarked specifically for that), it seemed like it made sense to increase mmaps rather than revert the fd count change.

08:16 < wumpus> yes

08:16 < wumpus> "By default LevelDB will only mmap() up to 1000 ldb files for reading and then fall back

08:16 < wumpus> to using file desciptors."

08:17 < wumpus> that pull 19 is *supposed* to change that

08:17 < wumpus> (from 1000 to 4096, which is still an arbitrary value) -- maybe luke-jr is using a system leveldb?

08:17 < sipa> where did we increase the FDs?

08:18 < wumpus> didn't that happen in #12495?

08:18 < gribble> https://github.com/bitcoin/bitcoin/issues/12495 | Increase LevelDB max_open_files by eklitzke · Pull Request #12495 · bitcoin/bitcoin · GitHub

08:18 < wumpus> I don't know anymore, sorry, should probably get coffee

08:18 < gmaxwell> wumpus: 12495, as you say.

08:19 < sipa> ah yes

08:19 < sipa> hmm

08:19 < sipa> we were using 64 files only before

08:19 < sipa> but using 1000 there seems pretty standard practice; it's leveldb's own default

08:19 < luke-jr> wumpus: not for this testing

08:19 < wumpus> sipa: correct

08:20 < gmaxwell> sipa: after increasing it, we were using a lot more file descriptors.

08:20 < luke-jr> wumpus: but in any case, for testing this, I'm changing the 4096 back to 1000, and can't get LDB to use a fd

08:20 < gmaxwell> luke-jr: are you using some system leveldb?

08:20 < luke-jr> gmaxwell: like I said, not for this; but in any case, it would be the same 1000 mmaps

08:21 < gmaxwell> luke-jr: dunno why it didn't reproduce for you, I saw leveldb using FDs, after using 1000 maps... and not anymore after increasing the map limit.

08:21 < luke-jr> hmm

08:22 < gmaxwell> sipa also saw it using fd's on his node.

08:22 < luke-jr> testnet or mainnet?

08:22 < gmaxwell> mainnet

08:22 < sipa> mainnet

08:22 < luke-jr> I wonder if that's why. :x

08:22 < gmaxwell> testnet probably doesn't have enough ldb files.

08:22 < luke-jr> ah

08:22 < sipa> how large is testnet's utxo set?

08:35 < wumpus> "disk_size": 1025042445, as of height 1384188 (it is at 1392776 now, don't know how much it changed since then)

08:48 < luke-jr> hopefully I won't "damage" my mainnet .bitcoin trying this.. :x

08:49 < luke-jr> (I still run 0.13 wallets from time to time)

08:50 < luke-jr> looks like the UTXO db upgrade isn't triggering it, at least

09:04 < * luke-jr> facepalms. After UTXO upgrade, it says I need to reindex >_<

09:10 < wumpus> does it say why / give an error message? the whole point of the utxo upgrade would be to not reindex, unless it fails

09:11 < provoostenator> If you're traveling and stuck with a slow laptop, the following gist lets you compile remotely and then connect via SSH tunnel: https://gist.github.com/Sjors/fda9b601e9464f61332ac6490f487c0a (improvements welcome)

09:26 < wumpus> provoostenator: nit: you could pass NO_UPNP to the depends build instead of --with-miniupnpc=no

09:26 < wumpus> not that libminiupnp takes long to compile

09:27 < provoostenator> wumpus: thanks, in this case it's probably easier to keep the configure line flexible.

09:28 < provoostenator> I first tried copying bitcoind over and just running it locally but that resulted in a world of pain, probably because the two systems aren't perfectly identical.

09:28 < wumpus> yes copying executables between linux systems is fraught with problems

09:29 < provoostenator> It's works fine with c-lightning though.

09:29 < wumpus> though it should work with depends and the glibc compat enabled

09:29 < provoostenator> I tried with depends as well, haven't tried glibc compat, might try that next time. But this also prevents me from accidentally syncing the chain over 4G.

09:30 < wumpus> -static-libstdc++ helps as well

09:30 < wumpus> (but only if all c++ dependencies are static, which is the case with a depends build but not otherwise)

09:32 < wumpus> then you end up with an executable that is as-good-as-static and it should be portable to non-ancient linux versions with the same CPU architecture

09:33 < wumpus> of course, that's the theory, there are still some things that can interfere for example if glibc is built with different options (I vaguely remember redhat and some security/hardening flag), or the target platform has a different libc

09:35 < wumpus> this is one reason they came up with "snap"s which are executables that essentially carry around part of the system dependencies with them in a container

09:35 < wumpus> (along with some security/isolation features)

09:35 < provoostenator> Being able to run locally is especially useful for QT, and with the multiprocess stuff it would be nice to run bitcoind remote and QT locally.

09:36 < wumpus> aanyhow if your two systems have the same release of the same distro, and are at the same update level, it should work too, that's probably easiest for most...

09:50 < luke-jr> wumpus: I reflink'd the blocks/ and chainstate/ dirs to a new datadir. I'm guessing that simply wasn't sufficient somehow

09:54 < luke-jr> oh well, I'll let it sync overnight and see if it hits the breakpoint

09:55 < luke-jr> env_posix.cc:379 if anyone else who can reproduce it wants to try

09:55 < luke-jr> (if you hit it, maybe see if you can work out why it got there..)

11:23 < provoostenator> Any volunteer for making a PR based on this commit? https://github.com/bitcoin/bitcoin/commit/a9db3dada0119c183d16627805e90c4dbca05c6a

11:23 < provoostenator> I left it out of my rebase #13937 because I'd rather not touch thread stuff.

11:23 < gribble> https://github.com/bitcoin/bitcoin/issues/13937 | Track best-possible-headers (TheBlueMatt) by Sjors · Pull Request #13937 · bitcoin/bitcoin · GitHub

12:06 < provoostenator> Never mind, that's already been done in #13023

12:06 < gribble> https://github.com/bitcoin/bitcoin/issues/13023 | Fix some concurrency issues in ActivateBestChain() by skeees · Pull Request #13023 · bitcoin/bitcoin · GitHub

14:06 < luke-jr> wumpus: heh, I was tired last night; the reason it needed a reindex (or I guess re-IBD) seems to be that I named the blocks directory "newblocks" XD

14:06 < luke-jr> err, "tmpblocks"

16:14 < Chris_Stewart_5> BIP157 doesn't currently have an open PR does it?

16:18 < instagibbs> Chris_Stewart_5, https://github.com/bitcoin/bitcoin/pull/13144

16:26 < Chris_Stewart_5> instagibbs: So that is just a refactor PR to make it easier to implement BIP157 right?

16:36 < instagibbs> doh wrong one

16:36 < instagibbs> one sec

16:36 < instagibbs> Chris_Stewart_5, merged two hours ago, didnt see that https://github.com/bitcoin/bitcoin/pull/12254

16:37 < instagibbs> nothing open after that

16:37 < Chris_Stewart_5> Hmm, is that only BIP158 though? I don't think that BIP158 includes any p2p networking stuff

16:37 < Chris_Stewart_5> BIP157 seems to have the semantics for that

18:28 < jimpo> Chris_Stewart_5: No, I'm working on the next PR towards BIP 157 support

18:28 < jimpo> Which is building filter indexes

18:39 < Chris_Stewart_5> gmaxwell: PIR? Without a PIR method of fetching blocks, use of this is still toxic to user privacy

18:43 < sipa> Chris_Stewart_5: anything that doesn't fetch all blocks will leak some information, inevitably

18:45 < Chris_Stewart_5> What is the acronym? PIR -- i'm guessing it is not Passive Infrared Sensor hah

18:47 < sipa> private information retrieval

19:03 < Chris_Stewart_5> jimpo: These test vectors seem to be a dead link https://github.com/bitcoin/bips/blob/master/bip-0158.mediawiki#appendix-c-test-vectors

19:03 < Chris_Stewart_5> at least the first link

20:16 < jimpo> Oh, thanks, will open a PR to fix that. Need to add a test block anyway.

20:17 < Chris_Stewart_5> Thanks!

22:28 < luke-jr> gmaxwell: sipa: wumpus: sync'd up to 341411 and still no non-mmap opens from LDB..

22:29 < sipa> luke-jr: ?

22:29 < sipa> what does that mean

22:29 < gmaxwell> 341411 is a long time ago.

22:30 < gmaxwell> sipa: he's trying to reproduce the behavior ossifrage, you, and I saw where leveldb would hit the mmap limit and start using more FDs.

22:30 < sipa> you'll only exceed 1000 files/mmaps once you're above ~2GB UTXO set size

22:30 < sipa> as each file is 2MB ish

22:31 < luke-jr> hmm

22:32 < ossifrage> pmap 24199 | grep \.ldb | wc -l == 1894 (for a node up ~26 days)

22:34 < sipa> luke-jr: so you'll need to get to somewhere in 2017 at least

22:35 < * luke-jr> retries the UTXO upgrade, this time with the blocks dir named correctly

22:35 < ossifrage> That includes the txindex and the chainstate

22:36 < sipa> ossifrage: per database; txindex and chainstate are separate

22:37 < gmaxwell> has anyone just tried making a dummy program that mmaps a couple thousand files to see what the RES shows up as?

22:40 < ossifrage> is having a larger resident size really an issue, you only get those pages if memory is available, the os should evict them right quick under memory pressure

22:40 < sipa> presumably, indeed

22:41 < sipa> it may be just an methodology problem, what rss doesn't exactly corresponds to what we care about

22:41 < gmaxwell> ossifrage: if thats actually the case, thing things are fine.

22:43 < gmaxwell> and if so we probably need to figure out how to start tracking "total dirty pages" or something.

22:44 < ossifrage> leveldb only uses mmap for reading, so the dirty pages from all the maps should be 0

22:44 < gmaxwell> Right.

22:44 < luke-jr> personally, I only look at ps's "size" metric

22:45 < luke-jr> size SIZE approximate amount of swap space that would be required if the process were to dirty all

22:45 < luke-jr> writable pages and then be swapped out. This number is very rough!

22:45 < gmaxwell> ossifrage: so to catch you up. We have infrastructure that benchmarks the software regularly and charts performance... so we find out if some commit accidentally hurts performance.

22:46 < gmaxwell> ossifrage: we noticed around the time of the 0.17 feature freeze that the claimed memory usage suddenly went up 20%!

22:46 < gmaxwell> during the week where a lot of last minute stuff got merged.

22:46 < gmaxwell> Bisection turned up your change to increase the leveldb maps count.

22:46 < ossifrage> gmaxwell, where does the claimed memory measurement come from?

22:47 < gmaxwell> Thats what we're trying to determine. after finding out that mmap automatically pages in some of the file tried madvising DONTNEED all the maps, but that didn't seem to fix it.

22:48 < gmaxwell> It might well be that the increased usage is totally fictional... that it's just counting paged-in non-dirty pages.... and they're not real memory pressure and we don't need to worry about them.

22:48 < ossifrage> I'm writing a mmap() test case out of morbid curiosity

22:49 < gmaxwell> Or maybe, though we haven't spotted it yet, leveldb is using 200kb of god knows what per open map. :P

22:49 < gmaxwell> Meanwhile, luke-jr went and read more of the leveldb code and believes that it can't have more mmaps open than the FD limit, and so he thinks the mmap increase should be a noop.

22:50 < gmaxwell> (which it clearly isn't, but it would be good if everyone was on the same page about how leveldb works)

22:53 < ossifrage> I currently have 992 chainstate ldb files mapped and 859 txindex and 43 blocks/indexes ldb files, but it took a fairly long uptime for the original problem to show up

23:00 < ossifrage> I don't see much of a change between the htop reported RES size between mapping 1 txindex ldb file and mmaping all 2041 of them

23:00 < luke-jr> gmaxwell: well, I can reproduce it now, but I haven't figure out why yet

23:00 < ossifrage> (but I never read from the map)

23:02 < gmaxwell> ossifrage: k? try reading like, the last byte of them or something?

23:02 < ossifrage> The VIRT line goes up as expected but RES stays fairly small

23:03 < luke-jr> hmm, is everyone reproducing this using txindex?

23:04 < ossifrage> gmaxwell, yeah that caused the RES size to approach the VIRT size (read the last byte)

23:05 < sipa> gmaxwell: "it can't have more mmaps open than the FD limit" -> s/FD limit/max_open_files setting/

23:05 < gmaxwell> now, try MADV_RANDOM ?

23:05 < gmaxwell> ossifrage: ^

23:05 < ossifrage> I read the last byte, so I'm not sure what it would prefetch :-)

23:06 < gmaxwell> I'm guessing that it's prefetching the whole thing.

23:06 < gmaxwell> andytoshi: HI!

23:07 < andytoshi> gmaxwell: hiya, sorry, my server shit out this morning .. i have no idea why, the logs show everything clear, but it was apparently completely locked up even to a local terminal user

23:07 < andytoshi> so i called my dad to hard-boot it and it seems ok now

23:07 < ossifrage> MADV_RANDOM didn't seem to change the bump in RES caused by reading last byte

23:09 < gmaxwell> ossifrage: saddness. ... so run pmap on the process, and see how many dirty pages it says you have, presumably like none.

23:12 < sipa> gmaxwell: mmap is only used in leveldb for read-only files; none should ever be dirty

23:12 < gmaxwell> Interesting, there are only 41.3 BTC stored in bare multisig outputs, but there are 410k of them in total.

23:13 < gmaxwell> sipa: I know but I also would have expected that increasing the mmap limit wouldn't have increased RES. :)

23:13 < ossifrage> One sec, I might have screwed up...

23:14 < gmaxwell> I'm going to suggest we start charting the total dirty pages from pmap... and not worry about RES...

23:14 < luke-jr> it looks like the mmap limit is global

23:18 < ossifrage> Okay, it looks like MADV_RANDOM isn't helping, but just reading the last byte does cause the RES size to jump quite a bit (but I'm not sure how much I expect it to go up)

23:18 < ossifrage> at least 1 page per file?

23:18 < sipa> yes, 1 page per file is expected

23:21 < gmaxwell> I expect that 1 page per file is what you should get, which is a modest amount of memory.

23:21 < gmaxwell> in bitcoin we're seeing an increase of hundreds of meg.

23:21 < gmaxwell> way more than one page per file.

23:23 < ossifrage> When reading the first byte, MAD_RANDOM does decrease the RES usage

23:28 < ossifrage> Even with MADV_RANDOM, it would depend entirely on how many pages are faulted in by ldb

23:37 < ossifrage> Err, it looks like for me it is prefetching ~64K

23:43 < gmaxwell> 64K with MADV_RANDOM?

23:44 < ossifrage> I think I might be suffering from previous state problems

23:45 < ossifrage> Okay, dropping the cache and madv_random does seem to only read 4K for most of the files mapped (some have more for some odd reason)

23:47 < gmaxwell> oh interesting, so, if you cause the files to be paged in, then start a program with madv_random it will reflect higher res becuase you mmaped files that were already in cache?

23:47 < ossifrage> Yeah, without MADV_RANDOM all the files have 64K RES when I read the 1st byte

23:47 < ossifrage> echo 3 > /proc/sys/vm/drop_caches between runs

23:47 < ossifrage> Yeah I was getting confusing results until I realized that

23:47 < gmaxwell> Cool so also we might not have seen the effects of MADV_RANDOM when testing with actual leveldb due to that.

23:48 < ossifrage> Depends heavily on the specific hardware yeah

23:48 < ossifrage> But drop_caches is a mighty big hammer

23:49 < gmaxwell> sure, just in terms of "huh why did MADV_RANDOM seem to do nothing"... now we know.

23:50 < ossifrage> I suspect the minimum amount read without MADV_RANDOM depends on things like the filesystem and the underlying block device

23:51 < ossifrage> Like reading 1 page with a SSD is no big deal vs 1 page with a RAID5 md array

23:55 < ossifrage> Any ideas how much leveldb actually reads at a time?

23:56 < ossifrage> It would have to touch at least some of the index and then the actual data it wants to read

23:57 < gmaxwell> it does a bisection search in the index and then I think it only reads the record of interest.