#bitcoin-core-dev on 2018-08-02 — searchable irc log

02:40 < phantomcircuit> gmaxwell, https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/

02:40 < phantomcircuit> apparently the answer is dont use that

02:44 < sipa> Great, WSAPoll doesn't report socket failures

02:53 < luke-jr> right

03:02 < gmaxwell> again: we can stay with select on windows. It doesn't have the 1024 FD problem.

03:17 < ossifrage> The fix for my specific problem was to just modify how many mmaps() leveldb will make

03:18 < gmaxwell> ossifrage: do you have any idea why the number of mmaps would be limited at all, on 64 bit systems?

03:19 < ossifrage> The comment is about "performance reasons" for large databases... But 1000 mmaps is in the noise I think

03:19 < ossifrage> I changed src/leveldb/util/env_posix.cc mmap_limit from 1000 to 4096

03:21 < gmaxwell> I don't understand what they mean there.. are they thinking in terms of TLB load or something?

03:21 < ossifrage> mmap() is a great way to generate a very large amount of write pressure, but it seems like most of the leveldb use in bitcoin has a very low change rate

03:23 < ossifrage> But lots of memory and slow IO will do that just fine without a single mmmap()

03:25 < gmaxwell> leveldb's writes are very structured, basically it's an append only thing, that periodically rewrites whole files.

03:38 < sipa> mmap is only used for readonly things

03:38 < sipa> afaik

03:38 < sipa> the files are produced in one go, by dumping a sorted table to disk

03:41 < gmaxwell> I wish the leveldb project were more active, it would be nice if we could ask if there is a reason we shouldn't just kill the limit on 64-bit.

04:05 < ossifrage> sipa, that is a good idea, especially if the writes are streaming

04:26 < phantomcircuit> sipa, gmaxwell should be used only for reads

04:27 < phantomcircuit> in which case increasing the limit to... infinity shouldn't be an issue on 64bit systems

04:31 < gmaxwell> doesn't replace using poll instead of select.

04:31 < gmaxwell> phantomcircuit: hows the PR coming? :P

04:32 < phantomcircuit> well i had one that looked like it worked but then fucking wsapoll is broken

04:32 < phantomcircuit> soooo

04:32 < phantomcircuit> try again

04:32 < sipa> phantomcircuit: don't do wsapoll

04:32 < sipa> just poll on sane OSes

04:32 < sipa> keep using select on windows

04:34 < gmaxwell> the main reason to not use select is the stupid fd value limit, but that doesn't apply for windows.

04:34 < phantomcircuit> i mean yeah but that's actually a bigger change

04:35 < gmaxwell> it's just "keep the existing code, add the poll in an ifdef", no

04:39 < phantomcircuit> gmaxwell, it's slightly different, there's no FD_ISSET

04:40 < phantomcircuit> you iterate through a list of fd, events pairs

04:40 < phantomcircuit> the select logic iterates over all the nodes and calls FD_ISSET

04:40 < phantomcircuit> (which iirc is insane cause FD_ISSET iterates over all the fds)

04:44 < sipa> phantomcircuit: in windows it does

04:44 < sipa> in linux it's a bitfield test

04:45 < phantomcircuit> sipa, oh

04:46 < phantomcircuit> well either way there isn't a trivial way to do that mapping with poll()

04:50 < sipa> phantomcircuit: which is btw the reason why select is resteicted to fd's below 1024

04:50 < sipa> fdset id a 128 byte array

04:50 < sipa> *is

04:52 < phantomcircuit> yeah makes sense

04:54 < phantomcircuit> i'll add the iteration needs to be reversed for epoll or kqueue also anyways

05:07 < fanquake> cfields Looking forward to the turtles!

05:09 < cfields> fanquake: heh, I just pushed it so that dongcarl can get his hands dirty. It's still an absolute disaster.

05:13 < phantomcircuit> cfields, is there a map from fd to CNode ?

05:14 < cfields> phantomcircuit: don't believe so. IIRC we always just iterate.

05:15 < phantomcircuit> how does that work with libevent stuff? iirc it's just calling a callback with the fd right?

05:16 < cfields> phantomcircuit: the libevent stuff hasn't been merged. You mean in my branches?

05:16 < phantomcircuit> yeah

05:16 < cfields> anyway, yea, callback with fd and a few other things, and a caller-supplied pointer

05:17 < phantomcircuit> oh i see the caller supplied pointer

05:17 < phantomcircuit> right so libevent is basically keeping that map for you

05:18 < cfields> well everything's done in reverse, so there shouldn't be any need to ever lookup an fd

05:18 < cfields> so, i suppose :)

05:21 < phantomcircuit> cfields, well the underlying epoll thing requires you can map fd to cnode

05:21 < phantomcircuit> just with libevent it's doing it for you implicitly with the callback data

05:21 < phantomcircuit> for poll() you need the same but it's explicit

05:23 < cfields> I thought you had to iterate through the fd list anyway with poll similar to select. Am I completely misremembering?

05:25 < cfields> after it wakes for active fds, I mean.

05:30 < phantomcircuit> cfields, you iterate through the list of fd's you gave it

05:30 < phantomcircuit> yes

05:31 < cfields> phantomcircuit: right, so why the need for a map? You've got the pointers to the CNodes that you pulled the fds from, and you need to test them anyway

05:33 < cfields> or are you just trying to eliminate the overhead of the iteration of nodes that didn't wake?

05:33 < cfields> (I don't remember if poll gives you anything to help avoid that)

05:35 < phantomcircuit> cfields, i mean i can make the map right there, but it's awkward

05:36 < cfields> phantomcircuit: good luck :)

05:36 < cfields> nnite

06:56 < wumpus> kallewoof: if there is no non-locale-independent function, you're going to have to implement one yourself

06:56 < wumpus> kallewoof: usually this is trivial as the ASCII case of the string functions is trivial

06:58 < kallewoof> wumpus: ok

07:01 < jonasschnelli> who own drahtbot?

07:01 < jonasschnelli> *owns

07:02 < kallewoof> jonasschnelli: MarcoFalke

07:02 < jonasschnelli> Nice work... I wasn't aware that it does also builds via gitian

07:03 < wumpus> yes it's a great bot

07:07 < fanquake> ^ It keeps getting better

07:08 < fanquake> I wonder if it'll be running/posting the results of some of the benchmarks from perfmonitor

07:10 < fanquake> https://github.com/chaincodelabs/bitcoin-perfmonitor

07:25 < fanquake> wumus 13835 & 13824 should be mergable

07:25 < fanquake> *wumpus

08:27 < fanquake> wumups Also #13844 and 13796

08:27 < gribble> https://github.com/bitcoin/bitcoin/issues/13844 | doc: correct the help output for -prune by hebasto · Pull Request #13844 · bitcoin/bitcoin · GitHub

09:06 < fanquake> sjors I've added the gitian-build label, so I think that should trigger it.

09:18 < provoostenator> fanquake thanks

10:58 < wumpus> Aug 01, 12:07 - f030410e88f11c5ff1ce6c80b463a1c7f6d39830functional-test-runner@ccl-bench-hdd-1 Average time up 9.5%

10:59 < wumpus> looking at how to interpret bitcoinperf.com - does this mean that merging #13697 made the tests 10% slower on gcc, but not clang?!?

10:59 < gribble> https://github.com/bitcoin/bitcoin/issues/13697 | Support output descriptors in scantxoutset by sipa · Pull Request #13697 · bitcoin/bitcoin · GitHub

11:01 < wumpus> it's not surprising that it makes running the functional tests fractially slower, as test cases were added, but 10% is a lot

11:26 < provoostenator> xpub derivation is expensive

11:27 < provoostenator> You could mock the actual derivation with lookup tables.

11:27 < provoostenator> But that's pretty invasive for the functional test suite.

11:31 < wumpus> yes but what I mean is that the test framework consists of so many tests, this mean that one test takes up ~1/10th of it

11:49 < provoostenator> That's not surprising because of how many derivations it's doing, trying to scan up to a thousand (?) keys for those xpub/* patterns. Using the range param for scantxoutset might help.

11:54 < wumpus> good point, yes true

11:55 < provoostenator> I left a note for sipa on the PR. The default is 1000 and sevearl tests use 1499, I suggest using 1 and 2.

12:20 < wumpus> thanks

12:48 < provoostenator> fanquake: no rush, but how often does that bot build things? There's only one ticket with a "Needs gitian build" label atm.

12:49 < fanquake> provoostenator: Not 100% sure if it's completely automated or not. MarcoFalke should be able to help out.

12:50 < wumpus> looks like we have a causality issue here, that message from provoostenator is visible here *just before* fanquake joining

12:51 < * provoostenator> hides time machine

12:51 < wumpus> leaky wormholes again

12:51 < fanquake> wumpus :o

13:08 < fanquake> wumpus I've retested 13823 now, thanks for pointing out the quoting issue.

13:37 < ken2812221_> wumpus: I think #13426 can be postponed to 0.18, I don't think this can be merged in a week. I'll open an issue to track this and seperate to several PR for easier review

13:38 < gribble> https://github.com/bitcoin/bitcoin/issues/13426 | [bugfix] Fix encoding issue for Windows by ken2812221 · Pull Request #13426 · bitcoin/bitcoin · GitHub

13:50 < MarcoFalke> provoostenator: It builds on a single cpu on gce, heh. So maybe 24hours to build master and the commit for all targets ;)

13:50 < ken2812221> Also, we'll 6+months to test if there is any side effect after these changes.

13:51 < MarcoFalke> ken2812221: Yeah, was going to ask about the state of this.

13:51 < MarcoFalke> Imo we should at least fix the crash

13:51 < MarcoFalke> the crash on Windows for non-ascii wallet file name

13:54 < wumpus> fanquake: cool, thanks for retesting

13:54 < wumpus> ken2812221: ok!

13:55 < fanquake> MarcoFalke which PR/changeset do you have in mind for that?

13:55 < MarcoFalke> Idk what is causing the issue on windows

13:55 < wumpus> ken2812221: changed milestone--thanks, I sort-of expected this, it was kind of a large and reasonably risky change to merge so last minute, better to do it as one of the first things for 0.18 probably

13:56 < fanquake> Ah right, we're talking about #13754 here?

13:56 < gribble> https://github.com/bitcoin/bitcoin/issues/13754 | Windows crashes for -wallet=你好 · Issue #13754 · bitcoin/bitcoin · GitHub

13:57 < MarcoFalke> jup

13:57 < MarcoFalke> It reports a fatal error and then hits an assertion

13:58 < ken2812221> MarcoFalke: That happen on -datadir for a really long time.

13:58 < MarcoFalke> oh

13:58 < MarcoFalke> Not a regression then

13:58 < fanquake> Ok. I'm not entirely sure how we can solve that in not very intrusive, right before 0.17.0 kind of way.

13:59 < fanquake> 13426 currently has changes to bdb and leveldb as well

14:00 < provoostenator> Afaik it used to throw a clear error and now it just crashes, but it's only a problem if you use character incompatible with your system language.

14:00 < MarcoFalke> So not really a common issue to hit

14:00 < MarcoFalke> Still, would be nice to bisect that

14:00 < MarcoFalke> Will probably do that

14:01 < provoostenator> Have fun, only takes 24 hours per build, right? :-)

14:02 < MarcoFalke> heh, I hope I can steal them from jonasschnelli nightly server

14:10 < fanquake> If someone wants to do some quick review, there is currently 13852 & 13852 for 0.16, both fairly simple backports

14:17 < provoostenator> fanquake duplicate number, what's the second one? I'll take a look.

14:18 < fanquake> provoostenator Sorry, 13796

14:27 < fanquake> Another trivial one in 13853, not entirely sure how the versions got out of sync.

14:27 < provoostenator> MarcoFalke: maybe it's useful to have a "needs windows build" tag? That's the only OS where you can't avoid cross-compilation pain.

14:29 < provoostenator> fanquake: you're sure you want to bump QT to 5.9.6 in backports?

14:29 < wumpus> let's not get too specific with labels

14:29 < provoostenator> wumpus: it's just that the only reason I sometimes ask for Gitian builds is that I want to test Windows binaries.

14:30 < provoostenator> Maybe others have other reasons.

14:30 < fanquake> provoostenator bump qt in backports?

14:30 < provoostenator> https://github.com/bitcoin/bitcoin/pull/13853/files#diff-0c8311709d11060c5156268e58f5f209R26

14:30 < wumpus> provoostenator: I understand, it's just that I think labels are best for keeping track of rough categories, this seems oddly specific :)

14:31 < provoostenator> Oh wait, that one isn't a backport, nvm.

14:32 < provoostenator> Maybe a Github bot listening for comments "Windows plz" is better than tags?

14:43 < wumpus> I... think we should simply make sure enough CPU capacity is available for the build bot and build everything possible

14:56 < wumpus> if cloud/server costs are a problem I'm sure we can find some solution

14:59 < fanquake> Can run all the tests, fuzzers, linters, benchmarks, gitian builds, test sync times..

15:00 < wumpus> yes, including openbsd and freebsd *ducks*

15:01 < ken2812221> In my opinion, how about uploading travis build binaries to a server?

15:02 < ken2812221> It would cost less money than own build.

15:04 < wumpus> the problem, last time we looked at that option, had to do with credentials; as well as well as with potential malware, especially if PRs automatically upload binaries

15:04 < wumpus> this is manually triggered by maintainers, so less risky

15:14 < ken2812221> We can print the binary hash at the end and upload them to a close server. If someone want to get the binary, they can ask maintainer to get it. The downside is that we should trust travis-ci.

15:34 < MarcoFalke> ken2812221: Could do that with https://transfer.sh/

16:03 < ken2812221> MarcoFalke: Thanks, I'll try it.

16:46 < wumpus> labeled the risc-v PRs 0.18, would be nice to have executables for that at that time

17:30 < sipa> wumpus: sgtm

18:29 < luke-jr> would be nice to have POWER9 executables ASAP; that hardware is already usable ;)

19:00 < wumpus> luke-jr: yes

19:00 < wumpus> luke-jr: let's add that for 0.18 too then.

19:01 < wumpus> #startmeeting

19:01 < lightningbot> Meeting started Thu Aug 2 19:01:07 2018 UTC. The chair is wumpus. Information about MeetBot at http://wiki.debian.org/MeetBot.

19:01 < lightningbot> Useful Commands: #action #agreed #help #info #idea #link #topic.

19:01 < jnewbery> hi!

19:01 < promag> hi

19:01 < cfields> hi

19:01 < provoostenator> hi

19:01 < jonasschnelli> hi

19:02 < wumpus> #bitcoin-core-dev Meeting: wumpus sipa gmaxwell jonasschnelli morcos luke-jr btcdrak sdaftuar jtimon cfields petertodd kanzure bluematt instagibbs phantomcircuit codeshark michagogo marcofalke paveljanik NicolasDorier jl2012 achow101 meshcollider jnewbery maaku fanquake promag provoostenator

19:02 < kanzure> hi.

19:02 < achow101> hi

19:02 < meshcollider> Hi

19:02 < instagibbs> ello

19:03 < gmaxwell> Hi.

19:03 < wumpus> topics?

19:04 < luke-jr> crickets

19:04 < wumpus> crickets are... good I guess

19:04 < wumpus> 0.17 PRs: https://github.com/bitcoin/bitcoin/pulls?q=is%3Aopen+is%3Apr+milestone%3A0.17.0

19:05 < wumpus> 0.17 issues: https://github.com/bitcoin/bitcoin/issues?q=is%3Aopen+is%3Aissue+milestone%3A0.17.0

19:05 < luke-jr> I guess we could discuss the CXXFLAGS stuff

19:05 < wumpus> 0.17 release schedule: #12624

19:05 < gribble> https://github.com/bitcoin/bitcoin/issues/12624 | Release schedule for 0.17.0 · Issue #12624 · bitcoin/bitcoin · GitHub

19:06 < luke-jr> I don't really have a good solution for it

19:06 < wumpus> #topic CXXFLAGS stuff

19:06 < gmaxwell> Whats the issue?

19:06 < luke-jr> gmaxwell: autotools forces user CXXFLAGS after our own; so when the user builds with -mno-avx2, the build simply fails

19:07 < provoostenator> #13789

19:07 < gribble> https://github.com/bitcoin/bitcoin/issues/13789 | crypto/sha256: Use pragmas to enforce necessary intrinsics for GCC and Clang by luke-jr · Pull Request #13789 · bitcoin/bitcoin · GitHub

19:07 < cfields> eh?

19:07 < cfields> it doesn't force it, we do that ourselves

19:07 < gmaxwell> can someone please drop the registed users +q for now? sdaftuar is muted.

19:07 < luke-jr> ie, autotools calls the compiler with our -mavx2 *followed by* -mno-avx2

19:07 < wumpus> well apparaently there's a problem where the build system passes the wrong flags, and luke-jr's PR works around it with pragmas

19:07 < gmaxwell> or at least voice sdaftuar

19:07 < wumpus> that seems wrong to me

19:07 < cfields> the intention is for any user-passed flags to be able to override the ones we add. If that's not the case, it's a bug.

19:07 < luke-jr> cfields: that's the problem in this case

19:08 < luke-jr> we can't let the user override -mavx2 here

19:08 < sdaftuar> hi

19:08 < cfields> luke-jr: you mean we shouldn't, or we currently don't?

19:08 < luke-jr> cfields: autotools makes it impossible to override user CXXFLAGS, but for these files we must or we fail to compile

19:08 < gmaxwell> luke-jr: why would the user ever pass -mno-avx2 in the first place? We shouldn't be using -mavx except on special files that need it to compile.

19:08 < cfields> oh, I see what you mean

19:08 < luke-jr> gmaxwell: to avoid AVX2 instructions

19:09 < cfields> luke-jr: eh?

19:09 < luke-jr> gmaxwell: it's those special files that fail to compile

19:09 < sipa> hi!

19:09 < cfields> luke-jr: fail how? do we just need to check for more intrinsics?

19:09 < provoostenator> (I guess it was crickets and the muffled voice of sdaftuar in the dinstance)

19:09 < gmaxwell> luke-jr: If the user wants to avoid executing avx2 instructions they need do nothing. They don't have to pass any special options.

19:09 < gmaxwell> cfields: he is saying that if he builds with CXXFLAGS=-mno-avx2 the compile fails.

19:10 < luke-jr> gmaxwell: AVX2 is enabled by default with some -march options

19:10 < cfields> gmaxwell: I assume the issue is some failures to compile because of a busted compiler, so there's a desire to be able to avoid them entirely.

19:10 < luke-jr> https://github.com/bitcoin/bitcoin/issues/13758

19:10 < gmaxwell> luke-jr: why would anyone -march=<thing with avx2> then -mno-avx2? that just seems busted.

19:10 < cfields> ooooooh

19:11 < wumpus> it looks like a really contrives scenario to me

19:11 < luke-jr> gmaxwell: I'm not sure why laurentb is doing it, but there's no reason they shouldn't be able to

19:11 < gmaxwell> in any case, why not detect the -mno-avx2 and then don't even compile the file?

19:11 < wumpus> not worth it polluting the code with all kinds of compiler specific pragmas at least

19:11 < cfields> ok, there are plenty of ways to solve that. I think we can just discuss on the github issue?

19:11 < luke-jr> gmaxwell: autotools says we're supposed to allow changing CXXFLAGS after configure

19:11 < wumpus> agree with cfields , there's no hurry with this

19:11 < luke-jr> make CXXFLAGS=…

19:11 < luke-jr> okay

19:12 < gmaxwell> in any case -march=<hardware that has avx2> -mno-avx2 sounds like a misguided thing that we shouldn't take a lot of complexity to support, unless someone knows otherwise.

19:12 < cfields> gmaxwell: we turn off all flags except the ones we're testing when doing the autoconf checks so that they don't cause unrelated errors.

19:12 < gmaxwell> Esp because there is -mtune

19:12 < luke-jr> cfields: no we don't

19:12 < MarcoFalke> Unassigned the issues from the 0.17 milestone for now

19:12 < cfields> luke-jr: ones that we add, we do

19:13 < luke-jr> MarcoFalke: it's a must-fix for 0.17

19:13 < wumpus> no, it's not a must-fix for 0.17

19:13 < gmaxwell> I don't see how this is a must fix.

19:13 < wumpus> agree fully with MarcoFalke

19:13 < luke-jr> broken build system..

19:13 < wumpus> any other topics?

19:13 < gmaxwell> "User sets weird options which seem to make no sense as we know, and then something that arguably should work but fails" is not really blocker material.

19:13 < meshcollider> Lol

19:14 < provoostenator> Windows

19:14 < provoostenator> (topic)

19:14 < MarcoFalke> I left the pulls for review in the 0.17 milestone, so if reviewers like them, they can still be merged

19:14 < luke-jr> provoostenator: what about Windows? :p

19:14 < provoostenator> As in: do we want to fix the Windows unicode stuff, given that there's still two weeks?

19:14 < wumpus> right, it's annoying for the specific user, but if you have really specific needs like that you can patch around it

19:14 < MarcoFalke> Let's do the unicode stuff for 0.18

19:14 < wumpus> #topic Windows (provoostenator)

19:14 < provoostenator> I think the opinion in the ticket was no.

19:14 < MarcoFalke> It would require a leveldb bump and major changes

19:14 < gmaxwell> wumpus: we should find out what the user is doing, might just be some greater confusion... like they want to benchmark each way or something.

19:15 < MarcoFalke> Not sure if we can review and test that in such a short time frame

19:15 < provoostenator> Ok, and we're not getting tons of reports about this either?

19:15 < jonasschnelli> Is the unicode stuff just about filenames (wallet)?

19:15 < wumpus> gmaxwell: right!

19:15 < MarcoFalke> I am looking into restoring the proper warning for -wallet=non-ascii, but that should be all for 0.17

19:15 < MarcoFalke> jonasschnelli: Also datadir

19:15 < sipa> only half here, but feel free to ping me

19:15 < provoostenator> I think it's mostly filename yes, but also labels.

19:16 < jonasschnelli> Yeah. We should fix that. But this seems to be open since forever and I don't see a pressing need for 0.17

19:16 < provoostenator> But afaik I know it works if your system locale is set "correctly".

19:16 < MarcoFalke> jonasschnelli: Indeed, anything that is not a regression should go into 0.18

19:17 < cfields> provoostenator: due to the nature of the bug, I think many of the people who would be reporting it may not speak english. So the significance may be a little under-represented.

19:17 < cfields> s/bug/issue/

19:17 < gmaxwell> I was trying to say what cfields just said.

19:17 < meshcollider> Good point

19:17 < jonasschnelli> Yes. Good point.

19:17 < luke-jr> provoostenator: wait, labels are broken?

19:17 < gmaxwell> We shouldn't take few reports to mean few issues... but it the change is invasive and not ready, and the issue isn't new...

19:17 < cfields> gmaxwell: agreed. -1 from me as well. Just wanted to throw that out there.

19:17 < provoostenator> luke-jr: not sure, try with an english system locale and then adding Chinese labels, I can't remember if that works.

19:18 < luke-jr> provoostenator: last I checked, we had functional tests for non-English labels :/

19:18 < provoostenator> But with a Chinese system locale it does work afaik, hence it doesn't seem super urgent.

19:18 < wumpus> I think it's a serious issue

19:18 < MarcoFalke> We can backport to 0.17.1, if it qualifies as bug fix

19:18 < provoostenator> Functional tests that run on linux.

19:18 < wumpus> but it's risky to do this last minute

19:18 < wumpus> and it's not a regression

19:18 < gmaxwell> MarcoFalke: +1

19:19 < wumpus> MarcoFalke: agree

19:19 < provoostenator> Maybe just merge it into master after the 0.17 split so it gets testing quickly. Then we could always backport it if there's strong demand?

19:19 < gmaxwell> it's a bug, maybe one not worth backporting depending on how tidy the fix is.

19:19 < wumpus> it's unfortunate that this requires such invasive changes for windows

19:19 < wumpus> specific changes not required for other OSes

19:19 < ken2812221> This is really hard to fix.

19:20 < wumpus> which means most of use cannot test it usefully

19:20 < luke-jr> can it be reproduced in WINE?

19:20 < provoostenator> luke-jr: I haven't tried, I just use a virtual box with Windows 10

19:20 < MarcoFalke> I don't think we should fall back to WINE for testing that issue

19:21 < provoostenator> Someone recently offered to run and maintain Windows integration builds

19:21 < meshcollider> It doesn't sound like WINE would have the same issue tbh

19:21 < gmaxwell> Health professions recommend against trying to use wine to solve your problems.

19:21 < cfields> careful we don't spiral to homebrew...

19:21 < MarcoFalke> This really needs to be tested on native Windows (Not against testing it in wine additionally, though)

19:22 < wumpus> yes

19:22 < provoostenator> #12613

19:22 < gribble> https://github.com/bitcoin/bitcoin/issues/12613 | [CI] Adding MSVC build to CI check with Appveyor · Issue #12613 · bitcoin/bitcoin · GitHub

19:22 < ken2812221> If there is a way to run functionfal test on Window

19:22 < wumpus> MSVC is a orthogonal issue, I think, to solving this in the gitian builds

19:22 < MarcoFalke> I occasionally run them on appveyro

19:23 < luke-jr> MarcoFalke: it'd be nice to have Travis test it in the meantime

19:23 < wumpus> well not orthogonal but I wouldn't be surprised if there are differences mingw versus MSVC in unicoode handling

19:23 < MarcoFalke> Sure, if you can get this to work on travis, why not

19:24 < MarcoFalke> ken2812221: I think I also got them to run on my windows vm once

19:24 < wumpus> gmaxwell: this is one of the cases where you'd recommend stronger liquior instead.

19:24 < ken2812221> MSVC adds a unicode version of fstream, but mingw does not have that.

19:24 < meshcollider> MarcoFalke: "once" sounds promising ;)

19:25 < MarcoFalke> I could try once more and then write down the steps I did, heh

19:25 < cfields> ken2812221: huh, we could potentially add it and upstream to mingw, then.

19:26 < wumpus> why does utf-8 need a special fstream

19:26 < wumpus> I'm confused

19:27 < wumpus> why does this have to be so messed up

19:27 < ken2812221> wumpus: just filename

19:27 < ken2812221> It's fine to read the stream

19:27 < wumpus> can't we just drop windows support

19:27 < * wumpus> ducks

19:27 < meshcollider> Lol

19:28 < provoostenator> Maybe run it inside at little Linux VM? Can't be worse than Electron :-)

19:28 < wumpus> yesss

19:28 < wumpus> windows 10 already includes ubuntu right

19:28 < midnightmagic> it's messed up because windows can still run old windows crap and they have to support old api forever, including all the crappy api they built when the company was on the rocks before they discovered scm

19:28 < wumpus> let's dump the win32 garbage and use that...

19:28 < gmaxwell> lol. really with the proposed builder stuff, would building a whole linux system really be worse? :P

19:28 < cfields> wumpus: Doesn't a brand new win10 update add a bunch of compat stuff that would avoid these issues? I don't think it'd be crazy to consider dropping support for anything less than that reasonably soon.

19:29 < wumpus> cfields: it does! that's true

19:29 < provoostenator> Not by default, but you can install it. I once did the inception thing: Gitian builder VM inside Ubuntu inside Windows inside Virtual Box on Mac. It crashed.

19:29 < gmaxwell> cfields: I dont know the details, but my understanding is that a lot of people don't want to run windows 10 due to integrated adware or something?

19:29 < provoostenator> gmaxwell: the latest update, at least for me, even forced me to allow analytics data

19:30 < cfields> whoa

19:30 < meshcollider> Yeah it's much more invasive

19:30 < provoostenator> They're definately going for the get hacked by random people on the internet or get spied on by us sales pitch.

19:30 < cfields> gmaxwell: I suppose those people would be used to running outdated versions of things, then :\

19:30 < provoostenator> (or both)

19:31 < wumpus> ugh

19:31 < wumpus> then again, this is not a regression

19:31 < wumpus> what works on windows, works.

19:31 < gmaxwell> thats a possibility too, and better than dropping support for windows...

19:32 < wumpus> the question is whether we should change 10000 lines just to accomodate unicode in it

19:32 < gmaxwell> It seems really surprising that we'd need to change more than some wrapper functions.

19:32 < luke-jr> ^

19:32 < wumpus> yes it seems a design error in the first place if this cannot be wrapped somehow

19:33 < cfields> agreed. And that's a reasonable thing to fix, imo. IIRC it also means we have trouble adding filesystem sandboxing.

19:33 < wumpus> I do think ken2812221's PR is fairly ok in that regard

19:34 < wumpus> yes the reason I did the fs:: stuff is to allow for filesystem sandboxing

19:35 < cfields> wumpus: more is required though, right? I assume something along the lines of ken2812221's PR would help?

19:36 < wumpus> cfields: I had it down to only changes in fs.h and adding a fs.cpp, afaik

19:37 < cfields> ah, ok. nm then.

19:37 < wumpus> but maybe more is needed now that the code evolved further...

19:38 < gmaxwell> I hope we can find a set of changes that a reasonable independant of windows, so that the windows fix is just a couple files changed.

19:38 < wumpus> I think my main drawback in ken2812221 's PR is that it changes .string to stringu8 or something, which seems unnecessary, all strings should be utf-8

19:38 < wumpus> if you make sure the wrapper does that you don't need to change it everywhere in the code

19:39 < ken2812221> wunpus: That's how boost's path work. It need to pass a utf8 converter.

19:40 < wumpus> ken2812221: yes...

19:40 < wumpus> but our own wrapper could do that automatically

19:40 < ken2812221> Actually I have thought that before, but it need to patch boost.

19:40 < luke-jr> didn't we want to get rid of boost anyway?

19:40 < wumpus> or wrap boost

19:40 < luke-jr> maybe this is a good opportunity

19:40 < wumpus> we'll do so, in time

19:41 < wumpus> yes

19:41 < wumpus> any other topics?

19:41 < gmaxwell> Topic suggestion: Leveldb FD usage on x86_64

19:42 < wumpus> #topic leveldb FD usage on x86_64 (gmaxwell)

19:43 < gmaxwell> There was a recent report from a user hitting the select limit on his x86_64 linux host. Inspection with lsof shows that leveldb is using a lot of FDs on nodes where we expected it to be mostly using mmap. Apparently leveldb has a number of mmaps limit, as far as I know there isn't any reason we shouldn't increase it.

19:43 < gmaxwell> (seperately we should move to using poll ... but increasing the mmap limit should be a ~1 line change, unless someone knows a reason to not do so)

19:44 < wumpus> I think this limit was increased recently

19:44 < wumpus> specifically for 64-bit OSes, as it was deemed to be no problem

19:44 < wumpus> of course, I'm not surprised it turns out to be it is... :/

19:45 < wumpus> the reason to increase it was something with performance

19:46 < wumpus> #12495 maybe?

19:46 < gribble> https://github.com/bitcoin/bitcoin/issues/12495 | Increase LevelDB max_open_files by eklitzke · Pull Request #12495 · bitcoin/bitcoin · GitHub

19:46 < wumpus> "utACK ccedbaf - with the comment that I think relying on undocumented behavior of leveldb is risky. "

19:46 < wumpus> yeah, of course this is going to bite us in the ass

19:47 < wumpus> so revert?

19:47 < cfields> upstream has updated a bunch of stuff lately (namely moving to c++11 by default and dropping the need for lots of platform-dependent code). Has this been addressed as well, by any chance?

19:48 < gmaxwell> wumpus: hm so reading this it sounds to me that there is a seperate limit of 1000 mmaps. And running into that limit is what is causing more FDs to be used than we expected.

19:48 < gmaxwell> wumpus: so it might be possible to not revert that patch, but instead increase the maximum mmaps.

19:48 < cfields> (that bump is going to be painful, btw)

19:49 < wumpus> gmaxwell: I like switching to anything else than select() though!

19:49 < wumpus> we've been stuck with select with no good reason

19:49 < gmaxwell> wumpus: yes, I think we should do that too!

19:49 < sipa> gmaxwell: that would explain why i saw exactly 999 mmaps

19:49 < wumpus> my reason used to be, well, we're going to switch to using libevent for P2P any time now!

19:49 < gmaxwell> BlueMatt: and phantomcircuit: both previously had private patches to change to poll.

19:49 < wumpus> but that's been years

19:50 < wumpus> let's just do it

19:50 < sipa> wumpus: any time now!

19:50 < wumpus> change to poll() for unix-ish OSes

19:50 < gmaxwell> sipa: my comments above about the map limit are based on ossifrage (reporter's) work, and some of the comments in that PR above.

19:50 < cfields> wumpus: sorry :(

19:51 < gmaxwell> In any case, I think there are two issues: (1) switch to poll (duh but can we do it for 0.17) and (2) leveldb should probably be allowed more maps unless we know some reason to not do that.

19:51 < wumpus> cfields: nooOO sorry I didn't mean it as an attack on you, I could have done it, many other people could have done it, but we didn't

19:51 < gmaxwell> and maybe (3) potentially revert that PR; but given my understanding, I don't think we need to.

19:51 < gmaxwell> Leveldb being map limited is probably going to end up bad for performance even if we no longer had the FD issues.

19:51 < luke-jr> man mmap doesn't talk about a specific mmap limit

19:52 < wumpus> there is no mmap limit afaik

19:52 < gmaxwell> ^

19:52 < gmaxwell> Thats where my concern about increasing leveldb's mmap usage comes from-- I dont understand why the limit is there.

19:52 < cfields> there's definitely something that limits it. Some unrelated sysctl ?

19:52 < cfields> I know i've seen it documented somewhere.

19:52 < gmaxwell> Unless they're trying to reduce TLB thrashing or something.

19:54 < wumpus> cfields: for 0.18 we should definitely update leveldb though

19:54 < ossifrage> I bumped my mmap file limit up to 4k, currently 1180 ldb files are mapped and it isn't using any fds for ldb files

19:54 < gmaxwell> https://github.com/bitcoin/bitcoin/pull/12495#issuecomment-367815005 < eklitzke suggests increasing leveldb's mmap limit here. ossifrage also apparently did that and reported it fixed his problems, though his specific problems are likely better fixed by switching to poll.

19:55 < wumpus> for 0.17 I'd propose to keep it like this, not do anything wild...

19:55 <@sdaftuar> cfields: if i read correctly, the internet says sysctl vm.max_map_count

19:55 < gmaxwell> I thought it was probably too late to switch 0.17 to poll, but increasing the leveldb map count seems like an easy mitigation that probably improves performance.

19:56 < gmaxwell> [gmaxwell@bean ~]$ sysctl vm.max_map_count

19:56 < gmaxwell> vm.max_map_count = 65530

19:56 < cfields> is there a per-process one?

19:56 < ossifrage> vm.max_map_count = 65530 (same on my box, but that number seems a bit low to be a global count)

19:56 < cfields> gmaxwell: I'd be really nervous about that. There are so many specific quirks related to select/poll/epoll/kqueue

19:57 < ossifrage> It is per process

19:57 < cfields> (not against the change at all, only hesitant for the 0.17 cycle)

19:57 < wumpus> vm.max_map_count = 65530

19:57 < luke-jr> vm.max_map_count = 65530

19:57 < wumpus> checked on 3 linux boxes, same output

19:57 < gmaxwell> likewise.

19:57 < wumpus> I... strongly doubt this is an issue

19:58 < gmaxwell> So it might be prudent for 0.17 to bump the leveldb max to 4k or something.

19:58 < wumpus> if you manage to map 65000+ files, well, I"m not surprised you run into trouble

19:58 < gmaxwell> I still just wish I better understood why they had that 1000 default.

19:59 < luke-jr> what happens if the limit is hit?

19:59 < phantomcircuit> gmaxwell, mine only worked on linux and was actually just broken on windows

20:00 < wumpus> *dong*

20:00 < wumpus> #endmeeting

20:00 < lightningbot> Meeting ended Thu Aug 2 20:00:19 2018 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)

20:00 < lightningbot> Minutes: http://www.erisian.com.au/meetbot/bitcoin-core-dev/2018/bitcoin-core-dev.2018-08-02-19.01.html

20:00 < lightningbot> Minutes (text): http://www.erisian.com.au/meetbot/bitcoin-core-dev/2018/bitcoin-core-dev.2018-08-02-19.01.txt

20:00 < lightningbot> Log: http://www.erisian.com.au/meetbot/bitcoin-core-dev/2018/bitcoin-core-dev.2018-08-02-19.01.log.html

20:00 < ossifrage> luke-jr, it falls back to using fds, does it age old read-only ldb files out or does it just grow until it hits the limit?

20:01 < midnightmagic> gah, that was a meeting?

20:01 < sipa> haha

20:02 < sipa> the best kind of meetings don't feel like meetings at all

20:02 <@sdaftuar> gmaxwell: i just saw this https://github.com/google/leveldb/issues/128

20:02 < midnightmagic> gah I'm so stupid.

20:02 <@sdaftuar> the comment there makes it sound like the 1000 mmap limit is some kind of memory limiter, but i don't know if i'm understanding correctly

20:03 < gmaxwell> yep, seems like it.

20:03 < luke-jr> ossifrage: I don't see that in the code. It looks like it just fails completely

20:04 < luke-jr> s = IOError(fname, errno);

20:04 < ossifrage> luke-jr, I was running into the 1000 map limit and then it started using a ton of FDs

20:04 < luke-jr> ossifrage: I mean hitting the OS limit

20:04 < luke-jr> ie, set your OS limit to 900

20:05 < gmaxwell> I wonder if in the issue sdaftuar links the person is hitting the OS limit.

20:05 < ossifrage> (ah, I thought you where talking about the limiter limit, duh)

20:05 < gmaxwell> So leveldb added a maximum and set it to some random value.

20:06 < gmaxwell> sdaftuar: thanks for finding that, though I wish it were enlightening. The user reports having enough memory. Google replies 'we fixed your running out of memory with an arbritary limit'...

20:08 < luke-jr> seems like they should have at least made it so mmap failure falls back to using fds :/

20:08 < phantomcircuit> gmaxwell, that does seem to be explicitly saying it's to limit the mmap memory usage to 2000MB

20:08 < luke-jr> so was this a 32-bit problem only? I thought mmaps were only used on 64-bit?

20:09 < gmaxwell> So I think maybe the limit was put in for 32 bit hosts... and then it was replied to someone who was really reporting another problem.

20:09 < gmaxwell> luke-jr: we don't use mmaps on 32-bit hosts, but leveldb can...

20:10 < gmaxwell> (not our leveldb)

20:10 < wumpus> on 64 bit the limit is much higher

20:10 < gmaxwell> gah, that was unclear lemme try. We don't use mmap on 32bit with leveldb, but that is a result of our configuration, since we already manage to use all the address space ourselves.

20:11 < gmaxwell> wumpus: the leveldb mmap limit is also 1000 on 64-bit hosts.

20:11 < cfields> mmap_limit = sizeof(void*) >= 8 ? 1000 : 0

20:11 < luke-jr> so sounds like we should just set the mmap limit to infinite on 64-bit, and modify the mmap fail code to fallback?

20:12 < cfields> so it'd disable mmap for x32 (not x86) as well, right?

20:12 < cfields> (er, x86 too. but also x32 :)

20:12 < gmaxwell> luke-jr: well not infinte, but something reasonably under 65530.

20:13 < luke-jr> I guess we don't want to exhaust it for other apps

20:14 < luke-jr> 60k / number of bitcoinds launched by functional tests? :P

20:14 < gmaxwell> setting it to just 4096 would put it well above our current usage. ... and by the time we hit that, we'll hopefully be using poll and it'll just be a performance consideration.

20:14 < gmaxwell> ossifrage reports he's saying about 1180 maps.

20:15 < ossifrage> gmaxwell, after 1.5 days of uptime

20:15 < ossifrage> I started to see the socket problem after 20-30 days of uptime

20:16 < gmaxwell> well how many ldb files are there in total?

20:17 < ossifrage> from my last lsof, 992 from chainstate, 146 from txindex (currently). I have 1763 txindex ldb files and 1354 chainstate ldb files

20:18 < gmaxwell> k, so 4096 would allow you to have all your ldb files mapped.

20:18 < wumpus> leveldb's mmap limit doesn't correspond to any os-level mmap limit

20:19 < ossifrage> Yeah, is that limit per database or process wide?

20:19 < ossifrage> (the leveldb limiter limit)

20:20 < wumpus> per database

20:20 < gmaxwell> wumpus: yea, I think the history is that leveldb gained it due to 32bit. 2mb * 1000 = 2GB sounds a lot like someone trying to avoid VM exhaustion on 32bit.

20:21 < gmaxwell> and they were only thinking about one database and ... yadda yadda.

21:00 < phantomcircuit> gmaxwell, that's so... wtf