#bitcoin-core-dev on 2018-08-01 — searchable irc log

00:08 < BlueMatt> gmaxwell: the fd_set 1024 limit stuff shouldn't cause that kind of problem, ldb shouldnt use almost any fds on 64 bit and should use a low-ish count on 32 bit, no?

00:08 < BlueMatt> I thought that's where we landed (as mmap doesnt need to use an fd)

00:08 < sipa> ossifrage: are you on a 32-bit system?

00:08 < BlueMatt> fwiw I think I have a branch with a rdwr lock implemented somewhere, though I never ended up using it

00:11 < sipa> can you implement a shared lock without support from underlying primitives?

00:11 < sipa> seems yes, after short googling

00:11 < BlueMatt> sipa: sure? with atomics and condvars, ofc

00:13 < sipa> yeah

00:13 < sipa> with just locks and condition variables you can implement any synchronization primitive, i believe

00:13 < * sipa> forgot

00:14 < BlueMatt> well, or s/atomics//

00:17 < BlueMatt> my old commit, no idea if its right but... https://github.com/TheBlueMatt/bitcoin/commit/8c5dc5e0ae19120fc79a6e4f7a56d60bbbacf348

00:41 < ossifrage> sipa, no x86_64

00:41 < sipa> ossifrage: what OS?

00:42 < ossifrage> linux 4.16.5-300

00:43 < gmaxwell> ossifrage: why did you say above that ldb was using lots of FDs?

00:43 < ossifrage> That was from the output of lsof and some counting

00:44 < BlueMatt> that seems...surprising, given its supposed to do mmap and then close the fd

00:44 < sipa> i have a number of chainstate files open by bitcoind as well - most are mmaped, but not all

00:44 < BlueMatt> so it may still show up in lsof but not use an fd?

00:44 < ossifrage> I was counting file descriptors not maps

00:45 < sipa> i have 30 chainstate files open with an FD, and 999 with mmap

00:45 < ossifrage> The node has been up for >30 days if that makes any difference

00:45 < gmaxwell> come on, why can't we just take the not-many-line-change to use poll? I know libevent future ra ra ra... but we have held off this simple fix for years. :(

00:46 < sipa> gmaxwell: we totally should

00:46 < ossifrage> I'm willing to test :-)

00:46 < BlueMatt> its super trivial to write

00:46 < BlueMatt> would take me longer to dig it up than someone rewrite it

00:46 < gmaxwell> I could dig up an old copy, but I know that phantomcircuit and BlueMatt run nodes with it.

00:46 < sipa> but i've also basically never encountered anyone actually seeing the "FD above 1024" error resulting in actually closed connections

00:46 < sipa> so i wonder what is special about ossifrage's setup

00:47 < sipa> or maybe just nobody ever paid attention to it

00:47 < gmaxwell> more fragmentation of databases? You do see a bunch of files open.

00:47 < gmaxwell> It also can be caused by RPC connections using up FDs.

00:47 < BlueMatt> I dont bother running mine with it anymore, and regularly have 500+ connections

00:47 < BlueMatt> at least that's usually when one asshat makes 100 connections, but usually that asshat is me

00:47 < sipa> having hundreds of files would certainly explain fd<1024 shortage

00:48 < BlueMatt> and I have a ton of scripts that poll rpc regularly, but not constant, sooo

00:48 < ossifrage> I have txindex and sadly my bitcoin data is on a btrfs filesystem (a mistake I won't make again)

00:48 < BlueMatt> both of those are also true on my seednodes

00:48 < gmaxwell> I think we have had other complaints about fd shortage... but I think we were chalking them up to rpc.

00:48 < BlueMatt> I *know* there are issues with rpc

00:48 < ossifrage> The only reason I noticed a problem was I was dropping a ton of connection due to select()

00:49 < gmaxwell> ossifrage: and the problem remained after restarting the node?

00:49 < ossifrage> gmaxwell, I haven't restarted the node

00:49 < gmaxwell> I wonder if this is a high uptime + txindex + only guy in that config who is watching the logs problem?

00:50 < gmaxwell> it would be nice to better understand why leveldb is leaving the files open... but ... switching to poll eliminates all these problems.

00:50 < ossifrage> both txindex and chainstate are gobbling up file descriptors (let me count the maps)

00:51 < BlueMatt> ossifrage: are you sure its using an fd, or just mmap, cause mmap *shouldnt* at least for ldb

00:51 < sipa> BlueMatt: i confirm that lsof shows both fd-ful opened files and mmapped ones

00:52 < BlueMatt> that...sucks

00:52 < ossifrage> 685 maps of chainstate/*.ldb and 268 maps of txindex (odd)

00:52 < sipa> example of an fd one:

00:52 < sipa> bitcoind 11155 pw mem REG 252,1 2173885 783231 /home/pw/.bitcoin/chainstate/864555.ldb

00:52 < sipa> and another:

00:52 < sipa> bitcoind 11155 pw 40r REG 252,1 2173957 779300 /home/pw/.bitcoin/chainstate/864998.ldb

00:52 < sipa> (sorry, swapped them; the first is mmap'ed, the other is with fd)

00:52 < ossifrage> the 40r is a fd map and the mem line is a mmapped one

00:52 < ossifrage> (fd open not fd map)

00:53 < BlueMatt> hmm, I wonder if ldb has tunables for that shit?

00:53 < sipa> though i only have 30 open files

00:53 < sipa> with fd

00:53 < ossifrage> There is a tunable to use 1024 fds, but is that per database?

00:53 < bitcoin-git> [bitcoin] MarcoFalke pushed 4 new commits to master: https://github.com/bitcoin/bitcoin/compare/0fb9c87815d1...e83d82a85c53

00:53 < bitcoin-git> bitcoin/master 9994d01 Jesse Cohen: Add Unit Test for SingleThreadedSchedulerClient...

00:53 < bitcoin-git> bitcoin/master b296b42 Jesse Cohen: Update documentation for SingleThreadedSchedulerClient() to specify the memory model

00:53 < bitcoin-git> bitcoin/master cbeaa91 Jesse Cohen: Update ValidationInterface() documentation to explicitly specify threading and memory model

00:53 < sipa> ossifrage: yes

00:53 < BlueMatt> MarcoFalke: wat

00:54 < bitcoin-git> [bitcoin] MarcoFalke closed pull request #13247: Add tests to SingleThreadedSchedulerClient() and document the memory model (master...scheduler-tests) https://github.com/bitcoin/bitcoin/pull/13247

00:54 < BlueMatt> oh, it did get rewritten to be r-a, nvm

00:55 < BlueMatt> no, it wasnt

00:55 < BlueMatt> https://github.com/bitcoin/bitcoin/compare/0fb9c87815d1...e83d82a85c53#diff-ff632a82ae2fed5941a9dae2c725ad9bR113

00:55 < BlueMatt> MarcoFalke: plz2fix comment broken ^

00:55 < gmaxwell> sipa: also even if the issue didn't exist on x86_64 (though it seems it does), we'd still have it on 32 bit.

00:55 < sipa> how do i find the uptime of by bitcoind?

00:56 < gmaxwell> there is a starttime in node info or something?

00:56 < ken2812221> uptime?

00:56 < ossifrage> Oh, in my case it was bitcoin-qt not bitcoind

00:56 < ken2812221> rpc

00:56 < sipa> ah, i tried getnodeinfo and getuptime

00:56 < sipa> seems it's just 'uptime'

00:57 < sipa> 10 days, apparently

00:57 < ossifrage> 32 days, currently with ~160 connections (which seems to be the most I can get, and I think it has been shrinking as more fds are used)

00:58 < ossifrage> I was going to build a new version, is it useful to reduce the max fd open count for leveldb?

00:59 < sipa> it may impact performance

00:59 < sipa> gmaxwell: on 32-bit systems we limit the max open files to 64

01:01 < ossifrage> I've mmapped >10k pgm files on this box at one point, not sure why leveldb wouldn't just mmap all of the ldb files?

01:03 < gmaxwell> ossifrage: oh you've increased your max connections over default.

01:03 < gmaxwell> so that might be one reason you're seeing this and we are not getting other reports.

01:06 < ossifrage> gmaxwell, yeah I have a gbit connection, figured I might as well get the most out of the blood money I pay verizon

01:07 < ossifrage> But if leveldb where to use 1024 fds, there would be nothing left for sockets

01:07 < gmaxwell> interesting to me that you're actually ending up with that many peers.

01:07 < ossifrage> gmaxwell, it takes a while, but the connection count slowly increases over time

01:08 < ossifrage> Before I had a UPS on my ONT, I'd change IPs ever power failure and then it would take a long time before the connection count would go back up (with the new address)

01:09 < sipa> i have 148 connections

01:09 < gmaxwell> cool.

01:09 < gmaxwell> some months back, on my long running nodes I was unable to break 125. must be more inbound right now.

01:09 < ossifrage> The txindex has been useful a few (rare) times, but just turning that off would delay the problem

01:09 < gmaxwell> (well, or spies, mine blocked spies really agressively)

01:10 < ossifrage> that is ~160 connections (with your block list)

01:10 < gmaxwell> yea, though I haven't updated the blocklist for a while

01:10 < sipa> ossifrage: txindex on or off wouldn't affect the behaviour w.r.t the chainstate ldb files

01:10 < gmaxwell> (right now I hav no nodes with public inbound)

01:10 < gmaxwell> sipa: yes but he is also losing a bunch of fds to txindex.

01:10 < sipa> and your number for those is on itself pretty high

01:10 < ossifrage> sipa, yeah but it was the chainstate + txindex that pushed the fd count >1024

01:11 < gmaxwell> Poll is also good because its faster.

01:11 < ossifrage> I'd happily test a patch :-)

01:12 < sipa> do we require Vista or later?

01:12 < sipa> because windows introduced a poll equivalent in Vista

01:12 < achow101> yes, xp support was removed a few versions ago

01:12 < gmaxwell> I think we don't formally support older versions but they still work.

01:12 < gmaxwell> ?

01:13 < sipa> xp certainly doesn't work anymore

01:13 < gmaxwell> Also at least in theory we might work on some other platform that doesn't have poll. We could try switching to only poll and see if someone complains.

01:13 < achow101> "Microsoft ended support for Windows XP on April 8th, 2014, No attempt is made to prevent installing or running the software on Windows XP, you can still do so at your own risk but be aware that there are known instabilities and issues. Please do not report issues about Windows XP to the issue tracker." <-- from 0.14 release notes

01:14 < ossifrage> I use XP to do my taxes, amazingly the tax software works on it

01:16 < gmaxwell> hm, I thought poll was faster than select, https://monkey.org/~provos/libevent/libevent-benchmark.jpg then again, maybe I don't understand that graph, because how did they manage 25000 FDs with select?

01:17 < gmaxwell> sipa: in windows select doesn't have the 1024 fd limit thing

01:17 < gmaxwell> it's implemented as a linked list or something.

01:19 < luke-jr> array of fd numbers IIRC

01:19 < luke-jr> and it does have a limit

01:19 < luke-jr> just not for the fd numbers themselves

01:19 < luke-jr> defaults to 64

01:21 < ossifrage> I thought on linux you could call select() with larger fdsets and it would work, but the libc fd_set is a fixed size?

01:21 < ossifrage> But it is not exactly efficient, especially with sparce sets

01:21 < luke-jr> you're supposed to be able to #define FD_SETSIZE before including stuff, to get more, but last I checked that was broken in glibc

01:22 < ossifrage> I've used epoll() for so long, using select() just makes me sad

01:25 < gmaxwell> ossifrage: indeed, but unfortunately BSDs and linux solved "life beyond poll" differently.

01:25 < ossifrage> gmaxwell, yeah that was never a concern for the stuff I was writing

01:26 < ossifrage> Sure it's portable, you can port it to any linux you like

01:26 < gmaxwell> we manage few enough connections that poll is fine anyways.

01:28 < phantomcircuit> gmaxwell, think you're looking at that graph wrong

01:28 < phantomcircuit> smaller is better

01:28 < phantomcircuit> or did you mean that poll() is the same as select() ?

01:30 < phantomcircuit> select and poll do basically the same thing just with a much better api for poll

01:30 < phantomcircuit> both pass the entire list to the kernel in every call

01:33 < ossifrage> epoll() was a big win for high connection count, low traffic

01:37 < Dave18> With our IRC ad service you can reach a global audience of entrepreneurs and fentanyl addicts with extraordinary engagement rates! https://williampitcock.com/

01:37 < Dave18> I thought you guys might be interested in this blog by freenode staff member Bryan 'kloeri' Ostergaard https://bryanostergaard.com/

01:37 < Dave18> Read what IRC investigative journalists have uncovered on the freenode pedophilia scandal https://encyclopediadramatica.rs/Freenodegate

01:37 < Dave18> A fascinating blog by freenode staff member Matthew 'mst' Trout https://MattSTrout.com/

01:39 < ken2812221> spam again

01:42 < phantomcircuit> gmaxwell, epoll and kqueue are virtually identical

01:42 < phantomcircuit> it's really silly

01:58 < midnightmagic> I think there's something even more different in NetBSD too.. kevent()? kfilter_register()? I forget now.

01:58 < gmaxwell> phantomcircuit: yes, I was saying I thought poll was somewhat faster, but apparently not.

01:58 < gmaxwell> phantomcircuit: go PR that poll patch.

01:59 < midnightmagic> There are faster things than poll if you use whatever they provide natively.

02:00 < gmaxwell> yes, but faster is not generally our issue, max connections = 100, or at most a few hundred.

02:02 < midnightmagic> (also no limitations a la select()'s irritating problems) -- and with a usage of the native event'ing things get very scaleable. But.. not like anyone but somoene like me is going to run a larger-scale system with NetBSD anyway.

02:04 < phantomcircuit> midnightmagic, the issue is that epoll and kqeueue and whatever windows uses are all platform specific

02:04 < phantomcircuit> there's some work being done to move to libevent but that's not done

02:04 < phantomcircuit> the poll() thing is pretty trivial iirc

02:04 < phantomcircuit> 80% solution for 10% the effort

02:05 < gmaxwell> phantomcircuit: open the PR!

02:05 < gmaxwell> I know you have had a patch.

02:06 < phantomcircuit> it's like 3 years old now but should be trivial to rewrite

02:08 < midnightmagic> using things like kevent natively isn't hard, it just needs to be clean and people on those platforms will look after it.

02:09 < gmaxwell> true but not obviously of any real value.

02:13 < phantomcircuit> midnightmagic, under thousands of fds it doesn't much matter

02:15 < gmaxwell> phantomcircuit: PR PR PR

02:17 < phantomcircuit> gmaxwell, yeah yeah

02:21 < phantomcircuit> gmaxwell, i remember

02:21 < phantomcircuit> windows is WSAPoll not poll

02:21 < phantomcircuit> and all the types are insane

02:21 < phantomcircuit> like it's virtually identical

02:21 < phantomcircuit> but not

02:30 < phantomcircuit> but i guess that codes already full of hacks for that anyways

03:12 < gmaxwell> phantomcircuit: windows can keep using select, it doesn't hae the same limit.

03:15 < sipa> yes, but its fdset implementation is a linked list with horrendous performance

03:15 < gmaxwell> so? I mean, we only need it to support max-connections. and it already does.

03:16 < gmaxwell> or is it so bad that its noticably slow even for 125 connections?

03:16 < sipa> probably not

03:21 < luke-jr> last time I checked, the only better alternative Windows had required significantly re-architecturing most programs to use it

03:21 < luke-jr> something along the lines of async IO, rather than write-ready notification/checking

07:07 < ossifrage> My computer shat itself, but I found the leveldb mmap limit and bumped it from 1000 to 4096, hopefully that will address my problem

07:31 < wumpus> luke-jr: yes let's definitely not do that, last thing we want to maintain is complex specificially for windows rearchitected network code in the repository

08:26 < kallewoof> Are there cases where the rev file for a previous blkXXX file are modified? Is this something that happens often? I assume it only happens at reorgs, in which case it should be very seldom except at transition to XXX+1

08:26 < kallewoof> s/are modified/is modified/

08:54 < wumpus> kallewoof: no, that never happens

08:55 < wumpus> kallewoof: only the last blkXXXXX file is written too, other files are only potentially deleted (pruning)

08:55 < wumpus> in case of a reorg old blocks will not actually be overwritten

08:55 < kallewoof> wumpus: really? what if a 2-block reorg happens right after a new blk file was created containing a single block?

08:55 < kallewoof> the rev file was for reorgs, i thought

08:56 < wumpus> same for rev files, as far as I know, the data for rev-ing the old blocks will stay there

08:56 < wumpus> it just won't be referenced by the active chain anymore

09:00 < kallewoof> wumpus: Huh, okay. Well, that's good news for masterdatadir PR then

09:01 < wumpus> yes

09:05 < wumpus> that principle works, I've been using it for a long time with an external script

09:06 < wumpus> https://github.com/bitcoin-core/bitcoin-maintainer-tools/blob/master/fastcopy-chaindata.py

09:29 < kallewoof> wumpus: Wait, ldb files are readonly too? Right now I am copying the chainstate over (~4 gb)

09:30 < kallewoof> wumpus: Though I can't really use the same approach there... wonder if it would be useful to check for hard linking capabilities and using them if found...

09:41 < kallewoof> So, about lint-locale-dependence.sh, which by the way has a list of violations about as long as the linter itself, complains about a bunch of functions because they are locale dependent. But there is no alternative (fix). If you need e.g. std::strtoull() you need to add to the list of violations in the linter. Is this even useful at all, when there are no non-locale dependent alternatives you can switch to?

09:43 < kallewoof> s/strtoull/stoull/g

16:22 < skeees> BlueMatt: sorry missed that one, updated in https://github.com/bitcoin/bitcoin/pull/13835

16:37 < provoostenator> Potentially trivial to review RPC doc improvements: #13676, #13662

16:37 < gribble> https://github.com/bitcoin/bitcoin/issues/13676 | Explain that mempool memory is added to -dbcache during IBD by Sjors · Pull Request #13676 · bitcoin/bitcoin · GitHubAsset 1Asset 1

16:37 < gribble> https://github.com/bitcoin/bitcoin/issues/13662 | Explain when reindex-chainstate can be used instead of reindex by Sjors · Pull Request #13662 · bitcoin/bitcoin · GitHubAsset 1Asset 1

18:09 < sipa> do we want to make the bot join in order to message here?

18:10 < sipa> join/leave spam would be somewhat annoying, but not as bad as spam

18:20 < achow101> most clients can hide join and leave messages, so I think that's fine

18:22 < midnightmagic> Does the bot have a freenode account?

18:22 < midnightmagic> If so, then +q $~a allows people to still join and watch, and worst case we get join/parts from the spammer bots.

19:02 < booyah> sipa: bot must join/part becasue that is how github works?

19:03 < booyah> possible solution: create #botx channel, have bot join say and part there. Setup a message relaying bot (tiny python script) to relay msgs from there to here, and the relay bot will be always joined

19:04 < sipa> booyah: we added +n to this channel (which requires joining in order to speak) to combat spam

19:12 < midnightmagic> the second bot would be present here as well and just speak it, I think is what he means.

19:12 < sipa> yeah, i understand the suggestion - i don't have much of an opinion on it :)

19:13 < sipa> i was just explaining it's not because how github works but because we have +n on

19:13 < midnightmagic> ah

19:13 < sipa> oh i see; i guess booyah understood that, but by "because that is how github works" booyah means as opposed to have it be continuously present

19:13 < sipa> right

19:14 < booyah> (yeah just afair github bot anyway was always joining and parting)

19:15 < midnightmagic> \o

19:17 < sipa> actually, #bitcoin-commits already exists

19:17 < sipa> we could have a bot mirror from there

19:22 < booyah> sipa: https://github.com/str4d/RelayBot

19:24 < booyah> I hope it works between 2 chans on same server

19:24 < sipa> i've just turned on join/leave

23:06 < n00bington> so I'm looking at this page on the wiki

23:06 < n00bington> https://en.bitcoin.it/wiki/Secp256k1

23:07 < n00bington> where in the source code are those parameters being implemented?

23:07 < n00bington> doing some security research for my compsec class

23:07 < achow101> n00bington: somewhere in src/secp256k1

23:07 < n00bington> achow101, thanks

23:07 < achow101> there's a library that implements all of that stuff: https://github.com/bitcoin-core/secp256k1

23:07 < achow101> that lib is put in src/secp256k1

23:08 < n00bington> cool lemme see if i can find it

23:08 < n00bington> thanks

23:11 < sipa> n00bington: it's spread out

23:11 < n00bington> sipa, what do you mean?

23:11 < sipa> the library implementing the elliptic curve things is in bhttps://github.com/bitcoin-core/secp256k1

23:11 < n00bington> right

23:11 < sipa> there's also a separate IRC channel about it, #secp256k1

23:11 < n00bington> oh

23:11 < n00bington> thanks