< bitcoin-git>
[bitcoin] kallewoof opened pull request #10267: New -readconfig argument for including external configuration files (master...feature-config-readconfig) https://github.com/bitcoin/bitcoin/pull/10267
<@wumpus>
"using deep learning frameworks in bitcoin mining", that's hilarious, what bottom-of-the-barrel buzzword factory did that come out of
< gmaxwell>
lol
< jonasschnelli>
heh
< emucode>
what if... that was the deep learning that discovered asicboost? :o
< BlueMatt>
wumpus: the same bottom-of-the-barrel buzzword bingo that came up with "using normal computers to simulate quantum computers to do bitcoin mining"
<@wumpus>
BlueMatt: ah yes the "apply all the buzzwords to bitcoin mining, see what sticks" approach
< sipa>
BlueMatt: i'm almost done splitting up pertxoutcache into more reasonable pieces... prepare to review 25 commits :)
<@wumpus>
"bitcoin mining on mars"
< jtimon>
uff, none of those are preparation commits that could be merged beforehand?
< BlueMatt>
sipa: god damn it
< BlueMatt>
jtimon: several already have been split out into separate prs, I think :)
< BlueMatt>
sipa: anyway, you're gonna have to wait until i spend two days on morcos' fee estimation stuff first, I think
< jtimon>
BlueMatt: what about using deep learning to come up with software optimizations that beat sha256d asics using gpgpu?
< BlueMatt>
sipa: I keep getting distracted because your stuff is easier for me to review, and I need to spend a day figuring out wtf fee estimation does
< * jtimon>
goes to read open PRs by sipa
< BlueMatt>
jtimon: sounds good, lets raise 10 mill and then tell investors it didnt work out the next day?
< BlueMatt>
can split the 5 even?
< BlueMatt>
or is that kind of overt scam reserved for ICOs now? :/
< fanquake>
Not sure what an ICO is, but I heard you could raise >10 times that just by building an internet connected juicer
< BlueMatt>
fanquake: I figured they just got so used to squeezing investors for money all they knew how to build was something that squeezed the contents out of bags
< BlueMatt>
*rimshot*
<@wumpus>
fanquake BlueMatt lol
<@wumpus>
the bags weren't even blockchain connected smart property :')
< BlueMatt>
wumpus: clearly we can further optimize their buzzword-compliance
< fanquake>
There is std::thread::hardware_concurrency(), but that seems to count virtual cores, which I don't think we want.
< BlueMatt>
fanquake: I doubt we'll do boost removal for 0.15
< BlueMatt>
shit like BOOST_FOREACH, sure
< BlueMatt>
but all of boost? doubtful, there are still things we need
< fanquake>
Yea sorry, not the whole lot, but we can remove a decent chunk. Just looking into what else needs to be done to replace some of the less involved Boost usage.
< BlueMatt>
fair
<@wumpus>
yes, it makes sense to plan ahead a bit, without immediately doing it
<@wumpus>
right, don't count virtual cores, that used to be the case but it makes no sense for our usage
<@wumpus>
it'd create a swarm of threads overwhelming any machine with hyperthreading (+accompanying thread stack overhead), for script validation, and there was no gain at all for that
< sipa>
BlueMatt: don't worry, there is no hurry
< morcos>
wumpus: i don't think that is correct
< morcos>
suppose you have 4 cores (8 virtual cores)
<@wumpus>
fanquake: indeed seems that std has no equivalent to physical_concurrency, on any standard. That's annoying as it is non-trivial to implement
< morcos>
i think running par=8 (if it let you) would be notably faster
< morcos>
jeremyrubin and i discussed this at length a while back... i think i commented about it on irc at the time
<@wumpus>
morcos: I think the conclusion at the time was that it made no difference, but sure would make sense to benchmark
< morcos>
perhaps historical testing on the virtual vs actual cores was polluted by concurrency issues that have now improved
<@wumpus>
I think there are not more ALUs, so there is not really a point in having more threads
<@wumpus>
hyperthreads are basically just a stored register state right?
< sipa>
wumpus: yes but it helps the scheduler
<@wumpus>
in which case the only speedup using "number of cores" threads would give you is, possibly, excluding other software from running on the cores on the same time
< morcos>
well this is where i get out of my depth
< sipa>
if one of the threads is waiting on a read from ram, the other can use the arithmetic unit for example
< morcos>
wumpus: i'm pretty sure though that the speed up is considerably more than what you might expect from that
<@wumpus>
sipa: ok, I back down, I didn't want to argue this at all
< morcos>
the reason i haven't tested it myself, is the machine i usually use has 16 cores... so not easy due to remaining concurrency issues to get much more speedup
<@wumpus>
I'm fine with restoring it to number of virtual threads if that's faster
< morcos>
we should have somene with 4 cores (and 8) actually test it though, i agree
< sipa>
i would expect (but we should benchmark...) that if 8 scriot validation threads instead of 4 on a quadcore hyperthreading is not faster, it's due to lock contention
< morcos>
sipa: yeah thats my point, i think lock contention isn't that bad with 8 now
<@wumpus>
on 64-bit systems the additional thread overhead wouldn't be important at least
< gmaxwell>
I previously benchmarked, a long time ago, it was faster.
< gmaxwell>
(to use the HT core count)
<@wumpus>
why was this changed at all then?
<@wumpus>
I'm confused
< sipa>
good question!
< gmaxwell>
I had no idea we changed it.
<@wumpus>
sigh :(
< gmaxwell>
What PR changed it?
< gmaxwell>
In any case, on 32-bit it's probably a good tradeoff... the extra ram overhead is worth avoiding.
<@wumpus>
the complaint was that systems became unsuably slow when using that many thread
<@wumpus>
so at least I got one thing right, woohoo
< sipa>
seems i even acked it!
< BlueMatt>
wumpus: there are more alus
< BlueMatt>
but we need to improve lock contention first
< morcos>
anywya, i think in the past the lock contention made 8 threads regardless of cores a bit dicey.. now that is much better (although more still to be done)
< BlueMatt>
or we can just merge #10192, thats fee
< morcos>
no, we do not need to improve lock contention first. but we should probably do that before we increase the max beyond 16
< BlueMatt>
then we can toss concurrency issues out the window and get more speedup anyway
< gmaxwell>
wumpus: yea, well in QT I thought we also diminished the count by 1 or something? but yes, if the motivation was to reduce how heavily the machine was used, thats fair.
< sipa>
the benefit of using HT cores is certainly not a factor 2
<@wumpus>
gmaxwell: for the default I think this makes a lot of sense, yes
< gmaxwell>
morcos: right now on my 24/28 physical core hosts going beyond 16 still reduces performance.
<@wumpus>
gmaxwell: do we also restrict the maximum par using this? that'd make less sense
<@wumpus>
if someone *wants* to use the virtual cores they should be able to by setting -par=
< * sipa>
flies to US
< BlueMatt>
sipa: sure, but the shared cache helps us get more out of it than some others, as morcos points out
< BlueMatt>
(because it means our thread contention issues are less)
< morcos>
gmaxwell: yeah i've been bogged down in fee estimation as well (and the rest of life) for a while now.. otherwise i would have put more effort into jeremy's checkqueue
< BlueMatt>
morcos: heh, well now you can do other stuff while the rest of us get bogged down in understanding fee estimation enough to review it :p
<@wumpus>
[to answer my own question: no, the limit for par is MAX_SCRIPTCHECK_THREADS, or 16]
< morcos>
but to me optimizing for more than 16 cores is pretty valuable as miners could use beefy machines and be less concerned by block validation time
< BlueMatt>
morcos: i think you may be surprised by the number of mining pools that are on VPSes that do not have 16 cores :/
< gmaxwell>
I assume right now most of the time block validation is bogged in the parts that are not as concurrent. simple because caching makes the concurrent parts so fast. (and soon to hopefully increase with bluematt's patch)
< gmaxwell>
improving sha2 speed, or transaction malloc overhead are probably bigger wins now for connection at the tip than parallelism beyond 16 (though I'd like that too).
< BlueMatt>
sha2 speed is big
< morcos>
yeah lots of things to do actually...
< gmaxwell>
BlueMatt: might be a tiny bit less big if we didn't hash the block header 8 times for every block. :P
< BlueMatt>
ehh, probably, but I'm less rushed there
< BlueMatt>
my new cache thing is about to add a bunch of hashing