< * luke-jr> is not very surprised
< echeveria> 2018-06-11 02:03:03.384975 Verifying last 3 blocks at level 3
< echeveria> 2018-06-11 02:03:23.676793 No coin database inconsistencies in last 4 blocks (6564 transactions)
< echeveria> off by one?
< sipa> echeveria: possibly!
< kallewoof> Looks like it checks one more block than suggested. `if (pindex->nHeight < chainActive.Height()-nCheckDepth) break;` should probably be `<=`.
< sipa> kallewoof: agree
< bitcoin-git> [bitcoin] kallewoof opened pull request #13428: validation: check the specified number of blocks (off-by-one) (master...validation-off-by-one) https://github.com/bitcoin/bitcoin/pull/13428
< bitcoin-git> [bitcoin] Empact opened pull request #13429: Return the script type from Solver (master...solver-return) https://github.com/bitcoin/bitcoin/pull/13429
< gmaxwell> sipa: you created a 32-byte version of the specialized 1-way SSE4 asm?
< bitcoin-git> [bitcoin] kallewoof opened pull request #13430: use IsBlockPruned() where appropriate (master...use-isblockpruned) https://github.com/bitcoin/bitcoin/pull/13430
< bitcoin-git> [bitcoin] kallewoof opened pull request #13431: validation: update pindexState for check level < 3 (master...verifydb_pindexstate_lvl0-2) https://github.com/bitcoin/bitcoin/pull/13431
< ossifrage> FYI, bitcoin-qt from the head I built today won't start if you have "daemon=0" in the config file, so you can't use the same config for either bitcoind or bitcoin-qt
< ossifrage> Seems like bitcoin-qt should ignore this option?
< bitcoin-git> [bitcoin] ken2812221 opened pull request #13434: Set default CFLAGS, CXXFLAGS to empty (master...enable_debug) https://github.com/bitcoin/bitcoin/pull/13434
< provoostenator> ossifrage: probably caused by 13112. Another problem is disablewallet=1 will prevent a launch if you compile bitcoind without wallet. It probably needs to be relaxed slightly.
< provoostenator> #13112
< gribble> https://github.com/bitcoin/bitcoin/issues/13112 | Throw an error for unknown args by achow101 · Pull Request #13112 · bitcoin/bitcoin · GitHub
< bitcoin-git> [bitcoin] ken2812221 closed pull request #13434: Set default CFLAGS, CXXFLAGS to empty (master...enable_debug) https://github.com/bitcoin/bitcoin/pull/13434
< wumpus> rc2 executables up https://bitcoincore.org/bin/bitcoin-core-0.16.1/test.rc2/, sorry for the delay
< bitcoin-git> [bitcoin] Empact opened pull request #13435: When build fails due to lib missing, indicate which one (master...lib-missing) https://github.com/bitcoin/bitcoin/pull/13435
< kallewoof> Regarding #13434, shouldn't --enable-debug automatically do --disable-maintainer-mode? Currently --enable-debug is useless at least on macs, as it still generates optimized code so lldb barfs.
< gribble> https://github.com/bitcoin/bitcoin/issues/13434 | Set default CFLAGS, CXXFLAGS to empty by ken2812221 · Pull Request #13434 · bitcoin/bitcoin · GitHub
< wumpus> kallewoof: maintainer mode affects optimization?
< wumpus> provoostenator: I think that one makes sense, if you compile without wallet the program has no way to know about wallet options, so will reject them
< wumpus> provoostenator: the alternative would be to move knowledge of wallet options into the core, that'd be even worse
< rafalcpp> wumpus: I wonder how githubmerge could support git submodules. The submodules are linked by sha1 which is useless.
< rafalcpp> perhaps we should recursivly calculate our tree-sha512 of a submodule, and that checksum will be placed as data (next to blobs) of the tree of parent
< provoostenator> Not full knowledge, just any option that disables these features, like upnp=0, disablewallet=1. That way if you forget to leave them out of compilation, those features don't just turn themselves on.
< wumpus> hmm okay that sounds better at least
< wumpus> rafalcpp: sounds like a lot of hassle, indeed
< * rafalcpp> spanks Torvalds to have git move to sha512 nativly
< wumpus> agree
< wumpus> I think he keeps insisting that 'commit hashes are not a security feature', fair enough, but many people need a way to authenticate repositories securely
< rafalcpp> yeap
< wumpus> and the commit hash is the weakest link in that
< rafalcpp> someone should ask him does he again want CIA to if (uid = 0) ... him like they did old CVS, but this time with false sense of security
< wumpus> well it's open source so someone that cares should do it,instead of trying to convince Linus, that's how these things work. I wish I had the energy and time.
< wumpus> whoa is bitcoinacks correct here https://bitcoinacks.com/?flt1_closed_empty=1&flt0_merged_empty=1&sort=7 none of the open PRs that are non high priority for review have any ACK/NACKs? or is it glitching somehow
< wumpus> nm, got the sorting wrong, phew
< rafalcpp> that Tree-SHA512 format, is something that bitcoin invented, or is it strictly based on some agreed upon convention? (ordering and format of data that is the material to be hashed)?
< wumpus> it's something that was invented for our script AFAIK
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/56f69360dc98...6e249e46789f
< bitcoin-git> bitcoin/master cbede7d Sjors Provoost: [qt] OptionsDialog: add prune setting
< bitcoin-git> bitcoin/master 6e249e4 Wladimir J. van der Laan: Merge #13043: [qt] OptionsDialog: add prune setting...
< bitcoin-git> [bitcoin] laanwj closed pull request #13043: [qt] OptionsDialog: add prune setting (master...2018/04/qt-prune) https://github.com/bitcoin/bitcoin/pull/13043
< promag> wumpus: are you available for merges?
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/6e249e46789f...531a0337ca93
< bitcoin-git> bitcoin/master fa6edfe MarcoFalke: qa: Remove portseed_offset from test runner
< bitcoin-git> bitcoin/master 531a033 Wladimir J. van der Laan: Merge #13421: qa: Remove portseed_offset from test runner...
< bitcoin-git> [bitcoin] laanwj closed pull request #13421: qa: Remove portseed_offset from test runner (master...Mf1806-qaPortseedOffset) https://github.com/bitcoin/bitcoin/pull/13421
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/531a0337ca93...70a03c635b73
< bitcoin-git> bitcoin/master f68049d Cory Fields: crypto: cleanup sha256 build...
< bitcoin-git> bitcoin/master 70a03c6 Wladimir J. van der Laan: Merge #13408: crypto: cleanup sha256 build...
< bitcoin-git> [bitcoin] laanwj closed pull request #13408: crypto: cleanup sha256 build (master...sha2-cleanup) https://github.com/bitcoin/bitcoin/pull/13408
< bitcoin-git> [bitcoin] laanwj pushed 2 new commits to master: https://github.com/bitcoin/bitcoin/compare/70a03c635b73...26c93edf1de9
< bitcoin-git> bitcoin/master a426098 practicalswift: Fix compiler warnings emitted when compiling under stock OpenBSD 6.3
< bitcoin-git> bitcoin/master 26c93ed Wladimir J. van der Laan: Merge #13294: Fix compiler warnings emitted when compiling under stock OpenBSD 6.3...
< bitcoin-git> [bitcoin] laanwj closed pull request #13294: Fix compiler warnings emitted when compiling under stock OpenBSD 6.3 (master...openbsd-warnings) https://github.com/bitcoin/bitcoin/pull/13294
< bitcoin-git> [bitcoin] laanwj pushed 3 new commits to master: https://github.com/bitcoin/bitcoin/compare/26c93edf1de9...3f0f39415bd7
< bitcoin-git> bitcoin/master 8160817 John Newbery: [wallet] [rpc] Remove getlabeladdress RPC...
< bitcoin-git> bitcoin/master 67e0e04 John Newbery: [wallet] [docs] Update release notes for removing `getlabeladdress`
< bitcoin-git> bitcoin/master 3f0f394 Wladimir J. van der Laan: Merge #13060: [wallet] [rpc] Remove getlabeladdress RPC...
< bitcoin-git> [bitcoin] laanwj closed pull request #13060: [wallet] [rpc] Remove getlabeladdress RPC (master...remove_getlabeladdress) https://github.com/bitcoin/bitcoin/pull/13060
< wumpus> promag: yes
< promag> wumpus: #12151
< gribble> https://github.com/bitcoin/bitcoin/issues/12151 | rpc: Remove cs_main lock from blockToJSON and blockheaderToJSON by promag · Pull Request #12151 · bitcoin/bitcoin · GitHub
< promag> or #13160
< gribble> https://github.com/bitcoin/bitcoin/issues/13160 | wallet: Unlock spent outputs by promag · Pull Request #13160 · bitcoin/bitcoin · GitHub
< MarcoFalke> promag: #12151 was just pushed, imo not ready for merge
< gribble> https://github.com/bitcoin/bitcoin/issues/12151 | rpc: Remove cs_main lock from blockToJSON and blockheaderToJSON by promag · Pull Request #12151 · bitcoin/bitcoin · GitHub
< wumpus> thanks, I'll have a look at those
< promag> MarcoFalke: right, should be re-ack
< promag> MarcoFalke: if you can take a look at 13160 too
< bitcoin-git> [bitcoin] laanwj pushed 10 new commits to master: https://github.com/bitcoin/bitcoin/compare/3f0f39415bd7...43ae5ee9e4c2
< bitcoin-git> bitcoin/master b9ef21d Karl-Johan Alm: mempool: Add explicit max_descendants...
< bitcoin-git> bitcoin/master 46847d6 Karl-Johan Alm: mempool: Fix max descendants check...
< bitcoin-git> bitcoin/master 475a385 Karl-Johan Alm: Add GetTransactionAncestry to CTxMemPool for general purpose chain limit checking
< bitcoin-git> [bitcoin] laanwj closed pull request #12634: [refactor] Make TransactionWithinChainLimit more flexible (master...txmempool-chain-limit-value) https://github.com/bitcoin/bitcoin/pull/12634
< wumpus> #13160 has only my utACK
< gribble> https://github.com/bitcoin/bitcoin/issues/13160 | wallet: Unlock spent outputs by promag · Pull Request #13160 · bitcoin/bitcoin · GitHub
< wumpus> I don't think that's enough for a change in behavior like that
< wumpus> even though code-wise it's very simple to review
< sipa> MarcoFalke: you mentioned that fetching PR information from github was the bottleneck for drahtbot? are you using the git interface for PRs?
< MarcoFalke> I do. But I also need to get the open pull requests and metadata for them.
< sipa> hmm
< ken2812221> Hi all. In order to fix #13103, is it allowed to add additional functions and methods like #13426 for Windows, and add macros for other OS? (The first and second commits)
< gribble> https://github.com/bitcoin/bitcoin/issues/13103 | Invalid wallet path with Chinese characters in windows · Issue #13103 · bitcoin/bitcoin · GitHub
< gribble> https://github.com/bitcoin/bitcoin/issues/13426 | [WIP, bugfix] Add u8path and u8string to boost to fix #13103 by ken2812221 · Pull Request #13426 · bitcoin/bitcoin · GitHub
< ken2812221> additional function and method in boost header
< MarcoFalke> And the "Needs rebase" task of DrahtBot is purely based on the api
< MarcoFalke> Since GitHub hasn't yet disclosed which merge tool they use
< sipa> MarcoFalke: seems reasonablw that it's pretty much vanilla git
< sipa> as our merge tool compares the local merge with the github merge
< sipa> and afaik never failed for noticing a differencw
< MarcoFalke> They advertise as merge conflict when one of the files got moved
< MarcoFalke> vanilla git doen't afaict
< sipa> hmm
< echeveria> bitcoind seems to get very sad sometimes if you have a low maxconnections
< ryanofsky> i have a hacky script that checks for merge conflicts reasonably quickly with "git diff | git apply" https://github.com/ryanofsky/home/blob/master/src/pr.sh#L558
< promag> wumpus: I can't reproduce #13111 failures locally
< gribble> https://github.com/bitcoin/bitcoin/issues/13111 | Add unloadwallet RPC by promag · Pull Request #13111 · bitcoin/bitcoin · GitHub
< promag> should retry travis job?
< bitcoin-git> [bitcoin] ken2812221 closed pull request #13107: Fix Windows locale problem (master...win-enc) https://github.com/bitcoin/bitcoin/pull/13107
< Purple7> Hi everyone :)
< Purple7> can someone here help me create a bitcoin script. I just installed Bitcoin core node and have tried bitcoin-cli commands
< echeveria> I don't know what you imagine bitcoin script to be capable of.
< Purple7> I want to create a script but not sure how to create one and how to run one
< wumpus> promag: I had some segfaults locally with it, but seems hard to reproduce
< Purple7> to carry out transactions
< echeveria> Purple7: lets do this in #bitcoin.
< promag> wumpus: yes I think I can reproduce now
< Purple7> Thanks echeveria
< Purple7> :)
< promag> wumpus: delete regtest folder, launch, then rpc stop
< promag> looks like that way always segfaults
< Chris_Stewart_5> Is there any ordering required for the rpcport config option? We are seeing behavior where it is being read if passed in as a command line arg but not being recogonized inside of a bitcoin.conf file
< Chris_Stewart_5> we are reading other config options from the bitcoin.conf file
< Chris_Stewart_5> just not rpcport it seems
< sipa> Chris_Stewart_5: what version of the code?
< Chris_Stewart_5> 0.16.0
< wumpus> rpcport should work fine in the bitcoin.conf, I've had it in there for ages
< wumpus> note that master has per-network config sections, but 0.16 does not
< Chris_Stewart_5> yeah it is pretty bizzare. We are reading other options like daemon=1 in the conf file. Just not the rpcport for some reason
< wumpus> are you really sure? as said, I have a setup myself that uses that
< Chris_Stewart_5> just a sec, I'm double checking he didn't compile from master
< Chris_Stewart_5> nvm, he is on 0.16.99.
< sipa> how do you observe it's not using rpcport?
< wumpus> ok on master you need to put it in a network-specific section, this is described in the doc/release-notes.md
< sipa> wumpus: oh!
< wumpus> it should also warn in the log
< Chris_Stewart_5> basically it was always binding to 18433 if we set rpcport via the command line
< Chris_Stewart_5> unless we set it via the command line* sorry
< provoostenator> Anyone else noticed bitcoin-cli crashing on macOS 10.13.5? I had to recompile. It's quite possible I did something stupid unrelated, but seeing c-lightning and random Github projects struggling with macOS 10.13.5, thought I'd check...
< provoostenator> What's stranger - and why I initially assume I did something else wrong: it didn't happen immediately after the upgrade but a few days later, so it may have been a homebrew or xcode update.
< wumpus> sipa: release-notes-pr12823.md apparently
< sipa> oh, i forgot we added pr-specific files
< wumpus> it does make things harder to find
< sipa> wumpus: any opinion about converting the existing sse4 assembly code (which was in 0.16) into intrinsics based code?
< sipa> downside: possibly a bit slower (i've seen up to 4% slower) and compiler dependent
< sipa> upside: more easily reusable, readable, and works on 32-bit systems
< promag> wumpus: found the problem, will push a fix in a separate commit for now
< echeveria> yeah, that's a weird one. I have hardcoded `connect` peers but it doesn't sync.
< echeveria> random peers, syncs perfectly, hardcoded it doesn't sync, even with the same peers.
< wumpus> sipa: no strong opinion on that; 32 bit x86 is very rare so I wouldn't do it for that, given that it's also 4% slower, "do nothing" sounds like a good option to me
< gmaxwell> 22:50:10 < gmaxwell> sipa: you created a 32-byte version of the specialized 1-way SSE4 asm?
< sipa> gmaxwell: no, not specialized
< sipa> we just have a benchmark for SHA256 with 32-byte inputs, and that benchmark shows a slowdown when going from asm-based 1-way SSE4 to intrin-based 1-way SSE4
< sipa> gmaxwell: also, the speedup from the 1-way SSE4 is pretty neat: it does the expansion for rounds X+16...X+19 in SSE registers simultaneously, interleaved with an otherwise totally ordinary implementation for rounds X..X+3
< wumpus> I wonder, is anyone using 32-bit x86 for bitcoin nodes? the last 32-bit-only x86 CPU was sold in 2008 or so, 10 years ago now
< sipa> wumpus: ah, but those don't even have SSE4 :)
< sipa> the only use for SSE4 in 32-bit mode is people running a 32-bit binary on 64-bit hardware
< wumpus> too bad we don't have statistics how much those get downloaded
< wumpus> right
< sipa> my goal in converting to intrinsics was making it reusable for specialized 32-byte or 64-byte inputs; we can leave the original untouched of course, and only have intrinsics based versions for the specialized versions
< wumpus> for that it certainly makes sense, easier to specialize code w/ intrinsics than manual register allocation
< sipa> i also wonder what to do about CI for special hardware
< sipa> SHA-NI is unlikely to be available on any Travis system
< sipa> and POWER9 would surprise me even more :D
< luke-jr> test these things more carefully when they change?
< wumpus> it's the same case as platforms such as FreeBSD and OpenBSD, we'll have to rely on people regularly compilng and testing master with them
< luke-jr> well, at least the platform-specific stuff in this case is isolated
< sipa> i'm also going to extend the self-test code for these
< sipa> so at least it will fail at startup rather than arbitrarily
< wumpus> it's notrealistic to have CI for all OS/architecture combinations, not even all OSes and architectures separately
< wumpus> I still have a PR open to run travis on at least one bigendian platform, but even that seems untenable
< luke-jr> maybe if someone were to volunteer to maintain a CI system for us that does all that, but yeah, probably not with Travis
< wumpus> it's not realistic to do that for every PR, sure, some daily cron job would work
< MarcoFalke> Apparently we have some jenkins running somewhere, maybe jamesob can see if that would work with FreeBSD/OpenBSD
< MarcoFalke> SHA-NI and POWER9 should already be tested by devs regularly, no?
< gmaxwell> with jenkins it would be as simple as adding a ssh key on a host and telling jenkins to use it as a remote. I guess travis doesn't have a similar facility for non-travis build hosts?
< jamesob> gmaxwell: travis is limited to execution on their infra AFAICT
< MarcoFalke> I think so as well ^
< gmaxwell> in any case, I assume jenkins hosts will have sha-ni sometime, for now we'll just have to count on developers (lol, pieter is the only person I know with it right now.) catching it.
< wumpus> in theory it's possible to ssh out from travis, but that sounds horribly fragile
< wumpus> (also, key management is a problem)
< gmaxwell> for jenkins the integration is super nice, it's not just 'sshing out' but once you give jenkins access it does all the stuff to properly use it as a build host automagically. (wow, java actually useful for something)
< wumpus> right, that sounds better
< bitcoin-git> [bitcoin] MarcoFalke opened pull request #13437: wallet: Erase wtxOrderd wtx pointer on removeprunedfunds (master...Mf1806-walletPrunedFundsSegfault) https://github.com/bitcoin/bitcoin/pull/13437
< luke-jr> MarcoFalke: I should be moving my development to my POWER9 system in this next week
< MarcoFalke> luke-jr: Good to hear
< MarcoFalke> Also BlueMatt switched to POWER9
< cfields> sipa: fwiw, I now fully understand the 1way sha2 optims for sse4/avx. I started working on the intrinsics over the weekend
< sipa> cfields: too late :)
< cfields> sipa: just thought I'd mention in case you were also looking at it
< cfields> oh?
< cfields> heh, did I miss a PR?
< sipa> cfields: i have intrinsics based 1-way sse4 working
< sipa> not PRed yet
< sipa> sorry, didn't know you were planning to work on it so soon as well
< cfields> sipa: no worries. Now I can pick yours apart :p
< BlueMatt> yea, POWER9 should get pretty good testing, sdaftuar and I have already been using it for some time now
< provoostenator> cfields: or work on #13401
< gribble> https://github.com/bitcoin/bitcoin/issues/13401 | ARMv8 sha2 support · Issue #13401 · bitcoin/bitcoin · GitHub
< provoostenator> :-)
< BlueMatt> and I think we might be getting a shared test box on it soonish if people want to get an ssh key to have access to it
< gmaxwell> sipa: would the 1-way sha2 be more efficient for longer inputs if it processed more blocks at once?
< gmaxwell> (e.g. could it be expanding the next group)
< sipa> gmaxwell: hmm, yes!
< cfields> sipa: did you include the lane duplication optim for quicker rotates?
< sipa> cfields: i literally translated the asm code instruction per instruction to intrinsics
< sipa> and then restructured it into functions that make sense
< provoostenator> What does "1-way" mean in the context of a SHA256 hash?
< cfields> sipa: ah, I think some of that will pessimize avx though, no?
< cfields> I guess I'll wait to see it
< gmaxwell> provoostenator: computing one hash at a time, as opposted to concurrently computing many hashes.
< sipa> cfields: recompiling this code with -mavx gains virtually no advantage
< sipa> provoostenator: say you have 8 64-byte vectors, and want to independently compute the 8 32-byte hashes of those
< sipa> provoostenator: we have a specialized function that does that
< sipa> that's 8-way
< cfields> sipa: hmm, that's really surprising.
< provoostenator> Ah ok, so that's useful if you're able to do some of the validation stuff in parallel?
< sipa> provoostenator: it's currently used (in master) for merkle tree root computations
< bitcoin-git> [bitcoin] laanwj pushed 4 new commits to master: https://github.com/bitcoin/bitcoin/compare/43ae5ee9e4c2...7c32b414b632
< bitcoin-git> bitcoin/master 906bee8 practicalswift: Use bracket syntax includes ("#include <foo.h>")
< bitcoin-git> bitcoin/master 6d10f43 practicalswift: Enforce the use of bracket syntax includes ("#include <foo.h>")
< bitcoin-git> bitcoin/master 16e3cd3 practicalswift: Clarify include recommendation
< gmaxwell> in theory we could also use them for hashing a bunch of transactions at once, but handling the variable length stuff is slightly gnarly.
< provoostenator> sipa: why not in script validation? Not worth it or still on todo lists?
< cfields> sipa: no clue if it's still state-of-the-art, but this is what I was using as a basis for understanding/implementing multi-block speedup: https://eprint.iacr.org/2012/067.pdf
< bitcoin-git> [bitcoin] laanwj closed pull request #13230: Simplify include analysis by enforcing the developer guide's include syntax (master...bracket-syntax-includes) https://github.com/bitcoin/bitcoin/pull/13230
< cfields> gmaxwell: ^^
< sipa> provoostenator: todo list
< gmaxwell> provoostenator: yes, it could be done, but it requires having things aranged so there are multiple messages to hash all ready to hash at once.
< gmaxwell> So easiest was hashtrees, since that all gets computed at once.
< provoostenator> Right now script validation can is done in parallel on _multiple_ cpu's, rather than on a single one?
< cfields> (ofc that's separate from the per-block optimizations)
< cfields> provoostenator: yes
< provoostenator> But if you verify e.g. 8 scripts in parallel on the same core to take advantage of that 256 optimzation, I guess the risk is that these cores are just sitting there waiting for these 8 threads to be ready?
< provoostenator> If they're not otherwise identical.
< provoostenator> But 1-way doesn't change anything
< gmaxwell> provoostenator: using n-at-a-time hashing requires restructing code so that it actually has n things to hash available at a time. Script validation doesn't work that way today and would need significant overhalls to make use of it.
< cfields> sipa: would you be ok with a separate avx-optimized version of those intrinsics if the speedup was as significant as before?
< sipa> cfields: the avx speedup is for the 4-way sse4, not the 1-way sse4 code
< sipa> cfields: and yes
< cfields> sipa: there are definitely 1way speedups due to non-destructive xor's
< sipa> cfields: ah, the compiler may not be smart enough to use those?
< cfields> sipa: I really would've expected it to, since there's not a separate intrinsic for them
< cfields> it might be how the other registers are setup, though
< cfields> the sigma1 implementation for sse4, for example, duplicates data in 2 registers so that it can do quicker but destructive rotates. Avx doesn't need to do that, so the duplication is strictly a slowdown.
< sipa> cfields: i'll cleanup my code and PR
< cfields> ok
< bitcoin-git> [bitcoin] sipa opened pull request #13438: Improve coverage of SHA256 SelfTest code (master...201806_selftestsha) https://github.com/bitcoin/bitcoin/pull/13438
< cfields> sipa: thanks for that :)
< bitcoin-git> [bitcoin] MarcoFalke closed pull request #13395: rpc: Avoid "duplicate" return value for invalid submitblock (master...Mf1806-rpcMiningSubmitblock) https://github.com/bitcoin/bitcoin/pull/13395
< provoostenator> Any other live metrics I should collect on my slow pruned AWS node? https://github.com/bitcoin/bitcoin/pull/12404#issuecomment-396356384
< gmaxwell> cfields: so somewhere intel has published sse4/avx/avx2 code (which is where our 1-way sse4 comes from) when we previously benchmarked the avx code we found that it was a couple percent faster on intel but a lot slower on AMD.
< gmaxwell> cfields: I don't see any problem with having a seperate AVX version but we would need to deal with not choosing it on AMD somehow, if its slower there.
< cfields> gmaxwell: do you mean intel's modified Sigma0/Sigma1 ?
< gmaxwell> I don't know what optimizations it had inside it, IIRC it was the same code intel submitted for the linux kernel (the kernel can realistically benchmark to decide what code to use, not as reasonable for us)
< cfields> if so, yea, most of the code I've found in the wild opts out of those for that reason
< cfields> gmaxwell: I'm guessing it's what I PR'd, which we closed for that reason :(
< gmaxwell> for us we could decide to use some AVX version that was faster on intel by string matching the cpu version and only using it on intel chips. But if it's only a couple percent it might not be worth it.
< gmaxwell> ESP since anything with AVX is pretty quick regardless.
< bitcoin-git> [bitcoin] TheBlueMatt opened pull request #13439: rpc: Avoid "duplicate" return value for invalid submitblock (master...2018-06-marcos-submitblock-fix) https://github.com/bitcoin/bitcoin/pull/13439
< cfields> gmaxwell: agreed. I'd rather avoid per-vendor optims unless it's really really significant
< cfields> might be harder to avoid that if we get into arm, though
< gmaxwell> for arm it's more worth it...
< gmaxwell> even a small percantage change is a large absolute time there.
< cfields> fair point.
< cfields> gmaxwell: #13400 is what I was referencing, btw. Hopefully that's the same one as the kernel discussion you had in mind.
< gribble> https://github.com/bitcoin/bitcoin/issues/13400 | sha256: small speedup for sse4 path. by theuni · Pull Request #13400 · bitcoin/bitcoin · GitHub
< gmaxwell> cfields: I guess for AVX ... Intel x86_64 cpus with AVX but without AVX2 or SHA-NI are _very_ common. I'd guess they outnumber all other kinds in our deployment by a large margin.
< cfields> huh, I figured my Sandy Bridge was the outlier
< gmaxwell> AVX2 is only haswell and later on intel, and the xeons lag desktops in microarch by a couple years.
< bitcoin-git> [bitcoin] MarcoFalke opened pull request #13440: qa: Log as utf-8 (master...Mf1806-qaLogUtf8) https://github.com/bitcoin/bitcoin/pull/13440
< edgar_> hello
< bitcoin-git> [bitcoin] achow101 opened pull request #13441: Prevent shared conf files from failing with different available options in different binaries (master...gargs-disabled-options) https://github.com/bitcoin/bitcoin/pull/13441
< achow101> ossifrage: provoostenator: ^ that PR should fix the problem with the options
< sipa> gmaxwell: got it faster than the asm version now
< sipa> 3.68 ms vs 3.77 ms
< sipa> on i7
< bitcoin-git> [bitcoin] sipa opened pull request #13442: Convert the 1-way SSE4 SHA256 code from asm to intrinsics (master...201806_sse4intrin) https://github.com/bitcoin/bitcoin/pull/13442
< cfields> sipa: nice
< cfields> sipa: and it looks like it should be friendlier to avx now. Was that purposeful, or nice side-effect?
< sipa> cfields: i don't know why or how
< sipa> elaborate please :)
< sipa> cfields: it's crazy how much the compiler affects things, though
< sipa> in one place changing a constant from const to static const made it 10% slower
< sipa> in another place changing it from static const to const made it 3% slower
< cfields> sipa: yea, I was going to ask you if the ordering really matters, I assume the compiler will reorder as it sees fit
< cfields> huh
< sipa> generally reordering seems to have only a marginal affect, to none at all
< sipa> changing the types of constants and variables has a large affect
< cfields> interesting, I would've thought the interleaving timing was the crucial part
< sipa> (the trick with Ws, as opposed to just keeping it as a __m128i and extracting the elements from it, does have a different)
< sipa> oh, the interleaving is absolutely crucial
< cfields> ...I guess it probably is, but you have to fight the compiler to get there first
< sipa> but the compiler reorders things anyway
< sipa> regardless of how you write it... to an extent
< cfields> sipa: by avx-friendly, I mean that you pass variables in which may or may not be overwritten, based on what instructions are available
< cfields> for example: XTMP0 = Palignr(X3, X2, 4);
< sipa> cfields: heh, SSA transformation will destroy that
< sipa> don't assume the compiler will map every variable to a single register all of the time
< cfields> ofc, I just mean that it has the option with avx
< sipa> even if you write things in a A = f(A, B) form everywhere, the compiler may still store the resulting A in a different register than the source A
< sipa> yes, but the way the code is written shouldn't affect that at all
< sipa> the SSA transform will effectively turn every assignment to a variable into a new variable, from the compiler's perspective
< cfields> I see
< sipa> with -msse4.1: 3.68ms
< sipa> with -mavx: 3.54ms
< cfields> sipa: huh, odd. I get a massive speedup with avx
< cfields> SHA256D64_1024, 20, 7400, 123.736, 0.000835992, 0.000836151, 0.000836061
< cfields> SHA256D64_1024, 20, 7400, 73.563, 0.000496887, 0.000497227, 0.000497053
< sipa> cfields: oh, i was looking at the SHA256 benchmark