bitcoin-git has left #bitcoin-core-dev [#bitcoin-core-dev]
jonatack85 has quit [Ping timeout: 260 seconds]
lukedashjr has joined #bitcoin-core-dev
luke-jr has quit [Ping timeout: 260 seconds]
lukedashjr is now known as luke-jr
shesek__ has quit [Remote host closed the connection]
shesek__ has joined #bitcoin-core-dev
shesek__ has quit [Remote host closed the connection]
shesek__ has joined #bitcoin-core-dev
<laanwj>
kidding aside wine is a pretty cool platform, i'm always surprised how well it works
brunoerg has quit [Ping timeout: 260 seconds]
Kaizen_Kintsugi_ has quit [Ping timeout: 250 seconds]
<laanwj>
hebasto: yes
<laanwj>
but that shouldn't realistically happen on x86_64, sse4 is so common
<laanwj>
and, definitely everything that has AVX2 will have that
jonatack has joined #bitcoin-core-dev
<hebasto>
laanwj: this is our msvc build
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
ghost43_ has joined #bitcoin-core-dev
ghost43 has quit [Remote host closed the connection]
<laanwj>
hebasto: oh, i would guess the special instructions code isn't compiled at all on that, it takes some build system magic to compile the right modules with the right instruction support
bitcoin-git has left #bitcoin-core-dev [#bitcoin-core-dev]
<fanquake>
heh. I just ran into the same issue
<fanquake>
I think rc3 might just become rc4
<laanwj>
still good we tried to go forward with rc3 to run into this issue
<fanquake>
yep. at least now that the tarball naming / normalizing is done, it's done
<laanwj>
but it will be after the weekend, i first need to rebuild 22.0 to get the symbols to know where to disassemble, this will take a while as i've guix-cleaned everything the 22.x dependency chain apparently
<laanwj>
hebasto: but fwiw 22.0 has no vmovdqa instructions involving the stack, at all
<hebasto>
so, it is a compiler, right?
<laanwj>
yes, it's a compiler change
<laanwj>
it just happened to work with last version (this might be accidental, e.g. some -mavx256-split-unaligned-store tuning cfields was talking about yesterday), or an actually introduced new bug
<laanwj>
pretty sure i found the function without hand-holding actually (there's only one function where ymmX is used)
shesek__ has quit [Remote host closed the connection]
shesek__ has joined #bitcoin-core-dev
shesek__ has quit [Remote host closed the connection]
<laanwj>
the 22.0 implementation is smaller and looks more streamlined, did we *disable* any kind of optimization?
<laanwj>
notably, _ZN14sha256d64_avx212_GLOBAL__N_16Write8EPhiDv4_x [sha256d64_avx2::(anonymous namespace)::Write8(unsigned char*, int, long long __vector(4))] is only called in the 23.0rc2 implementation
<laanwj>
in comparison the 22.0 implementation has only one call, at the end, i suppose the stack check failure
<jonatack>
laanwj: (looked at the lock logging undo, will push a pull soon)
<laanwj>
this relates to the problematic 0x260(%rsp) ... so it looks like an inlining failure!
<laanwj>
jonatack: awesome, thanks
<laanwj>
maybe it's not actually spilling as thought
kexkey has quit [Ping timeout: 246 seconds]
kexkey has joined #bitcoin-core-dev
<laanwj>
maybe always inlining __attribute__((always_inline)) at least ::Write would work around it
<laanwj>
it feels scary and could break any time, of course
shesek__ has quit [Remote host closed the connection]
shesek__ has joined #bitcoin-core-dev
<jonatack>
gg almost 11k lines in that gist, when it's finished would love to learn more how you are troubleshooting this
<jonatack>
thanks, i have it opened in my editor now
<fanquake>
It looks like atleast Debian / Ubuntu mingw-w64 gcc is patching around either this, or a very similar issue
<fanquake>
"Use unaligned VMOV instructions to avoid crashes caused by mis-aligned values; thanks to Claude Heiland-Allen for the patch! Closes: #939559."
<laanwj>
it looks like that would successfully prevent vmovdqa instructions from being generated
<fanquake>
I guess this also explains why we don't see it in non-guix builds (at least on Debian / Ubuntu). Need to see if Fedora etc patches this too
<laanwj>
right, good catch, i'm kind of shocked that something essential like this is part of distro patches
<laanwj>
that said i can see why the gcc people deem it too extreme, you wouldn't want to do it on platforms where the alignment assumptions are correct
sudoforge has quit [Ping timeout: 246 seconds]
<laanwj>
i'm also a littlbe bit surprised by the different inlining decision between 22.0 and 23.0rc2, we didn't change any optimization flags did we?
<laanwj>
it deosn't need to i mean the 'when to inline' heuristics get tweaked all the time
<laanwj>
but the resulting code for 22.0 just looks better
<fanquake>
No, but we did change from using GCC 7 iirc to GCC 10.3.0
<fanquake>
sorry, 8 to 10.3
<laanwj>
right
<laanwj>
ok it's not as bad as i thought, looking at actual code size instead of .asm text size, (0x14051ce2f-0x1405108d0)/(0x88a15d-0x87da00)=~0.99 the 23.0rc2 implementation is ever so slightly smaller (which makes sense if you stopped inlining something)
<laanwj>
so, yeah, i guess applying that patch (for mingw gcc build only) will simply make this problem go away
brunoerg has quit [Ping timeout: 240 seconds]
<fanquake>
I'll look at combining that into a guix build we can test
bitcoin-git has joined #bitcoin-core-dev
<bitcoin-git>
bitcoin/master 7762c56 Hennadii Stepanov: build: Fix "ERR: Unsigned tarballs do not exist"
<fanquake>
cfields: that is all happening inside guix, so I'm not quite sure what you mean?
<laanwj>
also mind it's a really old problem, that has existed ever since AVX2 support was added to the build, no one ever reported problems with it before so either everyone is using patched cross-compilers or no one is actually using mingw-w64 to build windows binaries for themselves
<laanwj>
most windows devs seem to use the MSVC build (which doesn't even support instruction sets, apparently :-)
<cfields>
fanquake: sometimes builders or distros hack in their own defines to compilers so that downstreams can test to see if they're using that distro's compiler. Thought that may be the case here.
<cfields>
laanwj: yeah, thinking about it more, I guess you're right...
<laanwj>
i mean you could cook up some script that does gcc then objdump to see what instructioss are generated to see if the comiler is patched, but i dunno
<cfields>
getting a working windows compiler is kinda a dark art, and I guess we kinda have to put the onus on the builder to get themselves a working toolchain.
<laanwj>
i guess the safest option is to disable AVX2 for mingw builds, then provide an option to force it to enable
<laanwj>
that said, i'm not sure it's worth the extra complexity for something no one encounters in practice
Talkless has joined #bitcoin-core-dev
<fanquake>
my assumption would be that most regular users / devs would end up with a working mingw-w64 compiler, from their distro. Anyone who’s doing something a bit more non-mainstream / supported, is also probably more likely to notice if something isn’t working
<laanwj>
yeah...
<laanwj>
we should encourage using guix to build anyway
<cfields>
fanquake: ack
<fanquake>
we do have at least one compiler bug test, so we could even add another unit test, where if you’re building for windows, using gcc, and your compiler is emitting aligned vmov instructions, we fail, or similar
<cfields>
wait, yeah, right...
<laanwj>
a test cuold work, though adding AVX2 code to the test suite does comlicate the build system
<cfields>
I thought fanquake meant trying to isolate the crasher and actually make it part of the sha2 sanity check.
<laanwj>
e.g. support has to be detected both run and compile time
<laanwj>
you'd have to control the stack pointer
<laanwj>
that's pretty nasty
Kaizen_Kintsugi_ has quit [Ping timeout: 260 seconds]
<sipa>
but also don't know if it's worth the effort
<laanwj>
yeah, i'd say not
<sipa>
call it from a recursive function that has a 1 byte stack variable, called at various depths
<sipa>
that should be pretty likely to trigger it, i think
<laanwj>
it's also a very apparent crash, not like a silent corruption
mikehu44 has quit [Ping timeout: 260 seconds]
<sipa>
right
<laanwj>
you would also have to explicitly call the AVX2 variant in that sanity check so it doesn't depend on the CPU being in a fairly narrow range
<laanwj>
but only if the CPU supports AVX2, of course
<laanwj>
tbh for rc4 we just need to get this working for our build
<sipa>
i think the current open PR is fine
shesek__ has quit [Remote host closed the connection]
shesek__ has joined #bitcoin-core-dev
<sipa>
wahaha, i just look at what the #ChangeTheCode PR is actually doing. That's hilarious.
<cfields>
agreed. I didn't mean to complicate. the status quo is clearly the patched compiler. it's just annoying that the status quo is a patched compiler :)
<laanwj>
it's sad that the issue sits unpatched upstream for so long wonder if clang does better
<laanwj>
windows support is and will be always a second thought in gcc
<cfields>
laanwj: yeah. I have a gcc bugzilla account, will drop a "we're affected too" note on the bug, unless you were planning on it.
<laanwj>
cfields: feel free to :)
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
<sipa>
I also believe that on nearly all CPUs (except the very first that had AVX2), movdqu and movdqa have the same performance (in situations where they both work).
<laanwj>
that's good to know
ashishkr has quit [Quit: Connection closed for inactivity]
Kaizen_Kintsugi_ has quit [Ping timeout: 256 seconds]
<earnestly>
That cleanupbitcoin website is unbelievable, the amount of js and css anims needed will probably outpace the power usage of bitcoin in a few weeks
Guest547 has joined #bitcoin-core-dev
<earnestly>
sipa: Where is the PR?
<laanwj>
although i suspect the impact would be minimal anyway even if there was a difference; the only memory-facing (so, not register to register) 256-bit movdqX instructions are to fetch some constant, and to write the result for Write, it's not some memory heavy inner loop
<earnestly>
(They don't link to it)
<sipa>
Earnestly: we're talking about the response to it, #24731
<sipa>
why does this only affect mingw, does anyone know?
<cfields>
"Clang does indeed realign the stack a la GCC, which is OK in simple cases like this with SEH, i.e. when no DRAP is required (MSVC does things totally backwards, since it realigns the frame instead of the stack but we just cannot do that in GCC)."
<cfields>
I don't really understand several parts of that answer, but it's an answer :)
<cfields>
I guess that translates to: "clang _successfully_ realigns the stack" ?
<laanwj>
achow101: added macos build, it matches with you
shesek__ has quit [Remote host closed the connection]
shesek__ has joined #bitcoin-core-dev
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
brunoerg has quit [Remote host closed the connection]
sudoforge has quit [Ping timeout: 260 seconds]
bomb-on has joined #bitcoin-core-dev
bomb-on has quit [Read error: Connection reset by peer]
bomb-on has joined #bitcoin-core-dev
brunoerg has joined #bitcoin-core-dev
Kaizen_Kintsugi_ has quit [Remote host closed the connection]
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
brunoerg has quit [Ping timeout: 256 seconds]
AaronvanW has joined #bitcoin-core-dev
NorrinRadd has joined #bitcoin-core-dev
bfsfhkacjzgcytf9 has joined #bitcoin-core-dev
brunoerg has joined #bitcoin-core-dev
khannasid has joined #bitcoin-core-dev
khannasid has left #bitcoin-core-dev [#bitcoin-core-dev]
bomb-on has quit [Read error: Connection reset by peer]
bomb-on has joined #bitcoin-core-dev
realies has quit [Quit: Ping timeout (120 seconds)]
realies has joined #bitcoin-core-dev
sudoforge has joined #bitcoin-core-dev
mudsip has joined #bitcoin-core-dev
mudsip has quit [Client Quit]
mudsip has joined #bitcoin-core-dev
jamesob has quit [Quit: Ping timeout (120 seconds)]
jamesob has joined #bitcoin-core-dev
AaronvanW has quit [Remote host closed the connection]
jamesob has quit [Quit: Ping timeout (120 seconds)]
Kaizen_Kintsugi_ has quit [Remote host closed the connection]
jamesob has joined #bitcoin-core-dev
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
mudsip has quit []
Kaizen_Kintsugi_ has quit [Ping timeout: 260 seconds]
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
AaronvanW has joined #bitcoin-core-dev
Kaizen_Kintsugi_ has quit [Ping timeout: 256 seconds]
AaronvanW has quit [Ping timeout: 246 seconds]
AaronvanW has joined #bitcoin-core-dev
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
jamesob has quit [Quit: Ping timeout (120 seconds)]
brunoerg has quit [Remote host closed the connection]
jamesob has joined #bitcoin-core-dev
AaronvanW has quit [Ping timeout: 240 seconds]
Kaizen_Kintsugi_ has quit [Remote host closed the connection]
Kaizen_Kintsugi_ has joined #bitcoin-core-dev
brunoerg has joined #bitcoin-core-dev
brunoerg has quit [Ping timeout: 260 seconds]
nanotube has quit [Ping timeout: 260 seconds]
willcl_ark has quit [Quit: Reconnecting]
willcl_ark has joined #bitcoin-core-dev
Talkless has joined #bitcoin-core-dev
brunoerg has joined #bitcoin-core-dev
brunoerg has quit [Ping timeout: 260 seconds]
<laanwj>
sipa: on linux, at least with gcc 11.2 i'm using locally, sp-relative accesses appear to use vmovdqu, it doesn't align the stack but uses the correct alignment assumptions
<sipa>
dqu or dqa?
<laanwj>
wait, that was actually a clang build
<laanwj>
dqu
<laanwj>
it doesn't assume the stack is aligned to 32 bytes, which makes sense
<sipa>
oh, i misread what you said; i see
AaronvanW has joined #bitcoin-core-dev
<laanwj>
it uses the aligned variant only to move between registers (doubt it makes a difference which one is used there?) and accessing constant data it knows to be aligned
<laanwj>
i don't get it really, it could do exactly the same on windows and it'd be correct?
<sipa>
right
<sipa>
i could understand if it uses the same codegen, but the alignment assumptions are different between the teo
<sipa>
but here it seems it's actually different code being generated?
<laanwj>
i think it's just the alignment assumptions that are different
<laanwj>
i'd assume so, i'm not comparing against the same version of gcc right now
AaronvanW has quit [Ping timeout: 260 seconds]
<laanwj>
what i don't get what the alignment assumptions made by gcc on windows even are; why would it *ever* think the stack is 32-byte aligned? as i see it could just err on the safe side, but it doesn't
<laanwj>
that's what #24736 does but it removes all reasoning about alignment at all