< luke-jr>
FWIW, I just had a 0.13 wallet go corrupt (so sipa's instance probably wasn't a regression)
< luke-jr>
anyone know off-hand what GCC we use in gitian builds? :x
< sipa>
linux builds use bionic, so GCC 7 i believe
< luke-jr>
sipa: do you happen to know this bug in detail enough, to be able to reasonably tell if we would be affected at all?
< luke-jr>
I suppose the only way to be _sure_ would be to do a before/after build using GCC master and master minus 1
< sipa>
luke-jr: affected, i think yes - in particular the cnetaddr code may incorrectly classify IP addresses
< sipa>
i'll look deeper into whether that is actually an issue
< luke-jr>
it would be funny if compilers turned memcmp(…) > -2 and memcmp(…) < 2 into a constant 1 <.<
< sipa>
actually, looking at the code in 0.20, and my understanding of the bug, i don't see why it wouldn't make IsIPv4() return true for literally every address
< sipa>
so i'm probably misunderstanding the bug
< luke-jr>
h mm
< roconnor>
Based on my limited experimation it does seem limited to cases where the static array begins with 0.
< roconnor>
Though I don't really understand why.
< sipa>
that happens to be the case for the IsIPv4 code
< luke-jr>
is it constexpr?
< sipa>
it should be
< luke-jr>
FWIW it looks like the fix is a clean apply to 10.2, so I'm doing a before/after build
< roconnor>
Though that test file does seem to be run.
< kallewoof>
sipa: technically it was thursday when you said 'hi' 12 hours ago. it just wasn't thursday in the right place (it was thursday in japan tho)
< bitcoin-git>
[bitcoin] naumenkogs closed pull request #19958: p2p: Better document features of feelers (master...2020-09-rename-feeler-to-probe) https://github.com/bitcoin/bitcoin/pull/19958
< bitcoin-git>
[bitcoin] naumenkogs reopened pull request #19958: p2p: Better document features of feelers (master...2020-09-rename-feeler-to-probe) https://github.com/bitcoin/bitcoin/pull/19958
< sipa>
kallewoof: timezones are evil
< bitcoin-git>
[bitcoin] jnewbery opened pull request #20008: [Backport] #19839 and #19842 to v0.20 (0.20...2020-09-appveyor-backport) https://github.com/bitcoin/bitcoin/pull/20008
< jnewbery>
wumpus: those appveyor commits fixed my issue. I've opened a PR to just backport those two commits. Thanks for your help!
< bitcoin-git>
[bitcoin] naumenkogs closed pull request #19843: Refactoring and minor improvement for self-advertisements (master...2020-08-refactor-advertiselocal) https://github.com/bitcoin/bitcoin/pull/19843
< bitcoin-git>
[bitcoin] naumenkogs reopened pull request #19843: Refactoring and minor improvement for self-advertisements (master...2020-08-refactor-advertiselocal) https://github.com/bitcoin/bitcoin/pull/19843
< wumpus>
jnewbery: great! yes, seems good to backport them separately and get them in first
< sipa>
yeah i just wanted to draw attention to this potentially very scary bug
< wumpus>
how did they manage to break such a basic library function
< kanzure>
hi
< luke-jr>
wumpus: they turned it into strcmp sometimes it looks like
< sipa>
wumpus: it's not a libc bug, it's bug in gcc's implementation of __builtin_memcmp
< luke-jr>
at least the consensus code seems to have dodged it
< sipa>
but otherwise yes, this is really concerning
< sipa>
i believe none of our gitian builds are affected, as we use older GCCs
< sipa>
can someone verify that?
< wumpus>
that's at least good to know, let me see
< luke-jr>
macOS build should use LLVM IIRC
< sipa>
luke-jr: indeed
< jeremyrubin>
sipa: is this a generaly class of bugs do you reckon or pretty specific to memcmp? should we be auditing all builtins?
< MarcoFalke>
sipa: Gitian should use 7 and 8
< luke-jr>
I *assume* LLVM didn't reimplement this bug
< wumpus>
gcc 8 it seems
< sipa>
jeremyrubin: i don't think we have the manpower to do actual auditing of all of gcc's builtins (and other potential bugs), nor is that our job
< jonasschnelli>
bionic uses gcc 7, right?
< luke-jr>
jonasschnelli: yes
< wumpus>
jonasschnelli: but we explicitly install 8 for some platforms
< luke-jr>
^
< sipa>
jeremyrubin: i think it's important to note that while the bug was known, we stumbled upon it in the libsecp256k1 repo, in a test
< jonasschnelli>
Ah. Yes. Arm / risc uses gcc-8
< luke-jr>
hmm
< sipa>
so that's kind of a sign that our processes are at least capable of catching some issues like this
< luke-jr>
sipa: should I be concerned I *didn't* get any libsecp symbols in my comparison?
< wumpus>
I honestly don't now how to prevent us being affected by problems like this
< luke-jr>
or is the test only in a newer version?
< sipa>
luke-jr: no, i think this is expected (it's in code added in a new test)
< wumpus>
computers are broken
< jonasschnelli>
indeed
< sipa>
reality is often disappointing
< wumpus>
I can understand libc bugs in some cases but this is something so low-level
< sipa>
anyway, i think we should see what performance impact building with -fno-builtin-memcmp has
< wumpus>
sipa: yes but only for affected compilers
< sipa>
if that is sufficiently low, i think that's a sufficient solution - perhaps restricted to GCC 9 and 10
< luke-jr>
sipa: that won't fix C++ afaik
< sipa>
luke-jr: it will
< sipa>
luke-jr: std::lexicographical_compare (in the .h file) just calls __builting_memcmp
< MarcoFalke>
Should we also add a test that fails with gcc9/10?
< wumpus>
I don't even care about peformance there, just add the option, it needs to be correct, if people want performance don't use a broken compiler
< wumpus>
that's the important thing thoug hthat it fixes the issue, a test like MarcoFalke says makes a lot of sense
< luke-jr>
at least GCC's docs suggest to whitelist builtins, one should use -fno-builtin and call __builtin_* directly ;)
< wumpus>
or just fail compiling on affected compilers, they're going to backport a fix right?
< sipa>
wumpus: yes
< luke-jr>
wumpus: supposedly, but it looks non-trivial and they haven't yet
< wumpus>
I'd prefer things to build at all on compilers with a broken memcmp
< wumpus>
+not
< sipa>
it's fixed in master, but not in any release
< jonatack>
gcc 9.3 and 10.1 seem afficted
< wumpus>
(whatever we end up doing, we need a test for this in configure)
< luke-jr>
personally, I neutered my builtin memcmp to always fail (fallback to libc) and am rebuilding my system
< sipa>
wumpus: my first thought was adding a configure test that exploits the bug... but that won't work when crosscompiling
< sipa>
i think?
< luke-jr>
well, crosscompiling, we can't run it
< wumpus>
sipa: ... ugh
< luke-jr>
why not add a sanity check? we have others
< sipa>
unless we assume that building for the host platform is always similarly affected as the target platform
< wumpus>
yes, you'd need to run it to figure it out right
< jeremyrubin>
Is memcmp constexpr?
< jeremyrubin>
Maybe it can static_assert...
< sipa>
jeremyrubin: it's a C function constexpr is a C++ concept
< wumpus>
holy shit this is *really* bad
< luke-jr>
sipa: bad assumption
< sipa>
luke-jr: yes, i think so
< wumpus>
maybe it could expect the symbols generated or something, if it generates a call to strcpy instead?
< sipa>
wumpus: there are no calls involved
< luke-jr>
wumpus: it doesn't
< sipa>
it's the autogenerated builtin code that is wrong
< luke-jr>
wumpus: it's unrolling the memcmp
< wumpus>
ok...
< sipa>
which makes it implementation something more or less equal to strcmp rather than memcmp
< luke-jr>
basically turns it into if (a[0] == "x" && a[1] == "x" && …)
< luke-jr>
aiui
< wumpus>
that's impossible to check in a platform independent way
< jonatack>
it's unclear to me if the patch is in the gcc 10.2 release?
< luke-jr>
jonatack: no
< sipa>
jonatack: it is not in any release
< jonatack>
thank you
< real_or_random>
fwiw, this is (part of) the GCC patch that fixes this: https://gcc.gnu.org/git/?p=gcc.git;a=blobdiff;f=gcc/builtins.c;h=228db78f32bfdcce31e23850e878843d7a45adcc;hp=eb662112b32e5ad377d2a865d9977c64dc12cc93;hb=d5803b9876b3d11c93d1a10fabb3fbb1c4a14bd6;hpb=3e99ed65cbedf7a6c0abb9cd63c191326995fd34
< jeremyrubin>
Could we wrap memcmp with a compile time wrapper that static asserts we aren't using a const array with a 0 byte? Not sure if you can enable-if-fu around constexpr args.
< real_or_random>
it's hard to read but I think it's possible to deduce a few condition under which this bug can't happen
< sipa>
jeremyrubin: that's not a solution
< jeremyrubin>
The bug only happens when it's a const array right?
< sipa>
jeremyrubin: if we can identify all potentially affected code paths, we could just replace them with my_memcmp that doesn't have the bug too
< jeremyrubin>
ah right you're worried about other stdlib calls and stuff.
< wumpus>
but C++ also generates memcmps it's not just our own code
< wumpus>
it's a very low-level function
< luke-jr>
sipa: I just did that? :P
< sipa>
right - "all code paths" also contains std::lexicographical_compare and a few other things
< sipa>
luke-jr: right, but not in a guaranteed automated way
< sipa>
luke-jr: it's a really useful way of finding potential issues now, though
< sipa>
thanks for that
< sipa>
i've tried making configure detect the version of GCC, and so far been unsuccesful
< sipa>
my google-fu led me to 9-year old stackoverflow posts with comments "ah yes, that doesn't work anymore, let's just delete that functionality" from the m4 gcc scripts
< sipa>
(though i'm sure people with more autoconf skills can do better)
< luke-jr>
>_<
< sipa>
the idea is that in general you should test for compiler features, rather than exact versions
< sipa>
which is absolutely true, except for bug workarounds :p
< wumpus>
and as soon as the patch is backported to gcc 9 and 10 it will become more difficult too
< luke-jr>
sipa: ideally, yes
< luke-jr>
sipa: even for bug workarounds, you can't use the version number to see that my GCC is patched ;)
< sipa>
that's true, but it's a safe superset
< wumpus>
worse, some distributions might backport the single patch without changing the version number
< wumpus>
sure, it dpeends on what it does in that case
< sipa>
anyway, what's the plan?
< wumpus>
if it only passes an extra command line flag it's ok
< sipa>
i think the first thing to do is add a unit test that catches the bug
< luke-jr>
sipa: why not sanity test?
< sipa>
though we also need to make sure it doesn't become an "expected failure" thing that people ignore
< wumpus>
but it shouldn't fail the build for versions that have been fixed
< sipa>
luke-jr: that's a good point
< wumpus>
a sanity check makes sense
< luke-jr>
the GCC bug tracker has a reduced test (note the first test *didn't* fail on my unpatched GCC)
< sipa>
luke-jr: yeah, it seems the cases you need to catch it differ slightly between gcc 9 and 10
< luke-jr>
hmm
< luke-jr>
I wonder if that means different code could be affected by GCC 9?
< luke-jr>
or is GCC 9's affected a strict subset of GCC 10's?
< sipa>
we also should figure out if -fno-builtin-memcmp fixes std::lexicographical_compare
< luke-jr>
I very doubt it does
< sipa>
luke-jr: i don't know
< sipa>
i don't want to monopolize the meeting with this; we can also continue discussion on #20005
< real_or_random>
i'm somewhat curious to add a patch to GCC that makes it emit a message if it applies this optimization wrongly. that may be possible with some effort given that we have the GCC patch that fixes the issue. but not sure if I'll find time
< sipa>
real_or_random: luke modified his compiler with the patch, and then tried to see which symbols differ before and after
< sipa>
(we can continue after the high priority for review topic)
< real_or_random>
sipa: oh I missed that, cool
< real_or_random>
yes
< wumpus>
patching gcc is not a solution that we can rely on in general as most people don't build their own gcc
< luke-jr>
right, I was just doing it to quickly audit what was affected
< wumpus>
sure, for that it's a great way
< luke-jr>
sipa: btw, I think it's using __builtin_memcmp for std::equal or == in PSBT stuff?
< wumpus>
nothing to discuss about high priority for review PRs?
< jonatack>
was good to see signet get in this week
< sipa>
indeed
< luke-jr>
wumpus: oh, we could build with -O1 ?
< wumpus>
yes
< * luke-jr>
wonders how -O1 works but -fno-builtin-memcmp doesn't
< MarcoFalke>
Doesn't that make IBD take twice as long?
< wumpus>
probably
< wumpus>
still better than broken builtins though
< MarcoFalke>
Might be better to refuse to start with the sanity check
< luke-jr>
maybe there's some more granular thing we can disable
< luke-jr>
MarcoFalke: both ☺
< sipa>
luke-jr: could you build with -fno-builtin-memcmp without the gcc patch, and then (for the affected symbols from your previous list) check if which ones become equal to the fixed gcc output?
< luke-jr>
sipa: I don't have the unpatched compiler anymore :x
< luke-jr>
on an x86 server, however, -fno-builtin-memcmp did not fix the bug
< luke-jr>
for int x= __builtin_memcmp (a, "\0\0\0\0", 4);
< sipa>
sure, but we never call __builtin_memcmp directly from our code
< wumpus>
speaking of that, what about other dependency libraries that can be assumed to have been built with the broken compiler
< wumpus>
boost, libupnp, libevent ...
< luke-jr>
we can't fix everything
< elichai2>
wumpus: is it even remotely possible to check what compiler compiled an already compiled library?
< wumpus>
no, we can't
< sipa>
doesn't mean we can't work around it
< luke-jr>
I suppose that's an argument for just the sanity check, and not trying to workaround it
< wumpus>
elichai2: not in gneral
< sipa>
like just not permitting building with known-to-be-affected compilers, which at least in some cases will avoid the situation where dependencies are built with the same thing
< wumpus>
sipa: right
< sipa>
but... if your system itself has bugs introduced by this, i think there is very little we can do about it
< wumpus>
for many distributions at least, the dependencies will have been built with the same gcc/g++ that is available
< sipa>
i believe my python3 binary is built with GCC 9.3
< sipa>
$ strings python3.8 | fgrep GCC
< sipa>
[GCC 9.3.0]
< jonatack>
luke-jr: did the tests in the final patch fail on your unpatched gcc?
< luke-jr>
didn't try
< wumpus>
I'm going to close the meeting, can continue this discussion afterwards and we're out of topics
< wumpus>
#endmeeting
< lightningbot>
Meeting ended Thu Sep 24 19:46:51 2020 UTC. Information about MeetBot at http://wiki.debian.org/MeetBot . (v 0.1.4)
< elichai2>
even 2 bytes, where in compile time it can see it's 2 bytes, it still produces a memcmp
< elichai2>
only when the array is 1 byte it doesn't call to memcmp, and instead it generates a `cmp BYTE PTR [rdi], 0` instruction (compare with zero(because my other array is zero))
< sipa>
and if you take the same code but with memcmp instead of std::lexicographical_compare, it does avoid the memcmp call
< roconnor>
luke-jr: my understanding is that it isn't quite that builtin_memcmp is broken but an optimization surrounding it that is invoked at level -O2 is broken.
< sipa>
i changed the 0.20.1 source code to replace IsIPv4() with "return ip[0] == 0;" (which is what I understand the bug would turn it into now)... the tests fail
< sipa>
so whatever it's doing, it's not doing that, or we'd have caught it
< elichai2>
sipa: yes. if I just replace with `memcmp` it produces bad code
< elichai2>
I think it ***might*** be because `lexicographical_compare` gets both start and end iterators of both sides, then it sees they're the same size, so it replaces with memcmp but it might be the same round where they replace memcmp so it doesn't do this deduction again, ***maybe***
< meshcollider>
Tomorrow is the wallet meeting but I won't be able to make it. So either someone else might want to host, or since it has been quiet the last few meetings we might just cancel it tomorrow
< sipa>
if i change IsIPv4() to memcmp(...) > 0 (rather than == 0), gcc 9.3 emits broken code
< sipa>
i'm beginning to suspect that (in)equality checks use a different optimization path that isn't affected
< sipa>
which if true would be good news, as it means automatically generated operator== code isn't affected either (and there exists no automatically generated < <= >= >)
< sipa>
!= also doesn't seem affected
< gribble>
Error: "=" is not a valid command.
< sipa>
stupid bot
< elichai2>
sipa: I also can't manage to trigger this in `std::equal`
< jonatack>
meshcollider: thanks for the heads up. i guess we'll see who is around.
< elichai2>
(although unlike lexicographical_compare std::equal does optimize nicely)
< elichai2>
damn it
< elichai2>
it also happens with lexicographical_compare
< elichai2>
if you replace the reference with a copy it happens, you can play with the optimization level and see the output changes: https://godbolt.org/z/vxEo8d
< roconnor>
uint256.h doesn't compare with a static array, so it should be fine right?
< sipa>
roconnor: it could be a compile-time known value in some cases
< elichai2>
isn't max PoW static?
< roconnor>
fair.
< sipa>
i don't know if being a compile-time known constant it enough to trigger it; maybe it actually has to be a comparison with a constant
< sipa>
*compile-time-known value
< elichai2>
I hate this
< sipa>
elichai2: great find
< roconnor>
It has to be a value that GCC's c_getstr is able to figure out object byte representation for from the source code, however c_getstr works.
< sipa>
elichai2: so, we learn that -fno-builtin-memcmp is not enough
< elichai2>
yes :/
< sipa>
it should be possible to find the subset of functions in STL that are affected by this
< roconnor>
rather than -fno-buildin-memcmp, we need to find a flag that disables this suite of optimizations :/
< elichai2>
we could decide to remove *all* usages of functions comparing stuff, but even `>` gets misoptimized on some containers :(
< sipa>
roconnor: that's a possibility too
< roconnor>
is -O2 stand in for a series of -f flags?
< elichai2>
although some flags might imply others so it might be possible to get more granular but this sounds like an NP problem lol
< elichai2>
ignore what I just said. with `-O2` and all the O2 flags actually disabled, if any of these 5 flags aren't disabled I get the wrong result, they all need to be disabled to get the right results
< promag>
who has admin role in travis?
< roconnor>
`-O2 -fno-tree-dce -fno-tree-dominator-opts -fno-tree-fre -fno-code-hoisting` seems to do something different.
< elichai2>
roconnor: I still get the wrong result there, through a different path it seems but still the wrong one hehe
< roconnor>
right. sorry, yes it still appears wrong, but with different wrong asm.
< elichai2>
it scares me that my current default compiler in my machine is so much broken right now.
< roconnor>
... I've been scrambling to recompile my entire system with gcc 8...
< roconnor>
I don't know if I'm being too paranoid...
< real_or_random>
idk, this bug seems scary at first glance but maybe if the scope is indeed limited, it's just one of hundreds of bugs in GCC.
< roconnor>
real_or_random: If I figure out that you are right that GCC's miscompiling its own string_length function, I am definitely going to recompile my system.
< roconnor>
I am not waiting for the outcome of such a mess.
< elichai2>
roconnor: I'm thinking of recompiling with clang
< luke-jr>
roconnor: I've been rebuilding my system as well, but with GCC 9 patched to never use the builtin memcmp
< luke-jr>
real_or_random: it's a bug that's higher-than-average risk to create security vulnerabilities
< luke-jr>
roconnor: real_or_random: what's this about string_length func?