< sipa>
jb55: we came up with a bunch of algorithms to opimize searching for parameters, which we probably should make a writeup about
< jb55>
sipa: why you chose this kinda of error correcting code in particular would be good to know as well. would be a great read if you get around to writing it up!
< meshcollider>
aj should be in the org so he can do it himself, I thought he was :)
< gmaxwell>
jb55: why 'this kind' is a pretty easy answer: we use a cyclic linear code because after extensive research, it's the only class of code we know of that can achieve good guarentees for the "user makes random single character errors" channel.
< aj>
meshcollider: curses, foiled again
< gmaxwell>
jb55: other kinds of codes, like a simple CRC don't operate at the 5-bit-character level, and so can't guarentee e.g. that a bad input with 3 character errors will be detected.
< gmaxwell>
jb55: and other kinds of codes (like ones constructed from arbritary non-linearities) are computationally intractable-- as far as anyone yet knows-- for detecting as many errors as bech32 does.
< gmaxwell>
(er, that was unclear, I mean -- for examples-- no one knows how to determine what the minimum distance of an arbritary quasigroup code is, except for the single error case other than by performing an test of all possible error patterns)
< jb55>
gmaxwell: ah, interesting... thx.
< jb55>
was wondering why I was dealing with 5bit words when working on a bech32 lib...
< jb55>
gmaxwell: is there a particular reason it's 5 bits?
< jb55>
oh I see the extra 3 are for error correction
< gmaxwell>
jb55: it's 5 bits because log2(32) is 5. -- the number of bits per character is a product of the character set size. The character set size is chosen to be relatively small because it makes entering it easier. Mixed case in base58, for example, makes it really hard to read addresses out loud (e.g. over the phone).
< * jb55>
slowly getting it
< gmaxwell>
The exact character set size could be another value, but (1) if it's a power of two its much easier to convert to and from bytes, since each character represents an integer number of bits. and (2) for it to be feasable to design a linear error detection code over it, the size must be a prime or a prime raised to some power. (so base58 is out)
< sipa>
jb55: there are no bytes involved anywhere
< sipa>
everything is 5-bit symbols; both the 'data' and the 'checksum' are 5-bit units
< jb55>
ahh
< sipa>
the reason for that is because we expect errors that are affecting groups of 5 bits simultaneously
< gmaxwell>
Of the character set sizes that allow for easy conversion two and from bytes... 16 characters is overly restrictive (bigger addresses), and 64 is too many (requires using characters that are hard to distinguish/hard to type).
< sipa>
jb55: because 1 character in base32 = 5 bits
< gmaxwell>
A pretty good case could have been made for base-31, base-29, or base-27-- which would allow excluding some more visually similar characters without being much longer, but the code to convert from binary data into those bases is somewhat irritating.
< jb55>
and 7 bits wouldn't work because the character set size would be too large... 3 would be too verbose. and the log of other bases are fractional so conversion is annoying, if I'm getting this right
< sipa>
indeed
< sipa>
6 bits would work too; base64 is pretty common, but it means mixing upper and lowercase
< sipa>
there are only 94 printable ascii characters, so 7 bits is kinda out
< sipa>
unless you want to mix in things like poop emoji
< jb55>
gotta capture that millenial market, should have went with basemoji
< gmaxwell>
mixed case is hard to read, also irritating to type on mobile devices, also results in more similar character mistakes.
< gmaxwell>
if your metric is "how long does it take to successfully transfer 200 bits of data between two humans, where one reads and speaks the other hears and types" a base somewhere between 25 and 32 is likely optimum.
< gmaxwell>
more than 32 and you're forced to use easily confused characters (and/or prefixing every letter with "upper" "lower" and causing a 2x slowdown)
< jb55>
base32 is my go-to, although I use a modified variant that replaces o,i with 8,9. I always thought that choice was kind of odd in the standard base32 encoding.
< gmaxwell>
If your metric is copy and paste, the length/characters hardly matter except non-alphanum characters break the automatic copy boundary behavior in browsers.
< gmaxwell>
jb55: there are like a half dozen different base32s that replace different characters.
< jb55>
although it's not optimized phonetically, would be interesting to see a character set that is ideal for voice transcription instead of visual.
< gmaxwell>
Our ommited characters are basied on minimizing visual confusion according to some NIST data. Though later a research paper came out and had confusion data generated from millions of password entry attempts and it gave essentially the same results.
< gmaxwell>
At least my thinking was the the visual component is always there (for human based transcription), while speaking is not always used. Also, speaking errors depend a lot on your accient and/or if you're using something like the NATO phonetic alphabet (alpha bravo charley delta echo foxtrot..)
< gmaxwell>
I think if that password entry data were available we would have used it instead of the NIST data, but I reran the analysis using it and got essentially the same result.
< * jb55>
thinking about an address format encoded as poetry
< gmaxwell>
(I think also if we had the password data I would have spent more time exploring e.g. base-27, since we actually would have had a good criteria to use to exclude more characters)
< gmaxwell>
One thing we were able to do in base-32 was make it so that if you only made the most likely mistakes then we are guarenteed to detect one more error. (we did this by choosing a character permutation that mapped the most likely errors to single bit errors, and choosing the code to also support up to 6 one bit errors)... this would have likely worked even better in base-27
< gmaxwell>
(both because we could exclude more characters outright, and because the analog would be mapping them to a single-trit error out of three trits per symbol.)
< jb55>
gmaxwell: sounds kind of like compression wrt. most common symbols assigned to shorter prefix bit sequences? in this case instead of most common symbols you're describing most common human error symbols?
< gmaxwell>
(the table is dimensioned so that it's relatively easy to see the 5 dimensional 1-bit-difference hypercube...)
< gmaxwell>
r/t r/y 8/0 c/e 8/h almost all of the easier mistakes map to one bit differences.
< jb55>
gmaxwell: so the smaller bit difference the less ambiguity in detecting which symbol has the error?
< sipa>
jb55: the checksum itself was optimized to be slightly better for low-bit-error
< sipa>
this is not an inherent property; it was selected for
< gmaxwell>
the bech32 checksum will always detect if there are no more than 5 characters wrong, ... or if there are no more than 6 bits wrong.
< gmaxwell>
we specifically designed it for this behavior.
< jb55>
and by designed you're referring to which symbol is assigned to which 5 bit number?
< sipa>
no, the BCH code
< jb55>
ah
< sipa>
the character set was designed so that common errors are 1 bit only
< gmaxwell>
Picking the symbol ordering lets us map common mistakes to single bit errors.
< sipa>
the BCH code was designed to that 1 bit errors are more frequently detected
< gmaxwell>
Picking the error correcting code let us pick one that would detect 6-one bit errors.
< sipa>
the two are designed for each other
< gmaxwell>
there were many possible codes which have exactly equal performance for detecting whole character errors, but one of them also did better for bit errors.
< jb55>
picking a BCH code involves what, picking constants?
< sipa>
jb55: effectively, picking the generator
< sipa>
every cyclic code is uniquely characterized by its field representation and generator
< gmaxwell>
this line in the BIP:
< gmaxwell>
GEN = [0x3b6a57b2, 0x26508e6d, 0x1ea119fa, 0x3d4233dd, 0x2a1462b3]
< gmaxwell>
mostly
< * jb55>
goes to learn about galois fields
< jb55>
thx guys this makes a lot more sense now
< Mischa010>
hi there, does anyone here know to how many peers bitcoin sends new blocks in general?
< sipa>
all.
< Mischa010>
i know 8 peers are connected to but how many can actually request and download a block at the same time?
< Mischa010>
i forgot the 'at the same time' qualifier!
< gmaxwell>
Mischa010: 8 peers are the number of outbound connections a node normally has, but they can have many more inbound. 70 inbound isn't uncommon.
< gmaxwell>
thats too narrow thinking, e.g. assuming a homogeneous network (equal speeds, equal delays on all links), fibre shows that the optimal distribution pattern (shortest time to reach all nodes) is achieved by round robining packets one at a time equally to each peer.
< gmaxwell>
because they'll each send the recieved packets to each of their peers. spreading more slowly than that fails to achieve the maximum bisection bandwidth of the network.
< gmaxwell>
(on an inhomogeneous network it's more complicated)
< Mischa010>
interesting, is there any literature on this?
< gmaxwell>
bitcoin specific, not that I'm aware of. There is a lot of lit on 'network coding' though, which covers many of these things.
< Mischa010>
great, im reading up on fibre now
< Mischa010>
thanks
< gmaxwell>
the writeup I made on stack exchange is probably one of the better ones.
< gmaxwell>
fibre gets pretty close to the best possible performance (consistently within a couple ms of the one way link delays between nodes)... there are a long list of potential improvements, but most of them end up making it work worse because they take more cpu time than they save. :(
< Mischa010>
ah yes i wasnt thinking at the packet level
< Mischa010>
an interface only sends one packet at a time right?
< Mischa010>
then my point is moot :p
< bitcoin-git>
[bitcoin] yunchiri closed pull request #15396: docs: make it clearer that posix mode must be used in WSL (master...patch-1) https://github.com/bitcoin/bitcoin/pull/15396
< bitcoin-git>
[bitcoin] practicalswift opened pull request #15413: tests: Add missing cs_main locks required when accessing pcoinsdbview, pcoinsTip or pblocktree (master...validation-cs_main-tests) https://github.com/bitcoin/bitcoin/pull/15413
< bitcoin-git>
[bitcoin] Sjors opened pull request #15414: [wallet] allow adding pubkeys from imported private keys to keypool (master...2019/02/keypool_import_private_keys) https://github.com/bitcoin/bitcoin/pull/15414
< fanquake>
jonasschnelli FYI, if the intention of the red & green squares in 15412 is to indicated working or not, you could also use :white_check_mark: & :negative_squared_cross_mark:.
< fanquake>
The names are a bit verbose.., but GH should also autocomplete for you
< bitcoin-git>
[bitcoin] MarcoFalke opened pull request #15419: qa: Always refresh rotten cache to be out of ibd (master...Mf1902-mocktime) https://github.com/bitcoin/bitcoin/pull/15419
< achow101>
provoostenator: write your 1's like sipa does. they look like upside down v's
< bitcoin-git>
bitcoin/master 7cee858 practicalswift: Add compile time verification of assumptions we're currently making implic...
< bitcoin-git>
bitcoin/master 9580190 Wladimir J. van der Laan: Merge #15391: Add compile time verification of assumptions we're currently...
< bitcoin-git>
[bitcoin] laanwj merged pull request #15391: Add compile time verification of assumptions we're currently making implicitly/tacitly (master...assumptions) https://github.com/bitcoin/bitcoin/pull/15391
< wumpus>
PSA: dev.visucore.com will be offline for a while (transferring service to a new domai)
< midnightmagic>
luke-jr: How so again? I think the HiFive board after the firmware deterministic build contest is all open isn't it?
< midnightmagic>
cjd: :-) hah funny
< luke-jr>
midnightmagic: doesn't it require non-free memory controller stuff?
< cjd>
:D
< midnightmagic>
luke-jr: the dram controller. I read an update on that somewhere, I think as part of the firmware contest update/winner announcement. hang on I'm having trouble finding it.
< midnightmagic>
luke-jr: I think that's all of it?
< midnightmagic>
luke-jr: so I think atm without open-source firmware for the SAS chip (for those that have it onboard) and.. mm.. the ethernet controller (which I think is being worked on) and of course the default chassis they ship with, I think the risc-v people might have pulled slightly ahead in the FOSS race. I could be wrong.
< sipa>
my dns seed is also back (was down for a few days)
< wumpus>
sipa: great!
< wumpus>
also made it update the version, apparently it was stuck at showing 0.15.99 (even though it was regenerating the docs from the correct branch)
< bitcoin-git>
[bitcoin] luke-jr closed pull request #11803: Bugfix: RPC/Wallet: Include HD key metadata in dumpwallet (master...bugfix_dumpwallet_hdkeypath) https://github.com/bitcoin/bitcoin/pull/11803
< gmaxwell>
oh, should I (or someone) make a PR moving forward assumevalid / chainwork now? It seems like we forget to do that, because it really needs to be done at release-candidate minus two weeks or so.
< meshcollider>
I.e. the key metadata hasn't been moved out of the "comment" section
< bitcoin-git>
[bitcoin] luke-jr opened pull request #15421: torcontrol: Launch a private Tor instance when not already running (master...tor_subprocess) https://github.com/bitcoin/bitcoin/pull/15421
< luke-jr>
provoostenator: ^ are you already working on boost::process detection?