< sipa> gmaxwell: if we use an encoding for the checksum which maps all hex characters + punctuation into the same "high 5bits" of the 2-symbol encoding, we essentially can ignore the 1-char-2-symbol-error blowup
< sipa> as everything else (uppercase characters, lowcase above f) only occurs inside base58 things, which have additional protection already
< gmaxwell> sipa: nice!
< sipa> gmaxwell: in theory a gf(25) code would suffice for this
< sipa> we have exactly 25 characters that occur "unprotected" i think
< sipa> ()[]*/,'0123456789abcdefh
< sipa> though base32 is a bit easier to implement :)
< gmaxwell> sipa: and just alias the other characters near uniformly down to the unprotected ones?
< gmaxwell> You don't want ()[]*/,' in the checksum so you'd want to have alternative ones for those.
< gmaxwell> e.g. the checksum's charset would be different from the rest.
< sipa> gmaxwell: you can expand all data characters into two symbols
< sipa> you just don't care about cases where the second one differs
< sipa> and indeed, for the checksum we can just use the bech32 charset
< provoostenator> Sounds sipa: like you have 7 characters to spare then, let's bike shed! ("-", ";" and "$", "%", "&", etc would be good for future extensions)
< provoostenator> For example ranges normally don't make sense in descriptors, but one might have a setup with hardened derivations and only a limited range of hot private keys.
< provoostenator> A future extension could support ranges that reason, so it's nice to have room for "-" in the checksum mechanism.
< gmaxwell> those sound okay, though % and & are less likely to survive being passed around on the web, and get mangled in html documents...
< gmaxwell> # is another candidate.
< gmaxwell> or ! (not very shell friendly, though # isn't perfect in that respect either)
< gmaxwell> | is a fine character too.
< booyah> gmaxwell: maybe not a big concer, but "!" is absolute bitch to use in cli/bash
< provoostenator> "$" is also not ideal in shell, so yeah, being html and bash friendly adds some constraints.
< sipa> gmaxwell: for codes with length >24000 (about what we'd need for something containing 100 xpubs), 7 characters for distance 4, 10 characters for distance 5
< sipa> (this is algebraice distance, i can't analyze things exhaustively for this length)
< sipa> (and 1 character for distance 2, 4 characters for distance 3)
< sipa> i think 7 characters is fine; it will detect any 3 errors within the "basic 32 characters" or 1 error in and 1 error out, and has a random fail chance of less than 1 in 34 billion
< sipa> actually, we can have a conversion that maps 2 characters to 3 symbols, increasing the maximum length
< sipa> oh, or even 3 characters to 4 symbols
< sipa> you can partition all non-whitespace ascii characters into 3 groups of 32 each, and then encode 3 group numbers into 5 bits
< gmaxwell> you can exaust analyize to pick between codes for shorter lengths, so you should do that once you've found parameters that are okay for the longer lengths.
< sipa> right
< sipa> though up to what length?
< gmaxwell> (and at least pick a code that doesn't have a threshold effect hump-- which is less of an issue for longer lengths anyways)
< gmaxwell> I dunno, you've got a bunch of descriptor examples.
< gmaxwell> 2 of 3 multisigs are probably interesting.