< phantomcircuit>
gmaxwell, i did, the only place that was ok was digital ocean and only once i had gotten explained it to higher level support
< echeveria>
phantomcircuit: I had the instance limit on my DO account raised to an absurd level with a one sentence ticket "I WANT TO SCRAPE THE INTERNET". not a very high bar.
< shesek>
gmaxwell: thanks for the answer. in your opinion, is UIH silly to the point its not worth mentioning as part of a privacy analysis for transactions? and is the preference to spend more inputs when fees are low something that's implemented today?
< gmaxwell>
shesek: yes, bitcoin core will spend inputs a bit more when fees are low, though right now that behavior is kinda weak
< gmaxwell>
as in it only does it as part of the branch and bound analysis for changeless spends.
< gmaxwell>
(that isn't the only case where it can spend extra inputs)
< gmaxwell>
I just don't really understand the motivation of it as a privacy hurestic. extra inputs could both help (e.g. spend all inputs connected to a given script pubkey, make change amount and payment amount harder to tell apart) or harmful to privacy (link otherwise unlinked scriptpubkeys, make change more distinguisahble)
< gmaxwell>
depending on the specific case.
< gxd>
hello everyone!
< gxd>
nNce to meet you!
< shesek>
gmaxwell: the motivation is to give users and developers some better indication of potential privacy gotchas they might need to pay attention to. I understand, for example, that making p2ep/payjoin not trigger UIH-2 to make it blend in better with transactions produced by consumer wallets is something they're explicitly aiming for
< shesek>
if more wallets implemented coin selection algorithms that sometimes trigger UIH rather than aiming to minimize short-term fees, this heuristic would eventually become useless. but it seems that as things stand today, triggering UIH-2 does reduce your anonymity set, making it a useful analysis technique
< gmaxwell>
shesek: IIRC bitcoin core has always violated it from day one. just not that all that often.
< sipa>
i believe that's correct
< shesek>
the preference to spend all utxos belonging to the same scriptpubkey in one go could be accounted for by opting transactions with multiple inputs of the same scriptpubkey out of UIH detection
< shesek>
this leaves UIH due to MIN_CHANGE and low fees period? are there more potential causes?
< gmaxwell>
shesek: yes, coin selection can just pick extra inputs.
< gmaxwell>
because the algorithim is probablistic.
< shesek>
it would be interesting to test a bunch of transactions produced by bitcoin core and see what percent of them matches UIH-2. but I'm not using core to produce transactions, any thoughts on where one might be able to find a list of txids that are known to be core-originated?
< shesek>
what does your intuition tell you? would you expect them to be common? say, more common than 2%?
< shesek>
(assuming spending all utxos of the same address is accounted for and not considered as uih)
< gmaxwell>
I expect it depends a lot on the wallet's composiion. if the wallet has a lot of really tiny inputs that are smaller than the payment amount then it's much more likely.
< gmaxwell>
I'd be surprised if it happened more than a few percent of the time except w/ pathlogical wallets
< gmaxwell>
oh well minchange makes it happen pretty often too
< shesek>
so not so common on a typical wallet that receives payments about the same size as he sends
< shesek>
minchange would also mostly kick into action with lots of tinyish inputs though, right?
< shesek>
s/sends$/sends?/ (was meant as a question)
< gmaxwell>
lots of small inputs is much more likely to find an exact (changeless) solution via branch and bound.
< shesek>
hmm, this brings up another heuristic I've been wondering about: change-less transactions as an indicator that he bitcoins possibly didn't change hands (by a user using "send max" to move to a new wallet, for depositing to an exchange, opening a lightning channel, etc). I'm aware that sophisticated coin selection algorithms attempt to avoid change if possible, but it still seems like it would require some non-negligible amount of luck and
< shesek>
wouldn't be all that common, except perhaps for places like casinos that have *lots* of very small inputs. my intuition is telling me that using "send max" or doing manual coin selection for exact self-transfers (like from cold to hot, where an advanced user might pick some specific utxos by hand and send them in full) would be far more common than the coin-selection algorithm being smart/lucky enough to be able to avoid change
< gmaxwell>
I'm working updating my blocklists, if anyone would like to send me connection info from your ipv4 publically reachable nodes, it would be helpful:
< shesek>
but my intuition might be broken :) what do you think? is this a reasonable heuristic to point out?
< gmaxwell>
shesek: bitcoin core manages to produce changeless payments quite often, so long as the wallet has many inputs.
< shesek>
what would you consider as many? thousands? hundreds? or even just a few dozens?
< gmaxwell>
shesek: it's much more likely than you think... it doesn't have to be exact because it can overpay by the amount of future fees it would use to spend the change, also can overpay by the fees it would spend to create a change output. if the number of inputs much smaller than the payment amount is larger than the number of bits in the amount, minus the number of bits in the amount it can overpay
< gmaxwell>
fees by, then there is probably a changeless solution, and bitcoin core will find it if there is one.
< gmaxwell>
dozens works.
< sipa>
it's much easier to find a changeless input selection when the feerate is high
< gmaxwell>
I've seen it find changeless soluions in my own walle, which doesn' have that many inputs.
< shesek>
I looked through my electrum history, couldn't spot a single changeless transaction except for manual coin selection ones
< sipa>
i doubt electrum has bab
< gmaxwell>
Weakling wallet software. :P
< gmaxwell>
Lacks advanced science power.
< gmaxwell>
:P
< shesek>
bitcoin core seems pretty smart about this. but, unfortunately, I don't think its actually being used to produce that many transactions, or does it? are there any estimates on that?
< gmaxwell>
shesek: pretty significant fraction of all, at least a year ago it was many times more than electrum.
< gmaxwell>
electrum gets used by a lot of small indivigual users that don't transact frequenly...
< shesek>
oh really. interesting. I would've expected it to be dominated by custom software made for commercial usage by exchanges/mining pools/etc. even just 4-5 of the big exchanges using some custom software and it should easily be a majority of txs
< shesek>
so I'll do some more thinking and reconsider the privacy analysis regarding UIH and changeless transactions. maybe remove, maybe change the wording/colors. thank you greg and sipa for your time and feedback. if you have any other thoughts on what esplora should display regarding privacy (some other interesting heuristics I missed?), please do let me know :)
< shesek>
the ones I currently have is address reuse, round payment amounts, sending change to a different script type than the payment, UIH1/UIH2, and changeless transactions
< shesek>
and a (very naive) heuristic to find transactions that look like equal-output coinjoin and display a positive badge next to them :)
< gmaxwell>
Address reuse, 'round payments amounts', mixed output types have obvious privacy implications to me, that rest I still don't see how they relate to privacy.
< gmaxwell>
how does the output type thing handle > 2 outputs?
< shesek>
it currently doesn't attempt to, its only applied to transactions with exactly two outputs
< gmaxwell>
does it pay attention to input types at all for that?
< gmaxwell>
also does it treat all p2sh as equal typed?
< shesek>
it does. it checks that one output is of a script type that doesn't match any of the input's previous output's script type, while the other output does match at least one
< gmaxwell>
eventually p2sh will be spent and you'll know the type... so what would have looked the same before often will look different later.
< shesek>
p2sh is considered equal. I thought about looking for spends to find the preimage script, but didn't get into that for now
< gmaxwell>
that hurestic will pretty reliably misidentify change for many bitcoin core users right now. :)
< shesek>
oh really. how so?
< gmaxwell>
because many users have non-sw inputs, but then make a payment to a 1x address, and end up with a native sw change (in that case), ... the 1xx output is not their change. :P
< gmaxwell>
core will match change type for p2sh vs native, but not legacy (unless overridden).
< shesek>
ah, I see, interesting
< shesek>
I didn't know core can be configured to match its change for payments to legacy scripts, very cool!
< gmaxwell>
electrum made some odd decisions with segwit deployment, e.g. forcing wallets to be native segwit or not segwit
< kallewoof>
gmaxwell: I can send IP info from public node. How would you like it sent tho?
< gmaxwell>
kallewoof: 0bin and an irc private message would be preferred. or an email.
< kallewoof>
sent
< gmaxwell>
kallewoof: thanks!
< shesek>
gmaxwell, change-less transactions due to finding a suitable set of inputs would normally have quite a few inputs, right? do you think it would be more useful if I only applied this heuristic to transactions of less than, say, 5 inputs?
< shesek>
1-2 inputs, 1 output transactions are quite common, and are very unlikely to be a suitable set of inputs for the intended payment amount
< shesek>
I can't see any way to parse a 1 in, 1 out transaction other than a self-transfer that didn't change ownership, can you?
< gmaxwell>
shesek: what matters more is how many inputs it has to choose from, not how many get used.
< shesek>
how many got used is also important, no? more inputs allows you more combinations
< shesek>
finding a combination of two utxos that matches the payment amount is harder than finding a combination of 4
< gmaxwell>
Yes, it's easier to find exact maches when there are more inputs. But e.g. 60 choose 3 already gives you more than 34 thousand possible combinations.
< gmaxwell>
on change ownership, 1-to-1 probably isn't a payment, but would often be paying into a users account at an exchange with a shared wallet.
< gmaxwell>
which for taint analysis sorts of purposes is just a payment.
< shesek>
right, I'm lumping this as "didn't change ownership" (it technically did, from the user to the exchange under the user account, but its still "his"). the idea being that you normally care about the exact amount you're sending, unless you're sending this to yourself (or to someplace you believe/expect to be under your control in some way)
< shesek>
I would still consider this leakage of private information, even if it moved to a different wallet and not so useful for taint analysis
< gmaxwell>
How? if you rewrite an address on a single txout from one address to another what information did you leak? maybe that your computer was online at the time? -- but thats inherent to making any kind of transaction.
< gmaxwell>
Again, it sounds like you're basically writing detect bitcoin core and print nasty warnings about litterly the only widely used wallet software that provides users with a decent degree of privacy at all.
< shesek>
well, I'm open to listening and trying to find ways to improve it :)
< gmaxwell>
Well I keep saying that I don't see how these things are releated to privacy and I'm still not hearing a response.
< shesek>
gmaxwell, if it was to an address that was later known to be associated with an exchange, it would leak that you likely sold your bitcoins, rather than sent funds to someone else's exchange deposit address
< shesek>
another common case for no change is moving to a new wallet or selling off fork coins using "send max", which has obvious privacy implications
< gmaxwell>
There is no 'send max' in many wallets.
< sipa>
shesek: in the 1-input 1-output case it seems indeed more likely than not that no change of ownership was involved
< shesek>
UIH-1 is pretty effective at detecting the change output of transactions produced by wallets that minimize immediate fees without much long-term thought or smart coin selection
< sipa>
but if we're talking about multiple inputs... not so much
< shesek>
UIH-2 is pretty effective at detecting that a transaction was not produced by a short-term-fee-minimizing wallet software
< gmaxwell>
It's the absense of alternative explinations that make an action identifyable, not the presence of a single articulable cause.
< hamma>
hello
< gmaxwell>
shesek: FWIW 20% of the donations to me that I see are 1-in-1-out.
< hamma>
anyone here
< gmaxwell>
(maybe people emptying out wallets? dunno.)
< gmaxwell>
so that page at the top recommends to improve privacy users should "Try to avoid creating change addresses" ... but then you want to display a warning for privacy loss when they do? :P
< shesek>
well, I display a warning that it seems like it might be a self-transfer. if they know that this was a payment and not a self-transfer, they'll know there's nothing to worry about. but if they just emptied their old wallet into the new one by using "send max", it'll give them a warning that could help them learn for the next time they do this
< shesek>
sipa, I think that you still have some pretty high likelihood even with 2-3 inputs, especially if you consider wallet software that's not as sophisticated as core
< echeveria>
I really caution against doing things like that. people will treat the absence of warnings as a sign of safety, which we know not to be true.
< shesek>
I would really love to run an analysis and get some numbers >_<
< gmaxwell>
shesek: so what happens if there are other outputs to the same spks that are spent from? clearly that wasn't a "send max".
< shesek>
echeveria, the message shown when there's nothing to show is "his transaction doesn't violate any of the privacy gotchas we cover. Read on other potential ways it might leak privacy.", with a link to https://en.bitcoin.it/wiki/Privacy#Blockchain_attacks_on_privacy
< gmaxwell>
shesek: actually thats something you could note? incomplete spend. E.g. if there are more unspent outputs to the same scripts being spent which weren't spent.
< shesek>
yes, I definitely could, and turn off the "possibly self-transfer" message when that happens.
< gmaxwell>
any incomplete spend ultimately hurts privacy, because its one of the main causes of taint snowballing.
< shesek>
ah, I see, so an explicit message about not spending all available utxos in one go... yes, definitely :)
< shesek>
noted!
< echeveria>
shesek: my comment stands, really. just by seeing the message they leaked their own privacy.
< shesek>
by visiting a public block explorer you mean?
< gmaxwell>
I agree that electrum's "send max" is privacy hurting, but changeless isn't an especially strong hurestic to detect it unfortunately.
< echeveria>
shesek: the best block explorer is a search box that tells you you're a moron as soon as you click in a box labeled "enter TXID".
< sipa>
of course, the most important thing to put on a blockexplorer is "Warning: looking up your own addresses on a blockexplorer leaks your privacy to the site operator"
< shesek>
echeveria, well, esplora is open-source, you could self-host it :)
< gmaxwell>
yea, use of the explorer (or electrum in the first place) is a bigger privacy loss than "send max".
< shesek>
gmaxwell, its not just "send max" though, its also users doing manual coin selection for transfers between their own wallets
< gmaxwell>
just always display "privacy note: this address has been looked up on a public explorer" :P
< shesek>
I've done this countless times
< sipa>
shesek: sure, just don't include the warning in the open source version :)
< gmaxwell>
just have a config flag, "public_explorer=1"
< gmaxwell>
:P
< gmaxwell>
shesek: sure but my point above remains: its not the existance of a pattern that makes something a privacy loss, its the absense of alternative explinations.
< shesek>
gmaxwell, no way, "send max" for a wallet with hundreds or even dozens of transactions is probably the worse thing one could do, much worse than using an explorer
< sipa>
shesek: you can't compars those things
< shesek>
this is especially common around forkcoins selloffs
< gmaxwell>
shesek: if you run electrum at all (e.g. have that send max button) then EVERY time your start it, the software is connecting to random hosts on the internet and sending your complete address list to them.
< gmaxwell>
it's more or less a total privacy loss.
< shesek>
I'm running electrum with EPS and oneserver :)
< gmaxwell>
okay you're weird. but you know that isn't true for more than a tiny percent of users. :)
< shesek>
and "send max" is available on nearly all consumer wallets that I know of, its a super useful feature that users are asking for
< shesek>
its not just an electrum thing
< gmaxwell>
Similarly going and looking up your transactions on most explorers is linking your IP to it, browser tracking cookies etc. And at least some sell the data to third parties.
< echeveria>
more than that. just the simple act "someone is interested in this transaction" is insane information.
< sipa>
shesek: they're both bad... but you can't compare them; linkage with an IP address is terrible in some cases and more or less harmless in others
< shesek>
right, but you'll be linking the few addresses you're looking up at the moment, and the link is just that you looked at them, not that they're yours
< gmaxwell>
shesek: and every one of those wallets sends all their addresses to a third party server on start, though some of them its just one server instead of anonymously run internet ones.
< shesek>
"send max" links all your wallet addresses together, on a public blockchain, for all ethernity
< sipa>
shesek: but maybe they were already clearly linked together for other reasons
< echeveria>
doesn't matter. the fact that someone cares at all about a transaction makes it hugely interesting.
< sipa>
i am super happy that there is a decent explorer now for debugging stuff out there
< sipa>
but i'm concerned about making it sound like it's an actual production tool
< sipa>
i know people will use explorers, and one that gives good information is better than one that confuses everything
< shesek>
echeveria, it is interesting, but the link is not as strong as when linking addresses together as inputs of a tx. I'm not defending using public block explorers, just saying that there might be some worse things :)
< sipa>
but really... we shouldn't encourage using the
< gmaxwell>
shesek: an actual sendmax would also spend all inputs that were cojoined siblings from prior spends... and that would essentially never happen from a BNB changeless spend.
< gmaxwell>
except when there just wasn't any linked history at all.
< sipa>
like... if this privacy detection feature causes people to go look up all their transactions because of a gamification like feeling "oooh let's see how my transaction did here?!", it's probably a net negative...
< sipa>
(only talking about widely used public instances here)
< shesek>
gmaxwell, sorry, BNB?
< gmaxwell>
I'm concerned with it being like the "bitcoin privacy project" which had a bunch of spurrious privacy unrelated ratings that caused it to derate the only options that had remotely good privacy (bitcoin core and armory) and rate over them a dozen wallets that sent all the users addresses to third parties.
< sipa>
shesek: branch and bound, the algorithm we use for changeless input selectio
< shesek>
ah, okay. and what does cojoined siblings mean?
< shesek>
like, outputs of the same transaction?
< shesek>
probably not, because a user creating multiple outputs to himself in the same tx is quite unlikely
< gmaxwell>
shesek: like if tx1 spends spk A, B, C, D. and then tx2 spends C, D, E, F then spks a-f probably belong to one user (absent coinjoins)
< gmaxwell>
(and there is a related hurestic that you can do if you identify change, but its far less reliable)
< echeveria>
sipa: it's definitely being promoted as "use this tool to evaluate your privacy", which is sort of in line with a "enter your credit card to see if it has been stolen" sort of thing. perhaps not in this case, but for sure something not to be promoted.
< gmaxwell>
A send-max will spend any spendable outputs to A-F.
< gmaxwell>
If something spends coins to E, F, G, H but there are also coins paid to B.. it's almost certantly not a send-max.
< gmaxwell>
it might be some kind of manual selection payment or a exact match payment.
< echeveria>
if anyone wants data, looking at the thefts from Electrum are good examples. it involves "send max" from around 500 different user wallets from $100 up to $100,000 in various formulations of outputs.
< shesek>
echeveria, being promoted by whom? I see this more as a way to reach the attention of people already using the block explorer regardless
< sipa>
i do like that the detected patterns are just links to the wiki explaining it, and not just a good/bad rating
< shesek>
but also, I'm not seeing him promoting users to go and check their own transactions proactivly to get the ratings, just that it helps improve their privacy (in my view, by giving them this information when they go to the block explorer either way)
< shesek>
gmaxwell, but I think he meant specifically the text by samson? my text says even less
< shesek>
gmaxwell, do you think that displaying privacy analysis information is a good idea but the heuristics need some adjustments, or is it a misguided effort in your view?
< shesek>
I'm getting the feeling you're somewhat negative about all this, genuinely interested in understanding your position
< gmaxwell>
I think it's probably useful, but care has to be taken to not give misleading results... and having people going to a public explorer and pumping in their addresses is a really bad pattern, and I'm not sure how to discourage that.
< shesek>
the reason I came here asking for questions is exactly that care :)
< echeveria>
first and permanent item on the list: you just messed up by ending up on this page.
< midnightmagic>
:-/
< shesek>
I do want to do this right and am very open to feedback
< gmaxwell>
Well I still can't see how the UIH is essentially anything other than bitcoin core (derrived code) detection. And I think that sysematically putting up privacy warnings on the by far most private option (in spite its other costs) is really harmful.
< gmaxwell>
shesek: as far as the going and typing in your txn to check your privacy, it could work by only showing the privacy flags on the full block view... but I somewhat doubt that would help because people will still first try to lookup by address/txid.
< gmaxwell>
(so my advice there, submit patched to electrum/core/ga to provide the privacy notes :P )
< shesek>
UIH 1 or 2? UIH 1 is quite effective with most typical consumer wallet software. and you can check for some fingerprints first to try and rule out bitcoin core transactions (say, fee sniping nlocktime)
< gmaxwell>
(and if you want to do a wallet type detection, that seems worthwhile, but it should be that, and not in triggering spurrious privacy warnings)
< gmaxwell>
I don't understand how UIH is privacy related at all.
< gmaxwell>
I get that its a hurestic that electrum or whatever won't violate.
< shesek>
UIH-2 is effective as an heuristic for "produced by core or by some other non typical consumer wallet software"
< shesek>
I can understand how it might not be effective, but how come they're not even related?
< sipa>
as a neutral message "This transaction matches a pattern that is not common in all wallet software; software this is known to produce this type includes X, Y, Z, ..."
< gmaxwell>
if your only interest in it is detecting the wallet software thats fine, but then just show a wallet software estimate.
< gmaxwell>
There are much stronger identifers of the wallet software.
< sipa>
tx version, anti fee sniping locktimes, low r grinding, ...
< gmaxwell>
support for mixed sw and non-sw inputs. ability to pay to varrious output types, multisig...
< shesek>
its not necessarily about which wallet it was, just the fact that it was a non-typical-consumer-wallet is useful too
< echeveria>
as a historical reminder, this is the sort of thing that blockchain.info had on their site, "taint analysis" which gave a series of important sounding but utterly meaningless pieces of information about a transaction. it was used by themselves and others to sell snake oil, because of course the underlying heuristic was easily disrupted without meaningful change in privacy, or potentially even a decrease.
< gmaxwell>
yea, that taint analsis thing was pretty much an rng. :P
< sipa>
shesek: i agree that leaking what software you're using is a leak
< gmaxwell>
shesek: the end effect is you get some weird privacy warning on a small but non-trivial percentage of bitcoin core txn, ... but how does this help anyone? it just spreads fud. The txn are usually identifyable through other characteristics.
< sipa>
but if there is software X which never does A, and software Y which does, but randomly based on unobservable properties of the wallet... you can't say this leaks information about your behavior... just about the software you're using
< gmaxwell>
indeed, it's a leak but it's a pretty well defined one, and a really hard to avoid one (esp if everyone isn't suicide packed into never improving)
< sipa>
but saying "this transaction is identifiable as being produced by software X" is exactly right
< echeveria>
there's a lot more of those than you've mentioned. UTXO selection is a privacy leak, just by the differences in the way wallets do it. there's some BIP proposals which try to define canonical forms for transactions, but manages to be inept in its description to the point it cant be implemented. it means that any wallets that did have yet another bit of definition about what created them, rather than less.
< gmaxwell>
So I wouldn't see any issue with giving an estimate of the originating software, --- thats a well defined thing which users should know is usually detectable.
< gmaxwell>
But giving little warning dings instead is fuddy.
< gmaxwell>
also adding extra inputs itself is good for the network in general, and can be good for privacy if done right. But if this thing is printing a warning on it it'll be harder to get other wallets to do it.
< gmaxwell>
which then prolongs it being an identifier, which is bad to whatever extent being identifyable is bad.
< gmaxwell>
it also doesn't give a baseline.
< shesek>
re "(esp if everyone isn't suicide packed into never improving)" - for a wallet that wants to maximize its anonymity set, it makes sense to use characteristics that are as common as possible, even if its less ideal for other reasons. for example, payjoin are intentionally trying to avoid uih-2 to enjoy a bigger anonymity set. and some of the arguments against bip69 lexicographical ordering were on a similar basis, that wallets that do
< shesek>
implement it will stand out in the transition period
< gmaxwell>
Imagine that there are three wallets -- A, B, C -- which are utterly indistinguishable but combined add to just 3% of users. Then there is another wallet with 25% of the users, D. You can't distinguish the software for A/B/C but you can identify D. Yet D has a much larger anonymity set.
< shesek>
it could perhaps be displayed differently, just as a note rather than a "warning" sort of thing
< shesek>
I already did split this up into red and orange messages, where the red ones can be improved by changing user habits or wallet software, and orange are the "we kinda have to leave with them"
< shesek>
ugh, live
< gmaxwell>
bip69 also just didn't add anything in and of itself, it's not like there was a "this is much better but its inconsistent so don't do it"
< shesek>
I could actually just analyze the whole history for UIH-2 and see what % of transactions is matched by it
< shesek>
is there a % under which you would consider this a valid heuristic?
< echeveria>
shesek: wallets aren't even slightly homogeneous even today. there's so many indicators of what wallet software is being used. did you know that you can fast poll fee-rate APIs and correlate wallet transactions that way? the value of a feerate is typically pretty unique.
< gmaxwell>
A valid huristic for what though? I believe the _only_ information it provides is that the transaction was not produced by one of the pieces of software that will never include additional inputs. But for most transactions you can already tell that from other factors, so for those transactions it tells you precisely nothing additional.
< shesek>
gmaxwell, and if the "pieces of software that will never include additional inputs" account for, say, 85% of transactions on the network, and including additional inputs puts you in a 15% minority, wouldn't that be quite harmful to privacy?
< gmaxwell>
no! because other properties of the transaction already identify the source software.
< shesek>
echeveria, its not about which wallet it is, more of a boolean "is it a typical immediate-fee-minimizing consumer wallet software" thing. this also helps with analyzing change outputs, as you can follow up on chains and see that some outputs are later spent by non-consumer-wallet software, which gives away this wasn't the change
< gmaxwell>
also virtually all consumer wallets have no privacy at all because they phone home their address lists, so it would be really odd to say "this txn has poor privacy because we can detect it coming from one of the only widely used solutions that has any privacy at all"
< shesek>
gmaxwell, do you think payjoin should not make an explicit effort to avoid triggering uih-2?
< gmaxwell>
shesek: you can already often idetify the wallet from other criteria regardless, nlocktime, rbf, fee levels, the size of signature r values, use of script types, mixture of script types, supported output types.
< shesek>
right, and this is one more of these heuristics... for every specific heuristic you look at, one could say "but look at all these other heuristics, why use this one?"
< gmaxwell>
shesek: I think never breaking UIH-2 probably gives it a smaller anonymity set, though its hard to tell. To make that determination you need to estimate a wallet distribution on the network, and then determine which of them will violate UIH-2 sometimes (but rarely)
< gmaxwell>
(and thats ignoring that enforcing UIH-2 would mean that you could just payjoin less often, which is sure to be a privacy loss)
< gmaxwell>
And also avoiding UIH-2 normatively would almost certantly be a mistake, since as wallets improve more will violate it.
< gmaxwell>
shesek: as far as "why use this one" -- I don't see anything wrong with displaying an estimate of what software is in use. What I'm complaining about is picking a single criteria and sticking a notice on it as if it were a privacy problem when it's really just a weak signal of the source software among many.
< gmaxwell>
it would be like sticking a warning on BIP69 txn. They're a minority of transactions so in that sense they hurt the user's privacy.
< gmaxwell>
But I don't think it should have a warning because it's not a privacy problem except leaking information about the source software that is probably leaked in a dozen other ways too.
< shesek>
well, its not just a single criteria though, there are 7 now and I'm looking to add more
< shesek>
how about if it looked less like a warning and more like informational text?
< gmaxwell>
or closer to home, you could put a nice little warning on blockstream GA transactions, they're super identifyable due to their scripts, and only a very small percentage.
< shesek>
oh yes, can definitely identify weird script patterns and notify about them
< shesek>
like, displaying a message if the script pattern is used by <X% of transactions sounds very reasonable to me
< gmaxwell>
I think this is likely to hurt users by randomizing preferences.
< shesek>
I mean, until we have taproot, using something like GA does have a very real privacy cost
< gmaxwell>
the problem is that singling out practices make them seem bad. even if in fact they are massive privacy improvements, long run, and we're just waiting for everyone to catch up.
< shesek>
but you agreed earlier that mixed output type is a real concern, and this is also caused because we're waiting for everyone to catch up
< gmaxwell>
The underlying leak is that the software is identifyable, but that is (hopefully) well known, but it wouldn't hurt to make it more well known. But it also isn't going away any time soon.
< gmaxwell>
mixed output types are not just catchup though they're partially that. I mean they'll continue to exist for the forseeable future, the only thing that will reduce that at all is really taproot, and then only if Musig style multisig becomes ubiquitious (which is doubtful, because of accountability and additional rounds).
< shesek>
I mean, I did know that I was going to be flagging a large % of transactions made by segwit wallets with this "sending to a different script type" message. it felt bad knowing that I'll do that, but being an early adopter of new technology does have a real negative effect on privacy, and I think users shoudl be made more aware of this
< gmaxwell>
So you think it's better to pressure to users to use wallets that send all their addreses to third parties in order to avoid a privacy warning that effectively only means that the user is using software that better preserves privacy?
< gmaxwell>
I'm exhausted by this conversation.
< gmaxwell>
I think estimating the software and giving some kind of probablity on that would be interesting and helpful. I think singling out by-themselves-privacy-irrelevant aspects of particular software as privacy problems is not helpful, because it will actually discourage improving privacy... and if in doing so it manages to warn more about software that is actually more private (perhaps for reasons
< gmaxwell>
that aren't even visible from the transaction) that that would do users and the ecosystem a terrible disservice.
< gmaxwell>
(though I think the best advice to users is that their software is always identifyable. I think it practice it pretty much always is.)
< gmaxwell>
(either through the transactions or from network analytics)
< shesek>
I think its better to provide tools and information to help users make educated decisions. hiding the fact that early segwit adopters leak more information about their change outputs because users might misunderstand what this means and use worse non-segwit wallets is not a solution, imo. better education and better tools that make more information accessible to the user are
< shesek>
I do agree that the information could be presented better - maybe some more metrics, make some of them look less like warnings and more like informational text, etc. maybe re-organize it and display some of the metrics outside of the "privacy analysis" area.
< gmaxwell>
that aregument would be stronger if your coverage were actually even.
< shesek>
even?
< gmaxwell>
consistent. unbiased. e.g. all through our earlier discussion, you singled out behaviors of bitcoin because they aren't done by electrum, yet bitcoin core is responsible for a much larger share of transactions, so you're litterally singling out the larger anonymity set.
< shesek>
not specifically electrum, quite a lot of other wallet software that I've had experience with. but yes, agreed, I didn't appreciate how good bitcoin core was at avoiding change, and didn't realize that it might break UIH-2 quite often
< shesek>
I will run some analysis on historical block chain data, there's a lot that can be learned from it. one thing that I'm interested in is the percent of two outputs transactions that match UIH-2. will come back with numbers :)
< gmaxwell>
You might want to look over time, you may see the release when bitcoin core wouldn't violate UIH-2.
< gmaxwell>
(you can also see that period on graphs of the utxo set size. :( )
< shesek>
re "I think estimating the software and giving some kind of probablity on that would be interesting and helpful", you don't feel this would be "helping the other side" too much? doing some basic heuristics that are obviously done by every blockchain spying company in existence is one thing, getting smart about this and making some not so easily obtainable private data public is getting into murky territory
< shesek>
I'm not here to develop sophisticated spying tools :)
< shesek>
but, I don't know, its hard to tell where to draw the line. one could definitely argue that its better for everyone to have these tools if some people have them
< gmaxwell>
well you could apply this to any of this but I think the reasoning is that esp 100% public data, stuff like these determinations are things that anyone remotely competent could do.
< gmaxwell>
you could do something like take all the properties that identify the wallet, stick them into a generic clustering tool. and then just return a probablity for cluster1/cluster2/cluster3/cluster...n.
< gmaxwell>
and a metric of how common that pattern is like the panopticlick metrics.
< gmaxwell>
which would be more useful for user in seeing privacy levels but less useful for tracking particular people.
< gmaxwell>
and also easier to maintain, as you don't need to go looking at how wallet work and how they change over time.
< gmaxwell>
just take a bunch of features (inputs/outputs ordered?, nlocktime, oorbf, uih, script types, signature sizes,... ) and feed it to a k-medians or whatnot.
< shesek>
what is the oo in oorbf?
< gmaxwell>
opt in rbf... it's a flag set in the sequence numbers. right now I think electrum is the only widely used wallet that sets it by default (ga does too, IIRC)
< gmaxwell>
bitcoin core can but its a setting.
< gmaxwell>
though also some services that run core set it, so it might be a non-trivial percentage of core txn even though its not a default.
< gmaxwell>
The idea with clustering is that you can basically give every transaction a poition in n-dimensional space (all the flags), and then estimate what percentage of transactions are similar to it vs far from it.
< shesek>
I know what is opt in rbg, but why two oo? never seen it written as "oorbf", google didn't bring up anything either
< shesek>
* rbf
< gmaxwell>
oo was inteded to be oi -- the keys are next to each other on qwerty. Sorry. :P
< gmaxwell>
(or maybe I was thinking opt-out who knows)
< shesek>
oh okay :)
< shesek>
yeah, this definitely makes sense, I've done some work with similar clustering before. but I'm still not quite sure if I feel this falls under something "anyone remotely competent could do", this could make some information that's not so trivial much more trivial
< shesek>
even if its based just on public data, I still feel the line can be crossed
< gmaxwell>
What I don't think would help the public though is if you were going and reading the code for a dozen wallets to extract their behaivor, thats a lot of custom work... don't give that to the badguys for free.
< gmaxwell>
stats on well known data is something I can go on freelancer and pay someone to do though.
< shesek>
do you think that the data being public is a sufficient criteria for this being moral? I think the amount of effort, experience and creativity that was devoted to it should count too
< shesek>
like, if someone devoted a year of a team of developers to building tools that extract private information from public blockchains, and setup a website making all of this information publicly available, I would consider him to be on the bad side
< shesek>
maybe easier to think about this as "overall negative or positive effect on bitcoin users' privacy" than in morality terms
< gmaxwell>
no, I don't think the data being public is the line.
< wumpus>
i think i disagree, sometimes it take a public statement like that to show people how information is available, the "bad guys" will be doing the same but without letting anyone know, of course
< gmaxwell>
but for example, if you already know that commercial anti-privacy services are doing it, I think it's likely pretty beneficial to make it public.
< shesek>
right, so no state-of-the-art stuff, only things you believe are likely already being done by component companies
< gmaxwell>
Right.
< shesek>
I guess this could be a good line
< wumpus>
and you never know that
< wumpus>
if you can think of it, someone at the NSA probably thought of it too
< gmaxwell>
Also, if you can think of the idea and getsomeone on fiverr to implement it, thats less harmful than something where a 5 year bitcoin expert has two spend three weeks extracting behavior patterns from codebases.
< gmaxwell>
Also depends on what attacker we're thinking about. Like, the state level attackers if they don't know essentially anything we do here, it's only because they don't care.
< gmaxwell>
the commercial anti-privacy services seem to have kind of mixed competence.
< gmaxwell>
like I can imagine that some have never realized you could use UIH to distinguish some wallet software.
< wumpus>
that's definitely true
< gmaxwell>
and I've seen ones make mistakes like "output type matches input scripts, that certantly change" and as a result gets all kinds of messed up conclusions. :)
< gmaxwell>
I've typed a couple other things here and erased them to not give people ideas. :P
< wumpus>
indeed, provoostenator wrote about those kind of things, it's kind of worrying, false positives are the most scary especially when they're used to decide who to prosecute or rob or kill etc
< gmaxwell>
in any case, you can always ask the question "how does doing this thing make a better world, how does it make a worse one"
< gmaxwell>
shesek: so another criteria is that you might have some good indicator but if its not actionable by users, it's not as interesting to show... as another one where the user could just change what they're doing.
< shesek>
but I think the orange stuff should mostly be changed to non-warnings, just informational text, some of it maybe outside the privacy section
< shesek>
and some of the texts could use some corrections/clarification
< shesek>
I'm off, going to get some food. thanks for the interesting chat, I learned quite a bit. I'll definitely look for ways to improve based on your feedback. will also run an historical analysis and get back with some numbers. cheers :)
< gmaxwell>
When should we start thinking about making bech32 the default -addresstype ? We've been able to send to bc1 addresses since 0.16.0 which was released feb 2018.
< gmaxwell>
Should it be a goal for 0.19? 0.20?
< gmaxwell>
Bitcoin core makes it pretty easy to get other addresses when you need them for compatiblity, fortunately.
< gmaxwell>
when satoshi completely broke compatiblity with the p2p protocol, he gave two years time for people to upgrade. On one hand, that was a much harder break, on the other hand, it as a much smaller ecosystem at that time.
< gmaxwell>
I'm somewhat disappointed to see that some parties that I've past identified as tech leaders still don't have sending support... which is an argument against rushing.
< gmaxwell>
But maybe announcing a plan to default by version X might help some parties prioritize.
< instagibbs>
a hotter take would be to make segwit v1 unenforced inside p2sh
< instagibbs>
bad idea for money-loss reasons :)
< gmaxwell>
I dunno if it would be likely to result in money loss.
< gmaxwell>
There is a fine line between encouragement and arm bending though.
< gmaxwell>
if it's taking e.g. ledger >1 year to add support, maybe more encouragement is required.
< sdaftuar>
at what point do you think we can nuke getblocks support, and reasonably assume that peers should be using getheaders instead?
< gmaxwell>
Uh, instrument a node and see if anyone is using it now?
< gmaxwell>
(I'm not saying now, but even knowing when would require knowing what use is now)
< gmaxwell>
I hadn't been thinking we would depricate responding to it, but if its not being used anymore then maybe.
< sdaftuar>
i found a bcoin:1.0.2 peer that seems to be using it :( somehow that peer also supports compact blocks(?!)
< gmaxwell>
okay well there are like three of those on the network, and AFAIK that software is only maintained for bcash anymore, so that wouldn't be a hard blocker, I think.
< sdaftuar>
i was looking at some net stuff that i'd like to refactor a bit to make things more efficient for blocksonly peers (i think i mentioned to you my idea of adding blocksonly edges to the network). would be nice to ditch some of this old unused code.
< gmaxwell>
also careful with believing subver, there is a bunch of stuff that lies. (though in this case I'd believe it)
< pinheadmz>
gmaxwell: I'm a dev on bcoin team, i can tell you bcash is being not supported. bcoin is our main focus
< sipa>
a surprising error: i can't run the bitcoind unit tests simultaneously by two different users on my system, as they both try to create a /tmp/test_bitcoin owned by themselves
< pinheadmz>
sdaftuar: can you explain a bit more the behavior you're seeing from that peer? In my own research I notice bcoin does send getblocks instead of getheaders, unless checkpoints are enabled
< sdaftuar>
pinheadmz: i'm just looking at the stats for data received by p2p message type, and i noticed that for a peer claiming to be bcoin 1.0.2 there are only getblocks messages, and no getheaders messages, being received by my node
< sdaftuar>
i also notice plenty of other normal-looking traffic from that peer (inv, cmpctblocks, headers, etc)
< pinheadmz>
when I test against Bitcoin Core, I noticed Core sent getheaders first, then getdata for the actual blocks
< sdaftuar>
yes, that makes sense to me
< pinheadmz>
bcoin does support compact blocks, why is that a (?!) :-)
< sdaftuar>
getblocks is just a deprecated message (or so i thought) -- compact blocks build on top of the headers p2p protocol messages that were added much later
< sdaftuar>
so i would have assumed anyone implemented compact blocks would have swithced to getheaders
< sdaftuar>
pinheadmz: any idea why bcoin still uses getblocks? sounds like you are saying that sometimes it uses getheaders instead?
< pinheadmz>
looking into it now... is the deprecation of getblocks documented? I was about to start work on BIP159 (NETWORK_LIMITED) but maybe I should checkout the existing networkprotocol behavior first. bcoin does send `sendcmpct` and then `getblocks` which will retrieve compact blocks from the peer.
< pinheadmz>
Actually Im curious why Core sends `getheaders` first, an then requests the full blocks?
< sipa>
pinheadmz: how would it know what blocks to fetch?
< sdaftuar>
pinheadmz: no, it being deprecated is just in my head, that was how i thought about. i think bitcoin core hasn't sent getblocks messages in 5+ years or so.
< sipa>
you can find out the block hashes using getblocks or getheaders, but the latter lets you verify PoW before fetching the actual blocks
< pinheadmz>
from `inv`
< pinheadmz>
makes sense
< sipa>
pinheadmz: that's only for newly mined blocks
< sipa>
(or in response to getblocks...)
< sipa>
and with BIP130, new blocks are also announced using headers instead of invs
< sipa>
but that's optionally negotiated between peers
< gmaxwell>
sdaftuar: I considered it deprecated too, fwiw, (though didn't expect it to get disabled anytime soon-- it's just useless/redundant now)
< gmaxwell>
pinheadmz: a header is not much larger than an INV but radically more useful.
< gmaxwell>
pinheadmz: so essentially for blocks we'd replaced INVs with headers.
< sipa>
i suspect there are a number of tools (perhaps block fetching analysis stuff) that use getblocks/inv/getdata/blocks still, which you probably won't see on the public network but are still pretty useful to support
< pinheadmz>
gmaxwell: ok thanks I see this now. SO `getheaders` asks the peer to just send headers without an `inv`
< pinheadmz>
same behavior with SPV wallets
< sipa>
pinheadmz: it sends headers instead of inv
< gmaxwell>
pinheadmz: yes, its pretty much exactly like getblocks but sends headers.
< pinheadmz>
then headers are verified, then `getdata` with block hashses
< sdaftuar>
right
< sipa>
right
< gmaxwell>
right
< sipa>
bitcoin core since 0.10 uses this as synchronization mechanism; it's called headers-first sync
< tryphe>
how were blocks gotten before getheaders again? just block #? the new way has much better performance
< gmaxwell>
pinheadmz: also they are not fetched with getdata if they are possible the best chain, if they're guarenteed to be worse, they're not fetched. This avoids some block flooding denial of service attacks.
< sipa>
getblocks takes a locator and a count, and responds with an inv announcing the next count block hashesd
< sipa>
iirc
< sipa>
the protocol doesn't expose anything by height
< sipa>
as height is ambiguous in the presence of brances
< sdaftuar>
that's basically right. there's also this hacky thing we do where we send redundant inv's to trigger additional getblocks responses to help peers download the whole chain (since inv's are capped at 50k responses). that's what got me to wanting to remove this :)
< pinheadmz>
thanks guys, going to get the team on BIP130
< sipa>
bip130 is a step being headers-first sync
< sipa>
*beyond
< pinheadmz>
is there a spec for headers first sync? (besides, this chat :-) which i think is pretty clear)
< gmaxwell>
sdaftuar: yea, when you said you wanted to remove it that was what I assumed you wanted to remove.
< sdaftuar>
(in particular it sounds like bcoin already implements bip 152, which is a step beyond bip 130.)
< sipa>
kthxbye
< sipa>
pinheadmz: no, not a spec - it's just a way of doing things, not really a protocol
< sipa>
(it didn't need any changes to the p2p protocol)
< pinheadmz>
will `getblocks` ever be unsupported?
< sipa>
as getheaders/headers existed for years before we started using it for block syncing
< gmaxwell>
pinheadmz: that was the subject of discussion.
< sdaftuar>
pinheadmz: i hope so, but i dunno. i might send an email to the -dev list asking if anyone would object?
< sipa>
sdaftuar: seems premature at this point
< sdaftuar>
sipa: yeah
< gmaxwell>
sdaftuar: maybe wait a bit so we don't waste time with 'bcoin' objecting, at least.
< sdaftuar>
well i might want to at least suggest that people start thinking of it as deprecated so we can get rid of it eventually
< gmaxwell>
unfortunate to have to keep around the feeding hack.
< gmaxwell>
sdaftuar: yes, thats fair, and something we should do.
< sdaftuar>
pinheadmz: thanks for jumping on here to discuss
< pinheadmz>
thanks for bringing it up!
< gmaxwell>
getblocks has been broken since day one, the design didn't consider that an a maximum size would be needed, and one got bolted on after the fact (iirc at the beginning of 2010).
< sipa>
gmaxwell: i'm gathering statistics on the prevalance of lowr transactions... and the numbers are surprisingly inconsistent when broken down by #sigs/tx
< sipa>
for 1 sig/tx: 3.4% above 1/2
< sipa>
for 2 sig/tx: 4.9% above 1/4
< sipa>
for 3 sig/tx: 8.6% above 1/8
< sipa>
for 4 sig/tx: 7.2% above 1/16
< sipa>
for 5 sig/tx: 11.6% abive 1/32
< sipa>
for 6 sig/tx: 5% above 1/64
< sipa>
for 7 sig/tx: 10.9% above 1/128
< gmaxwell>
why are you changing the threshold and number of sigs?
< sipa>
gmaxwell: because in tx with 5 sigs, there is a 1/32 chance that non-lowr wallets will produce a lowr solution anyway
< gmaxwell>
oh I thought that was the definition of lowness. :)
< sipa>
gmaxwell: oh, i see my statement may be misinterpreted
< sipa>
i mean for 5 sig/tx, (11.6% + 1/32) of transactions are fully lowr
< gmaxwell>
I'd expect bitcoin core users and ones that update more actively to consume more inputs... due to being larger more active wallets, and due to BNB.
< echeveria>
sipa: I'd expect that really.
< echeveria>
sipa: some forms of multisig are basically unique to specific software. almost all 11 of 15 transactions are going to be Blockstream's Liquid, for example. a large portion of 2 of 2 are going to be GreenAddress. people aren't really going out and making custom multisig sets outside of outliers.
< echeveria>
when I was looking into this a month or two ago I was surprised to see a small number of multisig transactions that combined uncompressed and compressed keys. that bucks the trend and goes against my previous statement.
< echeveria>
there's a very small number of 1 of 1 P2SH transactions too.
< gmaxwell>
the coinjoin bounty fund is a mixed compressed multisig.
< echeveria>
looking at the spend sets for multisig was interesting too. a number use a fixed third key and two other keys that vary. what signatures are made for multiple spends of the same script, too. is it always a fixed combination that satisfies, or a dynamic say, 2 keys of 3.