salvatoshi has quit [Read error: Connection reset by peer]
TallTim has quit [Ping timeout: 255 seconds]
<bitcoin-git>
[bitcoin] ajtowns opened pull request #28686: refactor: Split per-peer parts of net module into new node/node module (master...202310-nodenode) https://github.com/bitcoin/bitcoin/pull/28686
realies has quit [Ping timeout: 260 seconds]
brunoerg has joined #bitcoin-core-dev
someone235 has quit [Quit: Connection closed for inactivity]
brunoerg has quit [Ping timeout: 245 seconds]
salvatoshi_ has quit [Ping timeout: 240 seconds]
flooded has joined #bitcoin-core-dev
vasild_ has quit [Quit: leaving]
test_ has quit [Ping timeout: 258 seconds]
brunoerg has joined #bitcoin-core-dev
mudsip has joined #bitcoin-core-dev
mudsip has quit [Client Quit]
salvatoshi_ has joined #bitcoin-core-dev
salvatoshi_ has quit [Remote host closed the connection]
TallTim_ is now known as TallTim
<fjahr>
#proposedmeetingtopic Fix for hash_serialized2 calculation and implications (for context see #28675 and #28685)
<sipa>
has there been any thought about distributing UTXO sets over P2P? presumably this will require at least some merkle-structuring to permit validation of individual chunks, or even FEC coding to allow sharding
<sipa>
because if so, that may inform the kind of UTXO hash that's used?
<jamesob>
I wonder if there is any wisdom in committing to a sha256sum of the snapshot file itself in the source code as a belt and suspenders to avoid the issue that fjahr discovered
<_aj_>
it's always good to authenticate data before parsing it, imo
<sipa>
_aj_: what's the point of using muhash?
<_aj_>
sipa: it's readily available already
ghost43 has joined #bitcoin-core-dev
<_aj_>
sipa: ie, you can verify it against any node that's running coinstatsindex
<sipa>
_aj_: well, yes, but for the whole set only, not for 50 MB chinjks
<sipa>
*chunks
pablomartin4btc has quit [Ping timeout: 255 seconds]
<_aj_>
sipa: mostly it seemed like a possible way to have the set of hashes available for a height soon after the height was mined, without requiring the node to have any downtime while the utxo set gets rehashed
<sipa>
_aj_: ah, hmm
<sipa>
let's talk in the meeting, i guess?
<_aj_>
sipa: i think you may have already exhausted how much i thought about it :)
<achow101>
There is 1 pre-proposed meeting topic this week. any last minute ones to add to the list?
<gleb>
Hi. Poor internet here
<fjahr>
hi
<darosior>
Good day
<achow101>
#topic priority projects
<core-meetingbot>
topic: priority projects
bitdex has quit [Quit: = ""]
<TheCharlatan>
hi
<abubakarsadiq>
hi
<achow101>
The final count for the voting is Package relay - 19, Silent payments - 11, Multiprocess - 9, Legacy wallet removal - 9, cmake - 8, erlay - 7 stratum v2 - 4
<kanzure>
hi
<achow101>
We decided at CoreDev to go with the top 3 projects instead of 4, so that would be package relay, silent payments, and one of multiprocess or legacy wallet removal.
<pinheadmz>
ooh a runoff
<achow101>
i would be happy to have multiprocess as the priority project since legacy wallet removal has a plan to move forward this release anyways
<_aj_>
i count 9 for cmake?
SebastianvStaa has quit [Quit: Client closed]
<instagibbs>
cmake is happening when it's happening regardless iiuc
<_aj_>
true :)
<instagibbs>
Two Weeks(TM) whenever ready
<fanquake>
Sometime after c++20
SebastianvStaa has joined #bitcoin-core-dev
<TheCharlatan>
^^
<_aj_>
fanquake: woah</keanu>
<achow101>
_aj_: i've not included w0xlt as they aren't in the org, and haven't seemed to be active recently?
<_aj_>
achow101: ok
<achow101>
any other thoughts on multiprocess vs. legacy wallet removal?
guest-127 has quit [Quit: Client closed]
guest-127 has joined #bitcoin-core-dev
<vasild>
which one of mine did you count?
<achow101>
vasild: the signaling one
<instagibbs>
legacy wallet removal is your baby, if you think it's fine not being priority it's probably fine?
<fjahr>
achow101: if you are fine with multiprocessing taking the 3rd spot sounds good to me, I think multiprocess certainly needs more attention to make progress
<_aj_>
removing code seems easier to rebase than adding code, maybe?
<b10c>
hi
<sipa>
hi
<achow101>
_aj_: you'd think so. but I started rebasing my 2 year old removal branch and it's not a fun time
VisitingPeer has joined #bitcoin-core-dev
<fjahr>
_aj_ I think it's the same, it just depends on how intermingled the change is with the rest of the code.
<achow101>
is there a tracking issue for multiprocess?
<glozow>
is ryan here?
<instagibbs>
ryanofsky ping
<sipa>
just pinged him IRL
<ryanofsky>
no, can easily create one
<instagibbs>
In RimworLd
<furszy>
I think that people working on the wallet will continue reviewing PRs (or at least I will) vs multiprocess that needs some momentum
<fjahr>
This has implications particularly for assumeutxo, the hashes in the chainparams will need to change. But we are also discussing a bunch of additional changes to improve further while we change the resulting hash already.
SebastianvStaa has joined #bitcoin-core-dev
<fjahr>
Particularly dergoegge found another issue with his fuzzer on negative values and proposed to get rid of VARINT completely. I would really like to hear if people would like do that or if we should keep the change more minimal. I have started preparing the change with the VARINTs removed but it would be good to know which one we want now so we only have to change the chainparams once since that causes some significant review
<fjahr>
effort.
<fjahr>
To be precise I think we have the choice between getting rid of all VARINTs in kernel/coinstats, getting rid of it just where dergoegge found the issue and leaving it and checking for negative value in deserialization (I guess we will add this check either way).
<fjahr>
So generally would be good to have a few more eyes on this but particularly looking for feedback on the VARINT question.
<sipa>
For performance reasons, my guess would be that removing all VARINTs from UTXO hashes is better.
<theStack>
as commented in the PR, I think removing VARINTs in `ApplyHash` would make sense, but is only an option if that doesn't come with noticable loss in performance
<sipa>
I suspect that VARINT coding costs per-byte more than SHA256 per-byte.
<sipa>
But I haven't benchmarked it.
<dergoegge>
something that also seems weird to me is that the serialization format is different for hashing with muhash
<theStack>
sipa: oh, interesting, i would have expected it's the other way round. but i don't know too much about sha256 internals TBH
<fjahr>
I think at the time we already new the muhash one made more sense, just kept the hash_serialized one around for consistency
<fjahr>
*knew
<sipa>
SHA256 (without hardware acceleration) is maybe a dozen CPU cycles per byte; varint coding involves lots of branches that can be mispredicted... less work overall, but probably a lot lower ILP
<sipa>
Does the fact that it may impact chainparams really matter? I can imagine it's likely we want to revisit the actual hashing scheme for assumeutxo before mainnet deployment anyway, depending on how P2P serving would happen.
<achow101>
was hash_serialized_2 always incorrect?
VisitingPeer has quit [Ping timeout: 248 seconds]
<fjahr>
That would be another option, to apply the MuHash serialization so we are consistent
<fjahr>
achow101: since 2018
<theStack>
achow101: i think it's incorrect since v0.17 (see linked commit in the issue)
<theStack>
at least that's the earliest tag that is shown by `git tag --contains 34ca75032012562d226b06ef9e86a2948d3a8d16`
<fjahr>
sipa: at least we need to fix the testnet and signet params before the release
<sipa>
fjahr: that doesn't sound like a big deal
<fjahr>
not sure when we will even get into p2p distribution, if that's a focus for jamesob for 27
<fjahr>
sipa: I just would like it if we can come to an agreement now then we don't need another follow-up to 28685
<achow101>
if we change the serialization, will the coinstatsindex need to be reindexed?
<fjahr>
no, the hash_serialized is not part of it
Guest2177 has quit [Quit: WeeChat 4.1.0]
<sipa>
i'm happy with either just using the muhash serialization for hash_serialized_3, or keeping VARINT
<sipa>
but a benchmark would be useful
<achow101>
why are the serialziations different?
<sipa>
i suppose because they were designed at different times
<fjahr>
what I wrote above, I think pieter came up with the one for muhash but we kept the old one for consistency of hash_serialized
<sipa>
well i think i also came up with the one for hash_serialized
<fjahr>
:)
<sipa>
the muhash one is simpler, the hash_serialized one is more compact
<achow101>
it'd be nice if they were consistent
<achow101>
but it seems like people don't particularly care?
<sipa>
for muhash i think performance matters less also, because the serialization cost is probably dwarfed by the muhash math overhead
<sipa>
can someone benchmark if it matters at all, and if the more compact one is not significantly better, pick the muhash serialization for hash_serialized_3 ?
<fjahr>
I can run the benchmarks and post them in the PR
SebastianvStaa has quit [Quit: Client closed]
<sipa>
great
<achow101>
sounds like a plan
<achow101>
any other topics to discuss?
pablomartin4btc has joined #bitcoin-core-dev
<MacroFake>
Is the content hash needed, if the full file hash will be checked in the future?
<MacroFake>
I guess to protect against the file changing while reading it?
<sipa>
i could imagine introducing some kind of merkle-structured content hash in the future, that's used as a full file hash too
<sipa>
but it sounds like there is a lot of design space for that
<MacroFake>
Are you saying with your proposal the full file hash would be equal to the content hash?
pablomartin has joined #bitcoin-core-dev
<MacroFake>
(With full file hash I mean the dumb hash of the byte file, without parsing the content or looking at it)
<fjahr>
I think flat file hash might be limiting as a hard requirement when we think about p2p distribution, not sure if it makes sense as temporary belt and suspenders
<sipa>
no; i'm saying you'd verify the full file by computing its contents hash... which is designed in such a way that it's easy to validate the serialized file (and chunks of it) against
BlueMatt[m] has quit [Ping timeout: 240 seconds]
<sipa>
and not having a dumb hash of the file at all
pablomartin4btc has quit [Ping timeout: 240 seconds]
<sipa>
i guess there could also be a dumb hash of the whole file as a final last-resort check, but for P2P distribution you really need a way to check for incorrect chunks very early anyway
<sipa>
i guess it could be a tree-structured hash over the serialized file (and not over individual utxo entries in it)?
<achow101>
seems like something that requires more thought than we can do for a fix for this release
<sipa>
but this doesn't need to be part of this meeting, i think
<MacroFake>
[16:51] <sipa> i guess it could be a tree-structured hash over the serialized file (and not over individual utxo entries in it)?
guest-127 has quit [Quit: Client closed]
<MacroFake>
Yes, that is what I thought, but agree, no need to be part of the meeting
<sipa>
Well, it's not anymore!
<_aj_>
if you're generating something, i don't quite see why you wouldn't just make a .torrent file of the serialized utxo set?
<sipa>
_aj_: and then store a hash of the torrent file as assumeutxo hash?
<_aj_>
sipa: yeah
<sipa>
i guess that is a tree-structured hash (with just 2 levels of tree)
<_aj_>
sipa: along with the hash_serialized_3 or muhash, to verify it
<sipa>
i was more thinking along the lines of SHA256'ing say 4 KiB chunks of the serialized file, and then build a Merkle tree over those hashes, and make that the file hash
<sipa>
and then whenever you download a (range of) chunks from someone, they'd give a Merkle path to connect it to the file hash
<sipa>
this gives a lot of freedom later in how to schedule the downloading
<sipa>
i picked 4 KiB because it's as small as you can go while retaining the property that the Merkle overhead is negligible in bandwidth/cpu compared to the data itself
<_aj_>
sipa: i just wonder if there's much overlap between p2p distribution of old chainstate vs keeping a node up to date, and if we'd effectively just be reimplementing bittorrent?
pablomartin has quit [Ping timeout: 255 seconds]
<_aj_>
sipa: like where are we storing the old serialized utxo set? block data somehow? ldb snapshot? somewhere else? is that better than just having it dumped to disk as a file?
<sipa>
that's a fair point... the distribution mechanism (if it's done entirely using file-level hashes) is effectively bittorrent
BlueMatt[m] has joined #bitcoin-core-dev
<_aj_>
sipa: and especially for "version 0.1", might be worth just getting something that's quick and works?
<sipa>
how the whole snapshotting happens isn't something i've thought too hard about
<sipa>
because that's the other side of this... we don't just need easy distribution, but also easy creation
<_aj_>
it doesn't need to be that easy to create? one person creates -- everyone else downloads, and verifies the muhash against coinstatsindex?
<sipa>
_aj_: well if we want P2P fetching then you need to be able to fetch from nodes that just have been running, and didn't themself bootstrap from the snapshot
<sipa>
so for that the snapshotting needs to be efficient and deterministic (at least deterministic w.r.t. whatever hash mechanism is used for validating fetching)
<_aj_>
sipa: depends if the p2p is over the bitcoin network or over a bittorrent network?
<sipa>
_aj_: oh you mean *actually* bittorrent
<_aj_>
sipa: as a strawman at least, yeah
<sipa>
i think it'd be nice if fetching utxo snapshots can happen from random network nodes, but admittedly it is a very different problem than what's served now
<_aj_>
sipa: fanquake updates chainparams. stops his bitcoind node. generates the serializaed snapshot at height X. publishes the torrent file. others vet it. ACK the PR. torrent file is published in bitcoincore.org, and seeded by random folks?
<sipa>
do we ship a bittorrent client inside bitcoin core?
<sipa>
or do you need to manually run a bittorrent client to get the snapshot file?
<_aj_>
sipa: i think it'd be nice too; but i think making your node do that would be the exact same resource usage as being a bittorrent seeder of the same data? like, you'd need an extra copy of the utxo set, and have an extra bunch of p2p messages equivalent to the bittorrent protocol?
mraugust has joined #bitcoin-core-dev
<_aj_>
sipa: presumably it wouldn't be a default config option due to the extra resources? and that selecting the snapshot height is probably a manual thing that code can't automatically adopt?
<_aj_>
sipa: strawman: manually run a bittorrent client?
<sipa>
i was imagining that nodes would automatically make snapshots at predetermined heights, and be able to serve the last few - through some sharding mechanism that this doesn't result in storing 3-4 full chainsate copies
<_aj_>
hmm, is there any recent data about how quickly things cycle through the utxo set?
<sipa>
_aj_: if users have to run a bittorrent client manually, i worry that someone will just offer chainstates instead, which are just as easy to distribute and faster to load
<sipa>
so i think if we want to think about any kind of distribution mechanism, the goal is actually to be that it becomes *the* easiest way to bootstrap a node
<_aj_>
could we structure the snapshot so that it's super fast to import into a chainstate?
<_aj_>
presuming we have a hardcoded merkle root, presumably we want to have many peers like bittorrent, rather than only outgoings like IBD, and also relay the chunks we've downloaded and checked to other peers
brunoerg has joined #bitcoin-core-dev
DarrylTheFiiish has quit [Remote host closed the connection]
brunoerg_ has joined #bitcoin-core-dev
brunoerg has quit [Read error: Connection reset by peer]
<_aj_>
sipa: i find myself fairly convinced by the "why not just bittorrent chainstate/ directly". new strawman: just dumptxoutset every 25000 blocks, and serve that; but only keep the most recent 3 of them? it's an extra 30GB but on a full node that's got 600GB anyway, that's not that terrible?
<sipa>
_aj_: that's not terrible; i think it'd be nicer if we could let pruned nodes also participate in the serving
<sipa>
and with FEC, that's not impossible; say you use an 8-bit 255/16 RS code... so now UTXO-serving nodes can choose to store between 18.75% (3/16) of a UTXO set and 300% of a UTXO set (if they keep 3 snapshots)
<sipa>
and to bootstrap, you need to just find a set of peers that together have 16 distinct shards
<sipa>
sadly, unless the merkle tree structure commits to all FEC shards, you can't validate individual pieces of coding until you have all 16 for a given chunk
<_aj_>
i'm not following the shard vs chunk split?
<_aj_>
if you're 3/16, are you keeping 3 shards of every chunk?
<sipa>
so the idea is the file is split up in chunks, say 4 KiB each
<sipa>
these chunks get FEC coded in a way that 16 shards of chunk (each 512 bytes) are sufficient to reconstruct it
<sipa>
but each chunk has 255 possible shards, and every node would pick between 1 and 16 distinct integers in range 0..255, and keep those shards for each chunk
<sipa>
so whenever you have 16 distinct shards for a chunk, you can reconstruct the chunk
<sipa>
but by offering 255 possibilities rather than just 16, you now don't need to find a set of peers that together offer all 16... any 16 out of 255 suffices
<_aj_>
512B*16=8KiB not 4KiB? is that a lot of error detection or typo?
<_aj_>
filling in missing data as a 2d puzzle seems complex, i guess; do i get shards 1-16 for chunk 5 from peer 1, or shard 1 for chunk 10-26? i guess you just throw enough peers at the problem and it's fine though
<sipa>
brb, lunch
<sipa>
_aj_: err yes 256B, not 512B
brunoerg_ has quit [Read error: Connection reset by peer]
brunoerg has joined #bitcoin-core-dev
<_aj_>
seems like you should rarely need more than 6 peers, even with only 3 shards each
brunoerg_ has joined #bitcoin-core-dev
brunoerg has quit [Read error: Connection reset by peer]
<sipa>
_aj_: so one way of looking at it, is it expands every 4 KiB chunk into 255/16 * 4 KiB = 63.75 KiB of data... but you can reconstruct the whole thing with *any* 4 KiB out of those 63.75 KiB; so each node can choose to store some subset of those 63.75 KiB only
jQrgen has joined #bitcoin-core-dev
<sipa>
(but aligned with 256B boundaries(
preimage has joined #bitcoin-core-dev
<_aj_>
sipa: so this would kind of be `dumptxoutshards 65 115 252` (which does dumptxoutset but also does FEC calculations but also potentially is 3/16th of the size) every 20k blocks, then you serve those for 20k/40k/60k blocks, then you delete them.
test_ has joined #bitcoin-core-dev
<_aj_>
i guess if we can get pruned nodes to serve the data, there's less need for nodes that are still downloading the data to serve it out to others; that would let you download the data in order, rather than randomly. that might then let you import it into leveldb as you download it?
ibiko1 has joined #bitcoin-core-dev
flooded has quit [Ping timeout: 240 seconds]
ibiko1 has quit [Ping timeout: 252 seconds]
brunoerg_ has quit [Remote host closed the connection]
<bitcoin-git>
bitcoin/master 762404a Vasil Dimov: i2p: also sleep after errors in Accept()
<bitcoin-git>
bitcoin/master 5c8e15c Vasil Dimov: i2p: destroy the session if we get an unexpected error from the I2P router
<bitcoin-git>
bitcoin/master 77f0ceb Andrew Chow: Merge bitcoin/bitcoin#28077: I2P: also sleep after errors in Accept() & de...
<bitcoin-git>
[bitcoin] achow101 merged pull request #28077: I2P: also sleep after errors in Accept() & destroy the session if we get an unexpected error (master...i2p_accept_issue22759) https://github.com/bitcoin/bitcoin/pull/28077
realies has joined #bitcoin-core-dev
<theStack>
wouldn't it be nice if utxo dumps had magic bytes (probably with a version) at the beginning so they could be easily identified as such? for better error handling ("this is not an UTXO dump"), but also long-term for utilities like file(1)
VisitingPeer has joined #bitcoin-core-dev
VisitingPeer has quit [Client Quit]
realies has quit [Ping timeout: 255 seconds]
Guest7 has joined #bitcoin-core-dev
vysn has joined #bitcoin-core-dev
<jamesob>
Apologies I missed the meeting today, and great job to all involved diagnosing and fixing. I have no intention of doing p2p distribution of snapshots, so someone else will have to tackle that if it is desired; I'm honestly not sure that juice is worth the squeeze.
<jamesob>
In any case, it would be a shame to see a very useful feature held up by deciding on what the perfect hash structure is; that can always be changed later afaict
brunoerg has quit [Remote host closed the connection]
brunoerg has joined #bitcoin-core-dev
p has joined #bitcoin-core-dev
p is now known as Guest7362
Robotico has joined #bitcoin-core-dev
Guest7362 has quit [Client Quit]
paddingtonbear has joined #bitcoin-core-dev
Robotiko has joined #bitcoin-core-dev
Robotico has quit [Quit: Leaving]
Robotiko has quit [Remote host closed the connection]
Robotico has joined #bitcoin-core-dev
Robotico has quit [Remote host closed the connection]
realies has joined #bitcoin-core-dev
<paddingtonbear>
hey - i can learn more than i can teach here - but was sent by someone to discuss the bitcoind upstream issue
<paddingtonbear>
this is x.com/123456 (pad)
<paddingtonbear>
happy to talk in dms or in broad daylight haha
brunoerg has quit [Remote host closed the connection]