< jonasschnelli>
How is the process of merging "back" the state from the GUI repository? Is there a planed timeframe? Will there just be one PR to the main repository that includes all changes (like a backport PR)? Is that process documented already?
< bitcoin-git>
[bitcoin] MarcoFalke opened pull request #19922: test: Run rpc_txoutproof.py even with wallet disabled (master...2009-testMoreMiniWallet) https://github.com/bitcoin/bitcoin/pull/19922
< promag>
jnewbery: +1
< elichai2>
MarcoFalke: I can't manage to reproduce the error in #19920, instead I'm getting a really weird bug without the full details, any idea why? (I'm getting this: https://pastebin.com/raw/MBzJ0ixB)
< gribble>
https://github.com/bitcoin/bitcoin/issues/19920 | test: Fuzzing siphash against reference implementation [Request for feedback] by elichai · Pull Request #19920 · bitcoin/bitcoin · GitHub
< elichai2>
found the bug :) you're not allowed to do `&*it` on null 😅
< bitcoin-git>
[bitcoin] dongcarl opened pull request #19927: validation: Reduce direct g_chainman usage (master...2020-09-reduce-g_chainman-usage) https://github.com/bitcoin/bitcoin/pull/19927
< ariard>
jonasschnelli: just replied on bip324, AFAICT real-or-random and I favor MACing the length a la noise, even if as Lloyd is pointing we don't a concrete exploitation of it
< achow101>
jonasschnelli: IIRC PRs merged to the GUI repo are also pushed to the main repo simultaneously
< achow101>
allet.dat does but maybe he specified wallet.dat and not just ""
< achow101>
so that would mean it's a wallet name handling thing
< luke-jr>
"" is never on disk :p
< luke-jr>
you saw that apparently the actual on-disk filenames are being renamed, right?
< achow101>
yes
< luke-jr>
I'm not 100% sure they know what they're talking about in that regard, but it's weird
< gwillen>
off the top of my head, one way to eat the first character of a path component would be issues with quoting and backslash as a path separator on Windows
< luke-jr>
hmm
< luke-jr>
"It would have been Gentoo Linux with the wallet files on an NTFS partition." lol totally unexpected
< luke-jr>
I doubt the other guy has the same setup tho
< gwillen>
iiiiiinteresting
< gwillen>
could be an issue in the NTFS driver, that thing was always marked 'experimental'
< luke-jr>
what's the chance the other guy had Linux+NTFS tho
< achow101>
luke-jr: yeah, other guy is Win 10
< achow101>
could be a NTFS issue
< luke-jr>
seems unlikely
< luke-jr>
it's not like Windows and Linux share the same NTFS code
< achow101>
if he only see the problem on knots, then we could probably find the problem by looking at the diff?
< luke-jr>
achow101: the second guy had it on Core
< luke-jr>
I think
< gwillen>
any chance the win 10 guy is using WSL or something weird like that?
< luke-jr>
maybe if the Linux guy was using Captive NTFS.. he did say it was a long time ago
< gwillen>
in that case why doesn't every windows user see it, though
< luke-jr>
even the same user couldn't reproduce :/
< phantomcircuit>
anybody know how many transaction outputs are in the chain? (not utxo, txo)
< sipa>
years ago it was half a billion iirc
< andytoshi>
i have a simple script i used for my mimblewimble presentation, i can get this number in a couple hours
< andytoshi>
it seems to be taking 2-3 seconds per 100 blocks to scan, i don't remember it being so slow
< phantomcircuit>
andytoshi, i've rigged up rescanblockchain to tell me
< andytoshi>
ok cool. i had rigged the `getblock` rpc to dump the number of txouts per block and was using bash from there, but this is pretty brutal ... in the 40 minutes since i last spoke i'm up to block 200k. so it'll finish tonight :P
< sipa>
andytoshi: it may not... there are barely any transactions before 200k i think
< aj>
maybe update the coin stats index from #19521 and use that?
< andytoshi>
hmm, so, i definitely did this in fall 2016 for scaling bitcoin milan and it only took a few hours
< aj>
andytoshi: (maybe rusty's bitcoin-iterate is faster?)
< sipa>
7316308 transactions up to block 200000
< sipa>
out of 566745810 in total
< sipa>
phantomcircuit: given that there have been more transactions now than my claimed earlier historical number for the total txouts, you can safely disregard it
< yanmaani>
luke-jr: Maybe worth adding a check for it?
< yanmaani>
"if wallet.dat doesn't exist and allet.dat does, show a message box"
< yanmaani>
"Hi, a very rare bug has occured. We would be happy if you could email us at asd@asd.com and tell us what filesystem drivers you're using. To fix it, open that folder and rename allet.dat again."
< yanmaani>
bit hacky though
< phantomcircuit>
i think we've regressed on IBD somewhere, i have a server that's comically overpowered and there doesn't seem to be anything that's bottlenecking
< phantomcircuit>
running steady at about 100mbps cpu and disk basically idle on a 1gbps connection
< sipa>
phantomcircuit: do you have good peers to sync from?
< sipa>
the stalling detection logic can kick out the worst peers, but whether you get actually good ones can be hit or miss
< yanmaani>
what's usually the bottleneck for IBD? DB sync?
< aj>
phantomcircuit: i often find i'm stuck on block X from a slow peer, while the other peers are on block X+500 or so
< sipa>
yanmaani: depends... with lots of cache it's either network or (in-memory) utxo datastructure maintenance; with low cache it can be disk I/O
< phantomcircuit>
sipa, it must be the eviction logic cause im sure it would otherwise be network limited
< yanmaani>
how much disk I/O do you need? It's just 300gb or so right?
< aj>
yanmaani: disk io is mostly updating the utxo set, which is mitigated by cache
< yanmaani>
Can't you disable disk IO during IBD for utxo set?
< sipa>
yanmaani: yes, by making your cache big enough for the entire utxo set :)
< sipa>
which is 8 GB or so
< yanmaani>
No I mean can't you turn off DB sync and so?
< yanmaani>
Or will it just need to spill to disk regardless?
< sipa>
well the UTXOs need to be stored somewhere!
< sipa>
how will you validate transactions otherwise?
< yanmaani>
yeah, but there's no need to sync the database
< yanmaani>
You can have MongoDB tier safety
< yanmaani>
(during IBD)
< aj>
you have to have a database, it can be in memory or on disk; if it's in memory, it's in cache
< sipa>
if you set the cache big enough to keep the entire utxo set in memory, there will be no database I/O whatsoever during IBD
< sipa>
and it'll be flushed once at the end
< phantomcircuit>
yanmaani, if you set the dbcache high enough you will only write to disk once when you shutdown the node
< yanmaani>
If I set it say 99% of the way, will I notice a sharp slowdown, or is it smart enough to cache as much as possible?
< sipa>
it's a sawtooth function; our cache is kind of a weird mix between a buffer and a cache
< sipa>
once it fills up, it's written entirely to disk, and cleared
< sipa>
(the reason for this is an unusual design that lets us remove entries from the cache if they're created and deleted between flushes, without them ever hitting disk)
< sipa>
and at least years ago, we tried several alternative designs that kept some part in memory when flushing, but this turned out to be always worse
< yanmaani>
And I'm guessing the database is being interacted with by fwrite() rather than mmap
< sipa>
it's leveldb
< sipa>
so whatever leveldb uses, which is a mix (iirc it's all fwrite on 32-bit platforms, and a combination of fwrite and mmap on 64-bit ones)
< yanmaani>
So it's leveldb but with a custom cache?
< sipa>
it's probably better to call it an in-memory database, backed by an on-disk leveldb database
< sipa>
leveldb has its own caching too
< yanmaani>
wouldn't a file-backed mmap be better?
< yanmaani>
it's like malloc but with explicit swap handled by the OS
< sipa>
you're welcome to try, but we're really talking about different layers
< sipa>
the on-disk caching layer is a byte array
< yanmaani>
No, I mean isntead of the RAM blob being used for in-ram caching
< sipa>
there is no RAM blob
< sipa>
there is an in-memory database, with expanded, efficient, data structures
< sipa>
not serialized bytes
< yanmaani>
Isn't the in-memory database in RAM????
< sipa>
yes, but it's not a blob
< sipa>
no need to yell
< yanmaani>
It's several mallocs?
< sipa>
yes
< yanmaani>
so, wouldn't it make more sense to replace them with backed mmaps that you never flush? Then the OS would have a lot more liberty to optimize
< yanmaani>
than if you force it into RAM
< sipa>
seriouysly, you're welcome to try
< sipa>
i've spent months on optizing that stuff
< sipa>
it's a highly unusual design, but yes, based on the experiments we did back then, it works very well
< yanmaani>
huh
< sipa>
the unusual part is that the UTXOs really have a create-lookuponce-deleteimmediate cycle
< sipa>
which is very strange for databases
< sipa>
usual things aren't designed to take advantage of the degree to which looked up entries are immediately deleted
< sipa>
(and they'll instead create some sort of log that contains the creation and deletion, which still get written to disk at flush time)
< sipa>
by having an allocation per entry, you can just throw it away instantly when spent, and forget about its existence entirely
< sipa>
if you have a few hundred MB or more of cache, it means most UTXOs never hit disk at all
< yanmaani>
Wouldn't mmaps do this nearly as efficiently? Or is the OS too eager to flush changes?
< sipa>
sigh
< sipa>
you're talking about a different layer
< yanmaani>
No, I mean that the malloc is replaced by a mmap
< yanmaani>
And the mmap'd file is then treated like a RAM buffer of 8 GB
< sipa>
then you'd get inconsistent state on disk in case of a crash
< yanmaani>
Yeah, is that a problem?
< sipa>
yes
< yanmaani>
Can't you just remove the UTXO state in case of a crash?
< yanmaani>
at least during IBD
< sipa>
and sync from scratch? :o
< yanmaani>
if you make it fast enough it should be a gain on net
< yanmaani>
and dropping ACID guarantees and giving it the MongoDB treatment seems like it would make things faster
< sipa>
in pruned mode, you'd need to start over redownloading even
< yanmaani>
yeah, that's true. For pruned mode, you'd need to make sure it was synced properly.
< yanmaani>
Although if you're substituting malloc() for mmap() of a temporary file, isn't the persistence as good? "The synced stuff stays, the stuff in RAM doesn't"
< sipa>
there is no guarantee that mmap flushing happens in the same order as writes
< sipa>
is there?
< yanmaani>
no
< yanmaani>
if it crashes, your mmap will be garbage
< yanmaani>
but if you're using it to substitute malloc it should be fine
< yanmaani>
since there's no expectation malloc persists on crash
< sipa>
ah, i see
< sipa>
what advantages would this have? if used with the same cache size as you'd use now, it wouldn't be any faster or have other advantages i think
< sipa>
it'd permit you to make a cache larger than your ram, which may or may not be better
< sipa>
depending on how fast disk is etc
< yanmaani>
If used with the same cache size as you have now, it's roughly identical, but uses less RAM/is more fair
< yanmaani>
(OS can swap it out as it needs if there's a deficit of RAM)
< yanmaani>
if used with the max cache size*
< sipa>
that assumes the OS can predict better what's useful to have cached
< yanmaani>
it has some caching algorithm on a block level, yes
< yanmaani>
and users who are using zram/zswap will benefit from compression
< yanmaani>
it'll avoid sync disk writes in the cases where cache is too small
< sipa>
sure, but it doesn't know for example that after deleting some UTXO entry it's no longer useful to keep it around
< yanmaani>
after deleting the utxo entry, the ram is filled with something else surely?
< yanmaani>
(or it's never touched again, in which case the OS won't give it a very high priority)
< sipa>
at some point, sure
< sipa>
anyway, you're welcome to try and benchmark :)
< yanmaani>
yeah. Where is the cache?
< yanmaani>
i.e. what file
< yanmaani>
is it src/index/*/
< sipa>
CCoinsViewCache in src/coins.h
< yanmaani>
right, thanks!
< luke-jr>
mallocs can get swapped out too..
< yanmaani>
Only if you have swap enabled
< yanmaani>
Otherwise it'll go straight to thrashing
< yanmaani>
With a file-backed mmap, it can flush the pages to disk without consuming your swap
< sipa>
yanmaani: it would add I/O though, because the OS will start writing dirty pages from the mmap to disk, and then you'll read them again and write them again when flushing to the "real" database on disk
< sipa>
though that wouldn't be I/O on the critical latency path
< yanmaani>
yes, but so does normal thrashing
< sipa>
if the cache is so large that it gets swapped out to disk, you're better off picking a smaller cache
< luke-jr>
yanmaani: did you see my recent PR?
< yanmaani>
there's two options with malloc: either swap (if that's enabled), or thrashing (swap out libc). With mmap, you can also flush it
< yanmaani>
sipa: not necessarily - it might figure out which bits aren't so useful, and swap out those, for a net gain
< yanmaani>
That might also work. I don't know which approach is better.
< sipa>
yanmaani: given that every piece of data in the cache is accessed exactly once - when it's spent, i don't see how the OS could predict what is useful and what isn't
< yanmaani>
I suppose I'll have to benchmark it
< sipa>
yeah, it'd be interesting to know
< sipa>
you'll need some mmap-backed allocator i guess
< luke-jr>
I think a more likely improvement would be to flag cache entries rather than delete them, when writing to db
< sipa>
luke-jr: ?
< yanmaani>
I wonder if it'd make more sense to write everything to DB and have it in some extremely lax sync mode
< yanmaani>
so, take out the cache, and set the DB to MongoDB mode
< luke-jr>
sipa: after flushing changes to db, keep them in memory in case they're read soon
< yanmaani>
write during IBD, then flush when synced
< sipa>
luke-jr: i tried that
< luke-jr>
flag them so you know they don't need to be written anymore
< luke-jr>
sipa: why didn't it work?
< sipa>
luke-jr: at least a few years ago, it will never a win; the reason is that there is less memory available to exploit the "newly created entries that get deleted before ever hitting disk"
< luke-jr>
sipa: you'd delete the flagged entries when you need more space?
< sipa>
luke-jr: yes, i believe i tried something like that
< luke-jr>
I don't see how this can be a lose :/
< sipa>
where the flushing is done in two tiers; in one, you'd flush everything, but keep the most recently created half around
< sipa>
and in the second tier, when the memory is full, delete all non-dirty entries
< sipa>
luke-jr: because of extra CPU to walk the cache and find things to delete
< luke-jr>
std::move it to a second cache? :x
< sipa>
there are definitely more combinations that could be tried, not claiming it's a certain loss
< sipa>
but after trying half a dozen things, i think it was time to give up :)
< sipa>
it may also depend on relative speeds of RAM/CPU/disk
< sipa>
this was also pre-pertxout that was added in 0.15; that may have changed things
< luke-jr>
hmm
< sipa>
i did have a design a few years ago that i'd like to get back to at some point, which would permit flushing in the background without invalidating on-disk cache
< phantomcircuit>
sipa, nvm for some reason my gateway<->modem was in 100 not 1000
< phantomcircuit>
<.<
< sipa>
eh, on-disk storage
< phantomcircuit>
>.>
< luke-jr>
hmmmm
< sipa>
so you could be continuously writing the oldest entries, outside of the latency critical path
< sipa>
though it'd need extra memory to keep things ordered
< yanmaani>
Is std::unordered_map really the fastest in-memory kv store around?
< sipa>
it's nontrivial to make it work correctly with reorgs etc though, but not impossible, if i remember
< sipa>
yanmaani: it's not
< yanmaani>
But it's advantageous for some other reason? Or is it just being used for some small part?
< sipa>
i think people have tried some variations
< sipa>
the biggest differency would come from using different allocation strategies, i think
< gribble>
https://github.com/bitcoin/bitcoin/issues/16801 | faster & less memory for sync: bulk pool allocator for node based containers by martinus · Pull Request #16801 · bitcoin/bitcoin · GitHub
< achow101>
how do I make a const unsigned char* into a span?
< sipa>
achow101: do you have its length?
< achow101>
yes
< sipa>
Span<const unsigned char>(ptr, len) should work
< achow101>
ah, thanks
< sipa>
post c++17 we can add type inference, and you can use Span(ptr, len)
< luke-jr>
doesn't C++17 include std::span anyway? :P
< sipa>
luke-jr: no, that's only in c++20
< sipa>
achow101: if you're passing to a function that takes a Span<const unsigned char> already, you can use fn({ptr, len})
< luke-jr>
oh :x
< achow101>
sipa: even better
< sipa>
or {beginptr, endptr}
< jb55>
is there something in 0.20.0 -> 0.20.1 that would cause it to redownload the blockchain? trying to figure out why it's doing that after I upgraded my kernel+bitcoin. no configs changed ...