< PRab>
no surprise, but running 0.11.2rc1 uneventfully.
< gmaxwell>
Likewise.
< PRab>
If you guys want, I can crash my computer. I have been hit by the DB corruption bug several times and it looks like this release might fix that.
< btcdrak>
PRab, great.
< PRab>
btcdrak: Great that its running, or great that I can test crashing?
< btcdrak>
that it is running great
< gmaxwell>
PRab: yes, it should fix most of the corruption reports on windows. All except the anti-virus related ones, as far as we know.
< jonasschnelli>
PRab: would be nice if you could test the v0.11.2rc1 on real-windows (non VM).
< jonasschnelli>
I did some VMWare power-off simulations... it did mess up the db even with the fix. But this particular fix is better tested in a non-vm environment.
< wumpus>
I tested it on a real windows laptop (with NotMyFault to inject kernel faults) and wasn't able to get any corruption with the new syncing behavior, while the old behavior was to corrupt every single time
< wumpus>
so even if not perfect it's a lot better
< wumpus>
gitian is broken for me :( "Could not download some packages, please run gbuild --upgrade" when trying to sign the mac package
< dcousens>
jonasschnelli: I've had similar nodes get 'stuck' before, so unless its repeatable, I'm not sure that is related to that PR
< wumpus>
nothing wrong in install.log
< jonasschnelli>
dcousens: right. I wrote that in a comment. Probably unrelated to the secp256k1 switch PR
< jonasschnelli>
But a serious bug,...
< wumpus>
faketime is already the newest version. libc6:i386 is already the newest version. 0 upgraded, 0 newly installed, 0 to remove and 3 not upgraded.
< dcousens>
jonasschnelli: I know, just providing an anecdote that might further that
< dcousens>
further support*
< wumpus>
jonasschnelli: I had a similar issue a while back of a node that just stopped catching up with the chain - never been able to repeat it
< dcousens>
also, you wrote F.I.Y, for interested yours? :P
< jonasschnelli>
yeah... it seams to be hard to create clear steps to reproduce...
< wumpus>
(and I didn't build with debug symbols at the time, so was unable to do anything useful to troubleshoot internal state)
< jonasschnelli>
s/F.I.Y/F.Y.I... :)
< wumpus>
if it happens again I'm prepared. But it never happened again
< wumpus>
well it didn't happen during IBD in my case, just with a node that was synced but suddenly stopped without rejecting the chain or anything like that
< wumpus>
(and after four days it magically started again)
< dcousens>
That said, in restarting the node, I found the most prominent factor was the peers it was connected to
< wumpus>
that wasn't the problem in my case. it was a node with many incoming connections, and I did try dropping the network to get new connections
< wumpus>
it is likely some race condition, where it forgets about being insistent about requesting the next block it needs
< wumpus>
likely caused by one or more uncooprorative peers
< wumpus>
re: the gitian issue, adding --upgrade to my gbuild line seems to have solved it. I don't understand why this is suddenly needed, but ok, great :)
< GitHub175>
[bitcoin] jonasschnelli opened pull request #6979: [Qt] simple mempool info in debug window (master...2015/11/qt_mempool_easyinfo) https://github.com/bitcoin/bitcoin/pull/6979
< dcousens>
jonasschnelli: what block did you pause on OOI?
< GitHub120>
[bitcoin] laanwj closed pull request #6693: Set Windows TCP buffers to 64KB to match OSX and Unix (master...issue_6554) https://github.com/bitcoin/bitcoin/pull/6693
< GitHub4>
bitcoin/master b56953e Wladimir J. van der Laan: qt: Periodic translations update
< wumpus>
jonasschnelli: any luck building the image? maybe try updating vmbuilder, if you're building it from source
< sipa>
wumpus, jonasschnelli: where in the codebase are we using powers-of-1024 based units?
< sipa>
bitcoin traditionally uses 1000-based units (feerate is per 1000 bytes, block limit is 1000000 bytes), but I can't say I've payed much attention to it myself
< jgarzik>
sipa, it's all over Qt
< sipa>
if anything, be consistent (don't use MB for 1048576 bytes, and certainly don't use KB - that would mean kelvin bytes)
< jgarzik>
several core uses in default values, usually related to file size
< wumpus>
sipa: I'm not sure. But in this case jonasschnelli tried to introduce a use of 1024*1024
< sipa>
I know
< wumpus>
which was sensibly commented on
< sipa>
I think using MB = 1024^2 is a no-go. My question is whether or not we should aim for only MB = 1000000 or only MiB = 1024^2
< wumpus>
there's probably a few other cases but in general the idea is to use 1000000/MB
< wumpus>
only 1000 *unless* there is a convincing reason to use 1048576/MiB, which I'm not sure exists
< sipa>
We clearly need a --si commandline flag
< sipa>
*ducks*
< wumpus>
e.g. I can understand using MiB when you're selling memory chips which for hardware reasons, only in powers of two
< wumpus>
but for bitcoin core which is a high-level application there shoulod be no reason to not just use SI units
< sipa>
agree
< wumpus>
and it's always been that way, there's not really a reason to discuss or revise this :)
< * wumpus>
likes Kelvin*Bytes though
< sipa>
I was just asking whether or not we currently have cases where 1024-based units are exposed to users. Consistency with already existing uses would be a reason.
< sipa>
"Our cache is very hot. Many KB."
< wumpus>
hehe
< * wumpus>
going to add a note about using SI units to the developer notes
< jgarzik>
as noted, we have many cases exposed to users
< sipa>
i think -dbcache may be 1024-based
< wumpus>
should be changed
< jgarzik>
-dblogsize, several Qt UI attributes
< jgarzik>
several defaults are pow2
< sipa>
i'm in favor of making user-exposed amounts consistently 1000-based, if there aren't too many
< wumpus>
yes
< jgarzik>
+1
< wangchun>
Dear core devs, how do you plan to response to the latest actions regarding BIP101?
< wumpus>
not at all
< sipa>
Bitcoin Core will implement any uncontroversial hard fork.
< jgarzik>
wangchun, Scaling Bitcoin Pt 2 is the next step in figuring out uncontroversial hard fork
< GitHub159>
[bitcoin] laanwj opened pull request #6981: doc: Add note about SI units to developer notes (master...2015_11_si_units) https://github.com/bitcoin/bitcoin/pull/6981
< jgarzik>
IMO the consensus was already "move to MiB" before today...
< * jonasschnelli>
is confused... :/
< * Luke-Jr>
dislikes "_iB" notation and would avoid other software enforcing it.
< Luke-Jr>
I don't suppose Qt has a nice localisation method for these?
< sipa>
jgarzik: now i am confused too
< jgarzik>
ok, I meant SI units
< jgarzik>
whatever the abbrevation is
< Luke-Jr>
SI units are 1000-based kB/MB/GB
< jgarzik>
Luke-Jr, agree!
< jgarzik>
AFAIK the consensus is SI units </correction>
< Luke-Jr>
I'd prefer configurable at some level (DE level would be nice), but not worth the time if Qt doesn't make it easy.
< sipa>
jgarzik: there are 2 questions? "should we use 1000 or 1024-based units" and "should we use Mi-style prefixes for 1024-based units?"
< Luke-Jr>
Right now, the traffic thing uses 1024-based KB/MB/GB
< Luke-Jr>
(which is what I prefer)
< sipa>
that's the only unacceptable thing for me :)
< sipa>
either 1000000/MB, or 1049576/MiB
< sipa>
with a preference for the first
< sipa>
1048576 i mean
< gmaxwell>
sipa: really you have a prefered for the first?
< sipa>
yes, though i have a tendency for the aecond
< jgarzik>
1) 1000-based units, 2) I hate Mi style prefixes -- grew up thinking 1024==KB, 1024*1024==MB, etc. However I think that's the direction of the world...
< jgarzik>
easy enough to ensure we eliminate as many 1024-based units as possible, to minimize Mi prefixes
< Luke-Jr>
jgarzik: the direction of the world is defined by people in the world, such as us to a small degree ;)
< gmaxwell>
Normally _bandwidth_ is measured in megabits which is a 1000 based unit. Transfer regretably is often in 1024 based units. "MB" should never be used for 1024 based because it's unnecessarily confusing; MiB is unambigious. The purpose of the messages is to communicate, not to be pretty.
< sipa>
gmaxwell: i think the world would be a (marginally) better place if everything used 1000-based prefixes, it would make reasoning so much simpler
< jgarzik>
problem - so few know WTF MiB is, out in the world
< * Luke-Jr>
did not realise megabits were typically 1000-based
< gmaxwell>
Luke-Jr: not just typically but always except when handled by drooling idiots. :)
< gmaxwell>
jgarzik: far more common than you think, I think. But even still, if so then their ignorance is apparent to them and a 10 second search will resolve it.
< sipa>
Luke-Jr: a 100 Mb/s link can do 11.9 MiB/s :)
< Luke-Jr>
Well, in that case, /me suggests 1000-based with an unchecked-by-default checkbox that uses 1024 in GUI without the "i" silliness. <.<
< gmaxwell>
Luke-Jr: please, no freeing options.
< gmaxwell>
er freeking.
< gmaxwell>
It'll pepper the code with conditional logic that will never get tested.
< jgarzik>
nod
< sipa>
agree
< Luke-Jr>
gmaxwell: this is an option that different people want to use. so obviously it would get tested.
< sipa>
nobody will bother
< sipa>
no actual user will bother, rather
< Luke-Jr>
at least I will
< gmaxwell>
Do you even actually use the GUI except for testing? :)
< Luke-Jr>
yes
< Luke-Jr>
the only time I use the RPC is for testing.
< jgarzik>
Similar to -logtimemicros option -- it's basically an option 1-2 people will use, and the rest of the world will be unaware.
< jgarzik>
Should just pick a useful behavior and not make it an option.
< Luke-Jr>
let's just drop all non-English languages too :P
< sipa>
it doesn't hurt to add more digits
< sipa>
and the cases when they're useful is not when you know it in advance
< jgarzik>
I would rather remove -logtimemicros and make it unconditionally default-on.
< sipa>
agree
< sipa>
me too
< wumpus>
I disagree
< jgarzik>
From OS perspective there is no added cost
< wumpus>
seconds are precise enough for logging
< jgarzik>
1) other userland servers log microseconds, 2) it's a pointless option that will be used by no one - yet we must maintain
< sipa>
i would love to be able to get bug reports "when did X take 10ms?! it should just be changing a variable!"
< Luke-Jr>
let's go with systemd binary logs! /s
< sipa>
nobody will even notice now
< Luke-Jr>
sipa: lol
< jgarzik>
default-off options should be terminated with extreme prejudice.
< zooko>
jgarzik: +1
< kanzure>
replaced with what?
< jgarzik>
(translation: there should be a good argument for keeping and maintaining them, above "it's nice")
< wumpus>
bla, this is useless
< sipa>
wumpus: what is your reason against microsecond logs?
< kanzure>
ah okay. "it's nice" as insufficient. ok.
< kanzure>
you could send all logs through zeromq inside the process, then people can subscribe to logs in different ways using zeromq.
< wumpus>
sipa: too precise timestamps are a possible security issue
< wumpus>
e.g. timing attacks
< sipa>
hmm
< wumpus>
also it makes correlation easier, to breach privacy
< sipa>
i thought we dropped that concern when switching to -logtimestamps default on
< jgarzik>
sipa, yep
< wumpus>
yes, for seconds
< gmaxwell>
I would rather stop logging things that breach privacy.
< Luke-Jr>
as long as sipa hates 1024 KB, and I prefer 1024 KB, there's no way to satisfy both without an option. If you all decide to make it 1000-based only, I can just add an option to Bitcoin LJR if I ever care enough (unlikely)
< wumpus>
microsconds just goes too far, I see no point in that kind of precision
< sipa>
Luke-Jr: kB
< sipa>
:p
< Luke-Jr>
sipa: kB is 1000-based always
< gmaxwell>
We generate a LOT of excess IO via logging. Lets reduce the chattyness, that is a much more robust privacy protection than decreasing timestamp precision.
< jgarzik>
There is no useful privacy argument for seconds, which does not also apply to microseconds. The matter of degrees is tiny.
< jgarzik>
+1 gmaxwell
< wumpus>
god damnit
< kanzure>
timing attacks tho
< kanzure>
oh
< wumpus>
I agree with reducing chattyness as well, but that's a different concern
< kanzure>
right, i think you mean to say "microseconds does not have any useful privacy advantage over seconds".
< Luke-Jr>
it does have a screen-space advantage, but I can't believe we're spending time discussing this.
< jgarzik>
kanzure, no. the argument being made is "microseconds goes too far" and "it makes correlation easier, to breach privacy"
< wumpus>
Luke-Jr: and it's already possible to enable microseconds if you want to use them
< sipa>
Luke-Jr: KB is kelvin byte. I don't understand where your capital K even comes from; IEC uses Ki for 1024 because ki would make it look like a unit rather than a prefix
< jgarzik>
kanzure, no privacy advantage in microseconds is being claimed - just the opposite - though the matter of degree is tiny
< wumpus>
I don't understand this insistence on enabling it by default
< jgarzik>
versus the diagnostic utility and common practice elsewhere
< kanzure>
jgarzik: understood
< Luke-Jr>
KB is the traditional standard notation for 1024 bytes, defined in at least JEDEC standards
< Luke-Jr>
sipa: ^
< kanzure>
i believe the BFOH approach was "KiBi" instead of KB or kB or KiB. (not serious here) (but i did see this proposed once by a BOFH).
< sipa>
Luke-Jr: interesting :)
< jgarzik>
wumpus, Because merging options that nearly-nobody will use is dumb
< sipa>
wumpus: being able to benchmark things after the fact
< wumpus>
jgarzik: people that want to benchmark things with precision will enable it
< wumpus>
if no one wanted it it wouldn't have been merged at all
< kanzure>
was this cli option or was this build option?
< sipa>
cli
< jgarzik>
c.f. "1-2 users" Most people will not even know about it.
< jgarzik>
cli
< kanzure>
was build option considered?
< wumpus>
why?
< kanzure>
because cli options are disagreeable for good reasons
< wumpus>
what?
< wumpus>
disagreeable for what reason?
< kanzure>
11:24 < jgarzik> default-off options should be terminated with extreme prejudice.
< wumpus>
and that doesn't apply to build options?
< jgarzik>
build option is even worse - conditional compilation creates code not even built always, but still must be maintained.
< gmaxwell>
wumpus: it's more conditional code which will be inadequately tested in one form or another, especially all the combinations will not be tested.
< kanzure>
hmm okay.
< wumpus>
yeah this is absolutely going the wrong way
< wumpus>
gmaxwell: I agree if it's a complex conditional, but come on, a logging option
< gmaxwell>
E.g. all of us will turn microseconds on; and then no-microseconds + disable wallet will end up crashing; and we won't notice until after a release. ... but yes; in this particular case this argument is not strong. I agree that it's mostly safe.
< wumpus>
but anyhow feel free to remove the option, I'm done discussing this
< kanzure>
what about zeromq as logging transport, then just use whatever temporal resolution at receive time of each message?
< kanzure>
ok nevermind then
< Luke-Jr>
kanzure: ZeroMQ is stupidly unreliable.
< wumpus>
debug logging is *not* meant as application interface
< kanzure>
Luke-Jr: fedpeg.py is just misconfigured :P
< wumpus>
it's just for debugging and troubleshooting
< wumpus>
not for processing by other applications
< Luke-Jr>
kanzure: fedpeg.py is not my only experience with ZeroMQ>
< wumpus>
if you insist you can already use -logtoconsole and pipe it to something
< gmaxwell>
kanzure: I have other expirence with ZeroMQ too. It does not correctly handle non-lossless networks.
< jgarzik>
Makes sense. zmq was built for LANs, with similar use cases to RDMA.
< Luke-Jr>
LANs aren't always lossless (hi wifi)
< morcos>
just to be clear, i love and will always use logtimemicros, but i have another annoying question
< kanzure>
so complaint is lack of application-level retries? or tcp delivery guarantees too strongly stated in zeromq docs?
< sipa>
zmq is explicitly lossy
< sipa>
like udp
< Luke-Jr>
also, ZeroMQ 4.0 is incompatible with 4.1
< sipa>
i thought?
< morcos>
wumpus: i have created two new functions EstimateApproxFee and EstimateApproxPriority in 6134
< morcos>
my plan was not to expose them via RPC
< morcos>
because i think over time the fee estimator still has some more significant evolution
< morcos>
and then in the end it might be nice to expose what we think the final interface should be
< morcos>
however sdaftuar suggested i might be able to expose them now and comment they might change or keep them hidden or something
< morcos>
they would primarily be useful in developing tests for the new functionality i think?
< morcos>
in any case, would you like me to keep them unexposed for now?
< jgarzik>
morcos, Respectfully submitted, wumpus is the release manager not The DecisionMaker - ask the crowd not the one
< jgarzik>
There's a reason why I pushed Gavin hard for multiple committers when Satoshi disappeared - Gavin was release manager, not Bitcoin Leader
< morcos>
jgarzik: sure, s/wumpus/everyone/ . although i thought i recalled him expressing an opinion about rpc api churn before
< jgarzik>
"release early, release often" :) IMO Expose them. Maybe name the methods "beta.XXXX" to emphasize they are not final. The goal is the communicate to users that churn will be forthcoming, but also expose them for testing and further field development.
< wumpus>
morcos: I don't mind, if it's useful to have them then add them
< wumpus>
morcos: if it is an unstable interface mention it in the help
< morcos>
ok thanks
< wumpus>
morcos: there's less an issue with adding new commands than changing old ones, especially adding arguments to existing calls is messy
< jgarzik>
+1
< morcos>
wumpus: yes that was my concern with the fact that this one might change or disappear later, but i'll clearly mention it
< wumpus>
morcos: you could decide to not list them in the overview, like invalidateblock/reconsiderblock
< wumpus>
(the "hidden" category)
< GitHub60>
[bitcoin] laanwj closed pull request #6981: doc: Add note about SI units to developer notes (master...2015_11_si_units) https://github.com/bitcoin/bitcoin/pull/6981
< morcos>
sipa: question re: in memory sizes.. i was thinking of making a quick pull that defaults the dbcache to 2 * maxmempool and the maxsigcachesize to 1/5 * maxmempool. or do you think there is a better way to control total mem usage?
< morcos>
i'd like to test out a couple of different configurations just to make sure there aren't any regressions with a particularly large or particularly small mempool, and i was hoping to have an idea of what a typical large configuration and typical small configuration might look like
< morcos>
if those are the default ratios, then i'd just try out 50M mempool , 300M mempool and 2G mempool (hmm maybe 4G is too much for dbcache in that case?) seems like it woudl be nice to just control 1 number
< jgarzik>
nod - in general users should not need to fiddle N settings just to get a usable configuration
< sipa>
morcos: where do you 2G and 4G numbers from from?
< morcos>
i didn't get it from anywhere, i was just goign to try out somethign thats pretty big. i think 2G is about as big as mempools got in the recent spam attack, maybe between 2-3... also easily enough for a weeks worth of backlog in txs which we had previously discussed aiming for
< morcos>
mostly i want to see if any of the code that traverses the whole mempool gets a bit slow then or if the resorts from multindex changing get slow.
< sipa>
I think 2G is too much..
< morcos>
maybe i explained wrong
< morcos>
i was going to leave maxmempool default at 300
< morcos>
and change dbache default to 2x maxmempool and maxsigcachesize to 1/5
< morcos>
and then i was going to test what happens if someone runs a particularly small or big node.
< morcos>
but i think it makes sense as jgarzik says for their other defaults to scale with setting one value (unless they explicitly set them otherwise)
< morcos>
so yeah 2G is huge, but thats sure what i'd do if i was a miner
< jgarzik>
We're well away from miners caring about maximizing fees to an Nth degree... Reliability, orphaning and other factors drive miner conservatism in settings like this...
< gmaxwell>
The benchmarks w/ the libsecp256k1 pull really highlight the impact of increasing the size of the cache on IBD time.
< morcos>
gmaxwell, if you're in IBD, we should steal maxmempool size for dbacache . :)
< sdaftuar>
+1
< morcos>
so the question is what should the defaults for each of these 3 things be. and if you want to turn 1 knob to change to big mem footprint or small, how should that work?
< jgarzik>
Related tech note - if one option sets another option, it becomes order-dependent
< jgarzik>
i.e. set dbcache before mempoolsize, and dbcache gets stomped.
< jgarzik>
Still, I think the base argument holds - users do not want to set N settings to achieve a useful config.
< morcos>
jgarzik, sure if you change the order of lines of code things get messed up
< jgarzik>
morcos, no, order of configuration file defines
< morcos>
no i dont' think so
< sipa>
ideally there is just a single setting for "amount of cache memory" which is sum of dbcache and mempool, and the flush criteron becomes "too large percentage of the memory is dirty cache entries"
< sipa>
jgarzik: no, the logic for options settings other options is earlier in init.cpp
< jgarzik>
ok, good
< wumpus>
jgarzik: in case of bitcoin core that's noto the case, only still-unset settings get overridden
< morcos>
sipa: so you still need a limit for the mempool right, which makes sense to be a fraction of your total
< morcos>
you cant just go and trim your mempool a bunch when a whole new set of txins get cached
< wumpus>
I don't think utxo cache and mempool are very related
< morcos>
yeah i mostly agree with wumpus there
< sipa>
wumpus: well they're both essentially caches of information that we'll except to need for validating the next block
< wumpus>
enough reasons to increase one and not the other. I'm fine with setting some sane defaults, but they shouldn't be linked in general
< sipa>
*expect
< sipa>
except the utxo cache is *also* a buffer of unwritten changes
< wumpus>
well a smaller mempool doens't make your node slower, a smaller dbcache does
< morcos>
sipa: the mempool doesn't serve as a cache does it
< morcos>
right
< wumpus>
so even though they're 'caches' they have a widely different reason
< wumpus>
mempool is necessary for correct functioning, dbcache is just for performance
< morcos>
wumpus, the idea of linking though is that why make the user figure out how to divide up his available N gigs of memory
< jgarzik>
User experience. What does the user need to do to achieve a useful config on both large & small boxes?
< morcos>
we should just do soemthing vaguely smart by default
< jgarzik>
nod
< wumpus>
morcos: yes as said by default you could do that, make a -memoryquota parameter or such that automatically allocates them, like automatic partitioning in an OS instlal
< wumpus>
morcos: but it should certainly be possible to set them separately
< dhill>
getrlimit and base initial memory sizes on that?
< morcos>
ok agreed
< wumpus>
dhill: that assumes everyone wants to give all their memory to bitcoind
< dhill>
naw
< wumpus>
and no, just because I have 32GB of memory now doesn't mean I want to allocate it all to my node, and have compiles crash again...
< dhill>
i didn't suggest that
< morcos>
so -maxmempool default is some fraction of memoryquota which also has a default. , but then if you individually set your maxmempool,sigcahe,or dbcache, you are not necessarily going to respect your memoryquota
< sipa>
i think it makes sense for bitcoin-qt to do something smartish to guess how much memory to use for caches
< morcos>
well if we do this memory quota idea, then we can always have smart code that sets that as a later addition if we want
< wumpus>
agreed, doing something smart is good, but basing it (trivially) on the available memory does't make too much sense imo
< wumpus>
people generally want to run their node inthe background
< morcos>
so back to my original quesiton if i set memoryquota=N. what should the 3 components each be
< wumpus>
having applications use more memory just because it is available, and no othe reason is...weird
< morcos>
.4 N for maxmempol, .5N for dbacache , .1N for maxsigcachesize ?
< wumpus>
I would be angry if firefox used more memory per tab just because I have more, that's so self defeating
< morcos>
sdaftuar was trying to convince me 2 * is too much for dbcache
< morcos>
and default memory quota to 800M which should be around the 300M mempool we had previously been planning for, and keep total usage to under a gig (ugh, i hope, what else uses memory)
< wumpus>
dbcache is nice but apart from during IBD ,not too important
< morcos>
wumpus: i like the idea that block relay over p2p is very fast. i think that depending on the relay network for efficient block relay is a bad idea, even if p2p will never be quite as fast
< wumpus>
I sort of like the idea to use the mempool quota for UTXO cache during IBD, after all the mempool will be almost empty at that time
< wumpus>
though making them linked is less nice for seperation of concerns
< morcos>
for block relay to be fast, you need decent dbcache size
< wumpus>
a better eviction policy would help too there :)
< morcos>
i have a PR for that. :) but it requires still a decent size dbcache. a 1MB block might need over 50MB of dbcache just for itself, if you knew exactly what was goign to be confirmed in that block
< wumpus>
ok cool
< jgarzik>
"having applications use more memory just because it is available, and no othe reason is...weird" <-- That's pretty much exactly how OS page cache works
< jgarzik>
And I'm looking at how memory usage changes once mempool is on disk, and nearly everything goes through OS page cache
< wumpus>
yes at the OS level you can get away wit hit
< wumpus>
e.g. page cache isn't considered 'used' memory in most computations, it can be evicted when an application really needs it
< jgarzik>
quite true if you s/application/OS/ -- though "working set" is part of the app. If working set size gets below a threshold, performance falls due to increased I/O. As such, it is part of the app, and one that scales when more memory is available.
< wumpus>
you really can't do that in a well-behaving application - yes I know it's possible to use some madvise trick to map application pages as volatile, but supporting that cross platform will be a nightmare, and it will probably still look to the user as if it uses a lot of memory unconditionally
< wumpus>
yes I wasn't talking about the page cache, but a hypothetical application that would ask the OS how much memory is available and claim a fixed part of that. It would make the egotistical assumption that a user buys more memory just to give your application more space :)
< jgarzik>
hehe
< wumpus>
it would be very nice if we could rely on the page cache though for caching the db, this whole utxo cache is mostly a workaround because there is a lot of retrieve overhead even if the data is cached in RAM (do we even know why? deserialization/allocation overhead?)
< gmaxwell>
has nothing to do with storage vs ram, you get basically the same speedups from big utxo cache even if the datadir is in tmpfs.
< morcos>
gmaxwell: oh really?? why is that
< wumpus>
that's what I mean with overhead even if the data is already in RAM
< wumpus>
storing the datadir in tmpfs is an extreme example of that, but normally the page cache would make sure at least a part of the files are still cached
< wumpus>
morcos: that's what I wonder too, probably deserialization/allocation overhead, going from storage representation to internal data structure
< gmaxwell>
I had a prior belief but pieter tells me my understanding of leveldb's behavior was not correct. (I thought leveldb did a sequential scan of the log to find any records there)
< wumpus>
another possibility is that leveldb queries are really slow even if there is no seek/io overhead - maybe because of the all-pervasive checksum verification
< sipa>
gmaxwell: the log is sequential, but is never read
< wumpus>
anecdotally when I broke off block verification a few times in gdb it always ended up in leveldb checksum verification. That's no serious way to do profiling though :-)
< sipa>
gmaxwell: it's compacted into the database at startup (which is why startup after shutdown with high dbcache is slow, it has large log to process)
< sipa>
the database however has several levels, and several (non-overlapping) files for every level
< sipa>
which means it may need to scan through multiple levels to find data
< sipa>
every file has a bloom filter, so it will quickly skip the ones that don't have the requested amount
< wumpus>
right
< sipa>
within every file, it uses a bisection search to find the data, i think
< wumpus>
yes IIRC too
< sipa>
or a bisection search in its index
< sipa>
which points to the data
< gmaxwell>
Would having a protocol message that says "please don't INV loose transactions to me, I'm mostly only interested in transactions in blocks" be a bad idea? I want to have bandwidth limited nodes that don't care about unconfirmed txn not recieve anything but confirmed transactions.