< bitcoin-git>
[bitcoin] gmaxwell opened pull request #10608: Add a comment explaining the use of MAX_BLOCK_BASE_SIZE. (master...size_comment) https://github.com/bitcoin/bitcoin/pull/10608
< NicolasDorier>
wumpus sipa gmaxwell jonasschnelli morcos luke-jr btcdrak sdaftuar jtimon cfields petertodd kanzure bluematt instagibbs phantomcircuit codeshark michagogo marcofalke paveljanik NicolasDorier jl2012 instagibbs achow101 : Sorry to ping you all, but since some dev are in Tokyo around end of july for some conferences, I thought it can be a good idea to have such occasion to have 1 or 2 days of pure bitcoin core
< NicolasDorier>
coding/review/idea sharing together. If you are interested, come in the channel ##tokyocore . My company Digital Garage can give us a nice place to go for coding. Note: I certaintly forgot some name, if you are contributor and wish to come, you are defintively welcome. The 25 June we freeze the list.
< cfields>
NicolasDorier: thanks for arranging!
< cfields>
gitian builders: 0.14.2 detached sigs are up
< cfields>
NicolasDorier: I just rsvpd as I was going to be there already. Thanks again.
< NicolasDorier>
yay awesome
< luke-jr>
you know you're up too late when you don't remember what the bug you were trying to fix was. ._.
< wumpus>
cfields: thanks, pushed 0.14.2 signed sigs
< jonasschnelli>
cfields: with be49a294a240ec81a901af1aaabbba2172d38dc1 I get only a single lock report... haven't looked at your code thourgh
< jonasschnelli>
*though
< jonasschnelli>
wumpus: the pollBalanceThread is mainly responsible for UI freezes...
< jonasschnelli>
If I disable that poll thread,.. stuff runs much better
< jonasschnelli>
ryanofsky: I'll give it a test (polling but keeping the cache)
< jonasschnelli>
ryanofsky: But what I don't understand is why we would/should do a TRY_LOCK on cs_main every 250ms
< jonasschnelli>
sipa: you said CCoinsViewDB::Upgrade() can be cancelled (and continued) any time. Is that correct?
< jonasschnelli>
Adding a check for ShutdownRequested() and break the while (and write the possible batch) would make sense then?
< michagogo>
Will have 0.14.2 signed up shortly
< michagogo>
Doing a bit of VM maintenance
< michagogo>
Speaking of which: does anyone know what the status is of Gitian in Xenial?
< michagogo>
ISTR that for some reason we can't actually compile on Xenial and so the guest needs to remain Trusty, but can the Gitian (LXC) *host* be Xenial?
< jonasschnelli>
michagogo: Xenial should work fine IMO
< sipa>
jonasschnelli: i guess!
< jonasschnelli>
sipa: Seems to work here...
< jonasschnelli>
sipa: Can you tell me again how I can calculate the progress in that pcursor while loop?
< sipa>
i'll make a commit later
< sipa>
thanks for doing thid
< jonasschnelli>
okay. thanks
< wumpus>
jonasschnelli: interesting! I hadn't expected that, it should only update the balance if the TRY_ succeeds not freeze on the lock update
< jonasschnelli>
I don't know why currently but I know that its much faster with that PR
< jonasschnelli>
At least on OSX
< jonasschnelli>
Would be nice if someone could profile it on Linux / Win.
< wumpus>
but the try_lock on cs_main should not *itself* cause freezes
< wumpus>
if it runs the poll, and the lock is not available, the whole point would be that it doesn't spend time
< wumpus>
if that's not how it works it seems the whole TRY_ concept is broken
< jonasschnelli>
but what if the TRY_LOCK can acquire the lock every 250 ms and do the calculation...
< wumpus>
it should only do the computation if the balance is dirty, right?
< wumpus>
so in by far most cases if it gets the lock, it immediately is supposed to notice the balance is not dirty, so doesn't recompute it
< jonasschnelli>
I tested three options, 1) master, 2) only atomic caches with polling, 3) like 2 but polling replaced with signal
< wumpus>
if only needs to be recomputed if something changed
< jonasschnelli>
wumpus: so, yes. It should only lock when the balances are dirty
< jonasschnelli>
But in my test, I sent 20 txes, so the balance was always dirty afterwards
< jonasschnelli>
I don't know why its much slower with the TRY_LOCK
< wumpus>
right
< jonasschnelli>
I guess it must be the QTimer / TryLock overhead?!
< wumpus>
well yes if it gets a transaction every 250ms, sure
< ryanofsky>
i'd think TRY_LOCK only when balances were dirty would give best of both worlds
< jonasschnelli>
I manually sent the tx... ~every 0.5s
< wumpus>
but GUI freezes happen anso to people that don't have anythign in their wallet
< wumpus>
on the first sync
< jonasschnelli>
ryanofsky: I though as well,.. but look at my profile results,.. they tell a different story
< wumpus>
so it can't be just something with transactions, though there's clearly something with transactions too...
< ryanofsky>
oh ok, i didn't see that listed as one your three tests
< jonasschnelli>
Master with only the atomic caches result in still 33% of the execution time in that poll function
< wumpus>
ok
< jonasschnelli>
While I can't find any call in my profiler running pure (all commits) of 10251
< wumpus>
33% of the time for something that gets called 4 times per second?
< wumpus>
is that a very large wallet?
< jonasschnelli>
*any call that related to the balance update
< jonasschnelli>
wumpus: not really large,.. I only did generate 1100
< jonasschnelli>
depends what "large" is
< wumpus>
it's interesting that the balance computation is so slow
< wumpus>
well if so, that's already large, it seems
< jonasschnelli>
I guess we loop 6 times over the complete mapWallet
< wumpus>
oh wow
< jonasschnelli>
for all balance types...
< wumpus>
ok, yes then I understand why things are so slow, though I still don't understand why it recomputes also the times the balance was not updated
< wumpus>
it's as if the wallet is always dirty, even though only once in 0.5s a transaction arrives
< jonasschnelli>
I'm not sure if it recomputes then...
< wumpus>
because in principle, this should result in the same load: with the 0.25s poll, it should compute every time after a transaction comes it
< jonasschnelli>
But when it does,.. it seems to take much longer.. don't know why. Maybe because of the QTimer internals
< wumpus>
with the per-transaction notification, it also does
< wumpus>
what if the transaction rate is higher though, e.g. with 10 transactions per second the polling shoudl be faster
< jonasschnelli>
Yes. Indeed.
< jonasschnelli>
The signal should filter that out... min 250ms delta or something
< wumpus>
the fixed polling is also to reduce the maximum amount of work done
< jonasschnelli>
yes.. indeed.
< wumpus>
yes, it could, though if done not carefully it means the data is always one update behind
< wumpus>
(e.g. if you would naively do "process this update only if the last was >250ms ago")
< jonasschnelli>
Could it be the QTimer overhead?
< wumpus>
no, I don't believe that
< wumpus>
not at a 4Hz frequency
< wumpus>
if you remove the code from the handler but keep the timer, it'd probably be the same
< jonasschnelli>
I guess we should then try to continue to measure the atomic cache commit on top of master (with the TRY_LOCK polling).
< jonasschnelli>
This is already much faster (I don't have numbers though)
< jonasschnelli>
s/is/feels/
< wumpus>
I'm really surprised by the result at least...
< jonasschnelli>
What could be that we falsely always set fForceCheckBalanceChanged
< jonasschnelli>
Could also be OSX only... who knows
< jonasschnelli>
But since we also have a windows issue...
< jonasschnelli>
I doubt it's OSX only
< wumpus>
btw: what is the advantage of having a dirty flag per kind of balance - won't every update to the wallet invalidate *all* of them?
< wumpus>
or are there changes that, say, only invalidate the immutable balance
< * jonasschnelli>
looking at the code again
< ProfMac>
I feel like this is the secret password to join a club: a60d7c8dde9b77e7ff547976ce37db1fe98c71833003465befe650d6bc102b6b bitcoin-0.14.1-aarch64-linux-gnu.tar.gz
< wumpus>
ProfMac: congrats!
< wumpus>
so you got it to work, awesome
< ProfMac>
I said bad things & stomped my foot a time or two.
< jonasschnelli>
ProfMac: nice one!
< ryanofsky>
i think dirty flag doesn't really make sense for qt, but it does make sense for rpc
< wumpus>
so the balance computations loop over all transactions?
< wumpus>
which is always done 6 times
< jonasschnelli>
wumpus: because each balance type has its own routine to calculate, we probably should have caches/dirty-flags per routine
< jonasschnelli>
wumpus: that's the root source!
< wumpus>
I wonder if it'd make sense to roll it into one routine, that computes all 6 balalnces in one pass
< jonasschnelli>
wumpus: +100
< wumpus>
it would be more cache friendly at least
< jonasschnelli>
I guess that was a "organic growing" issue... we added one type after another over tim,e
< wumpus>
the added overhead of, in that loop over every transaction, adding up one more number is probably neglible
< jonasschnelli>
indeed
< wumpus>
while LOCKing the wallet six times and iterating over its contents six times is probably bad
< wumpus>
in any case it's good that your PR already brings improvements
< wumpus>
then we should probably merge it
< jonasschnelli>
But now it comes...
< jonasschnelli>
We call GetDepthInMainChain for each mapWallet tx
< jonasschnelli>
Which does mapBlockIndex.find(hashBlock);
< jonasschnelli>
I tried multiple times to cache the height... but seems to be relatively difficult to do it right
< wumpus>
that would also be reduced with factor 6 if it were to be done in one go
< jonasschnelli>
Yes.
< jonasschnelli>
That would be related to 10251 but with a bigger ramification
< wumpus>
I mean *ideally* the wallet would compute balance incremementally
< jonasschnelli>
I think the pure atomic caches is a first step... even if we then can throw away all down to a sigle cache
< jonasschnelli>
wumpus: Yes.
< jonasschnelli>
It should keep basepoints at certain height/states
< wumpus>
but if we had to do a pass over the entire wallet it's by far best to do it only once
< wumpus>
right, it's extremely complex to get right, which is why it was never done
< jonasschnelli>
Indeed, ... and we should probably do in – as most things – in a background thread (GUI wise)
< jonasschnelli>
If the node coms where in a background thread, nobody would tackle this
< wumpus>
e.g. there are so many small things that can change how a transaction is counted
< wumpus>
right, if the computation was not in the GUI thread, it'd be much less bad
< wumpus>
there would be some unnoticable lag
< ryanofsky>
if you get rid of the dirty flag and compute balance with each transactions that effectively would move computation to a background thread. still might lock up gui because of holding cs_main though
< wumpus>
instead of the thing hanging and complaining
< wumpus>
ryanofsky: but that'd be crazy, it'd make receiving transactions quadratic
< wumpus>
ryanofsky: if every transactions received causes a scan over the entire wallet
< wumpus>
ryanofsky: that's why the dirty flag exists
< jonasschnelli>
Balances should only be calculated if the user wants them
< ryanofsky>
yes, what i meant above by dirty flag makes sense for rpc but not gui
< jonasschnelli>
In the GUI, thats a bit different
< wumpus>
ryanofsky: right now it makes sense for the GUI too, as it's only polled 4 times per second
< wumpus>
ryanofsky: so it's naturally rate limited
< wumpus>
ryanofsky: if you get 400 transactions per second, it still only computes 4 times per second
< ryanofsky>
for gui users who are creating more than 4 transactions per second i guess
< jonasschnelli>
Modern GUI frameworks often poll/call updates when the according element is visible
< wumpus>
or "receiving"
< wumpus>
this happens during initial sync
< wumpus>
if you have a full wallet
< ryanofsky>
oh, that's true. dirty flag makes sense there too
< wumpus>
people always ignore initial sync for some reason, while that is the most common source of hangs, after initial sync the GUI is pretty ok
< ryanofsky>
anyway to be clear i'm not suggesting getting rid of dirty flag
< wumpus>
(initial sync or: more often, catching up after not running the node for a few days)
< ryanofsky>
i'm just saying one way to move balance computation to "background thread" instead of gui thread is to compute it on tx notifications in cases where that makes sense
< wumpus>
sure
< cfields>
jonasschnelli: the lock report is printed when threads destruct. So, mostly at shutdown
< jonasschnelli>
ah.. okay.. then my fault
< jonasschnelli>
cfields: have you seen the crash on OSX in current master over gitian
< jonasschnelli>
It must be one of the last 10 commits
< bitcoin-git>
[bitcoin] jnewbery opened pull request #10612: The young person's guide to the test_framework (master...templatefunctionaltest) https://github.com/bitcoin/bitcoin/pull/10612
< Victorsueca>
anybody else having issues when cross-compiling 0.14.2 for windows in ubuntu 14?
< Victorsueca>
sipa: yeah, more specifically with miniupnpc apparently
< Victorsueca>
see the 0bin
< sipa>
oh, sorry i missed the link
< spudowiar>
o/ jnewbery
< cfields>
jonasschnelli: ok, i think i have the crash worked out. I've taken the opportunity to learn more about asm
< sipa>
cfields: my condolences
< cfields>
sipa: heh. my mistake for not diving in decades ago
< cfields>
sipa: clang is nice enough to use ebx for its stack canary, which cpuid clobbers. I'm trying to understand the PIC/ebx interaction fully before PRing something
< cfields>
just tell me how deep i'm going to have to dive? :)
< sipa>
hint: we'll need different asm code for 32bit and 64bit
< cfields>
full canary address doesn't get restored via mov?
< sipa>
indeed
< sipa>
see why?
< cfields>
vaguely. I'm so green I'm having trouble distinguishing values from addresses. Investigating.
< wumpus>
oh, so the compiler isn't being told correctly what registers are being clobbered?
< sipa>
wumpus: if you tell the compiler yoi clobber ebx, it will complain saying that it needs ebx for PIC
< sipa>
and fail to compile
< sipa>
so the code there manually saves and restores ebx
< sipa>
i believe that is fixed in a later gcc
< cfields>
gcc5
< cfields>
i suppose it needs to be movq/rbx for 64bit instead?
< sipa>
indeed
< wumpus>
that seems kind of stupid, now you need to write the code in mind with the knowledge what every compiler might use for special things
< sipa>
wumpus: indeed :(
< sipa>
cfields: tmp needs to be a 64-bit int on 64-bit systems
< cfields>
sipa: i've seen lots of rantings about gcc's x86 pic handling in the past, i'm beginning to understand the hatred now
< wumpus>
(I suspect the cookie in ebx is not part of any official ABI convention, at least)
< cfields>
well if you list ebx as a clobber, it sticks the cookie somewhere else. So my "fix" was totally accidental and wrong (and would ofc break on x86 pic).
< sipa>
cfields: oh you're able to list ebx as a clobber?
< sipa>
and it compiles?
< wumpus>
in its defense, handling PIC on x86 32 bit is a nightmare
< sipa>
in that case, we should probably do that on x86_64
< wumpus>
modern architectures, and even less modern ones, have been designed with PIC in mind, but x86 never was
< cfields>
sipa: yea
< sipa>
you want to write a pr?
< cfields>
sipa: sure, will be a while though. Still reading/experimenting/learning
< sipa>
cfields: the issue is that the 32-bit ebx register and the 64-bit rbx register occupy the same space
< sipa>
ebx is just a "view" of the lower 32 bits of rbx
< cfields>
that makes sense
< sipa>
and the "b" constraint can either mean ebx or rbx, depending on the size of the variable
< sipa>
there are 8-bit and 16-bit views too
< cfields>
so the compiler adapts the mov to the register size?
< ProfMac>
lol. At this point I'm so fatigued, I don't even know my own name. It does seem pretty slow. Have a look at the scripts and see if you spot any trouble.
< sipa>
ProfMac: machine without hardware virtualization?
< achow101>
ProfMac: perhaps try using kvm instead of lxc? it will need to run on hardware and not in a vm though
< Victorsueca>
achow101: that seems to work \o/
< Victorsueca>
thanks
< ProfMac>
Probably just not configured correctly. One of the scripts creates the virtualbox, so that can be corrected. 5.7 GiB RAM, Intel Core 2 Duo CPU E8400 @ 3.00 GHz x 2 Ubuntu 14.04 LTS 2.0 TB online.
< ProfMac>
what kind of build times to others have? My impression was that it build the entire Trusty system from source during the process.
< ProfMac>
I don't know how to peek inside gitian-builder/target-trusty-amd64
< jonasschnelli>
ProfMac: my build times are at bitcoin.jonasschnelli.ch
< ProfMac>
Could I just make a new Trusty VM and do the build there? If I restore a "bare" snapshot before each build, what do I give up?
< ProfMac>
jonasschnelli, thanks, but I get a 403 forbidden when I drop that into an address bar.
< sipa>
cfields: also, for reusable code (if we want to use hw sha instruction or something, you can use "xchg %1, %%ebx" instead of mov, and get the real ebx out
< cfields>
sipa: heh, i meant to ask about that. that's what gcc does internally
< sipa>
not suggesting to do that here, but it's hardly harder to write a generic "cpuid number -> give me a tuple of 4 uint32_t" function
< cfields>
sipa: you mean just drop the special case and always stash b?
< sipa>
cfields: i think your PR is fine
< sipa>
but maybe later if we ever have multiple things to query cpuid for
< cfields>
sipa: well i'm pretty sure I whined about it not being generic in your original PR :p
< sipa>
cfields: we're using gcc 4.8 for win and linux builds, which i think has a sufficiently mature LTO implementation; any opinion on using it in release builds?
< sipa>
i don't know the status of LTO in clang
< cfields>
sipa: my main concern is that we could see a discrepancy between release binaries and what devs run
< sipa>
cfields: why?
< cfields>
as even with gcc7, linking still takes a long time. I don't think we could enable it by default, it'd have to be in gitian
< sipa>
it does take less memory too
< sipa>
maybe a generic "optimized build" available from configure and one that isn't, and gitian uses optimized
< cfields>
(not opposed, just considering the downsides)
< sipa>
it's a good point to bring up
< sipa>
i like that it categorically removes the concern about whether code needs to go in a header or not :)
< cfields>
sipa: yea, that sounds reasonable
< cfields>
heh
< cfields>
speaking of which, i worked on pre-compiled headers a few days ago. Shaved roughly ~30% off of build time
< cfields>
sipa: another one of the big benefits is that our deps can be lto'd as well. I suspect static lto'd qt would be a big win for filesize
< cfields>
but again, that makes linking take forever
< * cfields>
kicks off an lto'd qt build out of curiosity
< bitcoin-git>
[bitcoin] luke-jr opened pull request #10615: RPC: Allow rpcauth configs to specify a 4th parameter naming a specific wallet (master...multiwallet_rpc) https://github.com/bitcoin/bitcoin/pull/10615