<cfbolz>
mattip: ugh, there be dragons. Only arigato can fix those. But maybe we can decide to kill asmgcc instead?
lritter has joined #pypy
_whitelogger has joined #pypy
<cfbolz>
If there is consensus to remove it I volunteer to do that
<fijal>
right, but "not using it" means "let it die" I think
<mattip>
do we know there are no downstream users?
<fijal>
since we stopped fixing problems
forgottenone has quit [Quit: Konversation terminated!]
<cfbolz>
mattip: we don't but it doesn't matter much. The alternative we use now has exactly the same features
<cfbolz>
But I'd still like arigato's opinion
jcea has joined #pypy
<tos9>
vstinner: ping? for a thing possibly tangentially interesting to this channel
antocuni has joined #pypy
marky1991 has joined #pypy
nunatak has joined #pypy
ambv has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
jacob22 has quit [Read error: Connection reset by peer]
antocuni has quit [Ping timeout: 245 seconds]
jacob22 has joined #pypy
<vstinner>
tos9: hi. i'm trying riot.im/app as a new "IRC client" but it doesn't notify me of pings :-/
<vstinner>
tos9: sorry, i missed your message and I have to go, sorry. i will try to read the backlog later
Rhy0lite has joined #pypy
<tos9>
vstinner: No worries! The ping was to ask "haven't you considered making tons of $$$ by creating a webapp?", or more seriously, "is there a thing that can be used to track results of `perf` benchmarks over time"?
<tos9>
(The latter being the thing that possibly other people care about or have thought about recently :P, though if not can take it elsewhere)
<mattip>
tos9: are you thinking of codespeed, or maybe asv? Both allow creating web sites to track benchmarks
<tos9>
mattip: Well, I know you were working on codespeed, which was why I asked it here, but what I specifically care about is vstinner's perf tool output
<mattip>
codespeed drives speed.python.org and speed.pypy.org, asv is used in numpy
<tos9>
mattip: So if either of those can take it as input I probably do care about them, but also you have to self host I assume
<mattip>
asv is simpler to do that, it has its own web server
<tos9>
mattip: (The tl;dr of what I was hoping might exist is "coveralls.io for `perf` output")
<tos9>
Or codecov.io whatever, who can even keep track at this point
<tos9>
mattip: Will have a look -- the point though isn't that I care about self hosting, it's that if you use Travis for CI, you have no persisted state across runs
<mattip>
even if there was a service to publish perf runs, you would have to push the data to them
<tos9>
:) yeah, guess could go with that -- I'm not saying it's impossible (so thanks for the ideas)
<tos9>
mattip: right, but that's the case for e.g. coverage results
<tos9>
you add a 3 line config file, click a button, and now all your future test runs track coverage over time
<tos9>
so if it doesn't exist will definitely go with that kind of architecture, just wish running benchmarks was as "sexy" as coverage then since there are about 15 different websites that will do that for the latter
<tos9>
mattip: (Thanks for the pointers by the way, never even heard of asv)
<mattip>
ahh I see. You want someone to provide the hooks for the upload service
<tos9>
mattip: Right.
<gsnedders>
my problem historically with perf benchmarking is getting sufficiently consistent results, which is hard if you don't control the hardware, which makes it hard to do as a service :/
<mattip>
right, you would need to run your own travis instance
<tos9>
so from what I've seen his thing at least warns you if it gets inconsistent results
<tos9>
whether that's still no good because everything will just show as inconsistent who knows
<mattip>
inconsistent in which way? perf does many runs to try to reach a steady state. But what happens the next time you run? How do you compare runA to runB?
<mattip>
even on cpython variances of 5% are normal
<mattip>
between runA and runB
* mattip
waiting for a link to cfbolz's paper to magically appear
PileOfDirt has joined #pypy
<cfbolz>
mattip: let's not scare people :-P
<mattip>
:)
<mattip>
antocuni (for the logs), ronan: what do you think of newmemoryview-app-level? It seems to be Good Enough for numpy's needs
<mattip>
still needs some tweaking for BigEndianStructures
ambv has joined #pypy
<tos9>
mattip: yeah, just within runs (which is what I thought "inconsistent hardware" would be referring to, bursts somewhere else on the host)
<tos9>
mattip: for across runs -- is there something particularly worse about perf than anything else there?
<tos9>
is there some other answer than "you have enough runA and runBs to make some sort of statistical deduction"?
<mattip>
tos9: be careful, if you ask too much that link will appear and you will be drawn down the rabbithole that is benchmarking
<mattip>
but seriously, the general idea is you generate some results (runA, runB, ...) and then need to decide if they are the same or different
<mattip>
one thing that can help is the variablility of the little runs inside runA (I think perf calls them loops), which can be estimated
<mattip>
if you know the distribution
<mattip>
for the sake of argument, call it gaussian (it isn't), and calculate a std dev
<mattip>
and use that to compare runA to runB
<mattip>
or you can use change detection to do runs over time and changes in the code base, which gives a noisy time series, and decide if there was a significant change
<mattip>
</spam>
<gsnedders>
one obvious example in many VMs is if the GC is triggered on a timer then a very small performance difference can cause GC to trigger when it otherwise wouldn't, and that can be a vastly larger perf change.
forgottenone has joined #pypy
marky1991 has quit [Remote host closed the connection]
marky1991 has joined #pypy
marky1991 has quit [Remote host closed the connection]
danieljabailey has quit [Read error: Connection reset by peer]
<tos9>
mattip: :D -- right ok that makes sense-ish to me I think?
<tos9>
mattip: I am definitely interested in all these questions, but I was first probably just interested in "are you saying it's any harder with perf than anything else"
<tos9>
mattip: E.g., if we for arguments sake assume there's some predictable noise in the hardware (don't laugh at me), then in theory what I'm imagining is no more or less possible with e.g. Travis than your own hosted hardware, and no more or less possible with perf than anything else, yes?
<tos9>
like however you define "performance is the same across runs" (which I hear you is a hard thing :P) is an independently hard thing, yes?
zmt00 has joined #pypy
<mattip>
on travis you have no idea what real hardware your vm or docker is running on
<mattip>
new/old how many other vms are running, what the temperature is inside the data center,
<mattip>
if you control the hardware, you can at least control some of the variables
<mattip>
and perf has tools to help you with that: setting CPU governor policy for speed throttling for instance
* tos9
nods
<tos9>
OK fine you're ignoring my assumption :P, which probably you know to be false
<tos9>
mattip: So fine imagine I gave up on Travis and provide my own hardware for running the actual benchmark
<gsnedders>
even if you have multiple "identical" boxes, you can quite often observe performance differences between them.
<tos9>
I still could imagine some webapp that just ingests that over time via POST yeah?
<tos9>
mattip: I guess I'm well off the farm of my original question (though now maybe I understand just using codespeed or asv, and why there's no thing that looks the way I "expected") -- and also I know I'm way outside my area of expertise :P so I'm sure I'm talking either basics or nonsense -- but in theory there's 2 things you could care about when benchmarking right? One is "I change my code, is the overall
<tos9>
codebase faster or slower than before", and then you use the benchmark as a proxy to answer that question for one section of the codebase (or path through it) -- and then the other is (I assume impossible?) that you care about cases where people are *not* going to tune for your specific benchmark, they just run some similar code within other larger programs, and you want to learn what to expect about that
<tos9>
behavior
<tos9>
cfbolz: I can probably have a look in a few hours if that helps?
<tos9>
But now cfbolz's question is probably more important :P
<cfbolz>
mattip: would be fantastic, thanks (no particular rush, doesn't have to be today)
<vstinner>
mattip, tos9 : yeah, i friend told me that asv is great, especially to automatically bisect perf change (speedup / slowndown) automatically
<arigato>
cfbolz: I'm for removing asmgcc entirely
<vstinner>
mattip: using the same CPython binary, I managed to get very small std dev. but one issue to track performances over time is PGO which is not deterministic :-(
<cfbolz>
arigato: OK, I'll do that then
<cfbolz>
It served its purpose ;-)
<tos9>
vstinner: (to summarize the original ask, which I think mattip is telling me is a bad thing to want, which I trust his opinion on :P, I want "upload your benchmark results as a service for OSS and we'll graph them for you over time")
<vstinner>
tos9: CodeSpeed has that
<vstinner>
tos9: pyperformance has an "upload" command, but not perf which tries to be more general than pyperformance
<mattip>
tos9: it's fine to have a service like that, maybe someone will set one up
<vstinner>
tos9: perf is more to run benchmark and store results, with a few tools to analyze and compare results
<vstinner>
tos9: there are some tools to track perfromances, but most are very basic, like don't support std dev. i'm now convinced that a raw number with no std dev is *useless*
<tos9>
vstinner: Right, the reason I asked though is that I can't imagine what a workflow for perf looks like, without additional tooling -- like, I have "regular" OSS projects with perf suites, now what? (And if the answer is "go stand up that additional tooling" obviously that's OK :P)
<tos9>
vstinner: Which sounds like "a tuned benchmark box, and an instance of codespeed (or asv)"
<tos9>
Also probably now I have cluttered this channel enough with unrelated things :P so happy to take it elsewhere.
<gsnedders>
LLVM's perf tool (LNT?) is also pretty good, IIRC. But it's been a long while since I've looked.