<antocuni>
basically, I wanted to see what happens if you do "pypyjit.set_param('off')" after some iterations
<haypo>
antocuni: by default, performance tries to be nice and chooses parameters for you, but in my case, i wanted to always use: 10 processes, 0 warmup, 250 values (per process)
<antocuni>
so, in the plot you pasted above, there are 10 different lines, one per process?
<haypo>
antocuni: yes, one per process
<haypo>
antocuni: i expected to see different results per process, and it's the case, even if i didn't reboot between each run
<antocuni>
right
<haypo>
antocuni: using --skip, you can ignore the first N values per run. on the go benchmark, one run was faster. it seems that we reached the steady state after 58 values: http://www.haypocalc.com/tmp/go.png
<haypo>
haypo@selma$ python3 doc/examples/plot.py ~/pypy_p10_w0_n250.json.gz -b go --split-runs --skip=58
<haypo>
at least, it confirms that it's good idea in perf to use multiple processes :-D
<antocuni>
yeah, indeed. Do you see such a high variation also for CPython, or only pypy?
<haypo>
antocuni: various between two processes? some performance microbenchmarks have medium variation between runs on CPython, but I removed microbenchmarks yesterday :-D
<antocuni>
wow, you are doing a very nice job, congrats
<haypo>
to be honest, i didn't looked at CPython individual runs in depth
<haypo>
antocuni: well, PyPy benchmark already produced a JSON file, but only stored the result
<haypo>
antocuni: for me, it's important to store *all* data to allow deep analysis later
<haypo>
antocuni: for example, i modified the "perf stats" command to count the number of outliers. it's now possible to compute that on *old* JSON files, without having to recompute these data
<haypo>
which is nice since it takes 1 hour to compute a JSON file on CPython :)
<haypo>
performance_results/2017-03-31-cpython/ contains 44 files, so it took something like 44 hours to compute all data :-p
<antocuni>
I run the telco benchmarks three times: 1) normally; 2) I disabled the jit after 100 iterations; 3) I called gc.collect *before* each iteration
<antocuni>
(I wrote my own hackish runner because I could not find a way to modify perf and/or pyperformance to do what I wanted
<antocuni>
by looking at the graph I think we can see that:
<antocuni>
1) the spikes are caused by gc collections, NOT jit compilations
<antocuni>
2) if we disable the JIT, the performance drop. This is a bit unexpected: maybe it means that there is some guard which constantly fail and thus cause the JIT to compile again and again the same code paths?
* antocuni
tries an additional run with both gc.collect and jit-off-after-100
<haypo>
antocuni: ah, i forgot to explain you that you don't need performance to run bm_telco.py. it's a standalone script. the script directly accepts -w0 -p10 -n250 options
<haypo>
antocuni: i'm not sure that calling gc.collect() is "correct"
<antocuni>
haypo: sure, it is not correct at all
<haypo>
i should describe somewhere what i want from performance
commandoline has joined #pypy
<antocuni>
but it's a very different thing than JIT warmup: in case of the JIT, we can assume that after a (maybe arbitrarily long) warmup phase the performance stabilizes
<haypo>
in short, benchmarks should be representative of "real" applications and be run as users run real code
<antocuni>
if it's the GC, it means that the GC cost should be spread all over the iterations
<haypo>
antocuni: the question is more why a GC collection is needed. GC is only supposed to be required to break cycles, no?
<antocuni>
haypo: in pypy not at all
<antocuni>
the gc runs constantly
<haypo>
so, it looks closer to a bug in the decimal module
<haypo>
antocuni: i mean, if you don't have cycles, the GC is supposed to do nothing, no?
<antocuni>
no
<antocuni>
pypy doesn't have refcount
<haypo>
hum, ok
<antocuni>
the only way to claim memory is by running the GC
<antocuni>
and we have two phases: minor collections (which run often, probably multiple times during the execution of a benchmark)
<antocuni>
and major collections, which are slower and runs less often
<antocuni>
I bet that the spikes we see are because a GC major collection happen to run every N iterations
<haypo>
antocuni: yeah, i wouldn't be surprised to see a correlation between GC major collection and spikes
<antocuni>
yeah
<antocuni>
basically, what I wanted to say is that in this particular case, the benchmark probably DO warm up, and so it is fine to take the average after N iterations
<arigato>
("the only way to reclaim memory is by running the GC" => note that 80% or 90% of the objects are dead already at a minor collection, and the minor collection algorithm needs some steps per *alive* object, making reclaiming free objects exactly zero cost)
<antocuni>
arigato: sure, I was talking about the GC in general, not only major collections
<arigato>
(i.e. it's more efficient than calling malloc() and free() even without counting the overhead of the reference counter in CPython)
<arigato>
just want to make sure haypo doesn't get the wrong impression :-)
<antocuni>
ok
necaris has joined #pypy
necaris is now known as necaris[away]
<arigato>
"the GC" inside CPython is a very particular beast from the general GCs elsewhere
necaris[away] is now known as necaris
<antocuni>
anyway, all of this is another hint that we cannot use a single number to represent "pypy speed for a given benchmark"
* arigato
didn't fully read the conversation
<haypo>
antocuni: "it is fine to take the average after N iterations" hum, it's more a requirement than just being fine
<haypo>
antocuni: i want to include spikes in the result
<haypo>
antocuni: we had long and painful discussions about mean vs median for exemple, and at the end, i decided to choose mean
<antocuni>
I know, I'm not saying that it's wrong
Rhy0lite has joined #pypy
<antocuni>
but for example, it means that you have to distinguish "JIT spikes" vs "GC spikes" to compute the warmup
<haypo>
antocuni: ah, it should be easy to distinguish them: just disable the GC as you did, no?
<antocuni>
yes, although you cannot "disable" the GC (else you run out of memory pretty quickly). What I did was to force a major collection before the run, to make sure that it didn't happen by chance inside the benchmark
<haypo>
antocuni: ah yes, that's different and more reliable :)
<haypo>
antocuni: gc.disable() can also behave differently :-)
<antocuni>
this works because this particular benchmark does not allocate much memory. If a benchmark allocates a lot of memory, it might cause a full major collection on its own. But running gc.collect() before ensures that it always start from a "clean state"
<antocuni>
another way to see this is: in PyPy, in general, allocating memory costs a bit of time, but the cost is delayed and you see it only when the major collection occurs
<antocuni>
(the allocation itself is very quick; the "cost" is given by the fact that the more you allocate, the more often the GC runs)
<antocuni>
although of all this is very imprecise of course. In particular, if the object dies quickly enough, it is collected in the minor collection and thus does not affect the speed of the next major collection
<antocuni>
but in general, it is correct enough to say that "the more you allocate, the more you spend later in the GC"
girish946 has quit [Ping timeout: 255 seconds]
vkirilichev has joined #pypy
<arigato>
I think it's wrong to call gc.collect() and not put it in any time
<antocuni>
arigato: I agree. This was just to show that the spikes were caused by the GC, not by the JIT
<haypo>
arigato: sorry, what do you mean by "not put it in any time"
girish946 has joined #pypy
brechtm has quit [Read error: No route to host]
brechtm has joined #pypy
johncc3_ has quit [Ping timeout: 255 seconds]
<haypo>
d
<kenaan_>
arigo default b13b7c8e4fe7 /rpython/translator/c/src/debug_print.c: Call fflush() after writing an end-of-section to the log file. Hopefully, this should remove the constant problem t...
<haypo>
so: gc enabled, no gc.collect(), ASLR enabled, ignore warmups, etc.
<haypo>
i also write it for ronan who wants to include warmups :)
johncc3_ has joined #pypy
vkirilichev has quit [Ping timeout: 240 seconds]
brechtm_ has joined #pypy
brechtm has quit [Ping timeout: 252 seconds]
yuyichao has quit [Ping timeout: 258 seconds]
<mattip>
arigato: ping (buildbot own test failures)There are some own test failures on default, linux 32/64 - stress tests
<arigato>
mattip: ouch
<arigato>
looks likely to be branch-prediction
<mattip>
likely, but painful to find
DragonSA has quit [Quit: Konversation terminated!]
<mattip>
I just wanted to point it out before the last good version goes off buildbot reports
<mattip>
s/reports/summary/
<arigato>
ah, no
<arigato>
found it
<arigato>
yes, thanks
yuyichao has joined #pypy
inhahe_ has quit [Read error: Connection reset by peer]
inhahe_ has joined #pypy
<mattip>
maybe progress - in gdb I found a live PyObject with refcount== REFCNT_FROM_PYPY
<mattip>
which AFAICT should not happen
<kenaan_>
arigo default f0ba81de1e4f /rpython/jit/backend/x86/codebuf.py: Fix for untranslated tests
* mattip
bye
mattip has left #pypy ["bye"]
<arigato>
mattip (logs): it happens if there is no ref from CPython, only from PyPy
antocuni has quit [Ping timeout: 252 seconds]
marky1991 has joined #pypy
realitix has quit [Quit: Leaving]
vkirilichev has joined #pypy
John has joined #pypy
<John>
hi all
<arigato>
hi
<John>
it seems that when reading from the sys.stdin, even if i use os.read(sys.stdin.fileno(),4) i cannot guarentee buffering is turned off
<John>
and that only 4 bytes will actually be read
<danchr>
odd; I get an exception saying "can't use a named cursor outside of transactions" when running my django app with psycopg2cffi
<danchr>
if I just delete the check that raises the exception, everything appears to work
<arigato>
John: os.read(_, 4) should never return more than 4 bytes, AFAIK
<John>
arigato: it only reads 4 bytes, but it seems to read more from _ than 4, and presumably stores them in a buffer somewhere
<John>
using "python -u" when running the program doesn't seem to turn buffering off either
<arigato>
os.read() is directly calling the OS read() function. if that fails, then that's strange
<John>
It's not failing, it's just buffering
<John>
i will make a demo :)
<arigato>
please do, I don't believe you :-) i.e. I think there is a different issue
<John>
hahah, most likely :)
<John>
So far, my demo has been unable to reproduce :P
<nimaje>
John: is stdin the terminal or something else?
<John>
The script is being run via "cat myfile | python myscript.py"
<arigato>
how do you know there is buffering or not, in this situation?
<arigato>
the pipe alone contains an OS-internal buffer, and cat also reads and writes in chunks
<John>
right, right - and that's probably OK
<John>
myscript will read the first 4 bytes of whatever it's getting via stdin, and then pipe the rest to a subprocess
<arigato>
ah
<John>
and what "the rest" is sees to be somewhat random. usually it's the 5th byte and on, but occasionally it's something else
<arigato>
do you know it's something else from later in the pipe, or could it also be from earlier---i.e. os.read(_, 4) in the parent returned less than 4 bytes?
jamesaxl has quit [Read error: Connection reset by peer]
vkirilichev has quit [Ping timeout: 240 seconds]
<nimaje>
John: why don't you use 'tail -c +4'?
<nimaje>
s/+4/+5/
jamesaxl has joined #pypy
<John>
arigato: Good idea, but I don't think that's it as i'm checking that the four bytes are what they are supposed to be
<arigato>
ok
<John>
nimaje: i'm reading the first 4 bytes from stdin, and if it's a GZIP file, subprocess gzip, if it's a XYZ file, subprocess xyz, etc
<John>
So it's not about avoiding those four bytes, it's about reading them, then deciding what to subprocess :)
<nimaje>
ok, does the subprocess don't need those bytes?
<John>
it does! :D That was the hardest bit about this code
<John>
the subprocess is: " { printf "abcd"; cat; } | gzip "
<John>
Which reliably gives gzip the four bytes 'abcd' before the stdin
<John>
But the stdin is not reliably the 5th + btyes
<John>
It's really really ugly, but without a way to read from the stdin without changing it, or even see whats in python's buffers, it's really tricky to find a better way
girish946 has quit [Quit: Leaving]
<John>
I found the cause! And it's super weird xD
<John>
Somehow, making a call to subprocess is the problem
forgottenone has quit [Quit: Konversation terminated!]
untitaker has quit []
untitaker has joined #pypy
arigato has quit [Read error: Connection reset by peer]
arigato has joined #pypy
vkirilichev has quit [Ping timeout: 240 seconds]
brechtm_ has quit [Remote host closed the connection]
asmeurer__ has joined #pypy
asmeurer__ has quit [Client Quit]
forgottenone has joined #pypy
yuyichao has quit [Ping timeout: 258 seconds]
brechtm has joined #pypy
tbodt has joined #pypy
brechtm has quit [Ping timeout: 240 seconds]
yuyichao has joined #pypy
johncc3_ has quit [Ping timeout: 260 seconds]
<nanonyme>
arigato, I'm personally a bit annoyed subprocess interface didn't have stdin=None, stdout=None, stderr=None to mean that they are all redirected to dev null
<nanonyme>
(but instead None == same as default)
necaris is now known as necaris[away]
necaris[away] has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<nanonyme>
But yeah, no one said subprocess module was exactly good
jamesaxl has quit [Read error: Connection reset by peer]
jamesaxl has joined #pypy
johncc3_ has joined #pypy
antocuni has joined #pypy
necaris has joined #pypy
vkirilichev has joined #pypy
<kenaan_>
antocuni extradoc c812f32e4682 /talk/ep2017/the-joy-of-pypy-jit.txt: my ep2017 proposal
<antocuni>
arigato: ^^^ I submitted this proposal before I forget the deadline. If you have feedback or suggestions, please tell me :) (maybe tomorrow or by email because now I'm off)
antocuni has quit [Ping timeout: 268 seconds]
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
vkirilichev has quit [Ping timeout: 240 seconds]
yuyichao has quit [Ping timeout: 260 seconds]
yuyichao has joined #pypy
asmeurer__ has joined #pypy
tbodt has joined #pypy
tbodt has quit [Read error: Connection reset by peer]