#pypy on 2020-10-13 — irc logs at freenode.irclog.whitequark.org

2019-08-29 19:33 cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin

02:40 Orimendix has joined #pypy

03:03 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/5587 [py3.6]

03:13 jacob22 has quit [Read error: Connection reset by peer]

03:17 jacob22 has joined #pypy

03:43 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5263 [py3.6]

04:00 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/649 [py3.6]

05:00 <bbot2> Started: http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/3051 [default]

05:44 redj has quit [Ping timeout: 272 seconds]

06:36 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-aarch64/builds/697 [py3.6]

07:08 <mattip> any tk/cffi experts want to take a look why idle is failing tests?

07:08 <mattip> http://buildbot.pypy.org/summary/longrepr?testname=unmodified&builder=pypy-c-jit-linux-x86-64&build=7262&mod=lib-python.3.test.test_idle

07:12 <bbot2> Success: http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/3051 [default]

07:14 Orimendix has quit [Quit: Leaving]

07:27 oberstet has joined #pypy

08:31 dmalcolm_ has joined #pypy

08:33 dmalcolm has quit [Ping timeout: 260 seconds]

09:23 vstinner has joined #pypy

09:24 <vstinner> hello. i'm working on adding a "geometric mean" value to the compare_to command of the pyperf project: https://github.com/psf/pyperf/pull/79

09:24 <vstinner> i'm not sure that i'm computing the geometric mean of the right values

09:25 <vstinner> i'm computing the means of "speeds": ratios (benchmark mean) / (reference benchmark mean)

09:25 <vstinner> a benchmark suite is made of multiple benchmarks, and i would like to get a single number to easily compare two suites

09:26 <vstinner> in my PR, i consider that a geometric mean > 1.0 means "faster" whereas speed.pypy.org says that a geometric mean of 0.24 means "faster"

09:42 <fijal> vstinner: what's the distribution you assume?

09:49 <vstinner> fijal: i don't understand your question, sorry. distribution of what?

09:59 <cfbolz> vstinner: small numbers are better, no?

09:59 <cfbolz> 0.5 means you are 2x faster

10:04 <pmp-p> if timing a finite number of instructions i'd go +1 wiht cfbolz

10:05 <pmp-p> if 0.5 means "half the time" not twice computational amount in a given time

10:08 <fijal> vstinner: distribution of consecutive runs

10:08 <fijal> if you are doing a geometric mean, you are assuming something about the probability distribution, no?

10:16 <vstinner> cfbolz: maybe i'm doing it backwards :)

10:17 <vstinner> i'm computing: (reference benchmark mean) / (benchmark mean)

10:17 <vstinner> oh, in the https://www.cse.unsw.edu.au/~cs9242/11/papers/Fleming_Wallace_86.pdf paper, it's the invert: (benchmark) / (reference benchmark)

10:20 <cfbolz> vstinner: above you wrote " ratios (benchmark mean) / (reference benchmark mean)"

10:20 rfgpfeiffer has joined #pypy

10:23 oberstet_ has joined #pypy

10:26 oberstet has quit [Ping timeout: 265 seconds]

10:26 <vstinner> cfbolz: it will save your time if you consider that i have no idea of what i am doing :-D

10:27 <cfbolz> :-)

10:34 rfgpfeiffer has quit [Ping timeout: 240 seconds]

10:44 <vstinner> ok ok, i fixed my PR so now geo mean < 1.0 means faster and geo mean > 1.0 means slower, as on speed.pypyp.org

11:02 <Hodgestar> Using the geometric mean to aggregate benchmark results was deprecated in 1988, but it is still popular. Possibly because no one could agree on what weights to give the individual components. ;)

11:03 <vstinner> Hodgestar: depreacated ok, but replaced with that?

11:04 <vstinner> Hodgestar: i'm trying to give a single value to summarize N benchmarks of a benchmark suite, when comparing two benchmark suites results

11:10 <Hodgestar> vstinner: The geometric mean is an odd way to combine run times together, right? It multiplies them (X1 * X2 * ...) where a real program would add the run times of the different things it does (X1 + X2 + ...). But even if the individual benchmarks are somehow representative, other programs will do different amounts of those sorts of work, so one would ideally want to add weights (w1 X1 + w2 X2 + ...) but the weights would be

11:10 <Hodgestar> different for each use case.

11:10 <Hodgestar> vstinner: Sorry that isn't a suggestion or a criticism -- I am just thinking about the problem out loud.

11:11 <vstinner> Hodgestar: it's not absolute timings in seconds, but normalized values

11:16 <Hodgestar> vstinner: I'm aware. The normalizing is part of the issue. E.g. If we set PY36 to X1 = 1 and PY37 to X1 = 2, that sweeps under the rug the issue of what fraction of time do programs actually spend doing X1.

11:17 <vstinner> Hodgestar: currently, people throw 60 lines of benchmark results: some are faster, some are slower. honestly, even if i'm used to benchmarking, i have no idea if overall if it means that the change makes Python faster or slower

11:18 <vstinner> Hodgestar: i expect that the geometric mean will help me to take a decision

11:18 <vstinner> i don't know what is the geometric mean when 10 benchmarks are 1.01x slower but 1 benchmark is 2.0x faster. overall, is it a good thing or not? :)

11:19 <mattip> weights would take into account how common the faster action actually is in real life

11:19 <mattip> but there is no "real life" for python

11:19 <mattip> so just weighting everything equally is as good as any other metric

11:20 <mattip> unless you have some heuristic to say benchmark A is ten times as important as benchmark B

11:20 <vstinner> mattip: i put a weight of 0 in pyperformance microbenchmarks that I consider as non relevant/useless: i simply removed them :-D

11:20 <Hodgestar> vstinner: Lol. Nice. :)

11:22 <mattip> if you look at speed.pypy.org and what you do all day is readthedocs building sphinx documentation, then PyPy is not your tool

11:22 <mattip> but if you do templating then definitely, PyPy is fantastic

11:22 <Hodgestar> vstinner, mattip: Maybe people could be allowed to specify their own weights, or there could be a few different weightings that are meant to represent common scenarios (but that sounds like a lot of work and complication for uncertain gains).

11:23 <vstinner> mattip: i don't want to have to both with weights

11:23 <vstinner> Hodgestar: i wrote pyperf for people running a benchmark in 5 min and then tweet the result. for people who have no idea of what they are doing

11:24 <vstinner> that's why pyperf writes explicitly "faster" and "slower". previously, people (including me) read a benchmark result backwards :)

11:24 <vstinner> ah, about the case 10 benchmarks slower (1.01x slower) and 1 benchmark faster (2.0x faster), I got my reply: geometric mean that overall, it's faster :)

11:24 <vstinner> >>> statistics.geometric_mean([1.01]*10+[1/2.0])

11:24 <vstinner> 0.9474627803687371

11:25 <vstinner> i don't expect people to only rely on the geometric mean, it would be an _additional_ information

11:26 <vstinner> i'm writing on the geometric mean when i saw this file: https://bugs.python.org/file49511/bench_results.txt long list of "349 ns +- 710 ns" values, it's hard to read

11:26 <vstinner> by the way, the std dev is very large! 710 ns for a mean of 349 ns! i asked the author if there is something wrong with Python or the benchmark

11:27 <vstinner> context: https://bugs.python.org/issue41972 bytes.find() is inefficient for a specific pattern, it's about fine tuning the Bloom Filter

11:29 <mattip> so in most cases all you want is some relative measure of "did this make things better or worse"

11:29 <mattip> and if the answer is "both" then

11:30 <vstinner> mattip: lol

11:30 <vstinner> "did this make things better or worse" => "yes" :-D

11:30 <mattip> the change probably has some heuristic that is tuned, so provide a lever for people to tune it

11:31 <mattip> e.g. gcc flags for all kinds of optimizations and projects that explore the optimization space and choose the best ones

11:33 jacob22 has quit [Read error: Connection reset by peer]

11:34 <mattip> measuring something in ns sounds fishy to me, the whole benchmark is probably testing things like cpu caches and opcode pipelining

11:36 <mattip> "measureing something high-level like bytes.find()"

11:36 <vstinner> for me the right part is that in the same process, the benchmark produces very different values:

11:36 <vstinner> - value 1: 9.72 us (+138%)

11:36 <vstinner> - value 2: 364 ns (-91%)

11:36 <vstinner> - value 3: 2.16 us (-47%)

11:36 <vstinner> sorry, the *strange* part

11:37 jacob22 has joined #pypy

11:37 <vstinner> mattip: i tried but failed to suggest to people to stop bothering about nanoseconds

11:37 <vstinner> mattip: but at least, i tried to make such benchmark a little bit more reliable :-p

11:40 <vstinner> not everybody on earth is connected to #pypy, most people run nonsense benchmarks :-D

11:40 <mattip> benchmarks and reliable in the same sentence! cfbolz has a paper for you

11:40 <mattip> https://arxiv.org/abs/1602.00602v1

11:42 <mattip> numpy uses asv and has a ~20 minute benchmark suite.

11:42 <mattip> Every time I try to run it, I get wildly different results

11:43 <vstinner> mattip: haha, i read it

11:43 <vstinner> i hate this paper

11:43 <mattip> the paper or the idea that benchmarking is unreliable?

12:03 ctismer_ has joined #pypy

12:04 ctismer has quit [Ping timeout: 256 seconds]

12:04 ctismer_ is now known as ctismer

12:19 <Dejan> so instead of benchmarking we just say "benchmarking is unreliable" and we give up :)

12:20 <Dejan> i agree with the statement ofc

13:27 <mattip> I see two uses for benchmarking

13:27 <mattip> short term a/b testing for a comparing two algorithms in a systems test

13:27 <mattip> and

13:28 <mattip> long-term stability testing on a set of benchmarks on a fixed machine (like speed.pypy.org) where you can collect statistics over time and try to find regressions/improvements

13:33 <vstinner> mattip: i hate the truth that it's not possible to benchmark anything :-D it's not possible to get reliable and reproducible benchmark results

13:34 <simpson> Worse (and ironically), benchmarking is possible on older hardware designs, but we long since have stopped using that hardware because it's relatively slow.

13:38 <vstinner> simpson: are you thinking at HyperThreading, Turbo Boost and things like that? both can be disabled (more or less easily)

13:52 lritter has joined #pypy

13:53 <simpson> vstinner: I'm thinking further back than that, to the switch from in-order to out-of-order execution, and the switch from constant-access RAM to caches.

13:55 rfgpfeiffer has joined #pypy

14:57 <gsnedders> (though the value of disabling things like SMT and CPU freq scaling is debatable, given then you're testing in a configuration nobody is actually running in prod)

15:00 <simpson> Right. People shop for software like they shop for plumbing or screws in a home-improvement store; they expect hard qualitative numbers which aren't just internally/relatively correct, but which give them some objective hint as to whether it'll perform well enough for their needs.

15:06 <gsnedders> CPU freq scaling especially is significant, though, given so much performance nowadays is gated behind it, especially in multi-threaded/process situations

15:06 <gsnedders> but yeah, it's useful to give _some_ indication, but it can be totally misleading

15:52 <fijal> simpson: we always struggled with "how many cores are actually running the program"

15:52 <fijal> because depending on that, your settings should likely be quite different

15:53 <simpson> fijal: Yeah! And before that, it was "how much L2 do you have?" and etc.

15:54 <fijal> that we read

16:17 Orimendix has joined #pypy

16:26 dustinm- has quit [Ping timeout: 264 seconds]

16:30 dustinm has joined #pypy

16:38 jcea has joined #pypy

17:16 rfgpfeiffer has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

17:28 camelCaser has joined #pypy

17:50 jacob22 has quit [Read error: Connection reset by peer]

17:53 jacob22 has joined #pypy

18:19 oberstet_ has quit [Remote host closed the connection]

19:34 jcea has quit [Ping timeout: 260 seconds]

20:39 lritter has quit [Quit: Leaving]

21:30 <mattip> somehow calling pypy -mpytest extra_tests/cffi/embedding/test_basic.py in a virtualenv on win32 is broken

21:30 <mattip> all the tests print "function _add1_cffi.add1() called, but initialization code failed. Returning 0.

21:31 <mattip> (where _add1_cffi.add1 is changed in each test)

21:50 jcea has joined #pypy

21:53 <nulano> I'm having trouble getting it to find mt.exe

21:53 <nulano> but I can reproduce it by hardcoding in the path

21:53 <nulano> before that it prints 'import site' failed

21:53 <nulano> and pypy_setup_home() failed

21:54 <nulano> (note that I did not download the latest build so it might be slightly different)

21:57 commandoline has quit [Quit: Bye!]

21:58 <nulano> on win64 the behaviour is slightly different

21:58 <nulano> outside a virtualenv it passes

21:58 <nulano> in a virtualenv, there is a compile error

21:58 <nulano> looking at the code, I think there is a missing #ifdef WITH_THREAD

21:59 <nulano> suggesting that WITH_THREAD is not set if running in a virtualenv, but is set otherwise

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-win-x86-32/builds/2485 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/6058 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-s390x/builds/1596 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-s390x/builds/1417 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5264 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7263 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/5588 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/8385 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/650 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/7270 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-aarch64/builds/698 [default]

22:00 jacob22 has quit [Read error: Connection reset by peer]

22:04 jacob22 has joined #pypy

22:08 commandoline has joined #pypy

22:22 <mattip> nulano: that sounds like before the changes that fixed virualenv for windows

22:22 * mattip zzz

22:27 <bbot2> Success: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/8385 [default]

22:28 <bbot2> Success: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/7270 [default]

22:33 Orimendix has quit [Ping timeout: 240 seconds]

22:33 <nulano> mattip, no, I get the same output with latest nightly

22:36 <nulano> mattip, see stderr output at https://bpa.st/YAYQ

22:36 <bbot2> Failure: http://buildbot.pypy.org/builders/own-win-x86-32/builds/2485 [default]

22:38 Orimendix has joined #pypy

22:53 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/650 [default]

23:06 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7263 [default]

23:07 <bbot2> Success: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/6058 [default]

23:13 <nulano> win64 is missing some includes in Python.h, adding those lets the tests pass

23:13 <nulano> not sure what's going on with win32

23:22 holdsworth has quit [Quit: No Ping reply in 180 seconds.]

23:22 holdsworth_ has joined #pypy

23:23 dstufft has quit [Ping timeout: 260 seconds]

23:24 pjenvey has quit [Ping timeout: 260 seconds]

23:25 pjenvey has joined #pypy

23:27 dstufft has joined #pypy

23:35 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5264 [default]

23:52 cjmcdonald has quit [Ping timeout: 260 seconds]

23:55 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-s390x/builds/1417 [default]

23:56 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-64/builds/25 [nulano: test branch, win64]