cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
lritter has quit [Ping timeout: 240 seconds]
lritter has joined #pypy
rubdos has quit [Ping timeout: 272 seconds]
jcea has quit [Quit: jcea]
rubdos has joined #pypy
lritter has quit [Ping timeout: 260 seconds]
TheNewbie has joined #pypy
oberstet has joined #pypy
TheNewbie has quit [Quit: Leaving]
bender has joined #pypy
b has joined #pypy
b is now known as Guest52262
bender has quit [Remote host closed the connection]
Guest52262 has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
bender has joined #pypy
bender has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
bender has joined #pypy
bender has quit [Client Quit]
monkeyontheloose has joined #pypy
<monkeyontheloose> nobody here, srsly?
<mattip> monkeyontheloose: if you have a question, ask it
<monkeyontheloose> Q1: where can I get good benchmarks for nodejs vs pypy
<monkeyontheloose> Q2: how much more optimization do you guys think is left for PyPy, in terms of percent, a rough number
<monkeyontheloose> Hey @Mattip it's me Bender from the emails
<LarstiQ> A1: that's a rather broad question, and benchmarking is a fraught subject. Do you have a particular algoritm/use case in mind?
<monkeyontheloose> backend server general usecase
<monkeyontheloose> also different types of calculations would be intresting to see
dddddd has quit [Ping timeout: 256 seconds]
<LarstiQ> that's still quite broad, see e.g. for an (archived) attempt at that: https://en.wikipedia.org/wiki/The_Computer_Language_Benchmarks_Game
<monkeyontheloose> ill take whatever is available
<monkeyontheloose> hh
<mattip> if I recall correctly that benchmark discriminated against PyPy since the microbenchmarks did not run sufficiently long to allow the JIT to warm up
<mgorny> monkeyontheloose: 'general usecase' sounds like you should choose a language that you like (syntax, features, library)
<mgorny> Otherwise, you may end up getting a high f-word factor for negligible perf gain
<monkeyontheloose> mattip: what benchmark?
<mattip> the one linked to by LarstiQ
dddddd has joined #pypy
<monkeyontheloose> i don't see an results
<monkeyontheloose> it looks like just a bunch of tests types without results
<monkeyontheloose> mgorny: yeah i know, but i'm intrested in the speed
<mattip> if your interest is PyPy, you will not find it. It is excluded from any of the results, that seem to have dissapeared
<monkeyontheloose> are there any other sources on the inet?
<mattip> comparing Node.js to PyPy? Not that I know of
<mattip> as for Q2: I'm not sure I understand. Can you link to an answer for any other JIT or optimization technology that answers that?
<monkeyontheloose> I don't have a post for reference
<monkeyontheloose> just wondering how much more optimization is there left
<LarstiQ> monkeyontheloose: the problem is, how do you quantify that?
<monkeyontheloose> run time
<monkeyontheloose> seconds
<monkeyontheloose> no?
<LarstiQ> you can maybe calculate a theoretical maximum of what you can achieve with current methods, but where actual implementation will fall is trying to predict the future
<LarstiQ> maybe look at comparable technology to get an idea of what could be done, but still
<LarstiQ> monkeyontheloose: it's easier to answer a more concrete question like "why does this code take X times as long as cpython"
<LarstiQ> then the answer could be "ah, there is room for improvement here if we do Y" or "unfortunately due to Z this is as good as it gets"
<monkeyontheloose> just wondering what would be a rough estimate
<monkeyontheloose> of how much more give is there
<monkeyontheloose> for pypy
<monkeyontheloose> something very rough
<LarstiQ> I don't know if anyone can guess at that, maybe Armin
<mattip> if you find a comparable statement from another compiler group, we could use that as a starting point
<mattip> how do you go about estimating what "give" even exists? STM? SIMD? using GPU?
<monkeyontheloose> i don't have a comparable statment
<monkeyontheloose> how can I find this Armin dude?
<monkeyontheloose> I'm not asking for like a scientific accurate answer
<monkeyontheloose> just generally wondering how much more value can be created
<monkeyontheloose> Thanks for the answers up till now BTW
<cfbolz> no compiler that I know of ever got "finished"
<cfbolz> gcc, llvm, v8, graal, they all have huge teams "still" working on them
jacob22 has quit [Read error: Connection reset by peer]
jacob22 has joined #pypy
tos9 has quit [Ping timeout: 260 seconds]
tos9_ has joined #pypy
<arigato> monkeyontheloose: I don't think there is much "general-purpose" improvement left in pypy. There are many specific cases where doing exactly this or that could be done better, sometimes dramatically, but there is no general "I know how to make it 2x faster for any benchmark" left
<arigato> or, maybe, I can think about how to get it 2x faster, but with a huge startup cost, for example, so that programs would only be 2x faster if they run for a long time. Given that drawback, it's unlikely to get implemented
<simpson> There *is* an interesting metric, but I don't know how to measure it on PyPy; it's the estimate of how much native work is required to emulate a single emulated unit of computation.
<fijal> what's a single emulated unit of computation?
<simpson> Folklore says that Dolphin is 3x slower, V8 is 2x slower, etc. and those are roughly because of the number of instructions used underneath.
<fijal> I think it's very hard to come by with a reasonable metric
<simpson> Right, that's part of the problem; it's not very well-defined.
<fijal> it's as good of a question as what's a "standard benchmark"
<fijal> or more precisely "A is X times faster than B" is a provably false statement
<arigato> given that in a few extreme examples pypy is faster than C++, it's unlikely that an "estimated unit of computation" really means anything
<simpson> Sure. There are other measures, too; for emulators which can host themselves, there's the ratio of how much slowdown each additional layer of emulation causes.
<fijal> so you can't say how much it causes because it depends on the operation
<fijal> in the edge cases it's 0
<fijal> but then what do you say about things like checking array bounds?
<fijal> sure C is faster because you don't do it, is it the same operation?
<simpson> Sure. The reason that it's interesting is because of the performance targets. Dolphin needs to spend no more than around 3 host clocks per emulated clock, or else gameplay suffers; this motivates their speed goals.
<fijal> I do think that the whole notion is a bit of bullshit
<fijal> especially with the modern CPUs where a lot of effects only take place when you run a lot of code
<fijal> *also* a lot of "speed of language" is culture
<fijal> you *can* write very fast programs in PyPy, but most things are not very fast because they are "pythonic"
<fijal> and that adjective is as vague as it gets
<simpson> "simple", "readable", etc. I think people are trading off complexity, using easier algorithms which take longer to run.
<fijal> sure but all of that makes your desired goal even harder to achieve
<simpson> Yes. There's no free lunch in optimization.
<fijal> that's such a copout
<fijal> of course there are no proper solutions and we are only left with heuristics
<fijal> but you are chasing something that can't even be properly defined and every single heuristic I saw is largely meaningless
<simpson> I'm not trying to be glib; I just mean that if somebody writes some code without caring about performance, then a JIT won't necessarily be able to transform it into something fast. Compilers aren't magic.
<fijal> in my experience trying to profile large programs, it's something like 5 out of 10 where there even *is* a microbenchmark that can be devised to make things better
<fijal> the other 50% is "large scale effects that don't really show up on small benchmarks"
<fijal> or "culture of not caring about performance" even
<simpson> Yeah. What I wonder about is how languages can offer better builtins, which inspire a performance-oriented culture.
<monkeyontheloose> so not big optimization left in pypy?
<simpson> monkeyontheloose: I could imagine an infinite series of languages which a compiler could use to compile Python into ever-more-optimized forms. It would work by "superoptimization", which is a fancy word for finding *the best* way to do some small action. There are infinitely many superoptimizations possible, and each one should make PyPy faster.
<arigato> a more practical answer is "no"
<simpson> But, like, the ahead-of-time costs start to grow quite a bit! So suddenly there's a tradeoff: How much time can be spent on startup/compilation compared to time that needs to be spent running?
<simpson> (What arigato said.)
<simpson> monkeyontheloose: Would you be interested in contributing to PyPy?
<monkeyontheloose> i mean some optimization that you switch to and in most cases gives you 50% boost
<monkeyontheloose> or something
<simpson> What are you working on, specifically?
<simpson> There may be things that can be done for your current code base which speed it up. Like fijal says, a lot of performance is about the culture of your code and maintainers.
<monkeyontheloose> no no
<monkeyontheloose> i though about creating a better version of pypy
<monkeyontheloose> i was checking out why people used node js vs python
<monkeyontheloose> most sites say speed
<monkeyontheloose> then some smarted people told me v8 is faster cuz it has better funding
<monkeyontheloose> 1] im not sure v8 is faster then pypy
<monkeyontheloose> 2] your saying there isn't a hell lot more to squeeze out of it
<fijal> simpson: I have a few ideas, yes
<simpson> Aha. Okay, this is understandable. On [0], I think you're right; it *is* to do with funding. I wish that there were more money put into PyPy, because I'm confident that it's one of the limiting things.
<simpson> On [2], there's always more to squeeze. But it gets harder and harder; each tiny percent of savings costs more and more effort to gain. On [1], yeah, it just doesn't make sense to say "X is faster than Y" unless you're saying "X is faster than Y at specific task Z".
<monkeyontheloose> so i have ideas about how to get funding
<monkeyontheloose> but im not sure there is enough to squeeze out of pypy
<monkeyontheloose> as you said
<fijal> what's "enough"? what do you want?
<simpson> Oh, I can easily think of things to work on. fijal can, too, and I bet their ideas are actually good ones, unlike mine.
<fijal> simpson: I mean more "what in language inspires culture to be slow"
<simpson> fijal: Oh, yes, I would love to understand more of that. The only inkling I have so far is static vs dynamic; static features might help compilers, but dynamic features almost always are slowdowns.
<fijal> I don't think dynamic features are a massive slowdown in case of pypy
<fijal> so one way si that doing expensive things should be marked as such
<fijal> like, the java reflections
<fijal> vs first-grade features like in python
<monkeyontheloose> "enough" is a speedup that would easly convince dev to beg their boss to pay for the speedy licensed version
<monkeyontheloose> for example if it runs 2x faster
<monkeyontheloose> easy win
<tos9_> I want a mode of pypy that disallows slow things
<tos9_> (no one should care what I want. But yeah I think that'd maybe be nice)
<fijal> monkeyontheloose: that to me seems like a marketing problem not a technical one
<fijal> pypy does run 2x faster than cpython, faster than that usually
<monkeyontheloose> yeah i know
<monkeyontheloose> im guessing big production systems run pypy already, no?
<monkeyontheloose> instead of cpython i mean
<monkeyontheloose> why do you think this is a marking problem?
<fijal> yeah
<fijal> because you are trying to convince people?
<fijal> technically pypy is faster than cpython
<fijal> a lot of people don't use it, while at the same time complaining how slow is python
<simpson> Agreed. Every time I've gone to use PyPy in production, the obstacles have been people. PyPy itself runs great.
<fijal> monkeyontheloose: software engineering is far more of a human interaction study than actually working with computers
<fijal> turns out
<fijal> simpson: so one thing that stands out for example is how easy it is to use dictionaries
<fijal> it's easy to have a dictionary that has three keys "foo" "bar" and "baz". There is always a faster way to do it like [FOO, BAR, BAZ] = range(3) etc.
<fijal> I think another very prevalent problem is that cpython has rather insane performance characteristics
<simpson> Yes.
<fijal> and people either do know it or repeat unproven folklore. Either of those things don't apply to pypy
<fijal> so usually if you try to optimize for cpython, you end up making things worse for pypy (and sometimes for cpython too)
<monkeyontheloose> Q: most big production systems, do they run pypy or cpython?
<monkeyontheloose> (the ones that run Python)
<simpson> Depends on how you measure "big" and "most". The biggest one I know of is Youtube, which runs Google's own patched CPython; that system is not really portable Python code and is not the kind of thing that you're probably going to build.
<monkeyontheloose> i mean big as in hundereds of thousands of requests
<simpson> Most Python installations are CPython, period. I've personally seen PyPy Web servers in production, but it's not at all common.
<monkeyontheloose> intresting
<monkeyontheloose> do you think devs don't even bother switching?
<simpson> The typical Python user doesn't know that PyPy exists. Often they might only know IPython or Jupiter or vPython.
<monkeyontheloose> intresting
<monkeyontheloose> could it be that its hard to swtich because of lack of support?
tos9_ has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
tos9 has joined #pypy
<simpson> It's network effects and inertia. There's also a frustrating cultural gap; in particular the meme that it's okay to write Python modules in C or other non-Python languages for speed is a barrier to getting pure-Python packages.
<fijal> monkeyontheloose: C integration is a large thorny issue
<fijal> but mostly you don't need performance until you do
<fijal> and then it's usually too late to change your platform because you hacked a lot
<fijal> remember we are talking silicon valley, engineering time is super super expensive
<fijal> so in order for something to matter it has to be really massive
<fijal> so inertia, lack of knowledge, lack of "this is the way to do it"
<fijal> also, lack of appeal of changing platform nad tuning knobs - rewriting to rust seems more appealing
<monkeyontheloose> but with pypy you dont need to rewrite
<monkeyontheloose> unless you have cmodules
<monkeyontheloose> it should be a very easy switch
<simpson> Yep. Unfortunately, that "unless" is extremely common. Numpy was a problem for a long time. PIL is still a problem.
<monkeyontheloose> PIL?
<antocuni> any non toy system which is experiencing performance problems has some module written in C and/or Cython. Moreover, they are likely pip-installing a number of packages for which there are binary wheels: as soon as you switch to pypy pip needs to compile them from source, which means having gcc + all the required libraries + their -dev version: depending on how your build system is done, this might not be trivial, and for sure
<antocuni> it considerably increase compilation time
<antocuni> so, most of the time even just trying pypy it's not an "easy switch"
<antocuni> on top of that, your badly-engineered system might use e.g. gc.disable() here and there, which is going to cause problems on pypy
<simpson> Python Imaging Library. You might know it as Pillow.
<antocuni> or maybe it's not closing a file when it should, so you are getting short of file descriptors
<antocuni> unfortunately, even if the theory says that it's easy to switch to pypy, in practice it requires time. And then you discover that your system is 20% slower than on CPython
<antocuni> (maybe it's 20% slower because you are using a cython "speedup" somewhere, and if you switch back to a pure-python version of the code you are much faster on pypy; but again, it takes time to understand/try it)
<simpson> Right, more work is often required. Memorably, I had one time where I switched from Numpy on CPython to array.array on PyPy; the *combination* of changes resulted in like a 60x speedup on my numerical core, but it had to be both changes.
<antocuni> yes, exactly
<antocuni> hopefully, hpy will help in this regard: the goal is that you are using an C/HPy extension, it will at least not be slower than on CPython
<antocuni> so you can try pypy, see e.g. a 10% speedup, and incrementally improve things
<mgorny> mattip: it seems that test_thread.py hangs now. any suggestion hwo to deal with it?
<mattip> mgorny: can you run the test as pypy lib-python/2.7/test/test_thread.py and figure out which test it is?
<mattip> then look at the diff to the default branch, maybe something will jump out at you
<mgorny> nevermind, figured out how to fix it ;-)
<mgorny> i've missed the tiny difference in pypy version
<mgorny> going to add comment for it now
monkeyontheloose has quit [Ping timeout: 258 seconds]
<mgorny> mattip: ok, i've pushed a few fixes to my branch and now i'm going to fire buildbot
<mgorny> maybe just one for a start
<mgorny> there's a dozen new test failures
<bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7228 [mgorny: try new branch again, stdlib-2.7.18-3]
<bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7228 [mgorny: try new branch again, stdlib-2.7.18-3]
<mgorny> mattip: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7228/steps/shell_7/logs/stdio do you happen to know if that is my fault somehow or existing issue?
<ronan> i.e it's not your fault
<mgorny> ok
<mgorny> ronan: do you happen to know if i can test my changes to pypy's modules against stdlib test without retranslating everything?
<ronan> mgorny: if you're only touching stdlib or lib_pypy modules, you don't need to retranslate. Just update the nightly distrib with your changes.
<mgorny> pypy/module/time/interp_time.py is what i'm touching, that's the problem
<ronan> ah, then no. You need to translate to see the changes
<ronan> though if you're working at interp-level, you should try to use untranslated tests
idnar has quit [Ping timeout: 240 seconds]
fijal has quit [Ping timeout: 256 seconds]
stillinbeta has quit [Ping timeout: 260 seconds]
avakdh has quit [Ping timeout: 260 seconds]
pulkit25 has quit [Ping timeout: 260 seconds]
ctismer has quit [Ping timeout: 246 seconds]
krono has quit [Ping timeout: 260 seconds]
string has quit [Ping timeout: 260 seconds]
samth has quit [Ping timeout: 240 seconds]
Olorin_ has quit [Ping timeout: 246 seconds]
altendky has quit [Ping timeout: 260 seconds]
Alex_Gaynor has quit [Ping timeout: 240 seconds]
phlebas has quit [Ping timeout: 246 seconds]
fangerer___ has quit [Ping timeout: 272 seconds]
wallet42__ has quit [Ping timeout: 272 seconds]
jeroud has quit [Ping timeout: 272 seconds]
agronholm has quit [Ping timeout: 272 seconds]
stillinbeta has joined #pypy
idnar has joined #pypy
fijal has joined #pypy
agronholm has joined #pypy
altendky has joined #pypy
avakdh has joined #pypy
string has joined #pypy
wallet42__ has joined #pypy
ctismer has joined #pypy
Alex_Gaynor has joined #pypy
krono has joined #pypy
Olorin_ has joined #pypy
phlebas has joined #pypy
fangerer___ has joined #pypy
pulkit25 has joined #pypy
jeroud has joined #pypy
samth has joined #pypy
<mattip> mgorny: so it seems the "failure" in test_time is that we check earlier and raise a ValueError,
<mattip> we do not claim 1:1 exception compatibility with CPython, so in this case I think it would be OK to change the test
<mattip> to catch both OverflowError and ValueError
jacob22 has quit [Read error: Connection reset by peer]
jacob22 has joined #pypy
<mgorny> mattip: i've attempted to change that
<mgorny> i'm at C build already, so let's see if my patch helps
<mattip> cool
monkeyontheloose has joined #pypy
<mgorny> hmm, it didn't work, so let's change the test
Smigwell has joined #pypy
jcea has joined #pypy
jcea has quit [Ping timeout: 260 seconds]
jcea has joined #pypy
ctismer has quit [Ping timeout: 240 seconds]
agronholm has quit [Ping timeout: 260 seconds]
pulkit25 has quit [Ping timeout: 244 seconds]
jeroud has quit [Ping timeout: 244 seconds]
Alex_Gaynor has quit [Ping timeout: 240 seconds]
altendky has quit [Ping timeout: 240 seconds]
fijal has quit [Ping timeout: 240 seconds]
EWDurbin has quit [Ping timeout: 240 seconds]
DRMacIver has quit [Ping timeout: 260 seconds]
idnar has quit [Ping timeout: 260 seconds]
fangerer___ has quit [Ping timeout: 246 seconds]
krono has quit [Ping timeout: 272 seconds]
graingert has quit [Ping timeout: 272 seconds]
stillinbeta has quit [Ping timeout: 240 seconds]
wallet42__ has quit [Ping timeout: 260 seconds]
ctismer has joined #pypy
string has quit [Ping timeout: 272 seconds]
avakdh has quit [Ping timeout: 260 seconds]
michelp has quit [Ping timeout: 272 seconds]
fangerer___ has joined #pypy
Alex_Gaynor has joined #pypy
krono has joined #pypy
michelp has joined #pypy
wallet42__ has joined #pypy
altendky has joined #pypy
EWDurbin has joined #pypy
idnar has joined #pypy
stillinbeta has joined #pypy
string has joined #pypy
fijal has joined #pypy
avakdh has joined #pypy
pulkit25 has joined #pypy
DRMacIver has joined #pypy
agronholm has joined #pypy
graingert has joined #pypy
jeroud has joined #pypy
monkeyontheloose has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Taggnostr2 has joined #pypy
Taggnostr2 has quit [Client Quit]
oberstet has quit [Remote host closed the connection]