PyPy, the flexible snake (IRC logs: ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
<fijal> I mean more specifically "cython" the program is slow on pypy
<fijal> like cpython cython foo.pyx is faster than pypy cython foo.pyx
<fijal> I don't think there are any surprises tbh
<cfbolz> fijal: that's the usual "tree walking/interpretering/etc is slow"
<simpson> I wonder if that meme needs updating. The tree walker in Typhon (Monte in RPython) is not a very fast interpreter, but it's overall faster than bytecode which was originally designed for Smalltalk and then copied to Self to Java to E to Monte. Maybe "tree walking can't be fast" is better.
* simpson should write new bytecode for Typhon
<cfbolz> no no
<cfbolz> I mean if you write interpreters in app level python code
<cfbolz> the jit is not so good
<simpson> Aha! I see.
<mattip> cfbolz: if you could start over with cython/sphinx, what would a better design be?
<cfbolz> mattip: a bit unclear, I think somehow it's a limitation of pypy
<cfbolz> and I don't know how to remove it either
<cfbolz> (or a limit of tracing, really)
<cfbolz> open research question
<mattip> what is that research called? I would like to learn a bit
<cfbolz> mattip: nobody is working on this
<mattip> is it fair to say that in other fields they would use flex/bison rather than writing a parser in python?
<cfbolz> mattip: it's not really about the parser per se
<cfbolz> it's more that the highly variable control flow that these kind of systems exhibit are badly supported by tracing
<cfbolz> (a related problem pops up in various contexts, eg CPU's have a hard time with branch prediction for these systems)
<mattip> is there a different strategy that would yield better results than cpython?
<cfbolz> mattip: as I said, I don't think anybody knows. method-based JIT compilers don't do as terribly
<cfbolz> but they also have problems with heavy polymorphism
<simpson> mattip: is a good paper on the topic, IIRC.
<cfbolz> weeeeeell
<cfbolz> that's rather obscure
<cfbolz> it's not really a practical way to teach the JIT how to remove the overhead of tree-y code
<antocuni> another thing to consider is that people are used to write code which works well on method-based compilers (either JIT or non JIT), but less used to write code optimized for tracing compilers
<cfbolz> yes, sure
<antocuni> e.g. there are constructs and abstractions which are basically free on tracing compilers but add a cost on classical compilers
<cfbolz> but it's not like there are ways to write a parser to solve this problem
<antocuni> what has parsing to do with it?
<cfbolz> antocuni: show me a parser that's tracing friendly ;-)
<antocuni> ah ok, in this sense
<antocuni> yes I agree
<cfbolz> anyway, it's not a solution
<cfbolz> we can't rewrite the world to be parsing friendly
<antocuni> sure
<cfbolz> s/parsing/tracing
* cfbolz needs more coffee
<mattip> maybe we should do the opposite of numba: instead of having a @jitme decorator, have a @do_not_bother decorator
<cfbolz> heh
<antocuni> from some point of view, it's impressive that pypy gets good results at all on existing code 😅. Maybe it's just because python is soooo slow
<antocuni> s/python/cpython
<simpson> Doesn't Truffle have a way of annotating "megamorphic" sites where the polymorphism is known to explode?
<cfbolz> simpson: on the interpreter level? or app level?
<simpson> Hm, might be interpreter level.
<simpson> I don't really remember what they did specially, just that they had the nonce word in their presentation.
<mattip> for the specific case of cython, they have already identified the "hot" sections of code: these are the ones they compile to c-extensions
<cfbolz> mattip: that's not what fijal said though. He said that running cython as an executable is slow, not the generated c code
<cfbolz> That might be slow too
<mattip> the executable that convertex pyx -> c is not pure python. There are parts of it that are c-extensions
<cfbolz> Ah, they rewrote part of the compiler with cython code?
<cfbolz> Cool, sorry, I didn't know
<cfbolz> Then maybe it becomes faster if we don't use the pyx accelerators on PyPy? 😅
<cfbolz> antocuni: we still don't do anything about promotions that just produce more and more cases, right?
<mattip> that would be the first step, but then we would be back to the sphinx story, where we are now "only" 2x slower than pure-python
<cfbolz> mattip: yes
<mattip> they do not expend alot of effort to do the acceleration, see the Visitors files at the end of this list
<mattip> they just add a pxd file, and compile it
<fijal> antocuni: I did not make any progress with sphinx for example
<fijal> I'm not sure how to write such code in a friendly manner, other than trying to change interpreter to a compiler
<cfbolz> mattip: does the JIT help at all? Or is PyPy faster if we turn it off on sphinx
* mattip comparing for with/without the JIT
<mattip> cpython: ~45 secs, pypy JIT: ~75 secs pypy w/out JIT: 175 secs
<cfbolz> right
<cfbolz> so it helps, but not enough :-(
<mattip> could we figure out which parts of the code flow we are better than CPython and which parts we are worse?
<mattip> or is it just too many functions to analyze anything from the profile(s)
<cfbolz> mattip: yes, that's the right question
<cfbolz> I am not sure the full profile is where to start
<cfbolz> maybe we can add some time() calls to sphinx for the different phases
<antocuni> cfbolz: we don't do anything special about promotions AFAIK
<antocuni> sometimes I wonder whether we should add a quick method-based JIT to use instead of the bytecode interpreter
<cfbolz> antocuni: many ideas are possible. but they need a bit of serious trying, and it's not so easy to find the people/funding for that
<antocuni> I know :(
