cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
<kenaan>
mattip default e159b221303b /lib-python/2.7/ensurepip/__init__.py: back out most of 17694ed47a34, leave only what is needed for pip 19.2 to work
<mattip>
and the lack of _aheapsort_bool in pre-10.14 macOS
<mattip>
if you have a pre-10.14 machine, could you confirm you can modify the value in lib-python/2.7/distutils/sysconfig_pypy.py and still get a functional pypy?
<danchr>
I'll try when I get home — or perhaps tomorrow
<danchr>
there are two issues here: 1) MACOSX_DEPLOYMENT_TARGET isn't set consistently, as noted in the build logs from MacPorts and 2) it is conceptually incorrect to set it to a release newer than the host platform
<danchr>
the first one is the issue that breaks the build, but I'd suspect fixing that would lead to breakage due to the second one
<mattip>
sounds reasonable. let's move this to #macports
xorAxAx has quit [Ping timeout: 260 seconds]
tsaka__ has joined #pypy
<kenaan>
mattip default 158784f4a75c /lib-python/2.7/distutils/sysconfig_pypy.py: set minimal MACOSX_DEPLOYMENT_TARGET to 10.7 on macOS; cpython uses 10.5
<kenaan>
mattip py3.6 73d1f8c0a863 /lib-python/3/_osx_support.py: set minimal MACOSX_DEPLOYMENT_TARGET to 10.7 on macOS; cpython does not have this
<mattip>
danchr: changed in 158784f4a75c, not really sure how to test it doesn't break things
<mattip>
tos9: do you have an opinion? ^^^
<danchr>
I think setting it at all is probably wrong, but I'm not entirely sure
<danchr>
but it's probably sensible for the pypy.org binaries
<danchr>
ideally, pypy would have a proper configure stage where it figured out this sort of thing :)
<mattip>
I think it has to do with minimum macOS platform features we are using, things like monotonic time and sockets
tsaka__ has quit [Ping timeout: 245 seconds]
<mattip>
then there is rpython/rlib/rtime.py which has specifically pre-10.12 defined HAS_CLOCK_GETTIME=False:w
marky1991_2 has quit [Remote host closed the connection]
marky1991_2 has joined #pypy
<tos9>
mattip: maybe you've kept discussing elsewhere, but from what I recall... it's a minimum version, and whatever value it was set at was actually too low before
<tos9>
mattip: so 10.14 doesn't seem like the right value, but neither is whatever it was originally set to, because numpy fails with that value, so clearly we need a newer one
<tos9>
I don't remember if we bisected exactly which version gave a working numpy install
<tos9>
but probably that's easy to check with a built nightly
<danchr>
what version of macOS do you have on the builders?
xtarget has joined #pypy
jvesely has joined #pypy
<arigato>
mattip: found the bug
<arigato>
the aarch64 jit overwrites a register in one rare case
Rhy0lite has joined #pypy
<kenaan>
arigo default 81c30ab04ab4 /rpython/jit/backend/aarch64/opassembler.py: Fix in the aarch64 backend for a rare case where a register would be overridden
<kenaan>
danchr allow-forcing-no-embed bf745505226d /pypy/tool/release/package.py: package: allow suppressing embedded dependencies with an envvar
<danchr>
arigato: thanks :)
<arigato>
:-)
<danchr>
that change also applies to py3k btw — do you have regular merges, or should I graft it there?
<arigato>
we have regular merges
forgottenone has joined #pypy
<arigato>
trying to merge now; maybe you can tell me if I can ignore the change to MACOSX_DEPLOYMENT_TARGET (from 10.14 to 10.7) that occurred on default with no clear place where it should go on py3.6?
<arigato>
...ah, I see it was done already, sorry
<arigato>
(the change does to lib-python/3/_osx_support.py)
<arigato>
(the change goes to lib-python/3/_osx_support.py)
<danchr>
just make it easy to override; as an example, packagers will likely want to make it the current OS
<mattip>
pypy3 does not use the default, it probably only relevant for pypy2
jcea has joined #pypy
YannickJadoul has joined #pypy
<ronan>
arigato: I don't really remember, I think it was a mix of PyUnicode_New not being very relevant at the time and it being potentially tricky to implement
windie has joined #pypy
marky1991 has quit [Quit: Saliendo]
windie has left #pypy [#pypy]
windie has joined #pypy
windie has quit [Remote host closed the connection]
shunning has joined #pypy
<shunning>
Hi all, in the process of moving our project from python2 to python3, I found that the speedup using pypy is diminished. The first reason we found was due to the custom dictionary passed to exec statements (yes we compile a lot of functions and execute them for billion times). After fixing this, I found that, if we execute 7 functions in a loop,
<shunning>
the CALL_FUNCTION token is not constant folded anymore. Instead, there are 100B of code leaving in the trace before every function call. If I remove the last function, the 100B of code is gone, any comments or suggestions?
<shunning>
* 100B of trace before force_token and enter_portal_frame
<arigato>
and do you mean "inlined" instead of "constant folded"?
<shunning>
Hmm I'm not sure if I can call it inlined because the 100B trace is between the merge point for CALL_FUNCTION and force_token
<shunning>
In the case of calling 6 functions in a loop, I can still see the 100B trace before the targettoken label in the middle of jit-log-opt-loop, but after the targettoken label, it's optimized away -- the merge point for CALL_FUNCTION is directly followed by force_token.
<shunning>
We suspect there is some parameter that we are not aware of, or there is a bug in the JIT
<arigato>
it's likely that python3 needs a bit more operations than python2, which lowers a bit the somewhat-artificial limit for inlining
<arigato>
anyway, let's start again from what you said. there are many words that I'm not sure I follow, starting with "100B"
<shunning>
Sure thanks
<arigato>
so what do you mean by "100B of code"
<shunning>
100B is, i think, pushing whatever arguments, closure, func name, qualname, on to the stack. If you don't mind, I can past that 100B
<shunning>
100B in the trace
<shunning>
from +10064 to +10184
<arigato>
100 bytes of assembly code?
<arigato>
OK
<shunning>
let me create a gist to store the whole trace?
<shunning>
the trace is too big to be put on a gist, here is the 100B I'm talking about. After I read about python3 dis, I think it's because the function calling scheme is defined differently in python3
<arigato>
it can also come from the dictionary you use in exec(), indirectly
<shunning>
We tried to get rid of dictionaries by using exec() without globals and locals
<arigato>
that's not a good solution either
<shunning>
With custom globals, we observed a lot of ll_call_lookup_function_trampoline__v1434___simple_call__function_
<arigato>
maybe try to use ``import __pypy__; d = __pypy__.newdict("xxx") ``
<arigato>
with "xxx" being another string:
<arigato>
I think you want "module" here
<shunning>
hmmmm intersting
<arigato>
equivalently, in standard python, it would be "m = <make a new module>; d = m.__dict__"
<arigato>
you get in both cases a dictionary that is internally optimized to work like a module-dictionary
<arigato>
``m = types.ModuleType('mymod'); d = m.__dict__``
<shunning>
could you elaborate a little bit more on "it can also come from the dictionary you use in exec(), indirectly"?
<shunning>
right now we removed all globals() and locals() in exec by compiling a function A that returns the desired function B and capture all the variables in A
<shunning>
oh sorry, not removing all globals() and locals()
<shunning>
we still have to supply a local dict to compile function A, but the desired function B is returned by calling A, so B won't use the variables in A's locals()
<arigato>
I meant that if you're using exec() on a source code that contains several functions that call each others, these functions will need a lot of globals access, and if the "globals" is a plain dictionary obtained with "{}", then it's slightly less optimal
<arigato>
the global accesses are when calling the other functions
<shunning>
does it matter if we compile them in different places?
<shunning>
I think what we do is that we first compile individual functions A, B, C, D. and then we compile a function X that calls A();B();C();D();.
<arigato>
and where are A, B, C, D stored?
<shunning>
the way we compile function X is to create a compile_X function that takes a list of functions L,
<shunning>
inside compile_X's body, we do f0=L[0];f1=L[1];f2=L[2] ... and X is doing f0();f1();f2();f3();
<arigato>
ah, OK I see why you get what you pasted then
<arigato>
...maybe. Note sure yet
<arigato>
where are f0=L[0] etc.?
<arigato>
inside or outside the exec() that is inside compile_X()?
<shunning>
inside compile_X
<shunning>
let me paste it
<shunning>
src = """def compile_X( L ): f0 = L[0]; f1 = L[1]; ... def X(): f0();f1();f2();... return X"""l = {}exec( compile(src), l )ret = l['compile_X']( L )
<shunning>
```src = """def compile_X( L ): f0 = L[0]; f1 = L[1]; ... def X(): f0();f1();f2();... return X"""l = {}exec( compile(src), l )ret = l['compile_X']( L ) ```
<arigato>
OK I see, then how does the caller look like?
<shunning>
for i in range(1000):
<shunning>
ret()
<arigato>
I don't know what changed between pypy2 and pypy3...
<shunning>
Yes, on pypy2 it works fine
<arigato>
it's not about exec() at all I think, because you end up with a function ret() which never accesses any globals
oberstet has quit [Remote host closed the connection]
<shunning>
Anyways thanks arigato .. we are trying to reproduce it using a small python program, no success yet
<arigato>
yes, works fine on a small program here too
<arigato>
so I'll stand by my first impression: I think that the end result is optimized to the same thing, but maybe in large and messy examples you're hitting some trace-length limit
<arigato>
and the trace-length limit will be reached a bit faster on pypy3 because it makes more operations than pypy2 before optimizing them away
<shunning>
:) you made a good point
<shunning>
we usually compile pypy with bigmodel and supply a 100000000 trace limit
<shunning>
We remove the assertion
<shunning>
In pypy 2 we used to get 200k size trace and it still provides great speedup versus cpython
<arigato>
I didn't mean the trace would be 500 times bigger on pypy3, more like 10% bigger
<arigato>
so something looks wrong
<shunning>
yeah, for the next point, we did an experiment of running 6 functions and 7 functions
<shunning>
when its' 6 ot
<shunning>
6 is good, but 7 is not
<shunning>
so <=6 is good but >6 is not. We also try to permutate the functions to see what happens, the dichotomy persists
antocuni has quit [Ping timeout: 246 seconds]
jvesely has quit [Remote host closed the connection]
YannickJadoul has quit [Quit: Leaving]
xtarget has quit [Read error: Connection reset by peer]
jvesely has joined #pypy
<kenaan>
mattip allow-forcing-no-embed aad09c46bd5c /pypy/doc/whatsnew-head.rst: close and document merged branch
<kenaan>
mattip default d6217bf98b7c /pypy/doc/whatsnew-head.rst: merge allow-forcing-no-embed which added PYPY_NO_EMBED_DEPENDENCIES to packaging
<mattip>
if we change the so name for the py3.6 HEAD from dummy.pypy3-72-x86_64-linux-gnu.so to dummy.pypy36-pp72-x86_64-linux-gnu.so
<mattip>
does that require a SOABI change, i.e., 7.2.0 -> 8.0.0 or 7.3.0?
<tumbleweed>
I don't think so
<tumbleweed>
unless the ABI is changing as part of it
<tumbleweed>
I guess you'll want to be able to import both, for compatibility
<mattip>
it is only a semantic name change, the same ABI
<mattip>
so for the 7.2 ABI, we would need to support both names, but if we ever do 7.3 we could get rid of the old name
<tumbleweed>
yes
<mattip>
and pypy's python3.7 would only allow dummy.pypy37-pp72-x86_64-linux-gnu.so
<mattip>
not pypy3-72-...so
<tumbleweed>
what is the motivation for this change? to support abi3-pp72?
<tumbleweed>
err abi3-pp surely...
<mattip>
more about thinking of pypy releasing both python3.6 and python3.7
<mattip>
today they would both produce pypy3-72-...so
<tumbleweed>
right
<mattip>
at some future point we may have support for limited api pypy, and try to push through an abi3-pp tag, but I don't think we are ready for that yet
<mattip>
python itself is moving from two different tag processors (pep425.py in pep and whatever setuptools/distutils does) to a single standard pypa/packaging/tags.py
<mattip>
so lets let them finish that transition first, and get some pypy packages on PyPI, and then talk about limited api
forgottenone has quit [Quit: Konversation terminated!]
tsaka__ has quit [Ping timeout: 245 seconds]
<arigato>
+1
shunning has quit [Remote host closed the connection]
Rhy0lite has quit [Quit: Leaving]
jvesely has quit [Quit: jvesely]
jvesely has joined #pypy
lesshaste has quit [Ping timeout: 268 seconds]
tsaka__ has joined #pypy
asmeurer__ has joined #pypy
Dejan has joined #pypy
asmeurer__ has quit [Quit: asmeurer__]
Dejan has quit [Quit: Leaving]
shunning has joined #pypy
<shunning>
Hi guys I'm back with an example to reproduce the JIT bug I found
<shunning>
It also contains the instruction to execute the program. Basically what happened is, comment this f17() will lead to a much shorter trace with a bunch of useless code optimized away, and comment out more functions yield the same result.
<shunning>
I doubt there is some limit that the trace is hitting with f17(). Running f1-15() yield +7500 code size in jit-log-opt-loop. Adding f16 yield +8000, while adding f17 bumps it up to +10000 with a bunch of code being generated