<arigato>
unrelated note about Cython: I'm thinking more about the "PyHandle" idea, which would be a new C API slightly different from the existing PyObject-based one
<arigato>
transforming Cython to use the PyHandle-based API at most places instead of the PyObject-based API
<arigato>
might give already a lot of the speed boost we're looking for
<arigato>
for pypy
<mattip>
ronan: thanks, the offending line was correct.
<LarstiQ>
`correct but offending?
<arigato>
"correctly identified which line was offending" I bet
<LarstiQ>
I suspect so too, but the alternative is more amusing
<antocuni>
arigato: "might give already a lot of the speed boost we're looking for": apart that the W_Root->PyObject conversion cost never showed up in benchmarks so far
<antocuni>
but I agree that eventually it will show up :)
<mattip>
hopefully we will not be the only consumer of a better API
<arigato>
are you sure? I'm talking about the dict lookup needed for this conversion
<antocuni>
yes, IIRC the dict lookup was never the bottleneck in all the benchmarks we looked at, but I might be wrong of course
<arigato>
I thought we had numbers that show that calls to a C-extension-module-builtin function are very fast as long as we don't need to do such conversion, and very slow otherwise
<arigato>
I mean, not only the dict lookup: it's also the fact that we create many PyObjects and that's a lot of work for the GC later
<arigato>
such GC work would vanish too
<antocuni>
ah ok, this is more in line with what I remember
<antocuni>
although I seem to remember that the problem was when we create many PyObject* in C
<antocuni>
also, I don't see why using PyHandle creates less PyObjects?
<arigato>
in a world where pypy supports both, as long as you use PyHandles then there are no PyObjects at all
<antocuni>
ah ok, indeed
<arigato>
a PyHandle is an index in a (global, RPython) array of W_Roots, and the point is that we can forget about them deterministically (as opposed to PyObjects, which need to stay around as long as the W_Root does)
<antocuni>
right. So the GC problem will still be relevant for projects which uses the old cpyext API and create PyObjects directly (such as numpy)
<arigato>
yes
<antocuni>
ok
<arigato>
and ideally we can port old extensions incrementally
<arigato>
it should continue to work if it uses sometimes the old and sometimes the new API, and you have conversion functions between them
<mattip>
+1
<arigato>
(e.g. PyObject -> PyHandle is cheap, but the other way around requires a dict lookup)
<arigato>
(and associated GC cost for later)
<antocuni>
and I suppose that on CPython, and PyHandle would just be an alias for a PyObject?
<arigato>
likely
<antocuni>
mattip: how likely it is for numpy to accept patches to use PyHandles?
<arigato>
numpy is a kind of hard target, though
<arigato>
because it's likely to end up in a bit of a mess, where numpy uses all the internal stuff that we don't necessarily want to replicate with PyHandles, so it needs to use both
<mattip>
numpy creates many PyType_Objects. What would the parallel be in the new API? Create those in python not in C?
<arigato>
yeah, I don't know
<arigato>
I'm thinking that it would be cool to create types in Python in the PyHandles way, indeed
<arigato>
but of course the "static PyTypeObject xxx" way won't stop working, in the mixed-APIs world
<antocuni>
if we can't move numpy to PyHandles, then I fear we need to ALSO optimize the normal PyObject* usage :(
* mattip
has evil thoughts about micronumpy, and hides
<antocuni>
mattip: tell us
<mattip>
well, there is CuPy and Dask that have their own ndarray, why can't we too
<mattip>
they use the recent __array_ufunc__ and __array_funcion__ protocols to avoid np.asarray(arr) turning their ndarray into numpy's
<antocuni>
so you propose to just write ndarray in RPython, and integrate it with the rest of numpy?
<mattip>
+1
<mattip>
of course the details, the details
<antocuni>
yeah, starting from the fact that we want "numpy.ndarray" to be aliased to "_numpypy.ndarray", but probably not everywhere
<mattip>
(that does not solve the general problem of, say, interop with opencv or tensorflow)
<mattip>
I am waiting for CuPy and Dask to solve the problems, then we can copy what they do
<antocuni>
I still think that the best way forward is to make the standard numpy fast enough; then if we can make it super-fast by rewriting parts of it in RPython, even better
ambv has joined #pypy
lritter has joined #pypy
lesshaste has joined #pypy
zmt01 has joined #pypy
zmt00 has quit [Ping timeout: 264 seconds]
dddddd has joined #pypy
<mattip>
antocuni: since it will be a few years until numpy converts to any new api, I tend to agree
<mattip>
plus numpy is not the only one: lxml, pandas, ...
antocuni has quit [Read error: Connection reset by peer]
<ambv>
Who do we know from MicroPython and/or CircuitPython?
<ambv>
I'd like to invite them, too
themsay has joined #pypy
forgottenone has quit [Quit: Konversation terminated!]
forgottenone has joined #pypy
forgottenone has quit [Client Quit]
forgottenone has joined #pypy
<rguillebert>
glad to see the idea of PyHandles coming back :), I think I had a similar idea back in 2015 but didn't get to write anything because of work
<rguillebert>
I think it'd be fairly easy to write a POC on top of CFFI to verify the soundness of it
lesshaste has quit [Read error: Connection reset by peer]
<simpson>
JSON doesn't specify integers, it's true. Check the properties of your JSON decoder and host language first.
<rindolf>
simpson: ok. let me try
<fijal>
small ints are fine
<fijal>
(that is smaller than 32bits I think)
<fijal>
also if you do python-python, it's probably good for 64bit ints
<simpson>
fijal: IME even bigints are fine if using Python's `json`. I've had interoperability with bigints using JSON decoders in Haskell and Monte, too. JS gets to be the odd one out as usual.
<fijal>
ok
<rindolf>
simpson: thanks - json seems faster
rindolf has quit [Read error: Connection reset by peer]
senyai has quit [Read error: Connection reset by peer]
rindolf2 has joined #pypy
<dmalcolm>
(you could potentially write the decimal representation of large ints as a json string, though that's kindof ugly)
rindolf has joined #pypy
<rindolf>
hi all! running 4 processes with json seems to max out my ram
themsay has quit [Ping timeout: 240 seconds]
themsay has joined #pypy
<rindolf>
i originally tried using "import array" and it consumed more ram and made matters slower
<simpson>
Hm. Your benchmark looks memory-hard to me; it looks like it needs to keep all of the ints in memory while it works on them. I'd reach for mmap and struct, but maybe there's something better.
<simpson>
(You'd have a single file with all of the ints packed in fixed-width binary, and then you'd mmap that file from each process and use struct to decode the ints.)
danchr has quit [Ping timeout: 250 seconds]
danchr has joined #pypy
<rindolf>
simpson: using array helped and it seems that array('l') is faster than array('L')
<simpson>
rindolf: Curious. I'm not sure why that would be.
TsundereChen has quit [Quit: leaving]
<rindolf>
simpson: thanks for all your help
rindolf has quit [Remote host closed the connection]
rindolf has joined #pypy
rindolf2 has quit [Quit: leaving]
antocuni has quit [Ping timeout: 255 seconds]
danchr has quit [Ping timeout: 268 seconds]
Masklinn has quit []
rindolf has quit [Remote host closed the connection]
rindolf has joined #pypy
<rindolf>
tos9: hi
<rindolf>
hi all
<rindolf>
tos9: do you still want a bug report?
ambv has joined #pypy
danchr has joined #pypy
Masklinn has joined #pypy
ambv has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<kenaan>
mattip newmemoryview-app-level 6118585acf9c /: fix translation, extend class for python2 and use it in _ctypes
<kenaan>
mattip default 275fd99e1c23 /pypy/module/cpyext/test/test_methodobject.py: add test that passes -A (python2.7), fails on pypy