arigato changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | mac OS and Fedora are not Windows
senyai has quit [Ping timeout: 244 seconds]
<ronan> mattip: that triggers some interesting nonsense, I'm tempted to fix it but it'd probably take a month
dddddd has quit [Remote host closed the connection]
jcea has quit [Quit: jcea]
senyai has joined #pypy
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
[Arfrever] has joined #pypy
forgottenone has joined #pypy
Arfrever has quit [Quit: 御出で]
Garen has quit [Read error: Connection reset by peer]
Garen has joined #pypy
Garen has quit [Read error: Connection reset by peer]
Garen has joined #pypy
<arigato> unrelated note about Cython: I'm thinking more about the "PyHandle" idea, which would be a new C API slightly different from the existing PyObject-based one
<arigato> transforming Cython to use the PyHandle-based API at most places instead of the PyObject-based API
<arigato> might give already a lot of the speed boost we're looking for
<arigato> for pypy
<mattip> ronan: thanks, the offending line was correct.
<LarstiQ> `correct but offending?
<arigato> "correctly identified which line was offending" I bet
<LarstiQ> I suspect so too, but the alternative is more amusing
<arigato> :-)
Ai9zO5AP has joined #pypy
jiorno has joined #pypy
jiorno has quit [Client Quit]
<mattip> arigato: how would you like to see a PyHandle project evolve?
<mattip> would it make sense to take a simple package that doesn't use cython and try a rewrite?
<mattip> simplejson c speedups is a single file with ~3000 lines https://github.com/simplejson/simplejson/blob/master/simplejson/_speedups.c
<mattip> the one drawback I see to the idea is that using cffi makes more sense to me
<arigato> let's pick a b-tree or similar
antocuni has joined #pypy
<arigato> yes, looks good
<antocuni> arigato: "might give already a lot of the speed boost we're looking for": apart that the W_Root->PyObject conversion cost never showed up in benchmarks so far
<antocuni> but I agree that eventually it will show up :)
<mattip> hopefully we will not be the only consumer of a better API
<arigato> are you sure? I'm talking about the dict lookup needed for this conversion
<antocuni> yes, IIRC the dict lookup was never the bottleneck in all the benchmarks we looked at, but I might be wrong of course
<arigato> I thought we had numbers that show that calls to a C-extension-module-builtin function are very fast as long as we don't need to do such conversion, and very slow otherwise
<arigato> I mean, not only the dict lookup: it's also the fact that we create many PyObjects and that's a lot of work for the GC later
<arigato> such GC work would vanish too
<antocuni> ah ok, this is more in line with what I remember
<antocuni> although I seem to remember that the problem was when we create many PyObject* in C
<antocuni> also, I don't see why using PyHandle creates less PyObjects?
<arigato> in a world where pypy supports both, as long as you use PyHandles then there are no PyObjects at all
<antocuni> ah ok, indeed
<arigato> a PyHandle is an index in a (global, RPython) array of W_Roots, and the point is that we can forget about them deterministically (as opposed to PyObjects, which need to stay around as long as the W_Root does)
<antocuni> right. So the GC problem will still be relevant for projects which uses the old cpyext API and create PyObjects directly (such as numpy)
<arigato> yes
<antocuni> ok
<arigato> and ideally we can port old extensions incrementally
<arigato> ‎it should continue to work if it uses sometimes the old and sometimes the new API, and you have conversion functions between them
<mattip> +1
<arigato> (e.g. PyObject -> PyHandle is cheap, but the other way around requires a dict lookup)
<arigato> (and associated GC cost for later)
<antocuni> and I suppose that on CPython, and PyHandle would just be an alias for a PyObject?
<arigato> likely
<antocuni> mattip: how likely it is for numpy to accept patches to use PyHandles?
<arigato> numpy is a kind of hard target, though
<arigato> because it's likely to end up in a bit of a mess, where numpy uses all the internal stuff that we don't necessarily want to replicate with PyHandles, so it needs to use both
<mattip> numpy creates many PyType_Objects. What would the parallel be in the new API? Create those in python not in C?
<arigato> yeah, I don't know
<arigato> I'm thinking that it would be cool to create types in Python in the PyHandles way, indeed
<arigato> but of course the "static PyTypeObject xxx" way won't stop working, in the mixed-APIs world
<antocuni> if we can't move numpy to PyHandles, then I fear we need to ALSO optimize the normal PyObject* usage :(
* mattip has evil thoughts about micronumpy, and hides
<antocuni> mattip: tell us
<mattip> well, there is CuPy and Dask that have their own ndarray, why can't we too
<mattip> they use the recent __array_ufunc__ and __array_funcion__ protocols to avoid np.asarray(arr) turning their ndarray into numpy's
<antocuni> so you propose to just write ndarray in RPython, and integrate it with the rest of numpy?
<mattip> +1
<mattip> of course the details, the details
<antocuni> yeah, starting from the fact that we want "numpy.ndarray" to be aliased to "_numpypy.ndarray", but probably not everywhere
<mattip> (that does not solve the general problem of, say, interop with opencv or tensorflow)
<mattip> I am waiting for CuPy and Dask to solve the problems, then we can copy what they do
<antocuni> I still think that the best way forward is to make the standard numpy fast enough; then if we can make it super-fast by rewriting parts of it in RPython, even better
ambv has joined #pypy
lritter has joined #pypy
lesshaste has joined #pypy
zmt01 has joined #pypy
zmt00 has quit [Ping timeout: 264 seconds]
dddddd has joined #pypy
<mattip> antocuni: since it will be a few years until numpy converts to any new api, I tend to agree
<mattip> plus numpy is not the only one: lxml, pandas, ...
antocuni has quit [Read error: Connection reset by peer]
antocuni has joined #pypy
ambv has quit [Quit: Textual IRC Client: www.textualapp.com]
ambv has joined #pypy
<ambv> mattip: ^
<mattip> ambv: thanks
<ambv> Who do we know from MicroPython and/or CircuitPython?
<ambv> I'd like to invite them, too
themsay has joined #pypy
forgottenone has quit [Quit: Konversation terminated!]
forgottenone has joined #pypy
forgottenone has quit [Client Quit]
forgottenone has joined #pypy
<rguillebert> glad to see the idea of PyHandles coming back :), I think I had a similar idea back in 2015 but didn't get to write anything because of work
<rguillebert> I think it'd be fairly easy to write a POC on top of CFFI to verify the soundness of it
lesshaste has quit [Read error: Connection reset by peer]
curiouz_k0d3r has joined #pypy
antocuni has quit [Ping timeout: 250 seconds]
ambv has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
curiouz_k0d3r has quit [Quit: WeeChat 1.6]
jacob22_ has quit [Read error: Connection reset by peer]
jacob22_ has joined #pypy
jcea has joined #pypy
rindolf has joined #pypy
Masklinn has joined #pypy
Rhy0lite has joined #pypy
antocuni has joined #pypy
<rindolf> what can i use instead of pickle to save/load a list of ints?
<simpson> How about JSON? I don't know if it's faster but it's less insecure.
<rindolf> simpson: can it handle bigints well?
<fijal> rotfl
<fijal> rindolf: hell no, it can't handle normal ints either
<rindolf> though my ints are from 0 up to 1e9+7
<rindolf> fijal: oh
<fijal> rindolf: it's JS. The type is "number" - which means things, but those are not ints
<simpson> rindolf: In Python, sure; here's a dict that I recently packed into JSON: https://bpaste.net/show/2272b2205517
<simpson> JSON doesn't specify integers, it's true. Check the properties of your JSON decoder and host language first.
<rindolf> simpson: ok. let me try
<fijal> small ints are fine
<fijal> (that is smaller than 32bits I think)
<fijal> also if you do python-python, it's probably good for 64bit ints
<simpson> fijal: IME even bigints are fine if using Python's `json`. I've had interoperability with bigints using JSON decoders in Haskell and Monte, too. JS gets to be the odd one out as usual.
<fijal> ok
<rindolf> simpson: thanks - json seems faster
rindolf has quit [Read error: Connection reset by peer]
senyai has quit [Read error: Connection reset by peer]
rindolf2 has joined #pypy
<dmalcolm> (you could potentially write the decimal representation of large ints as a json string, though that's kindof ugly)
rindolf has joined #pypy
<rindolf> hi all! running 4 processes with json seems to max out my ram
themsay has quit [Ping timeout: 240 seconds]
themsay has joined #pypy
<rindolf> i originally tried using "import array" and it consumed more ram and made matters slower
<rindolf> i can try a different approach
<rindolf> pypy3 consumes 14.7% of ram
<rindolf> out of 8 gb
<rindolf> there is fromfile here - https://docs.python.org/3/library/array.html
<simpson> Hm. Your benchmark looks memory-hard to me; it looks like it needs to keep all of the ints in memory while it works on them. I'd reach for mmap and struct, but maybe there's something better.
<simpson> (You'd have a single file with all of the ints packed in fixed-width binary, and then you'd mmap that file from each process and use struct to decode the ints.)
danchr has quit [Ping timeout: 250 seconds]
danchr has joined #pypy
<rindolf> simpson: using array helped and it seems that array('l') is faster than array('L')
<simpson> rindolf: Curious. I'm not sure why that would be.
TsundereChen has quit [Quit: leaving]
<rindolf> simpson: thanks for all your help
rindolf has quit [Remote host closed the connection]
rindolf has joined #pypy
rindolf2 has quit [Quit: leaving]
antocuni has quit [Ping timeout: 255 seconds]
danchr has quit [Ping timeout: 268 seconds]
Masklinn has quit []
rindolf has quit [Remote host closed the connection]
rindolf has joined #pypy
<rindolf> tos9: hi
<rindolf> hi all
<rindolf> tos9: do you still want a bug report?
ambv has joined #pypy
danchr has joined #pypy
Masklinn has joined #pypy
ambv has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<kenaan> mattip newmemoryview-app-level 6118585acf9c /: fix translation, extend class for python2 and use it in _ctypes
<kenaan> mattip default 275fd99e1c23 /pypy/module/cpyext/test/test_methodobject.py: add test that passes -A (python2.7), fails on pypy
<mattip> def func(*args, **kwargs): print args, kwargs
<mattip> print "(), {}" when called with func(*(), **{}), but in cpyext the empty dict is swallowed,
<mattip> which of course causes a test failure on numpy
<mattip> on a happier note, the nonsense in newmemoryview-app-level seems to allow np.frombuffer() for ctypes arrays
rindolf has quit [Ping timeout: 250 seconds]
rindolf has joined #pypy
themsay has quit [Ping timeout: 255 seconds]
Masklinn has quit []
lritter has quit [Ping timeout: 240 seconds]
xcm has quit [Ping timeout: 250 seconds]
dddddd has quit [Ping timeout: 258 seconds]
xcm has joined #pypy
forgottenone has quit [Quit: Konversation terminated!]
rindolf has quit [Ping timeout: 245 seconds]
Rhy0lite has quit [Quit: Leaving]
rindolf has joined #pypy
kipras has joined #pypy
jacob22_ has quit [Read error: Connection reset by peer]
jacob22_ has joined #pypy
dddddd has joined #pypy
Garen has quit [Read error: Connection reset by peer]
Garen has joined #pypy
Ai9zO5AP has quit [Ping timeout: 255 seconds]
antocuni has joined #pypy
moei has quit [Quit: Leaving...]
<rindolf> simpson: hi! an update - seems that for my case array('i' is a bit faster than 'l'
<simpson> rindolf: Curious. I wonder why.
<rindolf> and consumes less RAM
<simpson> Oh, I misread. I see. Nice.
<simpson> Probably a similar effect. Better memory usage, less cache pressure, etc.
<rindolf> also cache line size
* rindolf needs to do a little refactoring
rindolf has quit [Ping timeout: 255 seconds]
kipras has quit [Ping timeout: 255 seconds]