antocuni changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "PyPy: the Gradual Reduction of Magic (tm)"
<njs>
fyi -- I was just talking to Matt Rocklin about dask, which is a popular library for doing distributed (multi-core, multi-machine) data processing, supported by Anaconda (formerly known as Continuum). It has a central scheduler process that's becoming a bottleneck, and they're trying to figure out how to speed it up. The interesting thing is that it's basically just a pure-Python tornado app using regular Python objects and such, so I asked if he'd tried
<njs>
PyPy, and he said that he had, and actually it was slower than CPython and he didn't know why.
<njs>
so I suggested that he get in touch because it seems like something that you might want to look into :-)
<antocuni>
njs: I quickly looked at the issue, but there is not much info there; I suppose one should really try to run the benchmark on pypy and look at the code
<njs>
antocuni: yeah, I don't have any more detail currently
<njs>
antocuni: I guess you could post there and ask for his benchmarks :-)
<antocuni>
njs: yes, apart that I'm not sure I want yet another task to do :)
<fijal>
I can have another task to do
tbodt has joined #pypy
yuyichao_ has quit [Ping timeout: 248 seconds]
<njs>
antocuni: I don't think that's representative of what he's doing; he said that he has some whole-program scheduling benchmarks he tried to run until pypy performance seemed to flatten out, and the place where it flattened out was slightly slower than CPython
<njs>
fijal: the scheduler process does not use numpy
<njs>
fijal: it's scheduler/workers architecture; the workers use numpy or whatever, but the scheduler doesn't, and they're separate processes so you can pick cpython/pypy separately for each
<njs>
fijal: and berkeley
<fijal>
there is an off chance we might swing through berkeley tomorrow
<njs>
oh huh, I didn't know you were in this hemisphere :-)
<fijal>
a bit on the busy side of things, but we have tomorrow off
<njs>
it's a holiday here, and I'm free all afternoon
<fijal>
Original error was: unable to load extension module '/Users/dev/.virtualenvs/pypy-utf8/site-packages/numpy/core/multiarray.pypy-41.so': dlopen(/Users/dev/.virtualenvs/pypy-utf8/site-packages/numpy/core/multiarray.pypy-41.so, 6): Symbol not found: _PyPyBool_Type
<fijal>
wtf is that?
<antocuni>
maybe you used a wheel compiled for a previous version of numpy?
<njs>
BTW, since Matt is here and works on product for parallelizing data analysis in Python I also just asked him what he thought of the GIL removal stuff, and he said no-one cares, the current situation is actually totally fine for all his users/customers. So... FWIW.
<njs>
(basically because numpy/pandas/etc. drop the GIL in their core loops, so up to like ~4-way multi-threaded parallelism works fine, and beyond that they just run more processes, since they already have to handle that to distribute over machines)
lritter__ has quit [Ping timeout: 248 seconds]
raynold has quit [Quit: Connection closed for inactivity]
<mattip>
njs: thanks, that is another data point in the apparent dissonance between what developers think is an interesting, solvable, and wothwhile problem
<mattip>
and how the users actually view the world
<mattip>
maybe worth a blog post or lightning talk
<mattip>
"The GIL-less ship has sailed, and the GIL is still on it"
songww has quit [Remote host closed the connection]
songww has joined #pypy
songww has quit [Remote host closed the connection]
songww has joined #pypy
songww has quit [Remote host closed the connection]
<Rotonen>
also one can do a lot with architecting the processing pipeline so you do not have to muck with data in-place so you can use shared memory arrays from multiprocessing
songww has joined #pypy
<njs>
I suspect there are other cases where GIL removal would be very helpful. Especially in cases where currently PyPy is prohibitive due to the JIT's memory overhead * the number of worker processes, like single-threaded web apps
asmeurer_ has quit [Quit: asmeurer_]
<mattip>
njs: I have a pull request for nditer as a context manager almost ready, I think it also addresses the c-level issues you brought up
squeaky_pl has quit [Read error: Connection reset by peer]
<njs>
mattip: did you see my mailing list post from like... 5 minutes ago?
<mattip>
yes, but it seems we are pretty much alone there, so I would prefer to have a lasting exchange on the github issue rather than in mail
<njs>
not sure what you mean about lasting? that mailing list has lasted longer than github so far :-)
<njs>
in any case I am asleep on my feet and falling over now. g'night