cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "the modern world where network packets and compiler optimizations are effectively hostile"
<Lightsword>
anyone know a good way to trace memory leaks in a tornado application running on pypy(the leak does not reproduce on cpython) I’m using PyPy 5.8.0/Python 2.7.13
marr has quit [Ping timeout: 252 seconds]
<Lightsword>
I’ve tried using vmprof but I get “Memory profiling is currently unsupported for PyPy. Running without memory statistics.”
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<kenaan>
cfbolz default 7153657512df /rpython/jit/metainterp/test/test_bridgeopt.py: generate tuples more efficiently to stop the occasional FailedHealthCheck
<kenaan>
cfbolz default 3868025e1ee1 /rpython/jit/metainterp/resoperation.py: typo (the method is unused)
<cfbolz>
That would be my main suspicion, honestly
<Lightsword>
cfbolz, hmm, what should I be looking for there? I’ve tried it with cpython and it doesn’t seem to leak there
<cfbolz>
just saying that cpython extension emulation is one of the more complex parts of pypy, and one where I would believe it if there are bugs/leaks there
<fijal>
I'm going to be working on memory stuff soon I think
<fijal>
Lightsword: there are simple ways to find out if pypy feels responsible for it's own memory for example
<fijal>
async minimal redis client for tornado ioloop designed for performances (use C hiredis parser)
<fijal>
^^^ Lightsword that likely makes everything slower on pypy, I would not use it
<Lightsword>
oh, so we should disable the hiredis parser?
<fijal>
as a general rule "CPython C extensions are slow on pypy"
<fijal>
so if there is something that's "done for performance in C" it likely slows pypy down quite a lot
<Lightsword>
fijal, oh, why is that?
<fijal>
because C extensions in CPython require well, CPython
<fijal>
so we implemented a complex emulation layer but it comes with a cost
jamesaxl has quit [Read error: Connection reset by peer]
<fijal>
it's incredibly surprising it works at all, ever
<kaizoku>
Is this still a good place for cffi support?
<Lightsword>
fijal, don’t think that handles async frameworks all that well
<kaizoku>
i'm having some issues with my typedefs, but the error message is pretty opaque
<kaizoku>
All I get is: cffi.error.CDefError: cannot parse "Elf32_Ehdr *ehdr32;"
<exarkun>
fijal: I'm trying to track down the code used to render tracebacks in sys.__excepthook__. It looks a lot like it's just the traceback module. Do you know if that's right? (I'm uncertain because I can't get a pdb set_trace in __excepthook__ to do anything)
<Lightsword>
exarkun, I think those are more for twisted right not tornado?
<fijal>
exarkun: I think it's just traceback module, yes
kolko has quit [Read error: Connection reset by peer]
<arigato>
kaizoku: can we see the whole string you pass in the cdef(), on a paste website?
<arigato>
(likely, the error is just saying that "Elf32_Ehdr" is not declared)
<mattip>
anyone around to rubber duck a cpython incompatibility wrt str.__radd__, and cpyext?
<ronan>
mattip: hi!
raynold has joined #pypy
<mattip>
ronan, got a few minutes to help me think something through?
<ronan>
yes
<mattip>
so np.string_ has both str and np.generic as base classes, np.generic has a __radd__, str does not
rokujyouhitoma has joined #pypy
cstratak has joined #pypy
<mattip>
in pure python, that would mean that np.string_ would get np.generic's __radd__, and indeed that is what I get when I print np.string_.__radd__
<mattip>
but
jamesaxl has quit [Read error: Connection reset by peer]
<mattip>
'abc' + np.string_('abc') does not call generic.__radd__ on cpython
<mattip>
because of ... a bug? a feature?
jamesaxl has joined #pypy
<mattip>
so I'm wondering what to do on PyPy, since we currently call generic.__radd__ which fails
<mattip>
choices I have thought of so far -
<mattip>
- get numpy to define a __radd__ on np.string_ to return NotImplemented
<mattip>
- get pandas to call 'abc' + str(np.string_('abc')) in about 5 places,
<mattip>
- file an issue with CPython (HAH)
rokujyouhitoma has quit [Ping timeout: 248 seconds]
<mattip>
- on PyPy, in PyType_Ready, if isinstance(w_type, basestring) and w_type.lookup('__radd__'), add a dummy implementation
<mattip>
that returns NotImplemented
<mattip>
what do you think?
cstratak has quit [Quit: Leaving]
<ronan>
hmm, isn't this an sq_concat vs nb_add issue?
<mattip>
(you can prove that generic.__radd__ is there but ignored by trying np.string_('abc').__radd__('def')
<mattip>
maybe, in python interpretation of + PyPy uses the __radd__ slot on the w_obj, the pyobj does not come into play
kolko_ has quit [Ping timeout: 240 seconds]
<mattip>
I wrote a failing test and backed it out yesterday
<ronan>
hrm, I'm failing to understand what CPython does exactly
lritter has joined #pypy
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
jamesaxl has quit [Read error: Connection reset by peer]
jamesaxl has joined #pypy
<mattip>
since cpython resolves all the capi slots at PyType_Ready, it seems to "lie", the w_dict has a __radd__ but it is not used
<ronan>
yes, the C code uses the slots directly
<mattip>
in binary_op1
<ronan>
hmm, does np.string_ have Py_TPFLAGS_CHECKTYPES?
<mattip>
yes, unfortunately, so does str
<ronan>
then I don't understand why its nb_add isn't called
marr has joined #pypy
<mattip>
exactly, nb_add is NULL on CPython
<ronan>
er, np.string_->...->nb_add is NULL and yet it has a __radd__??
<mattip>
yup, since the class dict is filled differently than the slots
<mattip>
you can see it if you stop in gdb in the test I added/removed yesterday
<mattip>
on pytest -A
<mattip>
(sorry you have to dive so deep, I guess I didn't explain it well)
<ronan>
is there a way to remove the __radd__?
tbodt has joined #pypy
<mattip>
we could add a lookup() to W_PyCTypeObject, that would override the one in W_TypeObject (objspace/std/typeobject.py) line 374
<mattip>
that would fail for __radd__ and isinstance(self, basestring)
<ronan>
wait, I still don't understand why np.string_ has a null nb_add
<mattip>
all basestring subclasses have all nb* set to NULL, see e45fdeb7813a which passes on CPYthon and (now) PyPy
<ionelmc>
why do i get this `/usr/lib/pypy/lib-python/2.7/contextlib.py:154: RefCountingWarning: 'NoneType' object has no _reuse/_drop methods` ?
<ionelmc>
it's just contextlib.closing context manager apparently ...
<ionelmc>
(pypy 5.8.0)
<mattip>
ionelmc: maybe by the time the finalizer/destructor is called, the object is already dead and gone, so it has become "None"
<ionelmc>
so basically someone else closes it
<mattip>
ronan: rather than slow down lookup(), I think it makes more sense to detect when a subclass of basestring has another base class with __radd__,
<mattip>
and in that case add a function that returns NotImplemented to the subclass
<mattip>
somewhere near the end of PyType_Ready
<ronan>
mattip: it's not just about basestring though
rokujyouhitoma has joined #pypy
<ronan>
AFAICT, the slots are copied only if the type has a tp_as_number in the first place
irclogs_io_bot has quit [Ping timeout: 240 seconds]
<mattip>
yes, there is a comment to that effect in inherit_slots
<ionelmc>
mattip: aaah ... turns out it's my fault
irclogs_io_bot has joined #pypy
<mattip>
ionelmc: glad you found it
rokujyouhitoma has quit [Ping timeout: 255 seconds]
<ronan>
ooh, with class mystr(str, np.generic): pass, 'x' + mystr('a') segfaults!
<mattip>
on cpython or pypy?
<ronan>
cpython
<mattip>
this whole mess does seem to stem from a cpython issue, but
<mattip>
what is the most productive way to fix it?
<mattip>
If we do somehow get a fix into cpython, so that slots more correctly reflect the class dict,
<mattip>
lots of downstream packages will break
<mattip>
in the case at hand, numpy will need to implement a __radd__ somewhere in the string_.mro()
<mattip>
or suddenly cpython basestring will grow a __radd__ , which will probably break someone else's code
nimaje1 has joined #pypy
nimaje is now known as Guest22539
Guest22539 has quit [Killed (tolkien.freenode.net (Nickname regained by services))]
nimaje1 is now known as nimaje
<ronan>
I doubt cpython can be fixed in any way without breaking existing code
<exarkun>
well, that hardly seems like a barrier, does it?
<ronan>
it does tend to be a barrier for changing 2.7
Rhy0lite has quit [Quit: Leaving]
<ronan>
I would say that numpy is abusing undefined behaviour in Python, but fundamentally the issue is dual inheritance from str and np.generic
<ronan>
and I doubt that can be changed either
<mattip>
it could theoretically be changed in numpy, if they are willing to accept it
marky1991_2 has joined #pypy
marky1991_2 has quit [Remote host closed the connection]
<mattip>
by adding an explicit __radd__ to np.string_ (or even np.character) in pure python AFAICT
marky1991_2 has joined #pypy
<mattip>
which would return NotImplemented
<mattip>
but should we aim for a more generic solution in cpyext?
marky1991 has quit [Ping timeout: 240 seconds]
<ronan>
the generic solution would be to emulate the whole mess, but I don't think it's possible without modifying the interpreter
gclawes has quit [Ping timeout: 246 seconds]
<mattip>
I see two places we could modfiy pypy, one by adding a lookup() in W_PyCTypeObject, or in PyType_Ready by identifying a problematic
<mattip>
lookup and adding a dummy function to the w_obj.w_dict[slot]
<ronan>
but that's not enough, CPython has an additional fallback to sq_concat after the equivalent of the __add__/__radd__ dance
<mattip>
is there code that would reach that?
<ronan>
that's the only way sq_concat ever gets called for 'a + b'
<ronan>
(well, except for the str + str special case)
<mattip>
or str + str_subclass if str_subclass.__radd__ returns NotImplemented
<mattip>
s/or/or except for/
jamesaxl has quit [Read error: Connection reset by peer]
<mattip>
sorry, I was thinking PyPy, not CPython
jamesaxl has joined #pypy
<mattip>
does CPython ever lookup the value in class.__dict__['__radd__'] or class.__dict__['__add__'] when evaluating + ?
<ronan>
I don't think so
<mattip>
so the __add__/__radd__ dance is actually only
<mattip>
if obj2 has nb_add: obj2...nb_add(obj2, obj1)
<mattip>
no, sorry
<mattip>
if obj2 is subclass of obj1 and has nb_add: obj2...nb_add(obj1, obj2)
<mattip>
elif obj1 has nb_add? obj1...nb_add(obj1, obj2)
<mattip>
PyPy is very different, we use the __dict__, which for cextensions is filled at PyType_Ready from nb_add or sq_concat, whichever is checked first and exists
<mattip>
add_operators()
<mattip>
and allows user space to override the __dict__ after type instantiation
rokujyouhitoma has joined #pypy
<ronan>
I don't think we can modify the __dict__ for C extensions
rokujyouhitoma has quit [Ping timeout: 240 seconds]
<mattip>
seems you are correct. It could be done in C by adding a nb_add to PyStringArr_Type
<mattip>
?
<mattip>
in numpy
<mattip>
or were you looking at modifying the w_obj.w_dict after calling W_PyCTypeObject.__init__ in rpython?
<ronan>
that would make more sense
<ronan>
or we could stick a '__radd__' in tp_methods
<ronan>
actually, the bogus __radd__ is arguably a bug on CPython as well
* ronan
off to buy food
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
tbodt has joined #pypy
tbodt has quit [Client Quit]
jamesaxl has quit [Quit: WeeChat 1.8]
<mattip>
ok, thanks. It seems we have settled on doing something in pypy, not numpy, correct?
gclawes has joined #pypy
rokujyouhitoma has joined #pypy
tbodt has joined #pypy
ronan has quit [Ping timeout: 255 seconds]
ronan has joined #pypy
rokujyouhitoma has quit [Ping timeout: 260 seconds]
oberstet has quit [Ping timeout: 248 seconds]
jacob22_ has quit [Ping timeout: 240 seconds]
<ronan>
mattip: no, I would rather fix it in numpy, and not change anything much in pypy
<mattip>
ok, so I will file an issue there, let's see how it goes.
<mattip>
tp_methods sounds good, although I do not think it will fix the class mystr(str, np.generic): pass, 'x' + mystr('a') segfault
<ronan>
this one works on pypy
<ionelmc>
what's the recommended way to test if vm is pypy?
<simpson>
I think `import __pypy__` is the most reliable way, but I'm not sure.