cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
<simpson>
I scrolled through the blogpost backlog but didn't see a post specifically about it. What did you want to know? FWIW I gather that the main effort really was in changing the storage to UTF-8 (from UCS-4?)
<Daetalus>
sorry, @simpson . I would like to know how does the unicode was implemented under the hood. The principle and related data structure, like CPython how to use PyASCIIObject and PyCompactUnicodeObject to implement its unicode object.
<Daetalus1>
cfbolz: I'm wrote an article to describe how CPython unicode was implemented, I also want to introduce PyPy's unicode implementation to the audience simultaneously, if possible.
<arigato>
Daetalus1: I can describe it to you in a few lines, it's conceptually simpler than in CPython
<Daetalus1>
thanks in advance! I need to go now and will come back later.
<arigato>
a unicode object contains the array of bytes in UTF-8, the length (in the Python sense), and an "index storage" that is initially null and lazily computed
<arigato>
various operations have a fast path for "is ascii", which is "length in unicodeobject == length of the array of bytes"
<arigato>
the "index storage" is a smaller array that is computed in a single pass over the whole UTF-8 array, which lets us know in constant time the byte offset of the n'th character
<arigato>
an "index storage" is a list of integers that are the byte offsets of every 4th character, and the ones inbetween are recomputed quickly. Then the list of integers is compressed in chunk of 16 integers, based on the fact that the integers are following each other closely, by storing the first integer in a chunk as usual, but the next 15 ones as a single byte relative to that first one
<arigato>
i.e. the "index storage" is of type "array of struct { ssize_t baseindex; unsigned char ofs[16]; }"
<arigato>
if I'm computing it right, that's 24 bytes per chunk, and a chunk covers 64 characters in the utf8 string
<ctismer>
LarstiQ: After hours, the build was working. Shiboken seems ok, QtCore gives a crash. Need to load the sources and debug PyPy. There are a number of omissions in PyPy and one incompatibility. Will give a detailed report when I open an issue.
<LarstiQ>
ctismer: cool, sounds like it's much closer to potentially working than some years ago
<ctismer>
LarstiQ: yes, I did not expect that, too. Funny that PyPy does not define the Py_LIMITED_API. That would have been easier to use. A real incompatibility.
<cfbolz>
ctismer: yeah, it's on our wishlist, will get there eventually
<LarstiQ>
don't suppose hpyproject.org is mature enough for PySide to use?
<cfbolz>
unlikely
<ctismer>
LarstiQ: Sure, a nice abstraction. But not needed, the API is quite ok.
<LarstiQ>
ctismer: was thinking from the PY_LIMITED_API angle, HPy takes that further
<LarstiQ>
well, maybe depends on which bit you care abut
<LarstiQ>
anyway, looking forward to the detailed report :)
<ctismer>
cfbolz: I was even wondering: With the L-API in place, would PyPy 3.7 be exchangeable with Python 3.6-3.9?
<cfbolz>
heh, good question
<cfbolz>
we should ask mattip when he's back from vacation
<ctismer>
yep
<ctismer>
cfbolz: One thing was weird: __builtins__ compatibility. Then I read that it is intentional. Why that?
<cfbolz>
ctismer: is that still a thing in cpy3?
<ctismer>
it is a known quirk that __builtins__ is a module __dict__ in Python.
<ctismer>
cfbolz: Yes :) I thought that thing would be easy to mimick, just wondered.
<cfbolz>
I don't remember the rules for cpython
<cfbolz>
ctismer: when is it a dict?
<arigato>
I think that __builtins__ is sometimes one and sometimes the other in CPython, which is a mess
<LarstiQ>
"There is no point if it's not also done with an in-depth review of all other related issues---for example, the RESTRICTED flag that CPython puts on some attributes of built-in objects."
<ctismer>
LarstiQ: Yes, I was hit by something similar because PySide wants to redirect __import__ at startup. Took a while to understand the crash.
<ctismer>
arigato: Moin! Yes, true (and bad). But PyPy never corrected quirks, to my memory?
<arigato>
I think in this case the real problem was that it's very hard to copy exactly in which case CPython does what
<arigato>
if it was easy we'd have indeed copied it
todda7 has quit [Ping timeout: 240 seconds]
<ctismer>
arigato: Ok, so this gets an official "won't fix" workaround in PySide.
<ctismer>
Also funny and quite unexpected was the location of the PyPy library in `bin/pypy3-c.dylib` instead of `lib/python3xxxx.dylib` for Python on macOS.
<ctismer>
mattip_: on the macros issue concerning L-API: yes, as you say it, sure, totally bad and we would need to do an extra build, as long as that are macros.
<ctismer>
mattip_: yes, but why is Py_Initialize not in PyPy?
<ctismer>
mattip_: Really missing: `PyDescr_NewGetSet` and `PyDescr_NewMethod` which I need.
jcea has joined #pypy
<mattip_>
ctismer: hmm. Do you know of a short test that I could use to make sure the implementation is correct?
<mattip_>
I don't see those functions used in cython or pybind11
<mattip_>
so it is a bit surprising that shiboken6 needs them
<ctismer>
mattip_: I have an extension built into PySide that enables `__signature__` on PySide functions. That feature has none of the other packages, yes. It is hard for me to disable that.
<ctismer>
but ATM I see some allocation problems. Need a debug build, soon.
<mattip_>
can you disable the __signature__ extension for "#ifdef PYPY_VERSION && PYPY_VERSION_NUM < 0x07030600"
<mattip_>
it seems you only call them in signature_extend.cpp::PySide_PatchTypes under certain conditions anyway
<mattip_>
or will diabling them break other things later on
gsnedders has quit [Quit: leaving]
<ctismer>
mattip_: Unfortunately, the signature module is quite integrated and used in a few places, in error handler and docs.
gsnedders has joined #pypy
<ctismer>
Disabling that as an option needs a different check-in, but I can try that.
<mattip_>
hmm. In any case, adding the missing functions will only be available on the next release of PyPy or in nightlies
<ctismer>
mattip_: May be I can build the missings, myself. Right now I must see what crashes.
<ctismer>
mattip_: So I will create an option to disable the signature module, to make sure that _that_ does not cause the crash.
<mattip_>
cool
<ctismer>
mattip_: Well, as you might have seen already: I made some fixes where I add missing functions in `pep386imp.cpp` which become probably wrong if I use them with PyPy.
<ctismer>
mattip_: There is `_PyLong_AsInt` missing from the PyPy interface. For CPython and the PEP, I used an own implementation that I took from CPython. This is most probably a very wrong assumption for PyPy, right? :)
<cfbolz>
ctismer: if you implement it in terms of PyLong_AsLong you're probably good?
<ctismer>
cfbolz: yes. These are left-overs when I hacked on limited API, would not do this, again :) I just try to figure out what causes the crashes, before I dive into PyPy.
<ctismer>
cfbolz: Checked it, it was implemented correctly as you said. The problems are elsewhere.
<mattip_>
in NumPy _PyLong_AsInt is implemented via PyLong_AsLongAndOverflow
<arigato>
this commit only adds the proper exception in case you misuse greenlets, to bring it in line with cpython's greenlet, AFAICT
<arigato>
"pip install gevent" fails for me on pypy2.7-v7.3.1
<arigato>
RequirementParseError: Expected ',' or end-of-list in cffi >= 1.12.2 ; platform_python_implementation == 'CPython' and sys_platform == 'win32' at ; platform_python_implementation == 'CPython' and sys_platform == 'win32'
<arigato>
running "pip install --upgrade pip" ends up with a broken pip that uses some Python 3.6 syntax
<arigato>
you may tell me "stop using pypy2.7", to which I'll reply "I don't have a convenient pypy3.x around on this machine"
<arigato>
found pypy3 somewhere else, now debugging
<arigato>
there's a "try: self.switch() except:" which, as you might expect, will swallow any exception
<arigato>
aaaaaaaa
<arigato>
except: pass
<arigato>
do I really need to debug gevent that contains these lines??
<cfbolz>
:-/
<arigato>
ah
<mattip_>
arigato: about "pip install --upgrade pip", can you do "pypy -m pip install --upgrade pip==20.3.4"
<mattip_>
which should update to the last pip to support python2.7
<arigato>
OK, thanks
<ctismer>
arigato: What is the recommended way to debug PyPy when you import an extension that segfaults? I guess I should use some own debug build?
<arigato>
I guess the first step is to use the debugger with a debug build on your extension, without needing a debug build of pypy
<ctismer>
It is a debug build. But the error message is from PyPy: `python3(29480,0x11285adc0) malloc: Incorrect checksum for freed object 0x7fe091577be8: probably modified after being freed.`
<arigato>
still, you'd likely only find yourself into obscure generic code
<cfbolz>
That's probably not even PyPy, but the malloc implementation?
<arigato>
and it's not telling a lot more than "oops somebody modified freed memory at some point in the past"
<mattip_>
ctismer: is there a Python C-API function somewhere in the stack trace?
<mattip_>
you may have to use valgrind to find out who is still accessing an object after calling Py_DECREF on it
<ctismer>
That is enough to justify to spend more time on this :)
Daetalus1 has quit [Ping timeout: 260 seconds]
<cfbolz>
ctismer: exciting
Taggnostr has quit [Remote host closed the connection]
<mattip_>
whohoo!
<mattip_>
of course the C-API will be quite slow
Taggnostr has joined #pypy
<ctismer>
mattip: I hacked the QTread function away, and a bit worked. Will find the bugs, systematically. Of couse there might be wrong macro calls, because I now don’s use the L-API. Maybe I should introduce a special mode that uses all the replacements, anyway.
<Daetalus>
argigato: Thank you! I suppose the implementation code is the `class W_UnicodeObject` in pypy/objspace/std/unicodeobject.py and `create_utf8_index_storage` in rpython/rlib/rutf8.py.
<Daetalus>
and regarding the index storage, if I understand it correctly, it computes every 4 characters but starts from 2nd character, e.g `Python`, it computes the offsets of `y` and `n`, which is 1 and 5, respectively. Am I right?