<kenaan>
cfbolz py3.6 f2a689373046 /pypy/objspace/std/unicodeobject.py: make performance of lower/upper/title/swapcase not terrible for strings containing Σ
<cfbolz>
finally managed to make test_unicodedata.py pass
<mattip>
yay
speeder39_ has quit [Quit: Connection closed for inactivity]
tazle_ has joined #pypy
marvin has quit [Ping timeout: 255 seconds]
tazle has quit [Ping timeout: 255 seconds]
commandoline_ has quit [Ping timeout: 246 seconds]
lritter has quit [Ping timeout: 255 seconds]
commandoline has joined #pypy
marvin has joined #pypy
dddddd has quit [Ping timeout: 255 seconds]
dddddd has joined #pypy
lritter has joined #pypy
k1nd0f has quit [Quit: Leaving]
Rhy0lite has quit [Quit: Leaving]
<Hodgestar>
@cfbolz & anyone else thinking about the Cython-IR project (or similar things): Some miscellaneous thoughts / questions --
<Hodgestar>
So I'm trying to clarify the goal for myself, which led to some questions.
<Hodgestar>
So the first question is why do people really use Cython over say just C extensions or CFFI?
<fijal>
because they are numeric people, that's one thing
<fijal>
two, it's hard to do numpy in cffi
<Hodgestar>
Conceptually its a complex interface from a Python-esque language to building C extensions with a bunch of calls back into the Python C API.
<fijal>
three, performance on cpython matters
<Hodgestar>
Which is a bit messy.
<Hodgestar>
Buy-in from the community is certainly one issue.
<Hodgestar>
And one does need C extensions of some sort to make CPython fast in cases it isn't (at least there aren't currently other options).
<Hodgestar>
I would agree that Cython is arguably less crazy than heavily calling the C API directly (at least one doesn't have to learn a billion new functions).
<Hodgestar>
I am also wondering a bit about the interface though. The learning curve for CFFI can be a bit steep (relatively) compared to being able to cut and paste a small Cython example into a Jupyter notebook.
<kenaan>
cfbolz py3.6 2eeaa559be67 /pypy/interpreter/unicodehelper.py: more places that give the name of the encoding as 'utf8'
<fijal>
cffi requires you to know C
<cfbolz>
Hodgestar: cython is much much easier than getting refcounting right in C
<Hodgestar>
cfbolz: Agreed.
<fijal>
cfbolz: right, but why over cffi
<fijal>
I don't think anyone here ponders why writing C API directly is a bad idea
<Hodgestar>
So I think the other important part is the calls back into Python land.
<cfbolz>
in cython you can create new data structures that need complicated fast algorithms, which is not really possible in cffi
<Hodgestar>
For this to ever be fast in PyPy I imagine the IR would have to explain what those calls are supposed to do in Python itself (rather than what should happen at the CPython C API level).
<Hodgestar>
cfbolz: I disagree a bit about that if the data structures are implemented purely in C?
<cfbolz>
yes, but then they need to reference back to python objects, which is a mess
<cfbolz>
not that much fun in cffi
<Hodgestar>
Agreed. And not going to be fast on PyPy or other VMs anyway.
<fijal>
it can be fast on pypy, I think
<fijal>
but not on cpython
<fijal>
(anyway, it's irrelevant whether it can or cannot be fast on pypy)
<Hodgestar>
What this sort of implies for the IR is some sort of low-level language interleaved with Python code?
<fijal>
yeah
<Hodgestar>
With the idea that maybe something like a JIT could understand both and JIT across the boundary (so like a much more crazy version of what happens with regexes).
<fijal>
I can imagine a mix of bytecodes that operate on python objects, C-level objects and wrapping/unwrapping
<fijal>
so say
<fijal>
o_1 = newint(13)
<Hodgestar>
And a CPython version would need to compile both into a C extension with no real Python? Maybe it could do some sort of eval on things it didn't understand.
<fijal>
o_2 = py_add(o_1, o_1)
<fijal>
i_1 = unwrap_int(o_2)
<fijal>
int_add(i_1, 13)
<fijal>
I think the CPython version would compile with cython or something similar
<fijal>
so really a mix of a+b and PyObject_Add(o1, o2)
<Hodgestar>
Would it make any sense to put the Python part of the IR at more of the Python bytecode level? (or was that already what you were thinking)
<Hodgestar>
cfbolz: I think I missed before that when you talk about complex data structures, you're thinking specifically of things like btrees or hashes that *link back to Python objects*.
<cfbolz>
yes
jcea has quit [Remote host closed the connection]
<Hodgestar>
So there one has the additional problem that PyPy would like to be able to know there is a reference to those & be able to move them around ideally.
<cfbolz>
yep
<Hodgestar>
Which maybe means the IR needs to contain the PyHandle idea in some form?
<Hodgestar>
Or I guess the bit that uses the IR could do that if the IR is sufficiently clear on what is happening.
jcea has joined #pypy
<cfbolz>
Hodgestar: as long as PyPy really executes the IR itself, it's all fine
<Hodgestar>
Aside: Hmm. I'm wondering if one can handle the complex data structures over Python object case nicely in CFFI by passing in handles / IDs to the C side and then looking them up on the way back.
<Hodgestar>
cfbolz: Yep, as long as the IR is really at the Python level and not the CPython C API level.
<cfbolz>
it has to be, anyway
* Hodgestar
nods.
<Hodgestar>
cfbolz: How are you feeling about the WebASM as IR suggestion?
<cfbolz>
-1
<cfbolz>
that adds yet another layer of external constraints
<mattip>
one of the things cython has going for it is "ctypedef"
<mattip>
to convert python object attributes to a struct field
<mattip>
users don't even know that array.shape becomes pure C code in cython, no python C-api involved
<nanonyme>
Regarding the earlier mentioned hypothetical CFFI backend for Cython, I guess it would there generate that stuff which is mentioned in CFFI API as "purely for performance"? :)
<Hodgestar>
nanonyme: Yes, one could add chunks of C in set_source. Not sure how much of the problem it would solve. Certainly dumping an entire C extension in there would not be that useful. One could imagine putting in code that didn't call the C API much, but that brings us back to the problem of crossing forwards and backwards between Python and C.
<Hodgestar>
nanonyme: I do wonder if there was, e.g., a way to create small functions like that with CFFI easily whether people would find it useful.
<Hodgestar>
cfbolz: Back to WebASM -- it's a big hammer, but it does at least appear to attempt to solve some similar problems and to have succeeded.
<cfbolz>
no, it doesn't solve the hard parts at all
<cfbolz>
the integration with javascript is very rudimentary
<Hodgestar>
Ah. So WebASM -> Javascript is also slow? :/
<cfbolz>
yes
<cfbolz>
anyway, we need a good bridge <lowlevel code> <-> python
<cfbolz>
not <-> js
<Hodgestar>
They did get a lot faster at the end of last year.
<cfbolz>
Hodgestar
<cfbolz>
are you sure about the js integration getting faster?
<Hodgestar>
(I'm not actually pushing the WebASM option here -- it looks yet another big and complicated piece to add to the picture -- but I'm exploring for the moment).
<cfbolz>
Hodgestar: I talked to the V8 people in july, so maybe my knowledge is out of date, and v8 specific
<cfbolz>
Hodgestar: but eg there is no GC integration between WebASM memory (which is all a big array) and js objects
<Hodgestar>
I think WebASM can ask for more memory, but yes, the model does seem to be "give me another page of memory".
<Hodgestar>
The simplified shared memory does actually look attractive, although likely that's a ship that's completely sailed for Python?
<Hodgestar>
(maybe this was similar to mattip's "make everything a buffer" suggestion?)
<Hodgestar>
I am off to sleep for now (still recovering from a nasty bug that flattened me yesterday and most of today).
forgottenone has quit [Quit: Konversation terminated!]