cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
jcea has joined #pypy
dansan has quit [Remote host closed the connection]
todda7 has quit [Ping timeout: 246 seconds]
tsaka__ has joined #pypy
tsaka__ has quit [Remote host closed the connection]
rubdos has quit [Ping timeout: 244 seconds]
rubdos has joined #pypy
dstufft has quit [*.net *.split]
avakdh has quit [*.net *.split]
krono has quit [*.net *.split]
oberstet has quit [*.net *.split]
mgedmin has quit [*.net *.split]
[Arfrever] has quit [*.net *.split]
ebarrett has quit [*.net *.split]
_whitelogger has joined #pypy
cfbolz has joined #pypy
dnshane has joined #pypy
ronan has joined #pypy
EWDurbin has joined #pypy
Guest68750 has joined #pypy
jaraco has joined #pypy
jeroud has joined #pypy
phlebas has joined #pypy
string has joined #pypy
mwhudson has joined #pypy
simpson has joined #pypy
JStoker has joined #pypy
arigo has joined #pypy
tbodt has joined #pypy
antocuni has joined #pypy
Civil has joined #pypy
gsnedders has joined #pypy
dmalcolm_ has joined #pypy
epsilonKNOT has joined #pypy
energizer has joined #pypy
trfl has joined #pypy
marvin_ has joined #pypy
gsnedders has quit [*.net *.split]
dmalcolm_ has quit [*.net *.split]
energizer has quit [*.net *.split]
marvin_ has quit [*.net *.split]
epsilonKNOT has quit [*.net *.split]
trfl has quit [*.net *.split]
ronan has quit [*.net *.split]
dnshane has quit [*.net *.split]
jaraco has quit [*.net *.split]
EWDurbin has quit [*.net *.split]
cfbolz has quit [*.net *.split]
Guest68750 has quit [*.net *.split]
string has quit [*.net *.split]
jeroud has quit [*.net *.split]
phlebas has quit [*.net *.split]
mwhudson has quit [*.net *.split]
simpson has quit [*.net *.split]
JStoker has quit [*.net *.split]
Civil has quit [*.net *.split]
antocuni has quit [*.net *.split]
arigo has quit [*.net *.split]
tbodt has quit [*.net *.split]
eregon has quit [*.net *.split]
Alex_Gaynor has quit [*.net *.split]
whitewolf has quit [*.net *.split]
lastmikoi has quit [*.net *.split]
danilonc has quit [*.net *.split]
pjenvey has quit [*.net *.split]
ulope has quit [*.net *.split]
runciter has quit [*.net *.split]
bogner has quit [*.net *.split]
_habnabit has quit [*.net *.split]
jerith has quit [*.net *.split]
glyph has quit [*.net *.split]
tazle has quit [*.net *.split]
tumbleweed has quit [*.net *.split]
raekye has quit [*.net *.split]
bbot2 has quit [*.net *.split]
marmoute has quit [*.net *.split]
the_rat has quit [*.net *.split]
Lightsword has quit [*.net *.split]
shodan45 has quit [*.net *.split]
Hodgestar has quit [*.net *.split]
oberstet has quit [*.net *.split]
mgedmin has quit [*.net *.split]
[Arfrever] has quit [*.net *.split]
ebarrett has quit [*.net *.split]
pmp-p has quit [*.net *.split]
epony has quit [*.net *.split]
Ninpo has quit [*.net *.split]
commandoline has quit [*.net *.split]
nopf_ has quit [*.net *.split]
jiffe has quit [*.net *.split]
atomizer has quit [*.net *.split]
Alex_Gaynor has joined #pypy
whitewolf has joined #pypy
eregon has joined #pypy
pulkit25 has quit [Ping timeout: 244 seconds]
oberstet has joined #pypy
mgedmin has joined #pypy
[Arfrever] has joined #pypy
ebarrett has joined #pypy
altendky has quit [Ping timeout: 260 seconds]
toad_polo has joined #pypy
idnar has quit [Ping timeout: 260 seconds]
ulope has joined #pypy
danilonc has joined #pypy
pjenvey has joined #pypy
lastmikoi has joined #pypy
jerith has joined #pypy
bogner has joined #pypy
runciter has joined #pypy
_habnabit has joined #pypy
energizer has joined #pypy
trfl has joined #pypy
dmalcolm_ has joined #pypy
marvin_ has joined #pypy
gsnedders has joined #pypy
epsilonKNOT has joined #pypy
tazle has joined #pypy
bbot2 has joined #pypy
raekye has joined #pypy
glyph has joined #pypy
Hodgestar has joined #pypy
the_rat has joined #pypy
marmoute has joined #pypy
Lightsword has joined #pypy
shodan45 has joined #pypy
tumbleweed has joined #pypy
nopf has joined #pypy
dnshane has joined #pypy
cfbolz has joined #pypy
ronan has joined #pypy
JStoker has joined #pypy
antocuni has joined #pypy
phlebas has joined #pypy
Civil has joined #pypy
jeroud has joined #pypy
simpson has joined #pypy
arigo has joined #pypy
mwhudson has joined #pypy
jaraco has joined #pypy
tbodt has joined #pypy
idnar has joined #pypy
idnar has quit [Changing host]
idnar has joined #pypy
idnar has joined #pypy
idnar has quit [Changing host]
graingert has quit [Ping timeout: 256 seconds]
idnar has quit [Changing host]
idnar has joined #pypy
Ninpo has joined #pypy
pmp-p has joined #pypy
commandoline has joined #pypy
epony has joined #pypy
jiffe has joined #pypy
atomizer has joined #pypy
EWDurbin has joined #pypy
pulkit25 has joined #pypy
Guest68750 has joined #pypy
altendky has joined #pypy
string has joined #pypy
graingert has joined #pypy
infernix has joined #pypy
the_drow[m] has joined #pypy
astrojl_matrix has joined #pypy
andi- has quit [Remote host closed the connection]
andi- has joined #pypy
lritter_ has joined #pypy
lritter has quit [Ping timeout: 244 seconds]
jcea has quit [Quit: jcea]
forgottenone has joined #pypy
<fijal> arigo: that's a bit bizzare
lritter_ has quit [Ping timeout: 240 seconds]
_whitelogger has joined #pypy
_whitelogger has joined #pypy
dddddd has joined #pypy
forgottenone has quit [Read error: Connection reset by peer]
forgottenone has joined #pypy
agronholm has joined #pypy
<agronholm> hello, could somebody shed a little light on this? why does this script produce different output on pypy than on cpython? https://bpa.st/FVWA
<agronholm> the buffer still contains two bytes on pypy at exit, nothing on cpython
BPL has joined #pypy
forgottenone has quit [Quit: Konversation terminated!]
<mattip> agronholm: what version of PyPy, what platform?
<agronholm> mattip: pypy3 7.3.1, Linux (Fedora 32)
<agronholm> I simplified the test: https://bpa.st/M7HA
<agronholm> for some reason it stops on the first encoding error, unlike its cpython counterpart
<agronholm> *decoding error
<mattip> can you try with latest HEAD?
<agronholm> I don't suppose there's a precompiled version lying around anywhere?
<agronholm> if not, I'll get to building it
forgottenone has joined #pypy
<agronholm> mattip: that gives me the same result
<agronholm> I am having trouble finding the implementation in the sources
<agronholm> maybe I could track down the problem then
<mattip> add a test to modules/_codecs/test, run it with python2 pytests.py pypy/modules/_codecs ...
<mattip> and look in pypy/ interpreter/unicodehelper
<mattip> you don’t need the decode part if the encode is different
<agronholm> thanks
<agronholm> I'm not sure where to look, and the utf_8_decode() function is a builtin so I can't step into it with a debugger either
<agronholm> are you sure it invokes functions from the unicodehelper module?
tos9_ is now known as tos9
<mattip> agronholm: this is 32-bit fedora?
<mattip> I get identical results on 64-bit linux.
<mattip> what I was trying to explain before is that the way to debug this is via a test with untranslated pypy
<mattip> the tests live in pypy/module/_codecs/test/test_codecs.py
<mattip> and are run via
<mattip> python2 pytest.py pypy/module/_codecs/test/test_codecs.py
<mattip> for instance you can run the test_decoder_state function via
<mattip> python2 pytest.py pypy/module/_codecs/test/test_codecs.py -k test_decoder_state
speeder39_ has joined #pypy
<agronholm> 64-bit Fedora
<agronholm> mattip: I get the same results also when I run the script against the latest pypy:3 docker image
<agronholm> I will try to run the test against the untranslated pypy, once I understand how
<agronholm> mattip: when I run the test (python3 pytest.py -D pypy/module/_codecs/test/test_decoder.py) it passes
<agronholm> I'm not entirely sure what the point of that is, since it loads the function from the host python, doesn't it?
<mattip> note my command line specifies python2 with no -D
<mattip> and you need to write/change a test to use your string
<mattip> that will allow you to run untranslated, and add a pdb.set_trace() and poke around (not in the test, in the RPython code inside interp_codecs or unicodehelper)
<agronholm> yes, I did add a test, I just didn't understand that running it with python3 runs it against the host and on python2 it does something entirely different
<mattip> so when I run decoder = codecs.getincrementaldecoder('utf-8')(errors='replace'); [ord(x) for x in decoder.decode('åäö'.encode('iso-8859-1'))]
<mattip> I get [65533, 65533, 65533]
<mattip> on both CPython3.7 and PyPy3.6-v7.3.1
<agronholm> I'm thoroughly confused
<agronholm> I ran that test on python 2 (why do I have to make it python 2 compatible when it's supposed to run on pypy3?)
<agronholm> it still gives me the wrong result though
<mattip> so you run with python2 because RPython is written in python2
<mattip> and the command python2 pytest.py pypy/module ... takes the test, notes that it imports _codecs, runs the test on top of the untranslated pypy while building enough of pypy to use the _codecs module
<mattip> what do you call "the wrong result"?
<mattip> are you using a locale?
<agronholm> yes
<agronholm> https://bpa.st/LQUA <- my test
<agronholm> I call it the wrong result because it differs from the cpython result
<agronholm> (and the result you get)
<agronholm> locale should have no bearing on these functions
<mattip> what is the value of the wrong result?
<agronholm> [65533]
<mattip> ahh, a single value?
<agronholm> yes
<mattip> btw, you can write this as [ord(x) for x in 'åäö'.encode('iso-8859-1').decode('utf8', 'replace')]
<agronholm> no, that gives me the correct result
<agronholm> only using codecs.utf_8_decode() (or using the incremental decoder) gives the wrong result
<mattip> ahh, ok, so we are getting somewhere
<mattip> maybe connected to sys.getdefaultencoding() or sys.getfilesystemencoding() ? For me both those are utf-8
<agronholm> same here
<agronholm> although that should also not affect anything since we're being explicit about the encodings
rubdos has quit [Ping timeout: 260 seconds]
rubdos has joined #pypy
<mattip> ok, I can reproduce. Here is the test
<mattip> for some reason it is returning 1 for the length, not 3
<agronholm> so unicodehelper.str_decode_utf8() is the first place to look for trouble
<mattip> I don't know what I was doing before, but now pypy3 translated is showing that error as well, sorry I must have been doing something wrong
<agronholm> np, glad we're on the same page now :)
<mattip> it is a problem with the handling of final, which by default is False
<mattip> to support incremental decoding
<mattip> and by problem I mean incompatibility with cpython. I am not sure I understand why CPython finalizes the buffer
<agronholm> your patch did not trigger the debugger for me
tos9 has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]
<mattip> did you run the test: python2 pytest.py pypy/module/_cpyext/test/test_codecs.py -k test_utf_8_decode
<agronholm> no, and your patch doesn't touch that file either
tos9 has joined #pypy
<agronholm> it creates pypy/module/_codecs/test/test_codecs.py
<agronholm> other than that difference, I did use that command line
<agronholm> it already fails at the "assert utf8 == b'\\xe5\\xe4\\xf6'" line
Rhy0lite has joined #pypy
<agronholm> isn't the code wrong though? u'\xc3\xa5\xc3\xa4\xc3\xb6'.encode('iso-8859-1') does give b'\xc3\xa5\xc3\xa4\xc3\xb6'
<agronholm> as it should
<mattip> copy paste mess. replace the double backslash with single
<mattip> and the utf8 string should be your original unicode string (in the line above)
<agronholm> those code points are the utf-8 representation of my original string
<agronholm> yup – translated to escape codes, it would be '\xe4\xe5\xf6'
<agronholm> ok, now it triggers the debugger
<mattip> ok, so in CPython the equivalent to unicodehelper.str_decode_utf8 is PyUnicode_DecodeUTF8Stateful
<mattip> but with one big difference: it passes in a `consumed` rather than a boolean `final`
<mattip> the only place `consumed` is used is in _codecs.utf_8_decode, where if final is False it is the length in bytes of the utf8 string
<mattip> so, in short, a bug
jcea has joined #pypy
<agronholm> ok, so the cpython behavior is wrong?
<mattip> no, PyPy's interface is not sophisticated enough to mimic CPython's final=True/False handling
<mattip> we should be using "consumed" in the unicodehelper functions, not "final"
<mattip> and consumed==-1 will be the default value (like when consumed==NULL in CPython)
<mattip> sorry. ... will be the final=True value ...
<agronholm> I'll create an issue of this and copy the IRC logs there
speeder39_ has quit [Quit: Connection closed for inactivity]
<mattip> thanks for pursuing this
forgottenone has quit [Quit: Konversation terminated!]
lritter_ has joined #pypy
mattip has quit [Ping timeout: 264 seconds]
mattip has joined #pypy
Rhy0lite has quit [Ping timeout: 240 seconds]
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
Rhy0lite has joined #pypy
_whitelogger has joined #pypy
BPL has quit [Quit: Leaving]