cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
<exarkun>
Can I change PyPy's idea of the "filesystem encoding" after startup?
<exarkun>
I just want it to be UTF-8
<exarkun>
CPython has no public interface for this as far as I can tell but it has a Py_FileSystemDefaultEncoding symbol that I can manipulate with cffi
<exarkun>
There are constraints I can't do anything about in the foreseeable future which make this so.
<tos9>
because if so you can seemingly mutate that attribute on it
<exarkun>
Darn. I think object space is pretty darn hard to manipulate from Python.
<mattip>
I think you are supposed to do something like LC_CTYPE= 'en_US.UTF-8'
<exarkun>
Yes, probably so
<mattip>
in the environment before running python/pypy
<exarkun>
Unfortunately you can't always count on having such an environment at process startup.
jcea has joined #pypy
<exarkun>
On CPython, changing the environment after Py_Initialize runs has no effect. Looking at the linked code, I guess I doubt changing the environment would work on PyPy either ... unless you can do it before the first call to getfilesystemencoding, maybe.
<exarkun>
but it looks like by the time any application code can run is too late
<mattip>
so write a wapper that sets it then calls python
<tos9>
exarkun: I'm assuming you're saying you're handed an already running Python process and it's been misconfigured?
<tos9>
inb4 you say this has to do with some CI provider or something
<exarkun>
eh, I have a CI job for this but it's intentional
<exarkun>
This is actually a program for users to run
<exarkun>
not much I can do if they run it without LANG set
<exarkun>
mattip: yea, sure, it'd just be waaaay easier if I could `sys.setfilesystemencoding("UTF-8")`
<simpson>
rjarry: I appreciate you posting that; it's a good lesson for other language designers: Force UTF-8 and write adapters for each OS. Completely take away the ability to choose to get filesystem encodings wrong.
<mattip>
by the time you can import sys it is too late
<rjarry>
simpson: I'm not even sure filesystem paths should be "encoded", to me they should remain in bytes
<rjarry>
in fact, nothing prevents you (on linux, AFAIK) from using non printable characters in filenames
<rjarry>
well, "most of the time" it works
<simpson>
rjarry: That's a Linux-only view. On other OSs, they *are* Unicode and encoded. I'm suggesting that language runtimes should paper over this difference completely. (As a corollary, perhaps we should stop encouraging people to have filenames full of trash bytes.)
<rjarry>
hehe
<rjarry>
that would break backward compatibility for people who rely on having '\b' in their folder names, lol
<rjarry>
btw, the same problem exists for network device names on linux
<rjarry>
any non '\0' ascii character is considered valid
<simpson>
One horrible mistake of history at a time.
jvesely has joined #pypy
wilbowma has quit [Ping timeout: 246 seconds]
wilbowma has joined #pypy
<cfbolz>
exarkun: sounds reasonable to me to add an API for that
<exarkun>
Actually considering the latest Python 3.x behavior is kinda sort "always UTF-8" a new API would only be for 2.x.
<exarkun>
So is it worthwhile?
<arigato>
exarkun: yes
RemoteFox has joined #pypy
RemoteFox has left #pypy [#pypy]
<exarkun>
Okay, cool, filing
<arigato>
I don't know why there is sys.setdefaultencoding() that you need reload(sys) to access (which is very obscure), but there is no sys.setfilesystemencoding() at all
<exarkun>
I guess there was for a couple releases of 3.x and then it was deleted
<exarkun>
"mojibake by construction" or something
<exarkun>
in general I agree it's a dangerous behavior so I can sort of understand the argument for not having it
<exarkun>
but when the detected value is wrong and broken it sure sucks not to have it
<arigato>
you could also do "if sys.getfilesystemencoding() != 'utf-8': os.environ['X'] = 'Y'; os.execv(sys.argv)...
<exarkun>
yea, but re-executing a process is fraught :/
<exarkun>
plus 2x python startup time for a short-lived cli sucks
<exarkun>
(at least it only applies to linux so there's no wonky windows process code to think about ... still)
<exarkun>
I guess doing that for only-linux/only-pypy/only-non-utf8 might limit the impact enough ...
<exarkun>
So functionality missing on PyPy + different schemes for building/packaging/installing CPython make the os.execv approach tempting... But of course there's a zillion edge cases
todda7 has joined #pypy
<exarkun>
What if the code is imported as a library, what if it is run from the interactive interpreter, what if it is being run as part of a test suite...
todda7 has quit [Ping timeout: 256 seconds]
<exarkun>
Haha. Also if the platform doesn't have the locale you pick for LANG then Python still goes with ASCII.
fling has quit [*.net *.split]
jerith has quit [*.net *.split]
kanaka has quit [*.net *.split]
LarstiQ has quit [*.net *.split]
Dejan has quit [Quit: Leaving]
<exarkun>
Ah, and then there's `python -m ...` which randomly shuffles everything around some more.
kanaka has joined #pypy
fling has joined #pypy
jerith has joined #pypy
LarstiQ has joined #pypy
fling has quit [Max SendQ exceeded]
<exarkun>
Okay, it's not clear to me that enough information about how the process was started is actually preserved any more to be able to re-execute it with a different environment.
<exarkun>
Probably time to just say "LANG!=*.UTF-8 is unsupported" :/
fling has joined #pypy
todda7 has joined #pypy
proteusguy has quit [Ping timeout: 258 seconds]
i9zO5AP has joined #pypy
Ai9zO5AP has quit [Ping timeout: 240 seconds]
proteusguy has joined #pypy
jacob22 has quit [Read error: Connection reset by peer]
jacob22 has joined #pypy
<mattip>
rain, rain
dansan has joined #pypy
dansan has quit [Excess Flood]
<arigato>
got some snow at the top of the mountain sunday (which I reached by cable car)
dansan has joined #pypy
dansan has quit [Excess Flood]
dansan has joined #pypy
jvesely has quit [Quit: jvesely]
<Hodgestar>
mattip, arigato: There was snow on the top of Table Mountain this weekend (!!).
_whitelogger has joined #pypy
jvesely has joined #pypy
todda7 has quit [Ping timeout: 265 seconds]
<tos9>
Someone dropped some ice cream in front of my apt in NYC
<tos9>
(sorry I was feeling left out of the snow discussion)
lritter has joined #pypy
Smigwell has left #pypy [#pypy]
<lazka>
exarkun, cpython will never use ascii since 3.7 I think, it will force utf-8 in that case