antocuni changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://botbot.me/freenode/pypy/ ) | use cffi for calling C | "PyPy: the Gradual Reduction of Magic (tm)"
<njs>
I've got an intermittent deadlock happening on pypy, that I'm now able to reproduce once in, I dunno, hundreds of tries or something. But it actually happened on my laptop, not just CI!
<njs>
I'm now trying to figure out how to debug it more, e.g. by getting it to trigger under rr
<njs>
any tips for doing this? I could run rr record hundreds of times and wait for it to happen, but I guess I'll end up wasting lots of disk space for failed recordings, and then even after I catch it I'm not sure how to debug it further.
<njs>
(I managed to get a gdb attached to the stuck process, and the backtrace was just dozens of lines of "0x00007fd8d54410ef in ?? () from /home/njs/pypy/pypy3.5-5.8-beta-linux_x86_64-portable/bin/libpypy3-c.so")
<njs>
also, here's a fun log: with today's pypy nightly, it hit the deadlock, then faulthandler (the stdlib module) kicked in to try to print a backtrace, and faulthandler caused pypy to segfault
<njs>
AFAICT
<nanonyme>
arigato, it's just a bit frustrating because apparently ctypes gets this right on Python3 whereas CFFI doesn't and I've been trying to promote CFFI for a while :)
<nanonyme>
It's basically just about keeping librarypath unicode as long as possible and never encoding on Windows but instead just calling a wide char API
<kenaan_>
arigo cffi/cffi 93e213825746 /: Trying to fix ffi.dlopen() for unicode filenames on Windows
inad922 has joined #pypy
<arigato>
njs: if we can't reproduce ourselves, then you need to get a debug build of pypy first
<arigato>
and then dig inside the RPython-generated objects, which is very involved
<arigato>
nanonyme: might be fixed by 93e213825746, if you want to give it a try
<njs>
arigato: I actually got lucky and figured this one out via print debugging, though I am still interested in any general advice :-)
<arigato>
njs: maybe we should investigate why faulthandler crashes in your case
<njs>
arigato: that does seem like a bug, but I'm not sure how to repo
<njs>
arigato: I can tell you that at the time the faulthandler timeout expired, the state of the process was almost certainly that it had two threads, both of which were blocked in queue.get
fryguybob has quit [Remote host closed the connection]
<nanonyme>
arigato, I'd also request you to try to make sure that if library is not found, e.errno == errno.ENOENT
<nanonyme>
Unrelated. I'll test that commit Real Soon
<arigato>
right, right now it's an OSError with a message but no .errno
<arigato>
but it's hard to come up with an errno
<arigato>
it doesn't make sense to always attach ENOENT
<arigato>
on non-Windows we only have info from dlerror(), which is a string
<arigato>
for this detail, cffi works like ctypes
<arigato>
ah, no, ctypes has completely different code on windows
<arigato>
so on windows, you get a WindowsError with the errno
<arigato>
it doesn't make sense to me that you get .errno on half the platforms only
<arigato>
if you really feel like writing Windows-specific code checking the errno, you could ask ffi.getwinerror(), I think
<arigato>
...ah no, it's not saved at all right now
<nanonyme>
Anyway, I guess I now need to figure out how to check that thing out locally. That's mercurial, right?
<nanonyme>
Ah, never mind, I can just download HEAD as zip
<nanonyme>
Right. It was a nice-to-have item anyway. errno.ENOENT is kinda platform-independent way to say "file not found"
<nanonyme>
But if it's not possible to reliable say whether that was the problem on all platforms, maybe there's no point doing that
<nanonyme>
Got a wheel locally built, setting up test environment now
<arigato>
ok, on default we only call madvise(MADV_FREE) for whole arenas
<arigato>
and that's when we're freeing the arena
<arigato>
so, sorry, still confused. in that sentense, "we have large fragmentation issue", but a more common case is simply that at one point all the peak memory was used, and then memory usage shrank, but in such a way that there are still a few objects alive in most pages
<arigato>
do we call this a fragmentation issue too?
<fijal>
yes, I would say?
adamholmberg has joined #pypy
<fijal>
I mean we can talk about arena fragmentation vs outside fragmentation
<fijal>
I wouldn't mind if you expand that in the document
jcea has quit [Quit: jcea]
<fijal>
options for x y and y are:
<arigato>
I'd talk about fragmentation in two senses: from the OS point of view we're using more memory than needed, but either (a) in such a way that future growth of memory will happily fill in the blanks, or (b) in a way that is not reusable
<arigato>
if there's only "(a)" fragmentation then memory usage for the OS should never be more than the peak for the program
<fijal>
right
<nanonyme>
arigato, I think I'm getting more useful errors now but still can't load the library. Error 0x7e which is apparently Windows jargon for file not found
<nanonyme>
Exact same path with ctypes works fine
<arigato>
it works for me in the example I tried
<nanonyme>
Hm
<nanonyme>
Maybe something is wrong with my test machine
<nanonyme>
Were you able to repro the problem before though?
<arigato>
yes, definitely
<arigato>
it was buggy
<arigato>
now there is a test
<arigato>
which you can try to run if you have py.test installed
jcea has joined #pypy
<nanonyme>
I wonder if I just messed something up
jcea has quit [Remote host closed the connection]
d0x0b has joined #pypy
jcea has joined #pypy
Rhy0lite has joined #pypy
<nanonyme>
As in, during build time
<nanonyme>
I just did path\to\python setup.py bdist_wheel and installed it on another machine
<arigato>
I still think the output of these two tables "Total memory consumed" is confusing
<arigato>
GC used: A (peak B)
<arigato>
GC allocated: A (peak B)
<arigato>
that's fine, but then the subcategories list rawmalloced: C
<arigato>
but in reality the rawmalloced in the "GC used" section is the current number, and the rawmalloced in the "GC allocated" section is the peak
<arigato>
and the peak might or might not be equal to the currently allocated memory
<arigato>
particularly if you use jemalloc
<nanonyme>
Any chance getting a release soon if this fix helps? (or even if not)
<fijal>
arigato: good, but you removed the reference to jemalloc
<fijal>
completely
<arigato>
did I ?
<arigato>
what?
<arigato>
I'm not used to mercurial on windows, but I'm confused
<nanonyme>
arigato, I seriously can't get it to work with my local build. I can't get my original error anymore (which was about decoding) but I now get this OSError: cannot load library 'c:\test™ (ID)\mydll.dll': error 0x7e
<arigato>
nanonyme: ok, but I can't do anything because for me it works, when testing a path that contains a greek letter
<fijal>
arigato: maybe change it to "some more modern malloc implementation, like jemalloc" with a link, but that was missing from the original one too
<nanonyme>
arigato, I think you specifically need the trademark symbol
<arigato>
fijal: I can't push right now, sorry
<nanonyme>
We chose it specifically to test our product will not break with wide char API's
<fijal>
arigato: no worries, should I just push your diff?
<arigato>
nanonyme: I can try, but what does it change?
<arigato>
fijal: go ahead
<nanonyme>
arigato, iirc completely breaks with Western variations of MBCS at least
<fijal>
arigato: FYI vrsketch.eu is on https now
<arigato>
nanonyme: it works for me
<nanonyme>
Then I must be compiling it somehow wrong
<nanonyme>
You're testing with PyPy?
<arigato>
no
<arigato>
cpython 2.7 and 3.5
<fijal>
arigato: pushed, thanks
<kenaan_>
fijal default bb02514372a2 /pypy/doc/gc_info.rst: (arigo, fijal) improve the doc
<arigato>
you can't easily upgrade cffi inside pypy without changing pypy itself
<nanonyme>
Oh, right
<nanonyme>
And now I realized I had been testing with 3.6.2 all today and when I finally tried to create a wheel with 3.5.2, I got a linker error
marky1991 has joined #pypy
<nanonyme>
Totally not my day
marky1991 has quit [Ping timeout: 240 seconds]
<nanonyme>
Ah
<nanonyme>
Now I reverted back to 3.5.2 and I *finally* managed to reproduce the original issue. Meaning I probably don't have your changes
marky1991 has joined #pypy
mattip has joined #pypy
<nanonyme>
I'll just run the tests and see what happens
[Arfrever] has joined #pypy
<mattip>
cfbolz: ff6a031587c2 broke translation on py3.5
<mattip>
which is why I backed it out from d46f72070fa8
<mattip>
it also seems that datetime C-API is still not good enough for pandas HEAD to work
<mattip>
but first I want to get win32 py3.5 to a point where pypy3 -mensurepip works, which means more winapi work
<cfbolz>
mattip: no, it's different than your change
<mattip>
right, you added the encoding, but it seems that is not enough
<mattip>
maybe another assert is needed somewhere?
<cfbolz>
mattip: I'll investigate a bit later (or back it out again)
<mattip>
thanx
<cfbolz>
But there is no reason to remove it from default, is there?
<mattip>
guess not. A mapdict always has str keys?
<cfbolz>
Yes
<cfbolz>
mattip: can you still add some tests for the datetime changes to cpyext please?
<mattip>
ok
<nanonyme>
arigato, you couldn't by chance do a scratch build just to see if I'm doing something wrong somewhere?
<nanonyme>
Funky. Looks like you can't even take a repr out of this thing on Python side without blowing up
<nanonyme>
UnicodeEncodeError: 'charmap' codec can't encode character '\u2122' in position 9: character maps to <undefined>
<nanonyme>
Never mind, it's my printing that does that. It's strange though, I thought repr should always be printable
<nanonyme>
Apparently not
<nanonyme>
Oh, this is actually a design flaw in early Python3. repr gives you unicode but you need to convert it to bytes for printing but your terminal might be using an encoding incapable of representing the unicode object
inad922 has quit [Ping timeout: 240 seconds]
<nanonyme>
I see why they changed console encoding in 3.6 to UTF-8 :D
<arigato>
it can only be downloaded one, and I tried this link myself
<nanonyme>
Schrödinger's link, obviously
<arigato>
s/one/once
<nanonyme>
Aand I still get UnicodeDecodeError: 'utf-8' codec can't decode byte 0x99 in position 7: invalid start byte with Python 3.5.2 with the trademark symbol in path with your build :(
<arigato>
sorry, no clue
<arigato>
provide a complete step-by-step example for me to reproduce
<nanonyme>
I'll do that, also minimal reproducer would be really nice because then I could test this with my home machine
inad922 has joined #pypy
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
d0x0b has quit [Quit: d0x0b]
adamholmberg has quit [Remote host closed the connection]
<arigato>
mattip: nanonyme is fighting with CPython, not PyPy
raynold has joined #pypy
adamholmberg has joined #pypy
<nanonyme>
mattip, well, specifically CFFI on CPython but that repr problem most likely happens on PyPy too on Windows
<nanonyme>
Print is stupid in the way that it tries to implicitly encode and the encoding chosen in Python3 before 3.6 is completely useless on Windows for certain codepoints
<nanonyme>
(the DLL loading issues I had with CFFI had that exact same codepoint which belongs to the category of Problems (tm))
<nanonyme>
Anyhow, I'm going to try to see if I can repro the problem using _cffi_backend directly using some simple safe DLL
<Rotonen>
nanonyme: pyinstaller ran into similarities with dll loading and 3.6, check that out if in the mood for "i'm not alone" :-)
<nanonyme>
Rotonen, this does not technically belong in the category of hard problems, I know in principle how this should work in pure-C code that doesn't touch Python at all. Unfortunately CFFI does not really fit that category :)
marky1991 has quit [Ping timeout: 255 seconds]
<nanonyme>
Aand I can't reproduce this at home
<nanonyme>
Not even with old CFFI
<nanonyme>
Tested by printing that the exact same error is gotten in both cases
<nanonyme>
Will continue tomorrow at work to see if this calling _cffi_backend.load_library directly works or not there
<nanonyme>
mattip, but still, I'm stressing that handling for non-ascii changes in 3.6 so I'm not sure how much point it makes to try to fix 3.5 semantics
jacob22__ has joined #pypy
jamesaxl has quit [Read error: Connection reset by peer]
<mattip>
nanonyme: right, hopefully we will take that into account when updating py3.5 -> py3.6, and the issue will dissapear
oberstet has quit [Ping timeout: 248 seconds]
jamesaxl has joined #pypy
<mattip>
nanonyme: are you using the same codepage, OS version on the two machines?
<nanonyme>
Both have Windows 10, my home one is newer
marky1991 has joined #pypy
jacob22__ has quit [Ping timeout: 260 seconds]
oberstet has joined #pypy
<nanonyme>
However, I'm trying to get this to work from Vista up :p I just want it first working on Win10, will then start walking backwards
<nanonyme>
However, I really should have started with this minimal test-case to begin with
<nanonyme>
I'm also highly concerned that on my home machine mbcs works with the trademark symbol
<mattip>
cpython verison/compiler?
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy
adamholmberg has quit [Ping timeout: 248 seconds]
<nanonyme>
Both were CPython 3.5, precompiled wheels of CFFI used in both cases
tbodt has joined #pypy
adamholmberg has joined #pypy
marr has quit [Ping timeout: 256 seconds]
<kenaan_>
cfbolz py3.5 1e8c1d693fcf /pypy/: gah, nonsense. I want an decode, not encode. add assert to catch this at test-time in the future
<cfbolz>
mattip: gee, that was stupid
<nanonyme>
:)
dmalcolm has quit [Ping timeout: 255 seconds]
dmalcolm has joined #pypy
<nanonyme>
mattip, anyhow, I did try to change my codepage from 850 (English UK), to 437 (English US) latter of which I use at work but it didn't seem to have any impact
energizer has joined #pypy
<nanonyme>
Bingo
<nanonyme>
I got it reproduced with out-of-line ABI mode FFI object calling dlopen directly
<nanonyme>
As in, the original bug
<nanonyme>
Now if only I could figure out some example dll, I could roll out a reproducer sample out of this and ZIP it up
adamholmberg has quit [Remote host closed the connection]
<nanonyme>
Will still try the version with the fix on this machine
<nanonyme>
Or rather, _cffi_backend.load_library worked fine, _cffi_backend.FFI().dlopen did not
adamholmberg has joined #pypy
<nanonyme>
And I just verified: it does not reproduce with inline ABI-mode, only out-of-line ABI mode
<nanonyme>
(probably because inline doesn't actually create an _cffi_backend.FFI instance but calls load_library which works)
marr has joined #pypy
<mattip>
cfbolz: :(
forgottenone has quit [Ping timeout: 268 seconds]
AndrewBC has quit [Read error: Connection reset by peer]
AndrewBC has joined #pypy
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
tormoz has quit [Read error: Connection reset by peer]
<svanheulen>
nanonyme: AttributeError: 'FFI' object has no attribute 'dlclose'
<svanheulen>
using abi mode
<nanonyme>
Which one?
<nanonyme>
Inline or out-of-line?
<svanheulen>
out
<svanheulen>
wait no
<svanheulen>
in line
<nanonyme>
Right, I'm not sure if it's available with inline
<svanheulen>
ah ok
<svanheulen>
doesn't mention that in the docs, but that makes sense
<nanonyme>
I specifically moved from inline to out-of-line a year or so back because arigato explained to me I could get dlclose there
Rhy0lite has quit [Quit: Leaving]
marky1991 has quit [Ping timeout: 240 seconds]
marky1991 has joined #pypy
<svanheulen>
nanonyme: thanks for the info, out-of-line works great :)
<nanonyme>
svanheulen, out of curiosity, which platform are you using?
<svanheulen>
linux
<nanonyme>
Ok, cool
inad922 has joined #pypy
drolando has quit [Remote host closed the connection]
drolando has joined #pypy
<nanonyme>
Hm, crappy vcvarsall.bat refuses to be found :(
<nanonyme>
Got it! There's a magic DISTUTILS_USE_SDK environment variable which allows you to call that script yourself
inad922 has quit [Quit: Leaving]
<nanonyme>
So lessee
<nanonyme>
arigato, fixed!
<nanonyme>
arigato, simply porting that code you had in the other codepath made the DLL loadable. Too tired to make a PR or anything reasonable and no idea what the tests should look like
tbodt has quit [Quit: My Mac has gone to sleep. ZZZzzz…]
<nanonyme>
Oh, heh, my attempt was maybe a bit too blunt in that error looked wrong :(
<nanonyme>
Rotonen, you know how much more helpful it would have been to tell that in the exception text when it says it can't find vcvarsall.bat
<Rotonen>
the people who do those corners are so deep into it they're blind to entry barriers
<nanonyme>
I'd assume the people who do those corners are also the people who assume no one reads docs anyway ;)
<Rotonen>
one of these years i'll need to catch up with what it is you do these days and why do you hit similar enough things than i'm hitting :P
<Rotonen>
looking at the distutils documentation, what really gets my attention is emxccompiler - when and what in was the last time that was relevant... :D
<nanonyme>
:D
<nanonyme>
Well, I actually did read the distutils code and my impression is that Microsoft broke the detection mechanism they use with Visual Studio 2017
<nanonyme>
It would be better to just advertise the workaround more widely
<Rotonen>
i have all of the 2005+ installed in parallel as there is always something which needs one of them
<nanonyme>
Microsoft dropped writing to system registry for figuring out the VC SDK, now the supported method is calling their executable which will dig up path to vcvarsall.bat
<nanonyme>
So it's now effectively tons less effort to just call it yourself if you aren't doing build automation
<Rotonen>
roughly on the topic - how do you cope with the winsdk dll ~split?
<nanonyme>
How so?
<Rotonen>
unified pre-10, split afterwards, they tried to do shared libs, but ended up with an even more horrible 'just bundle these in' scheme
tbodt has joined #pypy
<nanonyme>
I'm not sure what you mean. Are you talking about CRT?
<Rotonen>
yeah
<Rotonen>
and UWP
<nanonyme>
Well, in actual products we have in MSI a custom action that checks for existence of ucrtbase.dll and then we just install right C runtime as a merge module
<nanonyme>
Existence of ucrtbase.dll indicates the machine has the right Windows update which gives you universal C runtime
<nanonyme>
It is of course highly tempting these days to just staticly link everything :p