arigato changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | mac OS and Fedora are not Windows
themsay has joined #pypy
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy
speeder39_ has joined #pypy
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy
adamholmberg has quit [Ping timeout: 240 seconds]
adamholmberg has joined #pypy
adamholmberg has quit [Remote host closed the connection]
moei has quit [Quit: Leaving...]
themsay has quit [Ping timeout: 246 seconds]
adamholmberg has joined #pypy
adamholmberg has quit [Ping timeout: 240 seconds]
adamholmberg has joined #pypy
adamholmberg has quit [Remote host closed the connection]
themsay has joined #pypy
jcea has quit [Remote host closed the connection]
jcea has joined #pypy
Kipras_ has quit [Read error: Connection reset by peer]
speeder39_ has quit [Quit: Connection closed for inactivity]
dddddd has quit [Remote host closed the connection]
jcea1 has joined #pypy
jcea has quit [Read error: Connection reset by peer]
jcea1 is now known as jcea
jcea has quit [Quit: jcea]
tos9 has quit [Ping timeout: 264 seconds]
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
<mattip> njs: for an example of why the IR approach is problematic, see numpy PR #12317, which reimplements pure python in c
<mattip> for speed
tos9 has joined #pypy
<njs> mattip: why is that a problem for the IR approach?
<mattip> how would you rewrite that PR using the IR approach? It is a re-implemenation of python code in C-API
<mattip> and requires intimate knowledge of python structures, kwarg handling
<mattip> arg parsing error paths and attribute lookup
<njs> mattip: cython is pretty good at handling python structures, kwargs, error cleanup, etc. at C speeds?
<njs> mattip: I haven't read the patch
<mattip> not really, that is where cython marks the code as yellow (lots of C-API use)
<mattip> but it might be interesting to rewrite that PR as a pyx file ..
<njs> the color Cython uses for the code doesn't actually make it slower though does it :-)
<mattip> it indicates where cython is non-optimal
<mattip> and probably no faster than pure python
<njs> but if Cython colors any C API usage yellow, and the handwritten C code is also using the C API...
<njs> what is the handwritten code doing that Cython can't/doesn't?
jacob22__ has quit [Ping timeout: 252 seconds]
jacob22__ has joined #pypy
lritter has joined #pypy
themsay has quit [Ping timeout: 246 seconds]
Arfrever has joined #pypy
<lesshaste> njs, you can look at the C code that cython is producing to see
antocuni has joined #pypy
jamesaxl has joined #pypy
dddddd has joined #pypy
<mattip> here are the failing tests on numpy's CI
commandoline_ has quit [Ping timeout: 246 seconds]
commandoline has joined #pypy
<antocuni> mattip: am I missing something, or looking at that page it seems that numpy does NOT run tests on python2.7?
<bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/6192 [antocuni: force build, gc-disable]
Arfrever has quit [Ping timeout: 272 seconds]
<bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7151 [antocuni: force build, gc-disable]
<bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/4960 [antocuni: force build, gc-disable]
<bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/5876 [antocuni: force build, gc-disable]
<mattip> correct, numpy is dropping active support for python2.7 on master, bugfixes will be backported to 1.16
<mattip> antocuni ^^^
<mattip> until jan 2020, when even that will stop
<antocuni> poor py2.7
il_ratto has left #pypy [#pypy]
antocuni has quit [Ping timeout: 244 seconds]
Arfrever has joined #pypy
<bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7151 [antocuni: force build, gc-disable]
<bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/5876 [antocuni: force build, gc-disable]
<bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/4960 [antocuni: force build, gc-disable]
ionelmc has quit [Ping timeout: 260 seconds]
wallet42 has quit [Ping timeout: 252 seconds]
wallet42 has joined #pypy
ionelmc has joined #pypy
antocuni has joined #pypy
<bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/6192 [antocuni: force build, gc-disable]
jcea has joined #pypy
jcea has quit [Remote host closed the connection]
jacob22__ has quit [Remote host closed the connection]
Rhy0lite has joined #pypy
<arigato> snow, snow
adamholmberg has joined #pypy
* antocuni wonders what's the result of "grep 'snow now' | wc" on the full log of #pypy
<antocuni> *snow snow
xcm has quit [Killed (livingstone.freenode.net (Nickname regained by services))]
<antocuni> arigato: I'd like to merge the gc-disable branch. The tests are passing and it has been used in production for quite a while now, it seems to work very well
xcm has joined #pypy
<antocuni> are you interested in reviewing it, or should I just proceed?
<arigato> antocuni: I can review it, at least check if there is no problem from my point of view
<antocuni> yes please
<antocuni> I am writing the documentation right now, but the TL;DR version is: gc.disable() disables major collections, then you call gc.collect_step() to run a single step manually
<vstinner> antocuni: i proposed to modify the multiprocessing module to no longer rely on the GC to cleanup stuff. Antoine Pitrou replied that it goes against the language philosophy :-(
<vstinner> https://mail.python.org/pipermail/python-dev/2018-December/155947.html "anti-pattern" "Python _is_ a garbage-collected language"
<Alex_Gaynor> Which philosophy "Implicit is better than explicit"?
<vstinner> i also proposed to avoid creating a reference cycle in xml.etree rather than fixing the GC https://bugs.python.org/issue35502#msg332051 but Serhiy only fixed treebuilder_gc_traverse(), he ignored my remark
rjarry has left #pypy ["Leaving"]
<vstinner> my small contribution: i proposed a change to start emiting ResourceWarning when multiprocessing.Pool isn't used closed/terminated explicitly: https://github.com/python/cpython/pull/10974
<vstinner> no idea if antoine pitrou is going to accept that one :)
<antocuni> vstinner: well, from my POV it's obvious than Antoine Pitrou is wrong, because gvr was very explicit at saying that Python-the-language doesn't guarantee destructors to be called immediately
<antocuni> (I can't find the reference to that though)
<vstinner> antocuni: i was surprised how deep multiprocessing rely on the GC
<antocuni> THAT is the antipattern :)
<vstinner> and I'm concerned by Pablo's proposed fix which adds a new reference cycle :-(
<vstinner> antocuni: i guess that on this channel, it's easy to convince you
<vstinner> antocuni: in the CPython world, things are different :)
<antocuni> CPython devs needs to decide whether they want to evolve the language in a way which is possible to have alternative implementations or not
<vstinner> antocuni: i'm pushing hard towards that, but it's hard to change people :)
<antocuni> all the rest is pure rhetoric
<vstinner> antocuni: i'm not sure that the problem is technical :)
<antocuni> vstinner: I know and I am glad you are doing that
<vstinner> the root of my https://pythoncapi.readthedocs.io/ project is "hey, Python can run on something different than CPython!"
<antocuni> it really depends on whether people care about Python or about CPython
<vstinner> but people reply "your changes are going against CPython"
<vstinner> antocuni: i do care a lot of CPython, it's my little baby :)
<vstinner> antocuni: i love an hate CPython
<vstinner> and*
<antocuni> CPython does a very good job; but Python-the-ecosystem can only benefits from having multiple implementations
<vstinner> i hate CPython because people rely on every single subtle behavior
<vstinner> people love to see their file/socket magically closed
<antocuni> then ask them why we introduced "with"
<vstinner> ... and recently i found a 5 years old reference cycle bug in socket.create_connection() which kept all variables alive for hours
<vstinner> because Python 3 exceptions magically keeps all frames alive forever
<vstinner> i hate this stupid "feature"
<vstinner> because of that, asyncio code is full of "finally: self = None"
<antocuni> let's stick to py2.7 :)
nunatak has joined #pypy
<vstinner> antocuni: ok, i'm now depressed
<antocuni> vstinner: sorry :)
<vstinner> antocuni: my new big project is to add Py_SetType() to replace Py_TYPE(obj) = type :-)
<Alex_Gaynor> vstinner: will it be a safe API? e.g. `Py_SetType(<int>, PyListType)` will fail?
<antocuni> vstinner: btw, https://github.com/python/cpython/commit/acb9fa79fa6453c2bbe3ccfc9cad2837feb90093 is a VERY good example to convince people that __del__s should just die and not being used for anything
<antocuni> (and/or that py3k contains bad design choices)
<vstinner> Alex_Gaynor: currently, "Py_TYPE(obj) = type" has no way to return an error. i would like to fix Py_TYPE(), but if I add a new function, i would prefer to be able to report a failure
<vstinner> no idea what happens if you do Py_TYPE(None) = &PyUnicodeType ?
<Alex_Gaynor> I assume at random later points, things crash
<vstinner> antocuni: my socket.create_connection() fix is *one* example. it's easy to find way more of them. like the xml.etree bug
<vstinner> antocuni: it's super comlex to detect ref cycles during development, and worse on production
<antocuni> yes, sure. But a language which forces you to do something like that is, at least, obscure
<vstinner> Alex_Gaynor: maybe it's ok if the crash is deterministic :-D
<Alex_Gaynor> vstinner: it's probably exploitable
<vstinner> antocuni: come one, it's the Python philosophy!
<vstinner> antocuni: one obvious way to do something
<antocuni> :)
lritter has quit [Ping timeout: 244 seconds]
jcea has joined #pypy
xcm has quit [Killed (barjavel.freenode.net (Nickname regained by services))]
xcm has joined #pypy
nunatak has quit [Quit: Leaving]
jacob22__ has joined #pypy
<arigato> antocuni: I'm not sure it's a good idea to use gc.disable() like you do
<arigato> there are a few people on CPython that write their apps using gc.disable(), after making sure somehow that they don't have cycles
<arigato> though admittedly, these apps would not run any __del__ on pypy right now, but that sounds less bad than "leaking at a fast rhytm"
<antocuni> well, from some point of view you should not be surprised if you get a leak after you disable the GC
<antocuni> but yes, I see your point
<arigato> I agree, but "gc.disable" doesn't mean "disable the GC", for CPython it means something much more specific
<arigato> I'm maybe worried about cases where some code does "gc.disable(); run some stuff; gc.enable()"
<arigato> you'd hope that there is not too much stuff in the middle, but it might be large in some cases, then we get unexpectedly a "leak"
<antocuni> well, this is a good example of why it's a good idea to use my semantics: you want to make sure that the GC doesn't run in a specific section of the code, and you are fine with some objects not being collected in the mean time
<arigato> right now, gc.disable() disables calling __del__ for a very practical reason
<antocuni> although in pypy "some objects" is a much larger set of course
<antocuni> which is?
<arigato> on cpython, the point of gc.disable() and gc.enable() is that you're guaranteed to not have __del__ being called "randomly" in the middle by newly broken reference cycles
<arigato> well, at least that's (I think) the main reason for calling gc.disable()/enable() around a piece of code
<antocuni> ah. I thought it was to avoid unexpected pauses (or at least, that's what I always found reading code around)
<arigato> ah, I see where you're coming from
<antocuni> also, if the section between the disable() and the enable() is short (as it should be), the memory doesn't grow too much
<antocuni> but yes, thinking about that it's likely that there is code around which will be hit by this change
<antocuni> basically, we should decide whether we want to consider this case like we do for e.g. leaking file descriptors (you need to fix your code), or if we want to be super-conservative and try not to break existing code, at the expense of having a worse API
<antocuni> (I don't like too much the idea of introducing yet another obscure API like gc.disable_majors() or so)
<arigato> I suppose I'm fine with gc.disable() actually
<arigato> because if you really leave the gc disabled for a long while, no destructors are called, and you get strange effects too
<antocuni> I'll write a note in whatsnew to make sure to write something appropriate when we do the next release
<arigato> I'm trying to think about multithreaded examples where one thread does gc.disable()/long-blocking-call/gc.enable(),
<arigato> but that's a dummy thing to do because gc.disable/enable are not reentrant anyway
<antocuni> why would you want to do that anyway? If you have a long-blocking call, the GC doesn't run anyway
<arigato> no, it does in multithreaded examples
<arigato> random other python code would run during the long blocking call
<antocuni> ok, I thought you were proposing a pattern in which you disable the GC around the call *on purpose*; but I realize that you are probably talking about a case in which a thread disables the GC and then by chance it blocks
<arigato> but that's probably not a good thing to do because you have no sane way to do gc.disable/enable that really works in multithreaded cases
<arigato> yes
<arigato> just saying, if the original goal was to disable __del__s then in multithreaded programs you can't be sure another thread won't re-enable the GC
<arigato> which makes the 'subprocess' module basically buggy
<arigato> (there is gc.disable() in the subprocess module, for that reason)
<antocuni> ah, does subprocess use this pattern?
* antocuni looks
<arigato> meh, nothing we can really do
<arigato> ah, maybe the goal is to prevent __del__s from running in the child process, after os.fork(), and in the child there is only thread..?
<arigato> ...only one thread
<antocuni> arigato: I'm reading subprocess.py:943
<antocuni> to me, it looks like that they solved the bug using a workaround instead of a proper fix, which would have been to keep the file descriptor alive explicitly
<vstinner> arigato: /* We need to call gc.disable() when we'll be calling preexec_fn */ says _posixsubprocess
<arigato> it's hard to fix the bug properly, because the goal here is to mangle the file descriptors, precisely
<arigato> yes, on 3.x they moved that code to C but that doesn't fix all problems because preexec_fn() is still a python callable
<antocuni> anyway, this should still work in my branch, since the GC is disabled for a short period of time
<antocuni> arigato: I also remember that we often had to explain people "no, gc.disable() doesn't actually disable the GC" in the past. That's one of the main reasons why I used that, because I thought it was really a missing feature
marky1991 has joined #pypy
<arigato> yes, it's probably a good idea
<antocuni> good :)
<arigato> I must say I wouldn't mind answering "gc.disable() disables the gc, so you can't complain about running out of memory"
<antocuni> yes, exactly
<antocuni> arigato: do you want to review it further or I can merge?
<arigato> rpython/memory/gc/incminimark.py:767
<arigato> you changed "if gen < 0" to "if gen <= 0"
<arigato> now the logic doesn't make sense, because there is "if gen == 1 and ...", 10 lines later
<arigato> I think the idea was that gc.collect(0) would run self.minor_collection_with_major_progress() too,
<arigato> and gc.collect(1) would run that plus start a major collection if none was running now
<arigato> and the undocumented gc.collect(-1) would run only self._minor_collection() which is dangerous
<antocuni> arigato: note that this is only for the rpython API, not the app-level one
<arigato> ah right, didn't notice that the "generation" argument was not passed down
<antocuni> I don't remember exactly what I though when doing that commit, but I think I assumed that <0 was a typo because nobody ever calls rgc.collect(-1)
<arigato> right, but no
<arigato> the logic for the "gen" doesn't make sense now
<antocuni> why?
<arigato> I mean, I don't know if that was useful, but also, no point in changing this overcomplicated API to a different overcomplicated one
<arigato> because the "if gen == 1", 10 lines down
<arigato> now it's always true
<arigato> before, it wasn't always true
<antocuni> right
<arigato> you may remove the case "gen < 0" completely and argue that it is not tested and not used
<antocuni> I am also not very happy to abuse gc.collect to control the rgc, I just followed the existing style. But indeed, maybe it makes more sense to kill support for gc.collect and use more explicit rgc.* APIs?
<antocuni> I am not even sure that this logic is called by anyone
<arigato> in rpython? doesn't seem like such a big deal
<arigato> precisely
<arigato> ah, I see that collect_step() would be some "collect(1.5)" or something
<antocuni> so, a quicky grep seems to reveal that in rpython gc.collect(something) is called only in tests
<arigato> ok
<antocuni> in particular, test_newgc.py:1486 says "nursery-only collection, if possible"
<arigato> ok I think go ahead and merge, maybe with the "if gen <= 0" reverted to "if gen < 0"
<arigato> I'm sure test_newgc passes either way, because it's hard to write very specific gc tests that will catch future unexpected tweaks
<arigato> that's why I tend to be wary of changes in the gc
<antocuni> I am proposing to just killing the argument to rgc.collect()
<arigato> well, then you're making test_newgc.py:1486 no longer test what it is supposed to, I guess
<arigato> maybe just add the comment "don't use an argument here, it's only for tests?"
<antocuni> now I am no longer sure what test_newgc wants to do; does it want to do a minor collection, or to do a minor_collect_with_major_progress?
<arigato> I think the test was written because we have *incremental* minimark
<arigato> I think the test was written before we have *incremental* minimark
<arigato> I think the difference is not important: the test is expecting that most of the calls will be nursery-only
<antocuni> ok, so it's basically the same; but then I don't see why you propose to revert to the old logic "if gen < 0"; it seems this is the only place where it actually makes a difference
<antocuni> and the old logic didn't make much sense to me (was it written with gc.collect(-1) in mind?)
<arigato> I'm suggesting to their revert or kill that logic; just don't keep it as it is in the "gc-disable" branch because it makes even less sense
<arigato> gah
<arigato> I'm suggesting to either revert or kill that logic; just don't keep it as it is in the "gc-disable" branch because it makes even less sense
<antocuni> because of the "if gen == 1"?
<arigato> yes
<antocuni> what about killing just that?
<arigato> ok wait, I can rewrite that function to make it explicit what the cases it's trying to implement are
<antocuni> maybe it's just better/simpler to use == everywhere instead of <=
<arigato> yes, and comments
<antocuni> not sure what should the comments say that it's not already in the docstring
Zaab1t has joined #pypy
* arigato writes
<antocuni> thanks :)
<arigato> pushed
<antocuni> our hg hooks no longer work it seems :(
<antocuni> arigato: indeed, it is much clearer now
<antocuni> thanks
<antocuni> arigato: are you fine with merging?
<arigato> yes
<antocuni> cool, thanks
xcm has quit [Remote host closed the connection]
<arigato> I think the real motivation to keep these cases around is to explain what really occurs if you try to call this or that function
<arigato> in that order
<cfbolz> arigato: I have a random GC related question: can you think of a hack to tell the GC to allocate something (eg a string or so) in the old generation immediately?
xcm has joined #pypy
<arigato> it's not enough to allocate it non-movable?
<cfbolz> no
<cfbolz> the reason I want it: json parsing of huge files breaks the generational hypothesis, because the whole loaded json data stays alive till after the parser is done
<antocuni> merged & pushed
* antocuni off, see you
<arigato> antocuni: cool
<arigato> see you
<cfbolz> arigato: that means everything is allocated in the nursery and then moved again
<cfbolz> antocuni: nice!
<arigato> cfbolz: so, no, there's nothing like that
<cfbolz> arigato: it would be a mess to get it, too, I suppose?
<arigato> not necessarily
<arigato> see "def malloc_fixed_or_varsize_nonmovable" in incminimark.py
<cfbolz> I see
<arigato> we need the same thing, but with "alloc_young=False"
<cfbolz> right
<cfbolz> but a bit of a mess to get there from high level code
antocuni has quit [Ping timeout: 250 seconds]
<arigato> yes, you'd need to follow malloc_fixed_or_varsize_nonmovable via gctransform etc.
<cfbolz> right
<cfbolz> arigato: and what the high level API would look like is very unclear
<arigato> probably then adding something in lltypesystem/rstr.py
<arigato> yes
<cfbolz> yes, ideally I would like it for all kinds of objects. but strings would be a start
<arigato> I guess you'd have some obscure rlib function that allocates a rstr.STR using lltype.malloc(..., nonmovable=True, old=True)
<arigato> and then uses hlstr() to return it as a regular rpython string
<cfbolz> Right
<cfbolz> Actually we do such obscurity in the json parser already
<arigato> of course if that string comes from, say, a slice of a larger string, it's even more mess
<arigato> or you make the small string, call the function, and forget the small string, to be collected at the next minor collection
<arigato> that sounds like it's removing half the benefits though
<cfbolz> arigato: we already do the slicing in interp_decoder.py manually at the low level, because we have a fast path for ASCII to utf-8
marky1991 has quit [Ping timeout: 272 seconds]
<cfbolz> arigato: strslice2unicode_latin1
<cfbolz> so we could integrate it there
<arigato> "ah"
<cfbolz> :-)
<cfbolz> it's all a bit terrible
<arigato> strslice2unicode_latin1_careful_Im_returning_an_OLD_string
<cfbolz> yes
<cfbolz> arigato: but eg in one example json file I have, it takes 11s to parse, of which 7s are in the GC(!)
<arigato> meh
<cfbolz> yes :-(
<arigato> do the numbers change a lot if you change the nursery size?
<cfbolz> arigato: no, why would they? the problem is that basically almost everything that is allocated is copied
<cfbolz> arigato: (boehm on the same example takes 6s in total)
<cfbolz> arigato: the resulting datastructure is a couple of GB, so I can't make the nursery big enough ;-)
<arigato> wondering what the cache effects are
<cfbolz> arigato: PYPY_GC_NURSERY is in kb?
<cfbolz> arigato: seems I can get a +-10% effect by playing with the values, yes
<arigato> no, PYPY_GC_NURSERY is in bytes unless you give a number followed by K/KB/M/MB
<arigato> but ok, "only" 10% is kind of what I was expecting
jacob22__ has quit [Ping timeout: 245 seconds]
<mjacob> xorAxAx: invited where?
jacob22__ has joined #pypy
themsay has joined #pypy
themsay has quit [Ping timeout: 250 seconds]
infinite has quit [Ping timeout: 268 seconds]
infinite has joined #pypy
holdenjd has joined #pypy
AMIGrAve has joined #pypy
<AMIGrAve> someone knows a postgresql driver that could be used with asyncio ?
AMIGrAve has quit [Quit: leaving]
<xorAxAx> mjacob, at my place, for staying during the sprint
<xorAxAx> i have a guest room and a separate guest bed in the living room
Rhy0lite has quit [Quit: Leaving]
themsay has joined #pypy
Zaab1t has quit [Quit: bye bye friends]
xcm has quit [Remote host closed the connection]
xcm has joined #pypy
xcm has quit [Read error: Connection timed out]
xcm has joined #pypy
Arfrever has quit [Quit: 御出で]
inhahe has quit [Ping timeout: 250 seconds]
inhahe has joined #pypy
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy
adamholmberg has quit [Ping timeout: 246 seconds]
jamesaxl has quit [Quit: WeeChat 2.2]
adamholmberg has joined #pypy
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy