jacob22__ has quit [Remote host closed the connection]
Rhy0lite has joined #pypy
<arigato>
snow, snow
adamholmberg has joined #pypy
* antocuni
wonders what's the result of "grep 'snow now' | wc" on the full log of #pypy
<antocuni>
*snow snow
xcm has quit [Killed (livingstone.freenode.net (Nickname regained by services))]
<antocuni>
arigato: I'd like to merge the gc-disable branch. The tests are passing and it has been used in production for quite a while now, it seems to work very well
xcm has joined #pypy
<antocuni>
are you interested in reviewing it, or should I just proceed?
<arigato>
antocuni: I can review it, at least check if there is no problem from my point of view
<antocuni>
yes please
<antocuni>
I am writing the documentation right now, but the TL;DR version is: gc.disable() disables major collections, then you call gc.collect_step() to run a single step manually
<vstinner>
antocuni: i proposed to modify the multiprocessing module to no longer rely on the GC to cleanup stuff. Antoine Pitrou replied that it goes against the language philosophy :-(
<Alex_Gaynor>
Which philosophy "Implicit is better than explicit"?
<vstinner>
i also proposed to avoid creating a reference cycle in xml.etree rather than fixing the GC https://bugs.python.org/issue35502#msg332051 but Serhiy only fixed treebuilder_gc_traverse(), he ignored my remark
rjarry has left #pypy ["Leaving"]
<vstinner>
my small contribution: i proposed a change to start emiting ResourceWarning when multiprocessing.Pool isn't used closed/terminated explicitly: https://github.com/python/cpython/pull/10974
<vstinner>
no idea if antoine pitrou is going to accept that one :)
<antocuni>
vstinner: well, from my POV it's obvious than Antoine Pitrou is wrong, because gvr was very explicit at saying that Python-the-language doesn't guarantee destructors to be called immediately
<antocuni>
(I can't find the reference to that though)
<vstinner>
antocuni: i was surprised how deep multiprocessing rely on the GC
<antocuni>
(and/or that py3k contains bad design choices)
<vstinner>
Alex_Gaynor: currently, "Py_TYPE(obj) = type" has no way to return an error. i would like to fix Py_TYPE(), but if I add a new function, i would prefer to be able to report a failure
<vstinner>
no idea what happens if you do Py_TYPE(None) = &PyUnicodeType ?
<Alex_Gaynor>
I assume at random later points, things crash
<vstinner>
antocuni: my socket.create_connection() fix is *one* example. it's easy to find way more of them. like the xml.etree bug
<vstinner>
antocuni: it's super comlex to detect ref cycles during development, and worse on production
<antocuni>
yes, sure. But a language which forces you to do something like that is, at least, obscure
<vstinner>
Alex_Gaynor: maybe it's ok if the crash is deterministic :-D
<Alex_Gaynor>
vstinner: it's probably exploitable
<vstinner>
antocuni: come one, it's the Python philosophy!
<vstinner>
antocuni: one obvious way to do something
<antocuni>
:)
lritter has quit [Ping timeout: 244 seconds]
jcea has joined #pypy
xcm has quit [Killed (barjavel.freenode.net (Nickname regained by services))]
<arigato>
antocuni: I'm not sure it's a good idea to use gc.disable() like you do
<arigato>
there are a few people on CPython that write their apps using gc.disable(), after making sure somehow that they don't have cycles
<arigato>
though admittedly, these apps would not run any __del__ on pypy right now, but that sounds less bad than "leaking at a fast rhytm"
<antocuni>
well, from some point of view you should not be surprised if you get a leak after you disable the GC
<antocuni>
but yes, I see your point
<arigato>
I agree, but "gc.disable" doesn't mean "disable the GC", for CPython it means something much more specific
<arigato>
I'm maybe worried about cases where some code does "gc.disable(); run some stuff; gc.enable()"
<arigato>
you'd hope that there is not too much stuff in the middle, but it might be large in some cases, then we get unexpectedly a "leak"
<antocuni>
well, this is a good example of why it's a good idea to use my semantics: you want to make sure that the GC doesn't run in a specific section of the code, and you are fine with some objects not being collected in the mean time
<arigato>
right now, gc.disable() disables calling __del__ for a very practical reason
<antocuni>
although in pypy "some objects" is a much larger set of course
<antocuni>
which is?
<arigato>
on cpython, the point of gc.disable() and gc.enable() is that you're guaranteed to not have __del__ being called "randomly" in the middle by newly broken reference cycles
<arigato>
well, at least that's (I think) the main reason for calling gc.disable()/enable() around a piece of code
<antocuni>
ah. I thought it was to avoid unexpected pauses (or at least, that's what I always found reading code around)
<arigato>
ah, I see where you're coming from
<antocuni>
also, if the section between the disable() and the enable() is short (as it should be), the memory doesn't grow too much
<antocuni>
but yes, thinking about that it's likely that there is code around which will be hit by this change
<antocuni>
basically, we should decide whether we want to consider this case like we do for e.g. leaking file descriptors (you need to fix your code), or if we want to be super-conservative and try not to break existing code, at the expense of having a worse API
<antocuni>
(I don't like too much the idea of introducing yet another obscure API like gc.disable_majors() or so)
<arigato>
I suppose I'm fine with gc.disable() actually
<arigato>
because if you really leave the gc disabled for a long while, no destructors are called, and you get strange effects too
<antocuni>
I'll write a note in whatsnew to make sure to write something appropriate when we do the next release
<arigato>
I'm trying to think about multithreaded examples where one thread does gc.disable()/long-blocking-call/gc.enable(),
<arigato>
but that's a dummy thing to do because gc.disable/enable are not reentrant anyway
<antocuni>
why would you want to do that anyway? If you have a long-blocking call, the GC doesn't run anyway
<arigato>
no, it does in multithreaded examples
<arigato>
random other python code would run during the long blocking call
<antocuni>
ok, I thought you were proposing a pattern in which you disable the GC around the call *on purpose*; but I realize that you are probably talking about a case in which a thread disables the GC and then by chance it blocks
<arigato>
but that's probably not a good thing to do because you have no sane way to do gc.disable/enable that really works in multithreaded cases
<arigato>
yes
<arigato>
just saying, if the original goal was to disable __del__s then in multithreaded programs you can't be sure another thread won't re-enable the GC
<arigato>
which makes the 'subprocess' module basically buggy
<arigato>
(there is gc.disable() in the subprocess module, for that reason)
<antocuni>
ah, does subprocess use this pattern?
* antocuni
looks
<arigato>
meh, nothing we can really do
<arigato>
ah, maybe the goal is to prevent __del__s from running in the child process, after os.fork(), and in the child there is only thread..?
<arigato>
...only one thread
<antocuni>
arigato: I'm reading subprocess.py:943
<antocuni>
to me, it looks like that they solved the bug using a workaround instead of a proper fix, which would have been to keep the file descriptor alive explicitly
<vstinner>
arigato: /* We need to call gc.disable() when we'll be calling preexec_fn */ says _posixsubprocess
<arigato>
it's hard to fix the bug properly, because the goal here is to mangle the file descriptors, precisely
<arigato>
yes, on 3.x they moved that code to C but that doesn't fix all problems because preexec_fn() is still a python callable
<antocuni>
anyway, this should still work in my branch, since the GC is disabled for a short period of time
<antocuni>
arigato: I also remember that we often had to explain people "no, gc.disable() doesn't actually disable the GC" in the past. That's one of the main reasons why I used that, because I thought it was really a missing feature
marky1991 has joined #pypy
<arigato>
yes, it's probably a good idea
<antocuni>
good :)
<arigato>
I must say I wouldn't mind answering "gc.disable() disables the gc, so you can't complain about running out of memory"
<antocuni>
yes, exactly
<antocuni>
arigato: do you want to review it further or I can merge?
<arigato>
rpython/memory/gc/incminimark.py:767
<arigato>
you changed "if gen < 0" to "if gen <= 0"
<arigato>
now the logic doesn't make sense, because there is "if gen == 1 and ...", 10 lines later
<arigato>
I think the idea was that gc.collect(0) would run self.minor_collection_with_major_progress() too,
<arigato>
and gc.collect(1) would run that plus start a major collection if none was running now
<arigato>
and the undocumented gc.collect(-1) would run only self._minor_collection() which is dangerous
<antocuni>
arigato: note that this is only for the rpython API, not the app-level one
<arigato>
ah right, didn't notice that the "generation" argument was not passed down
<antocuni>
I don't remember exactly what I though when doing that commit, but I think I assumed that <0 was a typo because nobody ever calls rgc.collect(-1)
<arigato>
right, but no
<arigato>
the logic for the "gen" doesn't make sense now
<antocuni>
why?
<arigato>
I mean, I don't know if that was useful, but also, no point in changing this overcomplicated API to a different overcomplicated one
<arigato>
because the "if gen == 1", 10 lines down
<arigato>
now it's always true
<arigato>
before, it wasn't always true
<antocuni>
right
<arigato>
you may remove the case "gen < 0" completely and argue that it is not tested and not used
<antocuni>
I am also not very happy to abuse gc.collect to control the rgc, I just followed the existing style. But indeed, maybe it makes more sense to kill support for gc.collect and use more explicit rgc.* APIs?
<antocuni>
I am not even sure that this logic is called by anyone
<arigato>
in rpython? doesn't seem like such a big deal
<arigato>
precisely
<arigato>
ah, I see that collect_step() would be some "collect(1.5)" or something
<antocuni>
so, a quicky grep seems to reveal that in rpython gc.collect(something) is called only in tests
<arigato>
ok
<antocuni>
in particular, test_newgc.py:1486 says "nursery-only collection, if possible"
<arigato>
ok I think go ahead and merge, maybe with the "if gen <= 0" reverted to "if gen < 0"
<arigato>
I'm sure test_newgc passes either way, because it's hard to write very specific gc tests that will catch future unexpected tweaks
<arigato>
that's why I tend to be wary of changes in the gc
<antocuni>
I am proposing to just killing the argument to rgc.collect()
<arigato>
well, then you're making test_newgc.py:1486 no longer test what it is supposed to, I guess
<arigato>
maybe just add the comment "don't use an argument here, it's only for tests?"
<antocuni>
now I am no longer sure what test_newgc wants to do; does it want to do a minor collection, or to do a minor_collect_with_major_progress?
<arigato>
I think the test was written because we have *incremental* minimark
<arigato>
I think the test was written before we have *incremental* minimark
<arigato>
I think the difference is not important: the test is expecting that most of the calls will be nursery-only
<antocuni>
ok, so it's basically the same; but then I don't see why you propose to revert to the old logic "if gen < 0"; it seems this is the only place where it actually makes a difference
<antocuni>
and the old logic didn't make much sense to me (was it written with gc.collect(-1) in mind?)
<arigato>
I'm suggesting to their revert or kill that logic; just don't keep it as it is in the "gc-disable" branch because it makes even less sense
<arigato>
gah
<arigato>
I'm suggesting to either revert or kill that logic; just don't keep it as it is in the "gc-disable" branch because it makes even less sense
<antocuni>
because of the "if gen == 1"?
<arigato>
yes
<antocuni>
what about killing just that?
<arigato>
ok wait, I can rewrite that function to make it explicit what the cases it's trying to implement are
<antocuni>
maybe it's just better/simpler to use == everywhere instead of <=
<arigato>
yes, and comments
<antocuni>
not sure what should the comments say that it's not already in the docstring
Zaab1t has joined #pypy
* arigato
writes
<antocuni>
thanks :)
<arigato>
pushed
<antocuni>
our hg hooks no longer work it seems :(
<antocuni>
arigato: indeed, it is much clearer now
<antocuni>
thanks
<antocuni>
arigato: are you fine with merging?
<arigato>
yes
<antocuni>
cool, thanks
xcm has quit [Remote host closed the connection]
<arigato>
I think the real motivation to keep these cases around is to explain what really occurs if you try to call this or that function
<arigato>
in that order
<cfbolz>
arigato: I have a random GC related question: can you think of a hack to tell the GC to allocate something (eg a string or so) in the old generation immediately?
xcm has joined #pypy
<arigato>
it's not enough to allocate it non-movable?
<cfbolz>
no
<cfbolz>
the reason I want it: json parsing of huge files breaks the generational hypothesis, because the whole loaded json data stays alive till after the parser is done
<antocuni>
merged & pushed
* antocuni
off, see you
<arigato>
antocuni: cool
<arigato>
see you
<cfbolz>
arigato: that means everything is allocated in the nursery and then moved again
<cfbolz>
antocuni: nice!
<arigato>
cfbolz: so, no, there's nothing like that
<cfbolz>
arigato: it would be a mess to get it, too, I suppose?
<arigato>
not necessarily
<arigato>
see "def malloc_fixed_or_varsize_nonmovable" in incminimark.py
<cfbolz>
I see
<arigato>
we need the same thing, but with "alloc_young=False"
<cfbolz>
right
<cfbolz>
but a bit of a mess to get there from high level code
antocuni has quit [Ping timeout: 250 seconds]
<arigato>
yes, you'd need to follow malloc_fixed_or_varsize_nonmovable via gctransform etc.
<cfbolz>
right
<cfbolz>
arigato: and what the high level API would look like is very unclear
<arigato>
probably then adding something in lltypesystem/rstr.py
<arigato>
yes
<cfbolz>
yes, ideally I would like it for all kinds of objects. but strings would be a start
<arigato>
I guess you'd have some obscure rlib function that allocates a rstr.STR using lltype.malloc(..., nonmovable=True, old=True)
<arigato>
and then uses hlstr() to return it as a regular rpython string
<cfbolz>
Right
<cfbolz>
Actually we do such obscurity in the json parser already
<arigato>
of course if that string comes from, say, a slice of a larger string, it's even more mess
<arigato>
or you make the small string, call the function, and forget the small string, to be collected at the next minor collection
<arigato>
that sounds like it's removing half the benefits though
<cfbolz>
arigato: we already do the slicing in interp_decoder.py manually at the low level, because we have a fast path for ASCII to utf-8