<cfbolz>
buhman, njs: well, pypy3 is not that speed-tuned yet. please file a bug
amaury has quit [Ping timeout: 255 seconds]
<njs>
buhman: well, it's true that trio's subprocess support will hopefully end up with simpler code paths than asyncio's, but if bytearray is sluggish than that's important and needs fixing :-)
<njs>
bytearray.extend is on the critical path of pretty much any reasonable py3 network server
<njs>
yeah, it's a bit mysterious. the bytearray thing is just a guess - can vmprof give line-by-line profiles, or does that even mean anything once the JIT gets involved?
<gabrielm>
Hi! I studied a bit the json library and I tried to reduce the memory consumption using memory_pressure. From my test it result that the speed is the same but the memory consumption can drop by 2x . This is related to this issue https://bitbucket.org/pypy/pypy/issues/1124
<gabrielm>
It is OK to create a pull request with this modifications?
<arigato>
yes
<arigato>
where did you add memory_pressure?
<gabrielm>
in file _pypjson/interp_decoder.py in loads() method after decoder = JSONDecoder(space, s)
<gabrielm>
for loads
<arigato>
uh, and by how much?
<gabrielm>
down to 50%
<arigato>
no, I mean
<gabrielm>
aaa
<arigato>
if you call add_memory_pressure() you need to give it an argument
<gabrielm>
rgc.add_memory_pressure(len(s))
<arigato>
that doesn't seem to make sense, because there is a try: finally: that will release the string deterministically
marr has joined #pypy
<gabrielm>
in finally there is a call to close method that does free some thing but perhaps there are more intermediate objects that could be freed
<gabrielm>
things*
<arigato>
no, in RPython normally objects are managed by the GC, which knows when to free things
<arigato>
it's possible that there is another place elsewhere where memory pressure would be useful, and then adding add_memory_pressure() here helps with that other place in this case
<arigato>
or, it's also possible that RPython's GC is not correctly handling this case
<gabrielm>
in my worload I do only json operations
<gabrielm>
workload*
jamescampbell has quit [Remote host closed the connection]
<gabrielm>
I'm talking about this workload:
<gabrielm>
or l in open('data.txt'):
<gabrielm>
j = json.loads(l)
<gabrielm>
for*
<arigato>
it probably doesn't change much, but try first to close the file
<LarstiQ>
gabrielm: there is no dispute that your change doesn't lower the memory
<arigato>
ah, it's read one line at a time, anyway
<gabrielm>
3.9 GB
<gabrielm>
yes
cwillu_at_work has quit [Ping timeout: 260 seconds]
<arigato>
note that if you replace "j = json.loads(l)" with "pass"
<arigato>
you get very similar results
<antocuni>
so, basically, the only effect of add_memory_pressure is to cause a major collection more often?
<arigato>
yes, but in this precise case, it's unrelated to json
<gabrielm>
hmm, so there is a problem with iterating a file line by line?
squeaky_pl has joined #pypy
<arigato>
at least for me, if I comment out the whole "for" loop, then I get 70MB instead of 80MB
<arigato>
unsure where the extra 10MB go, but I wouldn't be surprized at all if:
<arigato>
you run the very short-running program, it won't even fill the heap enough to trigger a major collection
<arigato>
you run the longer-running program, garbage accumulates until major collections occur
<antocuni>
gabrielm: no, it's one artifact of how a GC works; from the point of view of the OS the memory continuously grows until you do a major collection, which shrinks it again. If you run the major collections more often, you see lower peaks
<antocuni>
gabrielm: but the total amount of memory actually used by your program does not change
<gabrielm>
I see
<arigato>
it's unlikely that reading a file line-by-line creates garbage in the form that add_memory_pressure() helps with
<arigato>
can you give us an example where there is really a 2x drop-down?
<arigato>
not just 10%
<antocuni>
arigato: I think it's a 2x in the sense that the increment is 12.3 MiB vs 28.7 MiB
<arigato>
uh, ok
<arigato>
then no, this way to measure doesn't make sense on pypy
<arigato>
if your program happens to consume 1GB of memory before, then you might see memory grow by 500MB before any major collection occurs
<arigato>
which is normal
<arigato>
and indeed, the add_memory_pressure() that you added has the effect of triggering the major collection after 250MB of extra memory instead of 500MB
<arigato>
because it says to the GC, "there are N extra bytes of memory that you don't see", in addition to the N-bytes of the string that is seen (the largest objects in this case)
* antocuni
--> lunch
<arigato>
an equivalent way to achieve the same result is to tweak the environment variable PYPY_GC_MAJOR_COLLECT
<gabrielm>
I understand
<arigato>
see rpython/memory/gc/incminimark.py
<gabrielm>
so that issue can be solved?
<arigato>
the default of 1.82 was chosen because it's a good balance between memory and speed
cwillu_at_work has joined #pypy
<arigato>
well, my point is that I wonder: the issue says 300MB, but the small example you gave today is lower than 100MB
antocuni has quit [Ping timeout: 246 seconds]
<arigato>
can you reproduce the 300MB of catalinif?
<arigato>
but indeed, /bin/time is probably giving more precise results
<gabrielm>
thank you!
<arigato>
yes, "maxresident" seems reasonable
<gabrielm>
should I try to continue with this?
nimaje1 has joined #pypy
nimaje1 is now known as nimaje
nimaje has quit [Killed (rajaniemi.freenode.net (Nickname regained by services))]
<arigato>
yes, I think an interesting remaining question is:
<arigato>
open('data.txt').readlines()
<arigato>
not on the whole 3.7 GB file
<arigato>
but on the smaller, 34 MB example from the original report of issue #1124
<arigato>
this line seems to consume 10x more memory in PyPy than in CPython
<arigato>
(not to mention, it's very much slower on pypy)
<arigato>
(but that may be only a side-effect of it consuming so much more memory)
<gabrielm>
ok, so this means that this issue is not related to json
<arigato>
exactly, or at least not any more
<arigato>
(maybe it was related to json too in 2012)
<gabrielm>
OK, I will have a look at this and I will post an update on the issue
<arigato>
thanks!
<arigato>
it doesn't really make sense that memory usage is so much higher, given that it should only build a list of strings
Ubuntu-BR has quit [Ping timeout: 255 seconds]
Guest1244 has joined #pypy
<arigato>
any extra memory that is uses should be garbage that goes away while the list is being built, not garbage that stays alive until the list is fully built...
<arigato>
RPython is too clever and thinks, oh, a loop which adds one item to the list
<cfbolz>
yes
<cfbolz>
even though the if is super rare
<cfbolz>
arigato: so what do we do?
<cfbolz>
kill this opt if there is an if?
<arigato>
I don't know :-/
<arigato>
maybe
<arigato>
yes, if there is a path through the loop which doesn't contain an append() then we shouldn't do it
<arigato>
we need to be careful about exceptions though
<arigato>
but there are usually paths that exit the loop
<cfbolz>
:-(
<cfbolz>
arigato: maybe worth it to print a few more of the graphs where the optimization applies and check them by hand
antocuni has joined #pypy
<arigato>
yes
marr has quit [Ping timeout: 240 seconds]
gabrielm has joined #pypy
cstratak has quit [Quit: Leaving]
cstratak has joined #pypy
Tiberium has quit [Remote host closed the connection]
<arigato>
annoying stuff
<arigato>
new_shape = [s for s in cur_shape if s != 1]
<arigato>
in micronumpy
<arigato>
now crashes translation
<arigato>
and I fear in many other places it will be less efficient, unless we manage somehow to return a non-resizable list
<gabrielm>
@arigato: I saw what you discovered earlier about file.readlines
<gabrielm>
how that could be fixed?
<arigato>
yes, I meant to tell you
<arigato>
we need to somehow tweak the optimization
<arigato>
it's not clear how, yet
Rhy0lite has joined #pypy
<arigato>
it turned out to be a real issue with a detail of the RPython optimizer
<gabrielm>
one of my silly ideas was to rewrite the module for realines to fix this specific issue here but would be nice to fix this on a global scale
<arigato>
maybe using a "while" loop instead of the "for" loop would be enough to side-step the optimization
jacob22_ has quit [Quit: Konversation terminated!]
<arigato>
but yes, of course it would be nice to fix this more generally
<gabrielm>
I was thinking to count the number of '\n' and write a for loop for that number and inside to write annother for loop to iterate over the chars
<LarstiQ>
gabrielm: that sounds slow?
<arigato>
that's RPython, so it's not slow
<gabrielm>
the append would be after the inner for
<arigato>
it requires two passes over the data instead of one, but well
<arigato>
so yes, that's also a solution. again, a general fix is much better :-)
<gabrielm>
len is not passes over the all data?
<arigato>
len(string)? no, that's constant-time
<gabrielm>
oh, I see
<arigato>
(rpython strings are not C "char *")
<LarstiQ>
collect the indexes of \n in one loop, then append those slices?
<LarstiQ>
or is that what gabrielm meant
<arigato>
no, that would be the same problem
<arigato>
the list of indexes that you collect would be pre-allocated
<LarstiQ>
meh
<arigato>
that's why we should find a general fix :-)
* LarstiQ
nods
<gabrielm>
I didn't meant that
<LarstiQ>
gabrielm: I see now how the count gets around the preallocation
<gabrielm>
where is the place to start for this problem?
<arigato>
I won't say "don't touch", but it's in rpython/translator/simplify.py, a large class ListComprehensionDetector that does a lot of careful checks and that works in conjunction with the flowgraph (producing unoptimized input) and the rtyper (consuming optimized output)
<arigato>
so it's not really a place I would recommend to start looking inside rpython internals
Guest1244 has quit [Ping timeout: 245 seconds]
<gabrielm>
I will take a look for my curiosity :D
<LarstiQ>
arigato: how general is the problem, preallocating lists when doing element comparisons but storing ranges?
<arigato>
:-)
<LarstiQ>
or handling them
<LarstiQ>
right, now the micronumpy crash makes sense
* LarstiQ
talks more to himself and gets back to dayjob
<arigato>
the numpy crash makes sense: it crashes because the list new_shape = [s for s in cur_shape if s != 1] is supposed to be non-resizable
<arigato>
that's a property that is used a bit later, and also, a property that we'd like to keep
<arigato>
it crashes if I just disable completely the optimization
<antocuni>
plan_rich: I think that vmprof-0.4.8 will NOT go into the 5.8 release, thus probably the "if pypy_version_info > (5, 8, 0)" in vmprof needs to be updated accordingly
<plan_rich>
antocuni, ok
<antocuni>
thanks
<plan_rich>
pypy_version_info > (5, 8, 0) should work because the release is 5.8.0 right?
* plan_rich
checks the release branch of pypy
<antocuni>
ah indeed
yuyichao has joined #pypy
cstratak has joined #pypy
amaury has joined #pypy
amaury has quit [Ping timeout: 240 seconds]
cstratak has quit [Remote host closed the connection]
<mattip>
it doesn't have any "stand out" features, just lots of bug fixes and minor performance enhancements
cstratak has quit [Read error: Connection reset by peer]
<antocuni>
mattip: I think we should still find a couple of feature to highlight in the blog post, else it looks like there is no difference with 5.7
<antocuni>
I see in the whatsnew that we had a lot of new cpyext features: did they allow us to run some new C package?
<antocuni>
this feature is maybe also worth to mention in the blog post, IMHO: "Add native PyPy support to profile frames in vmprof"
<antocuni>
and, if we want something else to highlight, the improvements of faster-rstruct-2 could be stressed more (although I'm biased :)). Currently they are listed as "Speed up struck.pack, struck.pack_into"
<mattip>
antocuni: ok, thanks. not sure what the status is with pybind11, or other modules that previously failed to run
<mattip>
antocuni: can we put a number on that speedup somehow?
amaury has joined #pypy
<antocuni>
but actually, it did more: now struct.unpack{,_from} is super-fast also on raw buffers and bytearray (before it was only for strings)
<antocuni>
mattip: sure, let me write some quick microbench
realitix has quit [Quit: Leaving]
Guest1244 has quit [Ping timeout: 260 seconds]
<kenaan>
mattip default 68056ea67737 /pypy/doc/release-v5.8.0.rst: add more highlights to the top-level of the release notice (hopefully accurate?)
<mattip>
is the statement "We can now profile native frames in vmprof, even in JITted code" accurate?
<antocuni>
so, struct.unpack on numpypy arrays is 10x faster
<mattip>
wow
<antocuni>
on array.array and bytearray is ~2x faster
<antocuni>
and pack_into on bytearray is 6x faster (but I measured also 8x-10x if you increase the N)
<arigato>
plan_rich: no, sys.pypy_version_info > (5,8,0) is true on 5.8.0 because it's actually a longer tuple
<arigato>
that's why you should never compare versions like that, but using only < or =>
<antocuni>
mattip: I think that "Add native PyPy support to profile frames in vmprof" is more accurate, because native frames work only outside the JIT
<mattip>
antocuni: thanks, editing
<antocuni>
also, something I just thought of: the work I did for faster-rstruct-2 has been paid by Gambit (the company I do consultancy for). Would be acceptable to mention them as sponsor in the blog post?
<antocuni>
mattip: they said they are ok with putting a link, please proceed
amaury has joined #pypy
<kenaan>
arigo default 6cbfa075de61 /rpython/: Don't preallocate RPython lists if we only know an upper bound on the final size. See comments.
pilne has joined #pypy
amaury has quit [Ping timeout: 255 seconds]
vkirilichev has joined #pypy
<kenaan>
mattip default ad3681ed2701 /pypy/doc/release-v5.8.0.rst: mention Gambit and actual struct speed improvement
<mattip>
antocuni: better?
<cfbolz>
arigato: fix looks good. We maybe should still check what is affected
<kenaan>
antocuni default 3873f54fed26 /pypy/doc/release-v5.8.0.rst: fix the name of Gambit Research
<antocuni>
mattip: yes, thanks; I fixed the name of Gambit into Gambit Research
<mattip>
antocuni: +1
<kenaan>
mattip default f3d63bdc598d /pypy/doc/release-v5.8.0.rst: fix link
marky1991_2 has joined #pypy
<arigato>
I don't know if the release document mentions it, but one of the main reasons for pushing this release was that we fixed several instabilities
marky1991 has quit [Ping timeout: 240 seconds]
<arigato>
cfbolz: right, I'm producing now a list of places where it changes something...
<cfbolz>
arigato: I assume we will laugh and laugh
<arigato>
yes, I started already
<cfbolz>
:-)
<arigato>
look at (rpython.rlib.rbigint:291)frombytes
<arigato>
it makes a copy of 'digits' after the loop, but if it didn't, then this would build 'long' objects with a wildly over-allocated list in them
DragonSA has joined #pypy
DragonSA has quit [Changing host]
DragonSA has joined #pypy
<arigato>
(pypy.module._locale.interp_locale:22)_fixup_ulcase creates 3 small lists that are also over-allocated to 256
<cfbolz>
:-(
<kenaan>
mattip default 325ce0ce6e7d /pypy/doc/release-v5.8.0.rst: mention the shadowstack issue that motivated the early release, other tweaks
<arigato>
antocuni: clearly, it needs to know if it must raise an RPython-level OverflowError
<arigato>
in which case it must build the result in a much more complicated way, and *then* ignore it
<antocuni>
ouch, indeed
<arigato>
a possible strength reduction would be to detect int_mul_ovf(_, CONST) where the result is not used, and replace it with a range check...
<antocuni>
I didn't think of the OverflowError
<antocuni>
so in theory, if I use "i*100" instead of x*100 it should be able to detect that it cannot overflow
<antocuni>
but apparently we don't have this optimization because I still see the int_mul_ovs
<antocuni>
uh? Even if I do "i+42", I get an int_add_ovf
<arigato>
uh, I'd guess it's missing a detail, like it doesn't have a lower bound for i (so it could be very negative)
<arigato>
ah
<antocuni>
so it means that "for i in range(2000)" doesn't produce the intbounds? :(
<arigato>
that's strange
<antocuni>
indeed, if I add assert i<100, i*100 produces an int_mul
<arigato>
ah, no, it cannot easily do that
<cfbolz>
antocuni: the range size cannot propagate into the loop, because it's allocated before
<arigato>
yes, the loop only knows it's a range(x) for an unknown x
<antocuni>
I see
<arigato>
(but not a three-argument range, which builds a different class, or something like that; the details changed in PyPy3 too)
<antocuni>
so maybe should should add an "assert return_value < self.length()" inside W_RangeObject.next?
<arigato>
no, that's already information that the jit has got
<arigato>
but self.length() is not a constant
<antocuni>
I see
<arigato>
try to replace "i+42" with "i+1"
<arigato>
it should get rid of the int_add_ovf
<arigato>
because the jit knows that i is smaller than x, which is also a regular integer
<antocuni>
indeed, I don't see an easy solution to that, although it's a bit disappointing that "for i in range" makes worse code than "while i < N"
<arigato>
"while i < 2000" precisely, yes
<antocuni>
arigato: nope, even i+1 produces and int_add_ovf
<arigato>
uh
<arigato>
note also that "while i < 2000" won't get rid of int_mul_ovf, because i can be a large negative number
<antocuni>
arigato: then we have a bug
<antocuni>
because if I insert an assert i < 2000 I get an int_mul
<antocuni>
ah, maybe because it already knows it is >0?
<arigato>
ah no, subtle
<arigato>
do you get also int_mul in the preamble?
<antocuni>
yes
<arigato>
ok, no clue then
<antocuni>
anyway, I'm off for today
<antocuni>
thanks :)
vkirilichev has quit [Remote host closed the connection]
antocuni has quit [Ping timeout: 268 seconds]
Guest1244 has quit [Ping timeout: 255 seconds]
vkirilichev has joined #pypy
amerel_h has joined #pypy
amerel_h has left #pypy ["Leaving"]
Guest1244 has joined #pypy
nimaje has joined #pypy
jamesaxl has quit [Read error: Connection reset by peer]
jamesaxl has joined #pypy
vkirilichev has quit [Remote host closed the connection]
vkirilichev has joined #pypy
arigato has quit [Quit: Leaving]
amaury has joined #pypy
oberstet has quit [Ping timeout: 255 seconds]
jamesaxl has quit [Read error: Connection reset by peer]
ssbr has quit [Ping timeout: 260 seconds]
jamesaxl has joined #pypy
amaury has quit [Ping timeout: 255 seconds]
oberstet has joined #pypy
DragonSA has quit [Remote host closed the connection]
Rhy0lite has quit [Quit: Leaving]
vkirilichev has quit [Remote host closed the connection]
vkirilichev has joined #pypy
jamesaxl has quit [Quit: WeeChat 1.7.1]
ESphynx has joined #pypy
<ESphynx>
hi guys, so if we're doing this: from build_module1 import FFI, ffi_module1
<ESphynx>
does that automatically imply linking with that other module ?
<ESphynx>
i.e. with ffi_module2.include(ffi_module1)
Guest1244 has quit [Ping timeout: 268 seconds]
tbodt has joined #pypy
Guest1244 has joined #pypy
tbodt has quit [Max SendQ exceeded]
<ESphynx>
Hmm, now this: NotImplementedError: 'struct Size' is opaque in the ffi.include(), but no longer in the ffi doing the include (workaround: don't use ffi.include() but duplicate the declarations of everything using struct Size)
tbodt has joined #pypy
tbodt has quit [Client Quit]
vkirilichev has quit [Remote host closed the connection]
tbodt has joined #pypy
vkirilichev has joined #pypy
marky1991_2 is now known as marky1991
marky1991 has quit [Changing host]
marky1991 has joined #pypy
vkirilichev has quit [Remote host closed the connection]