cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
ccamel has joined #pypy
camelCaser has quit [Ping timeout: 240 seconds]
shunning has quit [Remote host closed the connection]
gutworth has quit [Changing host]
gutworth has joined #pypy
xcm has quit [Ping timeout: 240 seconds]
jcea has quit [Remote host closed the connection]
jcea has joined #pypy
xcm has joined #pypy
jcea has quit [Quit: jcea]
andi- has quit [Remote host closed the connection]
andi- has joined #pypy
dddddd has quit [Remote host closed the connection]
jvesely has quit [Quit: jvesely]
forgottenone has joined #pypy
_whitelogger has joined #pypy
<arigato> shunning (for the logs): it's typically a trace limit issue
<arigato> but it's more indirect: the limit is not the length of produced machine code, but the number of intermediate-language instructions before optimization takes place
<arigato> intermediate-language instructions are called ResOps
<arigato> so every call seems to make a lot of ResOps
<arigato> moreover, the JIT needs to decide early on (before any optimization) if each call is going to be inlined or not
<arigato> so if there are too many calls, trying to inline them all create too many ResOps, and the JIT doesn't realize at this point how they are cheap or completely optimizable ResOps
<arigato> as a result, the JIT gives up inlining, and instead writes a sequence of residual calls (calls that will end up in the assembly as real calls)
<arigato> every residual call appear much cheaper to the JIT before optimization, because it's essentially just a few preparation ResOps followed by one ResOp doing the call
<arigato> however, that ResOp doing the call will not be optimized later, and it will end up in the assembly---as quite some number of actual assembler instructions
<arigato> but the early JIT-before-optimizations only sees it as "one"
<arigato> maybe we could tweak our counting heuristic
<arigato> e.g. we could count as "0" the "getfields", because they are cheap and can be removed by many places including the backend if the result is not used;
<arigato> and on the other hand we could count as more than "1" a few instructions like the residual calls
<arigato> it won't really fix the problem you mention, which is that a function calling too many other functions (even empty) will end up not inlining them all, but it would help
<arigato> in your case, if you're anyway playing tricks with generating code, you could as well directly generate the final function ab() with actual operations instead of calls to many small functions
<arigato> a function call, relatively speaking, is very expensive on CPython and on PyPy-before-JIT, and also on the JIT compiler itself (and plays bad with some heuristics like the trace limit); it's only after JITing that it becomes cheap
tsaka__ has joined #pypy
Ai9zO5AP has quit [Ping timeout: 240 seconds]
Ai9zO5AP has joined #pypy
<cfbolz> arigato: they are doing hardware simulation, btw
<cfbolz> It's the same group that did pydgin
<arigato> cool!
<arigato> bah I hate Ruby
<arigato> d = {"a": 5}
<arigato> {:a=>5}
<arigato> d["a"] == nil
<arigato> true
<arigato> d[:a]
<arigato> 5
<cfbolz> 🤔
<cfbolz> Eh, issue 3105
<mattip> ?
<cfbolz> Let's interpret marshal bytes without looking at the implementation!
tsaka__ has quit [Ping timeout: 264 seconds]
<arigato> uh
<cfbolz> The tool is cool though
<arigato> any idea why he can't use marshal.load() or loads()?
<arigato> want me to answer?
<cfbolz> arigato: because the base python might be different
<arigato> right
<cfbolz> It's really a project that can understand random pyc files
<arigato> I don't understand the bug report though
<arigato> is it just "the pyc files happen to be different in some detail but they load fine anyway"?
<arigato> or is he, like the title says, seeking help in understanding a detail, and it's not a bug report at all?
* phlebas wonders why arigato doesn't like the ruby code
<phlebas> seems perfectly clear to me, symbols and strings are different things?
<arigato> phlebas: yes, but the syntax {"a": 5} seems to use a string key, whereas it's actually a symbol
<phlebas> but that is a syntax for symbols
<phlebas> ah, you're confused about dict syntax, because python :)
<arigato> so my complain is that "a" is a syntax for symbols when used inside {} and strings when used inside []
<phlebas> no, not true
<phlebas> {"a" => 1} is syntax for ruby dictionaries
<phlebas> what you're using is syntactic sugar for symbol keys
<phlebas> the quotes around `a` are useless in your code, but allowed
<arigato> ah! thanks. but then m complain is that "a" is syntax for symbols when used in this syntactic sugar
<arigato> i.e. why are quotes allowed here
<phlebas> but you could have a key {"a b c": 1}
<phlebas> it's to deal with the spaces
<arigato> ah, and then you read it via d[:"a b c"] ?
<phlebas> yes
<arigato> OK, thanks
<arigato> still obscure but less so than before, thanks :-)
<phlebas> yes, all the syntactic sugar they put into ruby ends up being just confusing
<cfbolz> phlebas: hah
tsaka__ has joined #pypy
<cfbolz> arigato: yes, I must say that 'here I am in the bytes' is a very confusing perspective
antocuni has joined #pypy
lesshaste has joined #pypy
lesshaste is now known as Guest90013
<mattip> arigato: the latest mail in the SOABI saga suggests using pypy36-pp#-x86_64-linux-gnu.so
<mattip> where # is a C-API-specific number
<mattip> I am not sure how that would play with wheel naming, which is supposed to go {package name}-{python tag}-{abi tag}-{platform tag}.{extension}
Guest90013 has quit [Ping timeout: 245 seconds]
Guest90013 has joined #pypy
<mattip> where the abi tag on pypy is taken from sys.pypy_version_info[:2]
<arigato> we could argue that it's bogus
<arigato> it would certainly be annoying if we need to call our pypy versions 7.2.1, 7.2.2, 7.2.3, 7.2.4 forever until we change the abi of cpyext
<arigato> on the other hand, it's wrong in the right direction, so any new version of pypy will make a new abi tag and too bad
<arigato> I don't really care to be honest, but people uploading wheels to pypi will end up being annoyed in one or two years
<mattip> searching for pypy packages on pypi turns up ~6,000 packages
<mattip> not sure how to find the ones using c-extension modules and shipping as wheels
<arigato> I'm fine if we still ignore all this, and keep the names as they are now. in a way it's safer if the abi version changes too often, better than not changing the abi version and forgetting about some rarely-used case
<arigato> in any case we'd like to use a different scheme for cffi modules
<arigato> which should be much more stable
<mattip> +1
antocuni has quit [Ping timeout: 276 seconds]
lritter has joined #pypy
<kenaan> mattip default 037744e9abc7 /pypy/: move test that used to use _ssl to _demo
<kenaan> mattip py3.6 50cb48b9e6e5 /pypy/: merge default into branch
<mattip> it would be nice if someone could check that 037744e9abc7 is testing what it says it wants to test
shunning has joined #pypy
<shunning> arigato: thanks for the reply. I think there must be some changes in pypy from python2 to python3. In python2.7 we were able to inline everything. For example, the ab() can contain 500 functions. We compile the pypy with bigmodel and remove the trace limit assertion. All of these 500 functions get inlined without the function call overhead.
<shunning> Right now I doubt there is some other code that uses the original limit like 16k or something
<shunning> or 10k in this case
<shunning> If the trace is too long then I will see "trace_abort" count going up, but in this f17() case, it seems like the JIT engine under the hood decides that it gives up inlining everything which is weird.
<shunning> It should be able to do the same thing as without f17
<shunning> the actual performance on our workload was 10x-20x faster on python2 than python3
<shunning> * is
shunning has quit [Ping timeout: 260 seconds]
fryguybob has joined #pypy
fryguybob has quit [Client Quit]
fryguybob has joined #pypy
antocuni has joined #pypy
forgottenone has quit [Quit: Konversation terminated!]
jcea has joined #pypy
jcea has quit [Remote host closed the connection]
Dejan has joined #pypy
shunning has joined #pypy
jcea has joined #pypy
dddddd has joined #pypy
<kenaan> mattip py3.6 c4659943384c /pypy/module/imp/: use an extension name like pypy36-pp73 not pypy3-73 to prevent conflicts between py3.6, py.37
<kenaan> mattip py3.6 117749e404fd /rpython/translator/platform/darwin.py: port some of the changes from macports pypy https://github.com/macports/macports-ports/blob/master/lang/pypy/files/...
<Dejan> float->str conversion is superexpensive
shunning has quit [Remote host closed the connection]
<Dejan> it takes 90s to write down 100M float numbers to the file (with buffering)
<Alex_Gaynor> There's a reason binary encodings are popular :-)
shunning has joined #pypy
<shunning> hey guys, first of all sorry for creating confusion for python2 vs python3
<shunning> I downloaded pypy2-v5.7.1 binary and the problem is still there
<shunning> I simplify the code a little bit more. Now I have two code snippet that seems very similar, but they result in different traces
<shunning> If I wrap all of the code into a function to create a local scope, the problem is the same
shunning has quit [Remote host closed the connection]
<Dejan> I have a feeling that pypy does float->str faster than C
<cfbolz> Dejan: than libc, yes
<Dejan> cfbolz, first i thought i am dreaming
<cfbolz> Dejan: pypy's float to Str is written in C though
<Dejan> it seems like 2x faster than libc
<Dejan> and ~ 3-4x faster than CPython
jcea has quit [Remote host closed the connection]
jcea has joined #pypy
<antocuni> mattip: ping
oberstet has joined #pypy
jacob22 has quit [Quit: Konversation terminated!]
<gsnedders> Dejan: there are algorithms that don't always provide the shortest form (which is the "correct" result) that are quicker, most obviously Grisu2, some of which that can detect whether or not they are incorrect (like Grisu3, where you can then fallback to libc or whatever if it's wrong)
<Dejan> gsnedders, thanks - i will look for those
<gsnedders> Dejan: I don't actually know about PyPy's implementation, though
<mattip> antocuni: ping
<mattip> antocuni: my long term plan for numpy, scipy at least is to get the wheels built by the projects.
<mattip> but they would need to use manylinux2010
<mattip> so it would be good to make sure pypy-wheels can do that
<antocuni> mattip: yes, that's on my todo list
<antocuni> but I didn't find time to work on it lately
<mattip> what can I do?
<antocuni> so, the current pypy-wheels scripts support using different docker images
<arigato> shunning: I'm not seeing any difference between pypy2 and pypy3 on these two snippets of code
<antocuni> for example, look at docker/build_image.sh: the last line build an ubuntu-based image (there were also centos6-based images but they are commmented out)
<antocuni> so, a first step is to use the pypy-manylinux2010 image as a base
<mattip> +1
<mattip> anything else I can help with?
<arigato> shunning: I think I can explain exactly why the 2nd gistfile becomes much slower after enabling more than 17 function calls, whereas the 1st does not (both on pypy2 and pypy3)
<antocuni> mattip: step 0 is to update this to include pypy 7.2: https://github.com/pypy/manylinux
<antocuni> step 1 is to tweak pypy-wheels to use pypy/manylinux as a base image
<antocuni> which needs some tweaks: for example, pypy/manylinux already includes pypy installations, while pypy-wheels installs them
<mattip> right
<antocuni> step 2 is to build wheels using the new image: e.g., look at scripts/run.sh: it runs "build_wheel.sh" inside the ubuntu-based image
<antocuni> we also need to decide whether we want to keep ubuntu-only wheels, or to drop them in favor of manylinux2010
<mattip> +1 for manylinux2010
<mattip> and pushing people to install pip19.2 +
<antocuni> one drawback is that at the moment there are people who rely on https://antocuni.github.io/pypy-wheels/ubuntu
<antocuni> so their build would suddenly stop working
<antocuni> mattip: one option is to build ubuntu wheels for 7.2, and start building only manylinux2010 for 7.3, which will include an updated version of pip
<mattip> why would it break, does the CI run remove all the files and start over or just upload new ones?
<antocuni> it just uploads new ones. But e.g. as soon as numpy releases a new version, pip won't find the wheel and tries to compile the source
<antocuni> but indeed, it's not much different than now, since pypy 7.2 has been around for a while WITHOUT wheels
tsaka__ has quit [Ping timeout: 264 seconds]
<mattip> ok, so for the next few months we can do both builds
<antocuni> I am fine doing ubuntu-only for <=7.1 and manylinux-only for >=7.2
<arigato> mattip: I think 037744e9abc7 is fine and checking the original thing that it was meant to test
<kenaan> arigo default b50c4326c73f /pypy/module/_demo/: Import posix directly instead of os; import nt on win32
tsaka__ has joined #pypy
<mattip> arigato: thanks
<mattip> fwiw, if projects start producing pypy binary wheels, they will most likely be manylinux2010
<mattip> so user should just update their pip anyway
<mattip> IMO
<Dejan> gsnedders, this looks awesome
<antocuni> mattip: true, but I am sure that someone won't read the instructions and fill bugs, complain or stop using pypy because it can't install numpy. So if we offer a pypy which "just works" it is better, IMHO
<mattip> we could release 7.2.1 or 7.3 with the SOABI and ensurepip changes soon if we want
<antocuni> sounds good to me
waldhar has quit [Quit: ZNC 1.7.4 - https://znc.in]
<gsnedders> Dejan: there's also Ryu, first published last year, which is quicker still, that I had forgotten about
<mattip> the wheel renaming anyway is done by auditwheel, I think we can run "auditwheel repair --plat manylinux1_x86_64"
<mattip> even on manylinux2010 images
<antocuni> that would be a lie, though
<mattip> why? doesn;t it RPATH and package what it needs?
<antocuni> only certain libs. Other libs (such as libc) are guaranteed to be present, and to be at least a certain version: that's why they needed to introduce a new manylinux2010 instead of keeping using manylinux1
<mattip> ok, but in any case we can leave that as a possible "step 3"
<antocuni> OTOH, pypy is known not to compile on manylinux1 systems, so it won't be a problem in practice
<antocuni> but I think it's much better to do the right thing and just ship an updated pip
<mattip> +1
<mattip> step 4 would be to add an azure run for windows, and add macos to either travis or azure
<antocuni> or, even better, to convince projects to build their own wheels
<mattip> do you have analytics in the github pages? If we could tell projects "we get 100 downloads a week of your package" it might be more convincing
<mattip> step 5 :)
<antocuni> I never did anything to actively keep analytics. Maybe github does, but I don't know if we can have it
<arigato> does this mean it's fine if I break ABI compatibility now? :-)
<arigato> there is a horrible inefficiency in cpyext_unicodeobject.h
<antocuni> there is a "traffic" page, but it doesn't show counter for individual gh-pages urls. The README of the repo had 57 unique visitors, FWIW
<arigato> cpython uses bitfield flags, and instead we use a 32(!)-bits integer for each flag
<mattip> arigato: fine by me. Anyway the current HEAD(s) are marked 7.3, so the next version will be "rebuild the world"
<arigato> OK
<mattip> especially if it makes things faster
<arigato> not that huge a difference, it will replace 3 words (8 bytes each) with 1 in each unicode cpyext object
oberstet has quit [Ping timeout: 240 seconds]
YannickJadoul has joined #pypy
<YannickJadoul> mattip, antocuni: It'd be great to include PyPy in cibuildwheel. It's been on my TODO/wish list for a while now, but once there are manylinux2010 images with pypy binaries, I think it should be fairly straightforward
<antocuni> YannickJadoul: do you know this? https://github.com/pypy/manylinux
<arigato> YannickJadoul: great, thanks
<YannickJadoul> Yes, but I haven't taken the time to figure it out, yet
<antocuni> basically it's a manylinux2010 image plus pypy binaries
<YannickJadoul> So in principle, we should even be able to just use that already exiting pypy manylinux image?
<YannickJadoul> Except that it's currently a slightly older version
<Alex_Gaynor> antocuni: does pypy have ABI stability across releases? I can't remember, but I think not.
<YannickJadoul> And for Windows/macOS, we can just download binaries, I suppose? There's no real catch there?
<YannickJadoul> (I actually need to go soon, but this is the cibuildwheel issue: https://github.com/joerick/cibuildwheel/issues/143)
<antocuni> YannickJadoul: yes, we need to update the image to include pypy 7.2. Moreover, there is no automatic rebuild, so if the manylinux2010 image is updated, pypy/manylinux is not
<mattip> maybe I can try to issue a PR against pypa/manylinux
<mattip> using the portable pypy
shunning has joined #pypy
Rhy0lite has joined #pypy
<antocuni> Alex_Gaynor: it's a mess. AFAIK we never changed our SOABI, but at the the same time we never tried to be actively compatible between releases, so you pypy happily loaded .so from previous versions but you risked segfaults. However in practice it was not a big problem because wheels contained the pypy version in their tag
<YannickJadoul> mattip: That would be even easier, indeed
<antocuni> but I think that mattip recently fixed the SOABI issue, didn't he?
oberstet has joined #pypy
<Alex_Gaynor> antocuni: having something like "abi3" for pypy would be good -- that's the blocker for pyca stuff to do pypy wheels, since right now we'd need to build wheels for many pypy versions to get useful coverage
<antocuni> mattip: so the idea is to make a new PR to pypa/manylinux every time we do a new release? And what about older releases?
<mattip> just like cpython, a PR for every release
<mattip> it just puts another binary into the image, not very heavy
<antocuni> mattip: ok
<antocuni> Alex_Gaynor: yes but the problem is that whenever you do cpyext changes/optimizations, you risk to break the ABI
<Alex_Gaynor> antocuni: yes, indeed. hpy would support a stable ABI, right?
<antocuni> Alex_Gaynor: possibly yes, although it's not one of the first immediate goals
<mattip> numpy, scipy, ... do not build with the stable API
YannickJadoul has quit [Quit: Leaving]
<antocuni> Alex_Gaynor: I suppose that in theory with hpy it would be possible to build wheels which works with both pypy and cpython, but they would probably be slower
<Alex_Gaynor> antocuni: yes, we don't care about that :-) We just want "some number of ABIs to support which is less than the total number of PyPy releases"
<antocuni> Alex_Gaynor: what about just supporting the latest N releases of PyPy?
<shunning> arigato: back. How would you explain the difference between the two snippets? I was confused at the beginning and attribute the difference to python2 and 3 difference, which turns out to be incorrect.
<Alex_Gaynor> antocuni: maybe that's sufficient
<arigato> shunning: like I said already, what occurs is that when recording traces we reach the limit, apparently after 18 calls. So if there are less than 18 calls, the trace goes through the optimizer, and it removes mostly everything. If there are more, then the tracing is aborted and the next time it traces it will not inline the functions any more
<mattip> ahh, I remember where I stopped with pypa/manylinux and pypy
<shunning> but why would the other trace work?
<antocuni> mattip: yes?
<mattip> we need portable builds, and they are only available on x64
<shunning> arigato: it seems to me that these two code should work identically
<mattip> not aarch64
<mattip> so I was investigating moving our buildbots to a centos6 chroot/docker
shunning has quit [Remote host closed the connection]
<mattip> manylinux2014 will support aarch64
<antocuni> +1 for having a docker based build infrastructure
shunning has joined #pypy
<mattip> antocuni: you must be talking to my friends. They all like docker more than chroot
<mattip> :)
<antocuni> well, docker has the big advantage that your build environment is version-controlled like everything else. Currently, our buildbots work because someone installed manually the required dependencies
<antocuni> and nobody precisely know what they are, I suppose :)
<mattip> it is a matter of turning the README-chroot into a Dockerfile
<arigato> shunning: I'm confused, by "work" you mean "gives good performance"?
<antocuni> shunning: try to run your 18-calls example with "pypy --jit trace_limit=16384"
<antocuni> mattip: yes, exactly. Automatic scripts are better than documentation which people need to apply manually
<mattip> +1
<arigato> shunning: ah, you mean, why does the limit appear to be lower for one of the gist than the other
<shunning> arigato: yes mean gives good performance
<arigato> shunning: the reason is that we have an optimization for global lookups, which lets 'f1' to 'f20' produce constant function objects (with a special kind of out-of-line guard). This means all the information we get on the function objects to call are all constant-folded, during tracing already
<arigato> but we don't have a similar operation for closure variables
<shunning> yeah I tried your newdict hack and it works
<arigato> cool
<shunning> i used newdict for the global/local I passed to exec and it's gone
<arigato> OK
<shunning> so basically what happened was, passing user level dictionaries to exec result in those trampoline
<arigato> so even though closure variables can only be modified by the "nonlocal" keyword or when the outer function is still running, we don't detect these cases
<arigato> so closure variables => trampolines because the tracer doesn't know they are constant functions,
<arigato> and similarly using a regular dict as globals => trampolines for the same reason
<shunning> yeah
oberstet has quit [Ping timeout: 276 seconds]
<shunning> then I tried very hard to remove global/local dictionaries that got passed to exec by capturing them in the closure
<arigato> yes, that makes sense now
<arigato> but indeed the closure trick is not helping here
<shunning> yeah, thanks for the constant folding information
<shunning> since we always operate at 100k-1M optimized trace size, we always use opencoder big model and remove the trace limit too high assertion
<arigato> (or I guess the closure tricks helps only a little bit---passing a regular dictionary as globals is still slower because it needs to really do a dictionary lookup every time, as well)
<arigato> OK, so in this case it was generating traces that are much longer than previously, which can explain a major slow-down
<shunning> yeah!
<cfbolz> arigato: we should have a pypy3 optimization that recognizes if no nonlocal inner function exist?
<shunning> so i guess i will live with the newdict approach for now
<arigato> cfbolz: and what? mark the closure objects as "dead" whne the outer function exits?
<arigato> or "frozen"
<cfbolz> arigato: something like that, yes
<cfbolz> Bit obscure
<arigato> with a new bytecode FREEZE_CELL emitted for all cells except the ones with a "nonlocal" somewhere...
<shunning> by the way, is it possible that we have a pypy module called "pymtl" in the upstream pypy repo?
<arigato> no, what if the exit is by exception...
<shunning> we have a whole bunch of mixed modules that we developed for "simulation-jit" co-optimization
<arigato> shunning: we could consider it, but you need anyway to rebuild a custom pypy to get the big model jit, right?
<shunning> hmmm
<shunning> good point ...............
<shunning> i guess we are the only guys who stress pypy this hard
<arigato> or at least in that particular direction...
<arigato> maybe the big model should be dynamically enabled, but it's a bit of a mess to express that in RPython
<antocuni> shunning: what do you use pypy for?
<shunning> arigato: well i would guess bigmodel instantiates some variables to 4 byte but the default one allocates 2 byte?
<arigato> yes
<shunning> antocuni: we were thinking about submitting a pypy blog on this but I was too busy with life last year
<arigato> so we'd need to use RPython tricks to make a (hopefully small) part of the JIT present twice
<shunning> this was a paper we published last year but I was too busy to follow up since last fall
<antocuni> arigato: yes, the trace limit is something which we need to think about, eventually. I have seen many real world cases in which the heuristics simply doesn't work. E.g. for capnpy-generated code, all the field lookup is designed to be optimized away from the JIT: but if you have too many fields lookup in a loop, the trace becomes too long and you start emitting tons of code which could be happily optimized away
<antocuni> shunning: interesting, thanks
* antocuni off, will read the logs later
<shunning> the idea is that we have a list of functions we call for billions of iterations
<shunning> to optimize the performance, we basically create hardware datatypes in rpython and expose them to python level. On the other hand, we create loops to make the JIT happy
<shunning> for example the first observation we had was a O(N^2) scaling bottleneck when you do ``for x in funcs: x()``
<shunning> antocuni: maybe figure 3 and chapter 4 is more relevant to you guys. other chapters are for hardware people
<Alex_Gaynor> if the list of functions in `funcs` is mostly constant, you can probably get a big boost by invoking it recursively
<shunning> list of function will always be the same
antocuni has quit [Ping timeout: 252 seconds]
<shunning> Alex_Gaynor: Alex what we did was compiling a function to unroll it
<Alex_Gaynor> that also works
<shunning> then we found that the trace after bridge was too long
<Alex_Gaynor> I had an idea many years ago about how to expose a utility to app level an "unroll loop" annotation, but I never wrote it
<shunning> so we did dont_trace_here in the middle
njs has left #pypy ["Leaving"]
<shunning> dont_trace_here provides extremely scalable performance
<arigato> maybe we could export a helper pypyjit.invoke_unroll(list-of-functions)
<arigato> equivalent to "for f in list: f()"
<arigato> but unrolling just that loop
<arigato> or call it "pypyjit.map_unroll()"
<shunning> arigato: hmm i mean only if it's widely used would you consider exposing that ...
<shunning> arigato: also do you think there are any interesting work to do when workign with huge traces?
<arigato> we're fine with adding small special-case helpers in the pypyjit module... but I'm unsure it would really help here
<shunning> One example we had, was we have to hack it
jvesely has joined #pypy
<shunning> to allocate huge page
<shunning> because sometimes we have megabytes of traces
<arigato> ...or maybe doing in advance "g = pypyjit.serialize(list-of-functions)", and when you invoke g() it actually invokes all functions...
<Alex_Gaynor> arigato: I suppose `g()` could even take arguments and pass them to each function
<arigato> yes
<arigato> shunning: you'd really need gigabytes of traces for MAP_HUGETLB to make a difference, no?
<shunning> i don't remeber how much we had, but the memory usage was usually around 4GB when we execut it
<shunning> we used linux perf and the iTLB miss rate was 40%
<arigato> ooouch
<shunning> :) it's a LOT of traces
<shunning> imagine the code snippet I sent out has f1 to f5000, and each of them has 50 byte code
<arigato> Intel cpus are not optimized forf running extremely long, linear pieces of assembly... who knew :-)
<shunning> each of the function might contain if statements too
tsaka__ has quit [Ping timeout: 264 seconds]
oberstet has joined #pypy
<arigato> ronan: is there an official way to clear the cache at rpython/_cache/, after we make a change in module/cpyext/parse/cpyext_*.h ?
<ronan> arigato: no, AFAIK
<arigato> it's removed the next we run rpython/bin/rpython, I think
<arigato> and it's removed in the nightly buildbots
<arigato> so it's probably fine
<kenaan> arigo py3.6 1c74a232abde /pypy/module/cpyext/: Turn these flags, containing each 1 to 3 bits, from "unsigned int" to "unsigned char"
<kenaan> arigo py3.6 98fbd0ef00cc /pypy/module/cpyext/unicodeobject.py: I *think* that's all we need to fix these XXX for sizeof(wchar_t)==2
shunning has quit [Remote host closed the connection]
i9zO5AP has joined #pypy
Ai9zO5AP has quit [Ping timeout: 250 seconds]
tsaka__ has joined #pypy
<kenaan> rlamy py3.6 800619cae177 /lib-python/3/test/test_bdb.py: fix test
Rhy0lite has quit [Quit: Leaving]
<arigato> I'm 90% convinced that there was a memory leak in cpyext/unicodeobject
<arigato> whenever we do 'set_data(py_obj, data)' it's with data = rffi.str2charp(..)
<arigato> but we never free that
<mattip> and now we do free it?
<arigato> well, I'm trying to fix it
<mattip> ahh, cool
<arigato> I'm also very confused by _PyUnicode_Ready() which seems to work differently than in CPython---e.g. in CPython it computes the max unichar and assigns the correct .kind, but in PyPy it just assumes that .kind is already set
<arigato> ...no, confusion
<arigato> the logic is a bit different but in the end it does the same thing as CPython in some cases, and crashes cleanly in other cases
<mattip> there is probably alot that can be done to clean up the code, I am not sure it got a good review after the utf8 refactor
<mattip> but I think the unicode_dealloc was meant to free the set_data data
<mattip> but of course it doesn't do that
<arigato> trying to copy CPython's logic...
<arigato> (which is a total mess)
<arigato> I am quite confused because we have a leak detector that's supposed to work
<mattip> isn't it disabled on cpyext tests?
<arigato> ...yes, indeed
<arigato> someone still calls start_tracking_allocations() unexpectedly in the middle
<arigato> so now I'm fighting the reverse bug whereby trying to free from dealloc() makes the leakfinder complain that it was never allocated
<arigato> OK what
<arigato> it's rpython/conftest.py, when running pypy/module/cpyext/test/...
<arigato> and it fires really at an unexpected time
jacob22 has joined #pypy
jacob22 has quit [Remote host closed the connection]
jacob22 has joined #pypy
<mattip> :(
<arigato> no, it seems it was really a bug in my code
<kenaan> arigo py3.6 19641c3cf073 /pypy/module/cpyext/unicodeobject.py: memory leak: c_data is assigned a str2charp() that was never freed
antocuni has joined #pypy
<antocuni> Alex_Gaynor: about app-level unrolling, look at this; it's not a proper solution and it's more like a hack, but it works: https://bitbucket.org/antocuni/pypytools/src/default/pypytools/unroll.py
lritter has quit [Ping timeout: 252 seconds]
i9zO5AP has quit [Quit: WeeChat 2.5]
lritter has joined #pypy
lritter has quit [Ping timeout: 240 seconds]
shunning has joined #pypy
<shunning> arigato: i took a step back and thought about it. It's the exec semantics change that lead to this situation. I mean for cpython, a global/local dict is still a dictionary, but for pypy, using a user-level dictionary and __pypy__.newdict can have different performance implications. Do you think there are cases where we cannot lower the user-level
<shunning> dictionary that got passed to exec compile down to internal dict?
shunning has quit [Remote host closed the connection]