cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
CrazyPython has joined #pypy
CrazyPython has quit [Read error: Connection reset by peer]
dansan has joined #pypy
dansan has quit [Excess Flood]
dansan has joined #pypy
krono has quit [Quit: Connection closed for inactivity]
CrazyPython has joined #pypy
CrazyPython has quit [Read error: Connection reset by peer]
CrazyPython has joined #pypy
CrazyPython has quit [Read error: Connection reset by peer]
lritter has quit [Ping timeout: 268 seconds]
lritter has joined #pypy
adamholmberg has quit [Remote host closed the connection]
lritter has quit [Ping timeout: 240 seconds]
rubdos has quit [Ping timeout: 276 seconds]
rubdos has joined #pypy
jcea has quit [Quit: jcea]
dddddd has quit [Ping timeout: 250 seconds]
mattip has joined #pypy
<mattip> profiling a random slow py3.6 test (zipimport/test/test_zipimport.py -k test___spec which runs for 400 secs)
<mattip> it seems we call _pytest.runner.call_runtest_hook 3 times, each time takes ~136 secs
<mattip> ahh, I see, once with 'setup', once with 'call' and once with 'teardown'
jvesely has quit [Quit: jvesely]
<mattip> maybe there is a way to cache something there
<arigato> energizer: also, measure, even on CPython. They improved the baseline memory usage, so slots are less useful nowadays there too
<wleslie> there's an ACM SIGPLAN/OOPSLA 2018 talk by a Remigius Meier I just stumbled upon and I feel really silly for not knowing about this work
<wleslie> it's about using virtual address mapping as read barriers to maintain assumptions in the jit
<wleslie> is there a thread somewhere I can pull and find out more? and are there any runtimes that make use of this?
<mattip> another hot spot: unicodehelper.fsdecode is called 5700 times, which is taking ~140 secs
<arigato> wleslie: it's about using special address mappings for STM, really
<wleslie> I figured it would be dual purpose. Azul did something similar with their concurrent GC when they first ported to x86.
<arigato> maybe, but that's mostly not relevant for Pythons with a GIL
<antocuni> mattip: I'm also investigating slow tests; I think one of the biggest culprit is _frozen_importlib, which is implemented at applevel now
<wleslie> that's why I was wondering if any of the other runtimes ended up using it (my gut guesses pixie, but who knows). is it still a feature available from rpython, or has it bitrotten?
<antocuni> so for example, in test_cpytext:LeakCheckingTest:preload_builtins, you preload/import mmap and types, so it takes forever
ronan has quit [Ping timeout: 246 seconds]
<arigato> wleslie: it's still in a branch
danchr_ has joined #pypy
<wleslie> neat. I can't seem to find this specific work anywhere, do you happen to have a guess at the branch name?
<arigato> anything with "stm"
<wleslie> stmgc-8 looks good
<wleslie> thanks
<mattip> antocuni: it seems at least for zipimport, changing fsdecode to have a fast path for ascii is a big win
<antocuni> nice!
<antocuni> in parallel, I am playing with writing a custom __import__ to be used only by tests, so that we don't need to bring in/execute the whole _frozen_importlib every time
<antocuni> hopefully, with both our changes we will speed up tests considerably :)
<mattip> time says 1m55 vs 1m17
<mattip> the deeper problem is the else clause in the middle of fsdecode,
<mattip> it is calling out to a c implementation of pypy_char2wchar
<mattip> which wraps mbstowcs
<antocuni> where is fsdecode implemented?
<mattip> pypy/interpreter/unicodehelper.py
<mattip> I think it is only meant to be called at bootstrap, but the test calls it ~5700 times
<antocuni> I think it is perfectly find to have a space option which says "dont_care_about_this_stuff" which will be set to True by default
<antocuni> s/find/fine
<antocuni> and the only the tests which actually needs this can set it appropriately
<mattip> makes sense, but in this case I think a fast path for ascii is valid, the other code paths are always going to be expensive
<antocuni> true
<mattip> on the other hand, your work might make this less of an issue, since I think most of the calls are for importing
<mattip> and the same file name is getting fsdecoded over and over
<antocuni> maybe
<antocuni> another probable source of slowness is the fact that on py3.6 you have to initialize more builtin modules than on default. I see that by default we bring in _locale, _frozen_importlib, struct, atexit and _string
<mattip> right. I see a lot of caching fails for pypy.module.sys.state.State in build/rpython/rlib/cache.py
<mattip> as it slowly imports those modules one at a time
<mattip> and they have lots of lib-python/3 dependencies
<antocuni> I think that the correct strategy would be to mock most of them and use the mocks by default in tests, possibly with a nice error message which says things like "please add XXX to spaceconfig['usemodules']" in case you need a feature which is not implemented by the mock
<mattip> err, forget my previous comment about cache
<mattip> it is in unmarshalling code from stdlib modules, which I think is exactly what antocuni is saying
<mattip> committed a fast-path optimization, perhaps could offset the cost by only doing it for len(utf8) < TBD
<mattip> this might also impact 3126 about slowness in open()
<antocuni> uh, it seems that there were many failures tonight in the py3.6 branch. AttributeError: module 'threading' has no attribute 'RLock'
<antocuni> I suppose this is related to the work which arigato did recently?
<mattip> not sure. I am checking it now. I changed the buildslave to run from the docker image (on another machine, and paused the bencher4 buildslave)
<arigato> I didn't commit anything related to threads, apart from a comment
<mattip> I think there is something fishy, since when I rerun those tests on the same docker image they pass
<mattip> Somehow the image built without boehm gc.h, so maybe that is a factor
<mattip> since now I rebuilt the image
<mattip> another far-out theory is that I ran the tests with parallel_runs=4, which may have swamped the buildslave
<mattip> (so "same docker image" is only approximate, since now the image has boehm gc)
<bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7854 [mattip: force build, py3.6]
<antocuni> FWIW, my theory about cpyext slowness was wrong. Even by commenting out "preload_builtins", it takes ~2 minutes :(
* mattip trying the run again, to see if boehm gc + d6f87c4ee798 makes a diffence
<mattip> on bencher4 they were 1hr39min
krono has joined #pypy
<mattip> the good news is that the last nightly builds are now "portable"
<mattip> maybe I should change the upload name somehow to indicate that
<antocuni> that's awesome
<mattip> trying them out would be welcome
<antocuni> mattip: something is wrong http://paste.openstack.org/show/787226/
<antocuni> I think it happens because I am using my own version of pyrepl/fancycompleter/etc via PYTHONPATH and PYTHONSTARTUP
<antocuni> but pypy 7.2.0 works fine
<antocuni> what is the latest nightly which was built on bencher4?
<antocuni> no, the problem is not related to my pyrepl; a simple "curses.setupterm(None)" fails on the nightly build
<antocuni> maybe this means that some curse-related library was not installed in the docker image and thus the curse module was not built correctly?
_whitelogger has joined #pypy
marvin_ has quit [Remote host closed the connection]
marvin has joined #pypy
oberstet has joined #pypy
<bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7854 [mattip: force build, py3.6]
<mattip> antocuni: thanks, looking
<bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7855 [mattip: test with parralel_runs=2, py3.6]
<mattip> antocuni: it is working for me on Ubuntu 18.04. What OS do you have?
<mattip> ahh, I see, the linking is wrong
<mattip> no, it seems OK, the RPATH magic is working for ldd bin/libpypy-c.so
<mattip> the only system libraries are libutil, libdl, libbz2, libm, librt, libz, libgcc_s, libpthread, libc
<mattip> these are being packaged in: libexpat, libffi, libncurses, libtinfo
<mattip> maybe the tinfo database is not where it should be?
<mattip> I see squeakypl builds ncurses, I just used the centos6 one
<mattip> ahh, got it. I had "TERM=xterm-256color". If I remove that I get the error you see, and pyrepl is messed up
Rhy0lite has joined #pypy
<mattip> ncurses says that should be fixed as of 20120922 https://invisible-island.net/ncurses/NEWS.html#index-t20120922
<mattip> I wonder what version of ncurses that corresponds to. It may be easier to just install a new one into the image
jcea has joined #pypy
lritter has joined #pypy
<antocuni> mattip: I'd vote for doing the same thing as squeaky
<antocuni> since it has been working well for years
adamholmberg has joined #pypy
<mattip> +1, working on it
* antocuni is fighting with vmprof, trying to display the profile of running one cpyext test
<mattip> fwiw, I used python2 -mcProfile -o test.profile pytest.py ...
<mattip> and then hacked runrabbitrun to work with a more modern wx
<mattip> antocuni: did d6f87c4ee798 make any difference to you?
<antocuni> I tried to use cProfile and display the results in kcachegrind (using pyprof2calltree) and snakeviz. In both cases, I couldn't understand much of the result
<antocuni> mattip: ah, I didn't try yet
<antocuni> good idea, let me try
<mattip> some googling also gave me this, didn't try it
<antocuni> I tried that as well, and got nonsense again
<mattip> :(
<antocuni> I don't know if it's pytest doing some magic which confuses the stacktraces or what. Most of these tools report functions whose total time is more than 100% and things like that
<antocuni> and flameprof complains with this: Warning: flameprof can't find proper roots, root cumtime is 1e-06 but sum tottime is 169.117844
<mattip> maybe it is a python2 formatted profile output? I used runrabbitrun with python2
<antocuni> ah, maybe
<antocuni> d6f87c4ee798 does help, indeed: the wall-clock time went from 2m:20s to 1m:27s
<mattip> yay
<kenaan> mattip default 9c171d039841 /pypy/module/thread/test/test_thread.py: move slow test to its own class and skip it
<kenaan> mattip py3.6 aa3b8c5bd232 /pypy/module/thread/test/test_thread.py: merge default into branch
<kenaan> mattip default 890c142fd3b8 /pypy/module/thread/test/test_thread.py: add missing import
<kenaan> mattip py3.6 d91c0d495118 /pypy/module/thread/test/test_thread.py: merge default into py3.6
<mattip> in 9c171d039841, skipping a test that took ~10 minutes if it ran (only when it could *not* open 10000 threads
<antocuni> oh, finally managed to display the vmprof data. Unfortunately I can't share because I need to run a custom version of vmprof-server which has a higher recursion limit
<antocuni> this is the screenshot: http://antocuni.eu/misc/img/nGYbVPcd.png
<antocuni> (for running a single empty cpyext test)
<bbot2> Retry: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7855 [mattip: test with parralel_runs=2, py3.6]
<antocuni> it seems we are spending a lot of time in build_bridge/attach_all/finish_type_2; in particular, the biggest culprit seems to be "unicode_attach"
<bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7856 [mattip: test with parralel_runs=2, py3.6]
<mattip> I wonder if we can cheat there too, and detect ascii, and then
<mattip> allocate a "fake" usc2 or ucs4 buffer, set it to 0, and fill every second or fourth byte with the string char
<antocuni> I still wonder why it is so much slower than on default, though. It might be simply that since we import more modules at startup, it has more work to do
<mattip> instead of calling out to mbstowcs
<mattip> gotta go soon, but it seemed suspicious to me that we call _pytest.runner.call_runtest_hook 3 times
<mattip> and they all take about the same time
<mattip> We call once with 'setup', once with 'call' and once with 'teardown', and they seem to be doing the same amount of work,
<antocuni> I don't see it in the flamegraph, where is it?
<antocuni> ah no, you mean pytest_runtest_setup
<mattip> yup
<antocuni> from the screenshot I posted above, it's clear that they are doing very different things
<antocuni> and they don't take the same time
<antocuni> that's why I prefer flamegraphs than other visualizations which mix things together :)
<mattip> ok,so it was an artifact from the test I chose to profile
<mattip> or from the tool taking total_time / ncalls and me misreading the result
ronan has joined #pypy
<antocuni> yes, likely
<mattip> maybe compare that flamegraph to a pypy2 one, but it probably is very different
* mattip off, seeya
<antocuni> yes, I'm doing that now
jvesely has joined #pypy
adamholmberg has quit [Remote host closed the connection]
adamholmberg has joined #pypy
<antocuni> uh, this is impressive and unexpected
<antocuni> I managed to half the time needed to run a single cpyext test with this 3-line diff: http://paste.openstack.org/show/787247/
<antocuni> the problem is that functions like cpyext.unicodeobject.set_utf8 do a cts.cast, which requires to parse the cdecl again and again
<antocuni> I suppose I should just apply this patch to default and be happy
marky1991 has quit [Ping timeout: 268 seconds]
<arigato> bah
<antocuni> is the "bah" referred to my sentence?
<kenaan> antocuni default 317104f1b067 /pypy/module/cpyext/cparser.py: Use a cache to avoid parsing the same cdecl again and again, which is done e.g. for all the various cts.cast(......
dddddd has joined #pypy
<antocuni> I am confused: mattip merged default into py3.6 1 hours ago, and the only new commit in default is 317104f1b067
<antocuni> ah no, never mind
<antocuni> I was trying to merge default into an OLD commit of py3.6, that's why I got nonsense
krono has quit [Quit: Connection closed for inactivity]
<ronan> antocuni: oops!
<ronan> (for cts.cast())
<antocuni> yeah, I wouldn't never guess it without looking at the profile data
<antocuni> btw, I concluded the analysis of running a single cpyext test on my machine on default vs py3.6
<antocuni> it takes ~25s on default and ~75s on py3.6
<antocuni> most of the 50s of difference are accountable directly or indirectly to _frozen_importlib
<antocuni> e.g test_cpyext.preload takes ~25s on py3.6 and 0.11s on default
<antocuni> and the vast majority of those 25s are spent running the code in _frozen_importlib to import the various modules to preload
oberstet has quit [Quit: Leaving]
<antocuni> another "funny"/crazy finding is this line inside test_cpyext:setup_method
<antocuni> self.space.call_method(self.space.sys.get("stdout"), "flush")
<antocuni> this alone takes ~4.83s on py3.6 😱 (and no noticeable time on default)
<antocuni> I suspect that's because it goes through _io
<ronan> I've noticed the issue with calling methods of sys.stdout before
<ronan> but I couldn't find what exactly was slow
<ronan> it looked more complicated than just using _io
<antocuni> I admit I didn't investigate deeply that particular issue. But in the vmprof profile I see lots of interpreted code
<antocuni> so there is probably something which is implemented at applevel
<ronan> yes, CPython relies on quite a lot of app-level code at interpreter startup
<antocuni> I am thinking of writing a "_dummy_importlib" module to be used instead of _frozen_importlib, implementing at interp-level the bare minimum which is necessary to run the tests
<antocuni> also, do you know why e.g. "struct" and "_locale" are in the essential_modules? Are they needed by _frozen_importlib or by what?
<ronan> I think struct is a dependency of some essential_modules
<ronan> _locale is used in app_main
<antocuni> I am a bit confused now: are the modules listed in essential_modules ALWAYS included when we create an objspace for testing, or this option is used only at translation time?
craigdillabaugh has joined #pypy
<ronan> actually, it looks like it uses default_modules
<antocuni> where is the relevant code?
<antocuni> oh I see: pypy.tool.pytest.objspace.gettestobjspace
<ronan> yes
<ronan> and down the rabbit holes to pypy.config.pypyoption
<antocuni> I wonder whether we should try to use only the bare minimum modules which are actually needed for most tests, and explicitly add additional modules when they are actually needed
<antocuni> OTOH, the spaces are cached, so if you use too many different combinations you might end up with slower tests overall
<ronan> yes, and I don't think builtin modules are that expensive
<antocuni> but e.g. in py3.6, calling maketestobjspace() takes ~5 seconds :(
<antocuni> ronan: I tried to comment out everything in essential_modules and default_modules, apart "sys", "builtins" and "__pypy__"
<ronan> hmm, that's not good
<antocuni> creating the space takes 2s instead of 5s
<antocuni> and e.g. test_boolobject still passes
jvesely has quit [Quit: jvesely]
<antocuni> also, the distinction between essential/default/allworking does no longer make much sense nowadays
<antocuni> e.g., I don't think anyone ever do a translation with only the default modules
<ronan> +1
<ronan> well, maybe for the interpreter variants, like revdb or sandboxed?
<bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/7856 [mattip: test with parralel_runs=2, py3.6]
<antocuni> ah, maybe
<antocuni> note that maketestobjspace takes ~4s even on default, so at least this means that it's not a py3k regression
<antocuni> but I wouldn't mind to find a way to run tests in 1 second instead of 5 :)
riddle has quit [Ping timeout: 244 seconds]
<ronan> when you're used to pypy3, test timings on pypy2 seem perfectly fine!
<antocuni> :)
asmeurer_ has joined #pypy
riddle has joined #pypy
dansan has quit [Excess Flood]
dansan has joined #pypy
dansan has quit [Excess Flood]
dansan has joined #pypy
asmeurer_ has quit [Quit: asmeurer_]
asmeurer_ has joined #pypy
dansan has quit [Ping timeout: 245 seconds]
dansan has joined #pypy
Arfrever has joined #pypy
asmeurer_ has quit [Quit: asmeurer_]
<energizer> what does w stand for in pypy code?
Rhy0lite has quit [Quit: Leaving]
<energizer> mjacob: thanks
asmeurer has joined #pypy
CrazyPython has joined #pypy
CrazyPython has quit [Read error: Connection reset by peer]
CrazyPython has joined #pypy
asmeurer has quit [Quit: asmeurer]
CrazyPython has quit [Ping timeout: 245 seconds]
CrazyPython has joined #pypy
CrazyPython has quit [Read error: Connection reset by peer]
asmeurer has joined #pypy
adamholmberg has quit [Remote host closed the connection]