#pypy on 2020-07-01 — irc logs at freenode.irclog.whitequark.org

2019-08-29 19:33 cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin

00:26 tos9 has quit [Quit: ZNC 1.7.2+deb3 - https://znc.in]

00:27 tos9 has joined #pypy

00:53 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-s390x/builds/1547 [default]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/8215 [py3.6]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/515 [py3.6]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/7136 [py3.6]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/5896 [py3.6]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5123 [py3.6]

01:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7048 [py3.6]

01:46 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-aarch64/builds/528 [default]

01:46 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-aarch64/builds/529 [py3.6]

01:51 altoid has joined #pypy

01:55 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/5349 [default]

01:55 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/5350 [py3.6]

01:56 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/8215 [py3.6]

01:59 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-s390x/builds/1350 [default]

02:06 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/7136 [py3.6]

02:09 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/5896 [py3.6]

02:17 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7048 [py3.6]

02:18 altoid has quit [Quit: leaving]

02:19 Rhy0lite has quit [Quit: This computer has gone to sleep]

02:56 dddddd has quit [Ping timeout: 260 seconds]

03:32 jcea has quit [Remote host closed the connection]

03:32 jcea has joined #pypy

03:43 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5123 [py3.6]

03:56 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/515 [py3.6]

04:03 xcm has quit [Read error: Connection reset by peer]

04:05 xcm has joined #pypy

04:15 xcm has quit [Remote host closed the connection]

04:16 xcm has joined #pypy

04:17 jcea has quit [Quit: jcea]

04:24 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/5350 [py3.6]

05:00 <bbot2> Started: http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/2976 [default]

06:00 thrnciar has quit [Remote host closed the connection]

06:55 otisolsen70 has joined #pypy

06:56 infernix has quit [Ping timeout: 240 seconds]

07:01 xcm has quit [Read error: Connection reset by peer]

07:03 xcm has joined #pypy

07:10 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-aarch64/builds/529 [py3.6]

07:13 <bbot2> Success: http://buildbot.pypy.org/builders/jit-benchmark-linux-x86-64/builds/2976 [default]

07:26 rjarry has quit [Ping timeout: 256 seconds]

07:39 infernix has joined #pypy

07:52 rjarry has joined #pypy

08:30 * arigo fixes issue3255

08:34 <arigo> Alex_Gaynor: this issue is more messy than that

08:34 <arigo> I mean ffi_prep_closure_loc

08:34 <arigo> libffi would like people to use ffi_prep_closure_loc instead of ffi_prep_closure, but doing so exposes users to a crash after fork()

08:36 <arigo> as per the big warning in https://cffi.readthedocs.io/en/latest/using.html#callbacks-old-style

08:40 <arigo> but looking at the implementation of ffi_prep_closure(), then yes, I can trivially replace it with ffi_prep_closure_loc() anyway, which works fine if we assume that no-one uses a libffi so old that it doesn't have ffi_prep_closure_loc()

09:10 tsaka__ has quit [Ping timeout: 246 seconds]

09:21 tsaka__ has joined #pypy

09:31 tsaka__ has quit [Ping timeout: 246 seconds]

09:44 tsaka__ has joined #pypy

10:23 _whitelogger has joined #pypy

10:38 YannickJadoul has joined #pypy

10:39 rguillebert has joined #pypy

11:14 <Alex_Gaynor> arigo: looks like ffi_prep_closure_loc has been in the header for 11 years, so I suppose it's fine :-)

11:38 Rhy0lite has joined #pypy

11:44 tsaka__ has quit [Ping timeout: 265 seconds]

11:50 xcm has quit [Remote host closed the connection]

11:52 xcm has joined #pypy

11:55 dmalcolm_ has joined #pypy

11:56 dmalcolm has quit [Ping timeout: 240 seconds]

12:05 dstufft has quit [Ping timeout: 240 seconds]

12:06 nopf has quit [Ping timeout: 260 seconds]

12:07 nopf has joined #pypy

12:10 dstufft has joined #pypy

12:30 dstufft has quit [Excess Flood]

12:31 dstufft has joined #pypy

13:22 sknebel has quit [Remote host closed the connection]

13:24 sknebel has joined #pypy

13:30 mvantellingen has quit [Ping timeout: 246 seconds]

13:30 commandoline has quit [Ping timeout: 256 seconds]

13:32 mvantellingen has joined #pypy

13:33 oberstet has joined #pypy

13:33 commandoline has joined #pypy

13:36 jacob22 has quit [Quit: Konversation terminated!]

13:38 jacob22 has joined #pypy

14:16 tsaka__ has joined #pypy

14:34 otisolsen70 has quit [Quit: Leaving]

14:59 rguillebert has quit [Quit: Connection closed for inactivity]

14:59 tsaka__ has quit [Ping timeout: 264 seconds]

15:09 jcea has joined #pypy

15:47 lritter has joined #pypy

16:08 tsaka__ has joined #pypy

17:15 xcm has quit [*.net *.split]

17:15 pjenvey1 has quit [*.net *.split]

17:15 kipras`away has quit [*.net *.split]

17:15 marvin has quit [*.net *.split]

17:15 simpson has quit [*.net *.split]

17:15 ebarrett has quit [*.net *.split]

17:15 danilonc has quit [*.net *.split]

17:15 luizirber has quit [*.net *.split]

17:16 mgedmin has quit [Quit: ZNC - https://wiki.znc.in/ZNC]

17:17 mgedmin has joined #pypy

17:18 luizirber has joined #pypy

17:18 pjenvey1 has joined #pypy

17:18 marvin has joined #pypy

17:18 simpson has joined #pypy

17:18 kipras`away has joined #pypy

17:18 ebarrett has joined #pypy

17:18 danilonc has joined #pypy

17:19 xcm has joined #pypy

17:20 mgedmin has quit [Client Quit]

17:26 dmalcolm_ has quit [*.net *.split]

17:26 Cheery has quit [*.net *.split]

17:26 wallet42 has quit [*.net *.split]

17:29 wallet42 has joined #pypy

17:29 dmalcolm_ has joined #pypy

17:29 mgedmin_ has joined #pypy

17:29 Cheery has joined #pypy

18:30 BPL has joined #pypy

18:31 BPL has quit [Remote host closed the connection]

18:33 YannickJadoul has quit [Quit: Leaving]

19:24 otisolsen70 has joined #pypy

19:39 otisolsen70_ has joined #pypy

19:39 demonimin has joined #pypy

19:42 otisolsen70 has quit [Ping timeout: 246 seconds]

19:45 _aegis_ has quit [*.net *.split]

19:47 _aegis_ has joined #pypy

20:10 otisolsen70_ has quit [Ping timeout: 240 seconds]

20:23 demonimin has quit [Quit: bye]

20:23 demonimin has joined #pypy

20:24 otisolsen70_ has joined #pypy

20:24 demonimin has quit [Client Quit]

20:25 demonimin has joined #pypy

20:29 otisolsen70_ has quit [Ping timeout: 246 seconds]

20:57 whitequark has joined #pypy

20:59 demonimin has quit [Quit: bye]

21:22 <whitequark> hey folks, i have an optimization question

21:24 <whitequark> nmigen has an RTL simulator that converts HDL code into similarly structured Python code that simulates its behavior. this code is pure bignum (in the sense that it needs to, sometimes, use numbers beyond 64 bits) arithmetics that operates on a chunk of state

21:24 <whitequark> for every HDL signal (think "variable"), this state consists of a pair of numbers (curr, next)

21:25 <whitequark> the question is: what would be the best way for pypy performance to represent this state?

21:25 <whitequark> I have three contenders: AoS (current implementation), an array of classes with __slots__ = ('curr', 'next')

21:25 <whitequark> SoA, one class with an equally sized curr, next arrays

21:26 <whitequark> both of these are indexed by numbers, so the generated code looks something like `slots[0].next = ~slots[1].curr`, with `slots` injected via `exec`

21:27 <whitequark> and the third one is to make it work like a closure, i.e. have a dict with keys like `slot_0`, `slot_1`, where the values are instances of a class that has `curr` and `next` fields, and provide that dict as locals to `exec`

21:27 <whitequark> thoughts?

21:28 <whitequark> i don't really use pypy myself, but our downstream projects say very good things about it, so i thought i'd ask to make sure i'm not going to paint myself into a corner architecturally

21:31 <tos9> whitequark: (folks from EU time are possibly asleepish already)

21:31 <tos9> the broad answer is always probably "measure"

21:31 <tos9> but the broader answer to the best of my understanding is ... __slots__ mostly doesn't do anything on PyPy (in a good way)

21:32 <tos9> a class will be better than dicts generally

21:32 <tos9> given you have fixed fields

21:33 <tos9> PyPy may (I think probably should) figure out when bignums aren't needed based on the operations, and compile down to machine ints

21:33 <tos9> and same for compiling the list down to a plain array of machine ints in memory

21:33 <tos9> so yeah basically the rule in PyPy is "do the simple thing, usually PyPy can make that fast"

21:34 <tos9> and if it's not fast enough at that point we can help look at the generated code to see why

21:36 <Alex_Gaynor> SoA would be my guess at something likely to perform best, but you'd have to measure to be sure.

21:37 <whitequark> tos9: regarding dicts: the actual state object for an individual signal would always be a class (if present at all)

21:37 <whitequark> what i'm wondering about if it makes sense to inject locals

21:37 <whitequark> to avoid an array lookup

21:38 <whitequark> i suspect (haven't measured yet, needs some refactoring) that this would be good for cpython, which is what most people use and which is the most important target

21:38 <whitequark> but i'd like to not do something that pessimizes pypy

21:39 <whitequark> as for "fast enough": it's never fast enough :) the python simulator is far too slow for complex designs, so i had to implement a translator to C++, which can be built and loaded with ctypes

21:39 <whitequark> but that can't be the only option because of windows and other environments that have compiler issues

21:40 <pmp-p> then use wasm

21:40 <whitequark> ah so the reason i use C++ is to use the C++ compiler as a macro expansion engine

21:40 <whitequark> so first i'd have to compile *clang* targeting wasm to wasm and ship *that*

21:40 <tos9> whitequark: injecting locals doesn't do anything on PyPy speedwise

21:41 <tos9> personally I'm biased, but my general strategy is usually "make Python code that's as fast as possible on PyPy *first*"

21:41 <whitequark> tos9: as in, `slots[0]` and `slot_0` referring to the same object in the generated code would work roughly as fast?

21:41 <tos9> reason being people who care about performance often use it :D -- but second reason being, if you want to make it faster on CPython, you can then take the whole chunk and have a CFFI extension

21:41 <pmp-p> whitequark: you can compile python to wasm

21:41 <tos9> whitequark: correct.

21:41 <pmp-p> no need for clang

21:42 <pmp-p> you just need to fully annotate your python code

21:42 <whitequark> tos9: thanks, that's helpful! so the problem here is that the people who care about speed will use the C++ backend anyway

21:42 <pmp-p> pypy + https://github.com/windelbouwman/ppci could get you a lot ahead of cpython

21:43 <tos9> whitequark: right, exactly, that's normal (that the CPython folks are using some non-Python-based backend of whatever thing)

21:43 <whitequark> the C++ code i'm generating runs at about 1/2-1/4 speed of single-threaded verilator, which is likely unbeatable

21:43 <tos9> oh sorry, you mean a C++ backend not connected to Python at all? I mean you could connect it (again via CFFI)

21:44 <whitequark> it's connected to python

21:44 <tos9> ah, k

21:44 <tos9> then yeah I mean for CPython the fastest path is "make the slow stuff run in not-Python" anyhow

21:44 <whitequark> what i mean is that i'm completely certain that pypy can't beat the c++ backend, because i'm not feeding pypy the same thing i'm feeding the c++ compiler

21:45 <whitequark> there's an intermediate HDL-specific optimization stage that i cannot do in python without a massive amount of duplicated effort

21:45 <whitequark> the reason i care about CPython speed is that it's going to be the baseline for folks new to nmigen, especially on windows

21:45 <whitequark> it doesn't need to be very fast, but it can't be too slow

21:46 <tos9> yeah sure, obviously caring about CPython speed is important (probably as you say more important given that's what most folks will use, for better or worse)

21:46 <whitequark> the reason the c++ backend exists is that the people who use pypy don't find it sufficient

21:46 <whitequark> i think they currently have 40 minute CI times or something, which is better than multiple hour CI times

21:46 <whitequark> but... not sufficient

21:46 <tos9> that's the reason the *C++* backend exists?

21:47 <whitequark> *a* reason

21:47 <whitequark> the C++ backend goes through Yosys, so you can also include Verilog code in the simulation

21:47 <whitequark> the C++ backend also exists for people who don't use nMigen at all; it is a contribution back to the broader community

21:47 <whitequark> but the immediate impetus for writing it was to improve the speed on all Python implementations, yes

21:48 <tos9> what's the other not-C++ backend?

21:48 <tos9> Pure python?

21:48 <whitequark> yep https://github.com/nmigen/nmigen/blob/master/nmigen/back/pysim.py

21:48 <tos9> Normally for PyPy the pure python one would be faster than a ctypes one, AFAIK.

21:49 <tos9> But it's very hard to generalize these things, hence "measure"

21:49 <whitequark> can that be right? the overhead of ctypes calls is fixed, but the size of a design is unlimited

21:50 <tos9> Oh, was that what you were saying about the C++ backend doing different things than the pure python one?

21:50 <whitequark> no

21:50 <whitequark> two different things

21:51 <whitequark> what the C++ backend does differently internally from the Python one is that it does some netlist optimizations, giving the C++ compiler a *lot* more visibility into e.g. variable lifetimes

21:51 <whitequark> what i'm talking about now is the general way people use the simulation

21:51 <tos9> Basically -- PyPy can't JIT anything that isn't Python code, and worse, PyPy will be quite slow at anything that uses the CPython C API in many cases because what it has is just an emulation layer for it as much as it can

21:52 <tos9> So in general with PyPy you want as much code to be Python as you can, so that when it's used, PyPy can look inside, bridge lots of stuff, make everything fast

21:52 <whitequark> they compile a (typically very large) amount of HDL using some backend, python or c++ or whatever, and then interact with it using a small API surface and a small number of calls

21:52 <whitequark> there is no CPython C API involved anywhere, of course

21:52 <whitequark> here, take a look https://paste.debian.net/1154684/

21:52 <whitequark> this is a typical way in which a design would be driven

21:54 <whitequark> so you have an extremely large amount of generated code that is all stuffed into `cxxrtl_step`, and then a small amount of python code driving it that does almost no useful work

21:54 <tos9> ok yeah sorry, so not CPython C API, but just using ctypes is slower on PyPy than on CPython still IIRC.

21:54 <whitequark> in this case it toggles the clock, it would also probably read something in most simulations

21:54 <tos9> (As opposed to CFFI)

21:55 <whitequark> but the idea is that you execute literally a few lines of python code and then you go run hundreds of kilobytes of machine code generated from c++

21:55 <whitequark> oh, i can use CFFI if that's better, that was just a proof of concept so i went for ctypes

21:55 <whitequark> whatever works best, the generated library exports a conservative C API

21:56 <whitequark> there's no point in letting pypy look into cxxrtl_step because that function is completely self-contained. it can (and should) be compiled separately

21:56 <tos9> then yeah if I follow sounds like what you have should probably be fine for both then

21:57 <whitequark> both of them as in, pypy and cpython?

21:57 <tos9> yeah

21:57 * whitequark nods

21:59 <whitequark> alright, so my conclusion is that i should benchmark both SoA and local injection approaches (and of course AoS is the current solution), and probably go ahead with the one faster on CPython

22:00 <whitequark> since they're likely going to be the same on pypy. very nice.

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/8216 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/516 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/7137 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-aarch64/builds/530 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-win-x86-32/builds/2361 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/5897 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/own-linux-s390x/builds/1548 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-s390x/builds/1351 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5124 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7049 [default]

22:00 <bbot2> Started: http://buildbot.pypy.org/builders/pypy-c-jit-win-x86-32/builds/5351 [default]

22:07 <whitequark> pmp-p: i'm confused, ppci doesn't seem to have a wasm backend?

22:07 <whitequark> ah, sorry, it does

22:09 <whitequark> okay, i see, so it wouldn't really work for me because i need first-class bignums

22:10 <whitequark> this is a recurring theme in HDL simulation. Xilinx has their thing as a C++ library, clang has _ExtInt(n) now, i wrote one from scratch https://github.com/YosysHQ/yosys/blob/master/backends/cxxrtl/cxxrtl.h

22:11 <whitequark> what they have in common is they rely heavily on inlining and local optimizations to be able to describe bignum ("arbitrary size integers" is more correct but also unwieldy) using fairly abstract algorithms but still have it generate nice code

22:11 <whitequark> i'm basically replacing the gnarliest parts of verilator with yosys and clang, implementing only the actually interesting netlist manipulation parts

22:13 <whitequark> it's a lot easier to get things correct with this approach, at the cost of somewhat high but not extreme compile times

22:26 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-64/builds/8216 [default]

22:28 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-x86-32/builds/7137 [default]

22:30 xcm has quit [Read error: Connection reset by peer]

22:34 xcm has joined #pypy

22:54 <bbot2> Failure: http://buildbot.pypy.org/builders/own-linux-aarch64/builds/516 [default]

23:04 <bbot2> Success: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-64/builds/7049 [default]

23:07 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-linux-x86-32/builds/5897 [default]

23:39 <bbot2> Failure: http://buildbot.pypy.org/builders/pypy-c-jit-macosx-x86-64/builds/5124 [default]

23:49 <bbot2> Failure: http://buildbot.pypy.org/builders/own-win-x86-32/builds/2361 [default]