#pypy on 2020-12-18 — irc logs at freenode.irclog.whitequark.org

2019-08-29 19:33 cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin

00:03 oberstet has quit [Quit: Leaving]

01:39 jcea has quit [Ping timeout: 258 seconds]

05:56 wleslie has joined #pypy

06:07 todda7 has quit [Ping timeout: 268 seconds]

06:16 proteusguy has quit [Remote host closed the connection]

06:32 todda7 has joined #pypy

06:36 proteusguy has joined #pypy

07:27 todda7 has quit [Ping timeout: 272 seconds]

08:46 todda7 has joined #pypy

08:48 lesshaste has joined #pypy

08:48 lesshaste is now known as Guest38903

08:50 leshaste has quit [Read error: Connection reset by peer]

10:42 oberstet has joined #pypy

10:42 Guest38903 has quit [Quit: Leaving]

11:26 todda7 has quit [Ping timeout: 265 seconds]

11:57 wleslie has quit [Quit: ~~~ Crash in JIT!]

12:10 todda7 has joined #pypy

12:14 oml has joined #pypy

12:18 <oml> Hi! I'm wondering if pypy together with os.fork() would provide easily shareable read-only(!) memory

12:20 <oml> my problem is as follows: I want to do some analysis on a 122GB chunk of data. Yes, I can't really split it up further without fundamentally thinking about the algorithms involved in the analysis.

12:20 <oml> As the analysis can be done in a map-reduce-fashion I want to make use of the 32 cores the server provides.

12:20 <oml> But right now pypy + os.fork() seem to not make use of the copy-on-write-mechanism my linux should provide

12:22 <LarstiQ> which memory would be shared? an mmap? Does that need particular flags for that to work?

12:23 <oml> LarstiQ: right now the structure is a lists of lists, each saves nodes in a undirected graph

12:24 <cfbolz> oml: yeah, that's a bit tough, that needs support from the GC

12:24 <oml> i was considering just disabling the garbage collector after loading was done

12:25 <cfbolz> We always had plans to have a more fork friendly GC

12:25 <cfbolz> I don't know what happened to these

12:25 <oml> cause the analysis itself needs about O(1) memory per process, so it would be ok in the remaining 6GB of the machine

12:26 <oml> I mean I understand that with CPython and the refcounts the Copy-On-Write-Mechanism would be triggered

12:27 <oml> But I'm not at all sure how the whole refcount-stuff works in PyPy

12:27 <oml> (it's my first pypy project, so please bear with me)

12:30 <oml> LarstiQ: as far as I can see my stuff is currently not mmap-able, but it might be

12:31 <oml> on the other hand: If I rewrite the stuff anyhow I can just do it in C/C++/Rust, thats the end goal anyway

12:32 <oml> unfortunately I can't justify getting a server with a few terrabytes of RAM to my boss. He would be getting suspicious :D

12:37 jcea has joined #pypy

12:54 dddddd has quit [Ping timeout: 256 seconds]

12:58 dddddd has joined #pypy

13:18 <cfbolz> oml: we don't use refcounting

13:19 <cfbolz> But there is still an object for every object and we write to that

13:19 <cfbolz> What is the data type in your inner lists? Integer indexes?

13:38 <oml> its `Simplex = namedtuple('Simplex', ['ver', 'faces', 'cofaces'])`

13:39 <oml> where ver is a tuple of integers, faces is a list of references (to simplices one lvl down) and cofaces is a list references to simplices (on lvl up)

13:39 <oml> the data structure / graph / flag complex is then a list of lvl 0 simplices, lvl 1 simplices, lvl 2 simplices, etc.

14:04 <cfbolz> Right. So it's all complicated objects, basically

14:05 <cfbolz> As I said, I am not sure what happened to the fork friendly GC idea :-(

14:05 <cfbolz> arigato: do you remember?

14:12 <simpson> oml: A fellow explorer of sheaves!? FWIW last time I did this (not in Python, and not with as much data!) I omitted the cofaces, and required each algorithm to be properly recursive in one direction.

14:12 <ronan> oml: BTW, note that simple classes are a bit more pypy-friendly than namedtuples

15:14 commandoline has quit [Quit: Bye!]

15:18 commandoline has joined #pypy

16:07 <LarstiQ> simplices \o/

16:08 <oml> everything is more simple with simplices ... right? ;)

16:08 <LarstiQ> they sure helped me graduate

16:09 <oml> update for those who are interested: I'm gonna offload the datastructure to an existing project, flagser/pyflagser. If I can figure out how to bind these read-only to python then everything will fit into memory again :)

16:17 <antocuni> oml: years ago I wrote this: https://github.com/antocuni/cffi-shm

16:17 <antocuni> basically, it uses cffi to provide "python like" objects inside a shared memory region

16:18 <antocuni> so you can start multiple processes, each accessing the shared memory, but you need explicit locks of course

16:18 <antocuni> I haven't used it for years though, so I don't even know if it still works

16:20 ronan has quit [Quit: Leaving]

16:32 <LarstiQ> oml: cool, some new software to look into (flagser). Thanks! :)

16:42 <oberstet> oml: fwiw, I've been using FlatBuffers in LMDB and mmap from multiple PyPy processes for such things. great performance / memory usage + transactional safety

16:43 <oberstet> also have a Q myself;) I'm running into a strange traceback coming from the pypy crypt stdlib module: https://gist.github.com/oberstet/9ae5b3bd7ea5a3e27f4c7ac346711310

16:46 <oberstet> maybe because of "The thread module has been renamed to _thread in Python 3." ?

18:55 <cfbolz> oberstet: sounds like a bug. File an issue maybe?

19:23 Dejan has quit [Quit: Leaving]

21:51 Taggnostr has quit [Remote host closed the connection]

21:52 Taggnostr has joined #pypy

22:48 <arigato> oberstet: this was fixed, you must be using not the most recent version of pypy3

22:53 <arigato> oberstet: sorry, the fix missed the most recent release by a few days, so it will only be in the next one