cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
oberstet has quit [Quit: Leaving]
jcea has quit [Ping timeout: 258 seconds]
wleslie has joined #pypy
todda7 has quit [Ping timeout: 268 seconds]
proteusguy has quit [Remote host closed the connection]
todda7 has joined #pypy
proteusguy has joined #pypy
todda7 has quit [Ping timeout: 272 seconds]
todda7 has joined #pypy
lesshaste has joined #pypy
lesshaste is now known as Guest38903
leshaste has quit [Read error: Connection reset by peer]
oberstet has joined #pypy
Guest38903 has quit [Quit: Leaving]
todda7 has quit [Ping timeout: 265 seconds]
wleslie has quit [Quit: ~~~ Crash in JIT!]
todda7 has joined #pypy
oml has joined #pypy
<oml> Hi! I'm wondering if pypy together with os.fork() would provide easily shareable read-only(!) memory
<oml> my problem is as follows: I want to do some analysis on a 122GB chunk of data. Yes, I can't really split it up further without fundamentally thinking about the algorithms involved in the analysis.
<oml> As the analysis can be done in a map-reduce-fashion I want to make use of the 32 cores the server provides.
<oml> But right now pypy + os.fork() seem to not make use of the copy-on-write-mechanism my linux should provide
<LarstiQ> which memory would be shared? an mmap? Does that need particular flags for that to work?
<oml> LarstiQ: right now the structure is a lists of lists, each saves nodes in a undirected graph
<cfbolz> oml: yeah, that's a bit tough, that needs support from the GC
<oml> i was considering just disabling the garbage collector after loading was done
<cfbolz> We always had plans to have a more fork friendly GC
<cfbolz> I don't know what happened to these
<oml> cause the analysis itself needs about O(1) memory per process, so it would be ok in the remaining 6GB of the machine
<oml> I mean I understand that with CPython and the refcounts the Copy-On-Write-Mechanism would be triggered
<oml> But I'm not at all sure how the whole refcount-stuff works in PyPy
<oml> (it's my first pypy project, so please bear with me)
<oml> LarstiQ: as far as I can see my stuff is currently not mmap-able, but it might be
<oml> on the other hand: If I rewrite the stuff anyhow I can just do it in C/C++/Rust, thats the end goal anyway
<oml> unfortunately I can't justify getting a server with a few terrabytes of RAM to my boss. He would be getting suspicious :D
jcea has joined #pypy
dddddd has quit [Ping timeout: 256 seconds]
dddddd has joined #pypy
<cfbolz> oml: we don't use refcounting
<cfbolz> But there is still an object for every object and we write to that
<cfbolz> What is the data type in your inner lists? Integer indexes?
<oml> its `Simplex = namedtuple('Simplex', ['ver', 'faces', 'cofaces'])`
<oml> where ver is a tuple of integers, faces is a list of references (to simplices one lvl down) and cofaces is a list references to simplices (on lvl up)
<oml> the data structure / graph / flag complex is then a list of lvl 0 simplices, lvl 1 simplices, lvl 2 simplices, etc.
<cfbolz> Right. So it's all complicated objects, basically
<cfbolz> As I said, I am not sure what happened to the fork friendly GC idea :-(
<cfbolz> arigato: do you remember?
<simpson> oml: A fellow explorer of sheaves!? FWIW last time I did this (not in Python, and not with as much data!) I omitted the cofaces, and required each algorithm to be properly recursive in one direction.
<ronan> oml: BTW, note that simple classes are a bit more pypy-friendly than namedtuples
commandoline has quit [Quit: Bye!]
commandoline has joined #pypy
<LarstiQ> simplices \o/
<oml> everything is more simple with simplices ... right? ;)
<LarstiQ> they sure helped me graduate
<oml> update for those who are interested: I'm gonna offload the datastructure to an existing project, flagser/pyflagser. If I can figure out how to bind these read-only to python then everything will fit into memory again :)
<antocuni> oml: years ago I wrote this: https://github.com/antocuni/cffi-shm
<antocuni> basically, it uses cffi to provide "python like" objects inside a shared memory region
<antocuni> so you can start multiple processes, each accessing the shared memory, but you need explicit locks of course
<antocuni> I haven't used it for years though, so I don't even know if it still works
ronan has quit [Quit: Leaving]
<LarstiQ> oml: cool, some new software to look into (flagser). Thanks! :)
<oberstet> oml: fwiw, I've been using FlatBuffers in LMDB and mmap from multiple PyPy processes for such things. great performance / memory usage + transactional safety
<oberstet> also have a Q myself;) I'm running into a strange traceback coming from the pypy crypt stdlib module: https://gist.github.com/oberstet/9ae5b3bd7ea5a3e27f4c7ac346711310
<oberstet> maybe because of "The thread module has been renamed to _thread in Python 3." ?
<cfbolz> oberstet: sounds like a bug. File an issue maybe?
Dejan has quit [Quit: Leaving]
Taggnostr has quit [Remote host closed the connection]
Taggnostr has joined #pypy
<arigato> oberstet: this was fixed, you must be using not the most recent version of pypy3
<arigato> oberstet: sorry, the fix missed the most recent release by a few days, so it will only be in the next one