cfbolz changed the topic of #pypy to: PyPy, the flexible snake (IRC logs: https://quodlibet.duckdns.org/irc/pypy/latest.log.html#irc-end ) | use cffi for calling C | if a pep adds a mere 25-30 [C-API] functions or so, it's a drop in the ocean (cough) - Armin
oberstet has quit [Quit: Leaving]
jcea has quit [Ping timeout: 258 seconds]
wleslie has joined #pypy
todda7 has quit [Ping timeout: 268 seconds]
proteusguy has quit [Remote host closed the connection]
todda7 has joined #pypy
proteusguy has joined #pypy
todda7 has quit [Ping timeout: 272 seconds]
todda7 has joined #pypy
lesshaste has joined #pypy
lesshaste is now known as Guest38903
leshaste has quit [Read error: Connection reset by peer]
oberstet has joined #pypy
Guest38903 has quit [Quit: Leaving]
todda7 has quit [Ping timeout: 265 seconds]
wleslie has quit [Quit: ~~~ Crash in JIT!]
todda7 has joined #pypy
oml has joined #pypy
<oml>
Hi! I'm wondering if pypy together with os.fork() would provide easily shareable read-only(!) memory
<oml>
my problem is as follows: I want to do some analysis on a 122GB chunk of data. Yes, I can't really split it up further without fundamentally thinking about the algorithms involved in the analysis.
<oml>
As the analysis can be done in a map-reduce-fashion I want to make use of the 32 cores the server provides.
<oml>
But right now pypy + os.fork() seem to not make use of the copy-on-write-mechanism my linux should provide
<LarstiQ>
which memory would be shared? an mmap? Does that need particular flags for that to work?
<oml>
LarstiQ: right now the structure is a lists of lists, each saves nodes in a undirected graph
<cfbolz>
oml: yeah, that's a bit tough, that needs support from the GC
<oml>
i was considering just disabling the garbage collector after loading was done
<cfbolz>
We always had plans to have a more fork friendly GC
<cfbolz>
I don't know what happened to these
<oml>
cause the analysis itself needs about O(1) memory per process, so it would be ok in the remaining 6GB of the machine
<oml>
I mean I understand that with CPython and the refcounts the Copy-On-Write-Mechanism would be triggered
<oml>
But I'm not at all sure how the whole refcount-stuff works in PyPy
<oml>
(it's my first pypy project, so please bear with me)
<oml>
LarstiQ: as far as I can see my stuff is currently not mmap-able, but it might be
<oml>
on the other hand: If I rewrite the stuff anyhow I can just do it in C/C++/Rust, thats the end goal anyway
<oml>
unfortunately I can't justify getting a server with a few terrabytes of RAM to my boss. He would be getting suspicious :D
jcea has joined #pypy
dddddd has quit [Ping timeout: 256 seconds]
dddddd has joined #pypy
<cfbolz>
oml: we don't use refcounting
<cfbolz>
But there is still an object for every object and we write to that
<cfbolz>
What is the data type in your inner lists? Integer indexes?
<oml>
its `Simplex = namedtuple('Simplex', ['ver', 'faces', 'cofaces'])`
<oml>
where ver is a tuple of integers, faces is a list of references (to simplices one lvl down) and cofaces is a list references to simplices (on lvl up)
<oml>
the data structure / graph / flag complex is then a list of lvl 0 simplices, lvl 1 simplices, lvl 2 simplices, etc.
<cfbolz>
Right. So it's all complicated objects, basically
<cfbolz>
As I said, I am not sure what happened to the fork friendly GC idea :-(
<cfbolz>
arigato: do you remember?
<simpson>
oml: A fellow explorer of sheaves!? FWIW last time I did this (not in Python, and not with as much data!) I omitted the cofaces, and required each algorithm to be properly recursive in one direction.
<ronan>
oml: BTW, note that simple classes are a bit more pypy-friendly than namedtuples
commandoline has quit [Quit: Bye!]
commandoline has joined #pypy
<LarstiQ>
simplices \o/
<oml>
everything is more simple with simplices ... right? ;)
<LarstiQ>
they sure helped me graduate
<oml>
update for those who are interested: I'm gonna offload the datastructure to an existing project, flagser/pyflagser. If I can figure out how to bind these read-only to python then everything will fit into memory again :)
<antocuni>
basically, it uses cffi to provide "python like" objects inside a shared memory region
<antocuni>
so you can start multiple processes, each accessing the shared memory, but you need explicit locks of course
<antocuni>
I haven't used it for years though, so I don't even know if it still works
ronan has quit [Quit: Leaving]
<LarstiQ>
oml: cool, some new software to look into (flagser). Thanks! :)
<oberstet>
oml: fwiw, I've been using FlatBuffers in LMDB and mmap from multiple PyPy processes for such things. great performance / memory usage + transactional safety