adhux0x0f0x3f has quit [Ping timeout: 240 seconds]
adhux0x0f0x3f has joined #mirage
aion has joined #mirage
ouin has quit [Ping timeout: 240 seconds]
_whitelogger has joined #mirage
jnavila has joined #mirage
jnavila has quit [Quit: Konversation terminated!]
TG[m] has quit [Remote host closed the connection]
copy` has left #mirage ["Kicked by @appservice-irc:matrix.org : User has been idle for 30+ days."]
Haudegen[m] has left #mirage ["Kicked by @appservice-irc:matrix.org : User has been idle for 30+ days."]
xuqzab[m] has left #mirage ["Kicked by @appservice-irc:matrix.org : User has been idle for 30+ days."]
Haudegen has joined #mirage
Haudegen has quit [Quit: Bin weg.]
Haudegen has joined #mirage
kensan has quit [Quit: leaving]
kensan has joined #mirage
mahmudov has joined #mirage
<aion>
this is sooooo a fragmentation issue ... now reporting "out of memory" and "65MB+ free" at the same time.
<aion>
(well, fragmentation or alignment needs...)
<tg>
oi, any mirage related things at fosdem this time?
kit_ty_kate has quit [*.net *.split]
kit_ty_kate has joined #mirage
aion_ has joined #mirage
aion_ is now known as kuya
aion has quit [Ping timeout: 240 seconds]
<hannes>
kuya: i read you... i am investigating a similar oom issue in another application (has ~1GB, once 380MB allocated is passed, the probability is very high an out_of_memory is raised)
<kuya>
hannes: filed an issue + pinged you by now.
<hannes>
now, I already added (as workarounds / see whether it changes anything) cyclic calls to the garbage collector to run a full major collection and compaction...
<kuya>
hannes: can not tell about any specific limits. this soooo smells like fragmentation/alignment...
<hannes>
yes
<hannes>
i can think of two ways forward: attempt to just get rid of any bigarray allocations, or b rebase and restart the statistical memory profiler
<kuya>
i can run whatever tests needed. (or spin up the guest vms again)
<hannes>
i got it to work with mirage (by exporting the stats on request via tcp)
<kuya>
i was hoping the "4mb" part might be hint though, because even without any leak/frag issues, 4MB per client VM would be _some_ change.
<hannes>
likely will need to adapt some code (+compiler code) to get that up and running, not sure whether i'll make it before marrakesh
<hannes>
i don't know of any hardcoded "give me 4MB" in the dependency cone, sorry :/
<kuya>
my guess would be "default vchan buf size"
<kuya>
(because thats the only thing in the situation where i see the oom that probably cares about alignment)
<hannes>
yes, maybe.. i don't know too much about the xen backend, for me its an issue on solo5-hvt / kvm / freebsd bhyve..
<kuya>
oh. so not backend specific?
<hannes>
i sometimes get some reasonable backtraces from the out of memory exception (usually in sexp_conv :/)
<hannes>
yes, well, not entirely sure we have the same issue
<kuya>
not seen any stacktraces, the oom always happens in the same place, and the "sometimes i recovers some memory after some hours" part was rather unexpected.
<hannes>
meh, i should rebase/update that stuff (esp now that in 4.11 we'll get the statmemprof into ocaml-baseline)
<hannes>
i don't know whether it still compiles and works (it is likely if it compiles as documented that it'll work)
<hannes>
the UI is an emacs "GUI" (where i already forgot what's in the user interface)
<kuya>
*frowns* ... considering to just add some forced GC.compact calls to the main loop...
<hannes>
the code I linked to above, Io_page.get -- is something getting less call sites in newer mirage versions (for various reasons, usually not needed); this also means less gc compactions (which may or may not be related..)
<hannes>
there were also changes in the ocaml runtime system (when a collection is triggered), but i don't remember which version that was introduced..
<hannes>
but now i just drink beer and think i'll do something else with my life ;p
<hannes>
you could try a Gc.compact every 10s to see if it solves anything for you
<kuya>
how would i do a "every 10s" kind of thing? (as in, whats the keyword to google for?)
<hannes>
Lwt.async (fun () -> let rec loop () = Gc.compact () ; Time.sleep_ns (Duration.of_sec 10) >>= fun () -> loop () in loop ());
<kuya>
hm. i could just compact from the mempressure reporting code ...
<hannes>
sure, that as well
<hannes>
that's likely easier :)
<hannes>
Otherwise you've to hunt for a Time implementation (or use OS.Time.sleep_ns)
<kuya>
added in front of mem reporting, built+deployed, lets see how that goes.
<hannes>
cool
<hannes>
in your report, would you mind to specify which OCaml version you are using?
<kuya>
added note on versions to the issue post. basicly 4.08.1 with netchannel and mirage-net-xen pinned to --dev