hannes changed the topic of #mirage to: MirageOS are OCaml unikernels - https://mirage.io - this channel is logged at http://irclog.whitequark.org/mirage/ - MirageOS 3.7.1 is released - happy hacking!
Haudegen has quit [Ping timeout: 246 seconds]
mahmudov has quit [Ping timeout: 265 seconds]
adhux0x0f0x3f has quit [Ping timeout: 240 seconds]
adhux0x0f0x3f has joined #mirage
aion has joined #mirage
ouin has quit [Ping timeout: 240 seconds]
_whitelogger has joined #mirage
jnavila has joined #mirage
jnavila has quit [Quit: Konversation terminated!]
TG[m] has quit [Remote host closed the connection]
copy` has left #mirage ["Kicked by @appservice-irc:matrix.org : User has been idle for 30+ days."]
Haudegen[m] has left #mirage ["Kicked by @appservice-irc:matrix.org : User has been idle for 30+ days."]
xuqzab[m] has left #mirage ["Kicked by @appservice-irc:matrix.org : User has been idle for 30+ days."]
Haudegen has joined #mirage
Haudegen has quit [Quit: Bin weg.]
Haudegen has joined #mirage
kensan has quit [Quit: leaving]
kensan has joined #mirage
mahmudov has joined #mirage
<aion> this is sooooo a fragmentation issue ... now reporting "out of memory" and "65MB+ free" at the same time.
<aion> (well, fragmentation or alignment needs...)
<tg> oi, any mirage related things at fosdem this time?
kit_ty_kate has quit [*.net *.split]
kit_ty_kate has joined #mirage
aion_ has joined #mirage
aion_ is now known as kuya
aion has quit [Ping timeout: 240 seconds]
<hannes> kuya: i read you... i am investigating a similar oom issue in another application (has ~1GB, once 380MB allocated is passed, the probability is very high an out_of_memory is raised)
<kuya> hannes: filed an issue + pinged you by now.
<hannes> now, I already added (as workarounds / see whether it changes anything) cyclic calls to the garbage collector to run a full major collection and compaction...
<kuya> hannes: can not tell about any specific limits. this soooo smells like fragmentation/alignment...
<hannes> yes
<hannes> i can think of two ways forward: attempt to just get rid of any bigarray allocations, or b rebase and restart the statistical memory profiler
<kuya> i can run whatever tests needed. (or spin up the guest vms again)
<hannes> more info at https://jhjourdan.mketjh.fr/pdf/jourdan2016statistically.pdf -- this line of work has now been merged into ocaml-master (there are branches for 4.07 etc. around)
<hannes> i got it to work with mirage (by exporting the stats on request via tcp)
<kuya> i was hoping the "4mb" part might be hint though, because even without any leak/frag issues, 4MB per client VM would be _some_ change.
<hannes> likely will need to adapt some code (+compiler code) to get that up and running, not sure whether i'll make it before marrakesh
<hannes> i don't know of any hardcoded "give me 4MB" in the dependency cone, sorry :/
<kuya> my guess would be "default vchan buf size"
<kuya> (because thats the only thing in the situation where i see the oom that probably cares about alignment)
<hannes> yes, maybe.. i don't know too much about the xen backend, for me its an issue on solo5-hvt / kvm / freebsd bhyve..
<kuya> oh. so not backend specific?
<hannes> i sometimes get some reasonable backtraces from the out of memory exception (usually in sexp_conv :/)
<hannes> yes, well, not entirely sure we have the same issue
<kuya> not seen any stacktraces, the oom always happens in the same place, and the "sometimes i recovers some memory after some hours" part was rather unexpected.
<hannes> but there's this code around https://github.com/mirage/io-page/blob/master/lib/io_page.ml#L36-L45 -- which tries to allocate n pages, and on failure does a gc compaction
<kuya> where can i find your "GC on steroids" version?
<hannes> meh, i should rebase/update that stuff (esp now that in 4.11 we'll get the statmemprof into ocaml-baseline)
<hannes> i don't know whether it still compiles and works (it is likely if it compiles as documented that it'll work)
<hannes> the UI is an emacs "GUI" (where i already forgot what's in the user interface)
<kuya> *frowns* ... considering to just add some forced GC.compact calls to the main loop...
<hannes> the code I linked to above, Io_page.get -- is something getting less call sites in newer mirage versions (for various reasons, usually not needed); this also means less gc compactions (which may or may not be related..)
<hannes> there were also changes in the ocaml runtime system (when a collection is triggered), but i don't remember which version that was introduced..
<hannes> but now i just drink beer and think i'll do something else with my life ;p
<hannes> you could try a Gc.compact every 10s to see if it solves anything for you
<kuya> how would i do a "every 10s" kind of thing? (as in, whats the keyword to google for?)
<hannes> Lwt.async (fun () -> let rec loop () = Gc.compact () ; Time.sleep_ns (Duration.of_sec 10) >>= fun () -> loop () in loop ());
<kuya> hm. i could just compact from the mempressure reporting code ...
<hannes> sure, that as well
<hannes> that's likely easier :)
<hannes> Otherwise you've to hunt for a Time implementation (or use OS.Time.sleep_ns)
<kuya> added in front of mem reporting, built+deployed, lets see how that goes.
<hannes> cool
<hannes> in your report, would you mind to specify which OCaml version you are using?
<kuya> added note on versions to the issue post. basicly 4.08.1 with netchannel and mirage-net-xen pinned to --dev
_whitelogger has joined #mirage