avsm changed the topic of #mirage to: mirage 2 released! party on!
demonimin has quit [Ping timeout: 276 seconds]
demonimin has joined #mirage
demonimin has joined #mirage
demonimin has quit [Remote host closed the connection]
demonimin has joined #mirage
rgrinberg has joined #mirage
rgrinberg has quit [Ping timeout: 244 seconds]
copy` has quit [Quit: Connection closed for inactivity]
tizoc has quit [Ping timeout: 240 seconds]
seangrove has joined #mirage
tizoc has joined #mirage
dexterph has joined #mirage
AltGr has joined #mirage
kensan has quit [Read error: Connection reset by peer]
seangrove has quit [Ping timeout: 252 seconds]
dexterph has quit [Remote host closed the connection]
dexterph has joined #mirage
kensan has joined #mirage
kensan has quit [Client Quit]
kensan has joined #mirage
andreas23 has joined #mirage
mort___ has joined #mirage
mort___ has left #mirage [#mirage]
<mato> hannes: got a minute? i have an interesting problem i could use some help with
<mato> hannes: am load-testing static_website, which is always a good way to find bugs, and it seems io-pages are never being recycled :-(
<mato> hannes: at least that's what i can see by tracing calls to sbrk() in Solo5 -- the unikernel asks for a new io-page for each packet sent or received
<hannes> mato: uh :/
<hannes> mato: in the xen world, the allocation is done in mirage-net-xen, and a pool shared between dom0 and domU is used
<mato> hannes: well, the solo5 netif code comes from the mirage-unix code
<hannes> mato: ic. is it then running out of memory?
<mato> eventually, yes
<mato> and i see several sbrk(0x1000) calls per packet, e.g. when testing with ping
<mato> which seems wrong
<hannes> (and how recent was your mirage-net-unix checkout? last summer thomasga and myself digged down into some leak there... where recv was recursive, but non-tail-recursive)
<hannes> mato: who calls sbrk? io-page allocator?
<mato> hannes: dlmalloc
<mato> hannes: io-page allocator calls posix_memalign() which is part of dlmalloc
<mato> just trying to make sense of the mirage-net-solo5 history now, it seems to have the tail-recursion fixes in it, but not clear how they got there
<hannes> mato: I've to swap in those libraries into my brain... it is currently unclear to me how the OCaml GC should know about the memory allocated by io-page in order to free it.. (since it is allocated out-of-band)
<hannes> s/out-of-band/directly by calls to malloc and not registered to the GC/
<mato> right, i gathered that much from the comments
<mato> i'm wondering if i can just kill the io-page stuff, solo5 does not need the buffers to be page-aligned
<hannes> yes
<hannes> I argued to kill io-page for a long time
<hannes> avsm wants to keep it for unknown reasons
<hannes> you can just delegate to Cstruct.create in the Io_page.get
<mato> do i need to change page_aligned_buffer in netif.mli?
<mato> also, what deals with allocating the io-page on the write path?
<hannes> oh I guess it is a rabbit hole to get rid of io_page properly... yes, the page_aligned_buffer is different... Cstruct.t instead of io_page.t... (which are not the same, damn)
<hannes> on the write path something the tcpip library calls Io_page.get and shifts it a bit around to fill tcp / ip / ethernet headers
sknebel has quit [Quit: sknebel]
sknebel has joined #mirage
<mato> hmm, except i can't substitute Cstruct.t for page_aligned_buffer, since then the interfaces don't match up with types/V1_LWT.mli :(
<hannes> this is not fast code
<hannes> but avoids the C stub
<mato> ok, with that it survives the load test longer, but still eventually runs out of memory
<mato> also, it seems to happily allocate all the heap given to it (2GB with ukvm) before (guessing) any kind of gc kicks in
<mato> the fact that it still runs out of memory eventually suggests that those buffers are not being GC'd
mort___ has joined #mirage
mort___ has quit [Client Quit]
<hannes> mato: I'd assume that the mirage-net-unix code is not well tested...
<hannes> since nobody uses it in production... on unix you'd use the socket stack, or use the xen backend and then mirage-net-xen
<hannes> (and as far as I can tell the mirage-net-xen (1.4.2 is what I use) does not leak)
mort___ has joined #mirage
<hannes> mato: you can manually force a GC (call `Gc.full_major ()`) and get some GC stats (`Gc.stat ()`, see http://caml.inria.fr/pub/docs/manual-ocaml/libref/Gc.html) to gather evidence whether a) GC does not kick in or b) it is leaking
<mato> hannes: thx, yeah, just goint to experiment with that now.
<hannes> mato: I used to call it every other second and look into the live_words data.. (ignoring the minor_words)
mort___ has left #mirage [#mirage]
<mato> ok, ocaml n00b question: i have this code:
<mato> let stats =
<mato> OS.Time.sleep 2.0 >|= fun () ->
<mato> let s = Gc.stat () in
<mato> C.log c (Printf.sprintf "GC: %d %d %d" s.live_words s.minor_collections s.major_collections)
<mato> but can't figure out how to turn it into a loop
<hannes> mato: easiest (for my brain) is a recursive function... make it let rec stats (), change >|= to >>= and append >>= fun () -> stats ()
<hannes> no, append "; stats ()"... no need for another >>=..
<mato> thanks
<mato> so, it seems to be GC'ing something, around line 47, but overall it looks like it's leaking all over the place :(
<mato> this is with your Cstruct-using io-page
<hannes> mato: could you call a Gc.full_major () just before you get stats and print them?
<hannes> (away for lunch)
<mato> bon apetit...
<mato> i've added timestamps, and explicit comments where everything seems to freeze (presumably a different bug)
<mato> so, Gc.full_major() helps, though it's still too slow to stem the leak, the unikernel doesn't survive more than a 5s long siege test
mort___ has joined #mirage
mort___ has quit [Client Quit]
mort___ has joined #mirage
<mato> hannes: tracking this here for now: https://github.com/djwillia/solo5/issues/58
<hannes> hmmyep...
mort___ has quit [Quit: Leaving.]
<hannes> if there's a unix version of that which reproducible leaks, maybe the spacetime https://github.com/ocaml/ocaml/pull/585 memory profiler helps (would be convenient to have this available on solo5&xen as well, but not sure how much work that is)
mort___ has joined #mirage
<mato> i'll test that, and also see if i can get samoht or someone else to help take a look
<mato> it might be best to debug together around a computer when i'm in cambridge next week
<mato> going to fix some minor bugs in solo5 sbrk() / malloc() i found along the way...
<hannes> sure... I'm in .cam and happy to help out on that issue
<hannes> I might also find some time at some moment to look into the mirage-net-solo5..
<hannes> while looking into solo5 the other day, there's some (non-exported) code like memcmp and friends, which are both in solo5/ and in ocaml-freestanding/nolibc... not sure whether it makes sense to unify / share the code between those entities...
mort___ has quit [Quit: Leaving.]
mort___ has joined #mirage
agarwal1975 has joined #mirage
<mato> hannes: that's deliberate -- the code that's in Solo5 is private (used by Solo5 itself) and not intended to be exported.
<hannes> makes sense... though the implementations differ ;) (sorry for my code duplication OCD) ;)
<mato> hannes: Oh, they differ, yes. That's a different issue -- the implementations in Solo5 are Dan's and the ocaml-freestanding ones are what I lifted from musl.
<hannes> ic
<mato> hannes: I will probably unify them at some stage, although by copying, not via a dependency.
<hannes> ack
<mato> hannes: Since the musl implementations are much better.
<hannes> likely makes sense to look at compiler output whether musl's implementations actually make a difference
<mato> I'd just treat them as "known good" and go with them. The attention to detail and edge cases in musl is impressive.
<hannes> :)
rgrinberg has joined #mirage
mort___ has quit [Quit: Leaving.]
mort___ has joined #mirage
copy` has joined #mirage
andreas23 has quit [Quit: Leaving.]
mort___ has quit [Ping timeout: 258 seconds]
mort___ has joined #mirage
andreas23 has joined #mirage
andreas231 has joined #mirage
agarwal1975 has quit [Quit: agarwal1975]
andreas23 has quit [Ping timeout: 276 seconds]
agarwal1975 has joined #mirage
brson has joined #mirage
rgrinberg has quit [Ping timeout: 260 seconds]
mort___ has quit [Quit: Leaving.]
dexterph has quit [Ping timeout: 250 seconds]
agarwal1975 has quit [Quit: agarwal1975]
agarwal1975 has joined #mirage
mort___ has joined #mirage
mort___1 has joined #mirage
mort___ has quit [Read error: Connection reset by peer]
mort___1 has quit [Quit: Leaving.]
rgrinberg has joined #mirage
agarwal1975 has quit [Quit: agarwal1975]
agarwal1975 has joined #mirage
StrykerKKD has joined #mirage
jermar has joined #mirage
insitu has joined #mirage
insitu has quit [Ping timeout: 260 seconds]
jermar has quit [Ping timeout: 240 seconds]
insitu has joined #mirage
insitu has quit [Client Quit]
insitu has joined #mirage
insitu has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
rgrinberg has quit [Ping timeout: 260 seconds]
AltGr has left #mirage [#mirage]
abeaumont has quit [Remote host closed the connection]
StrykerKKD has quit [Quit: Leaving]
rgrinberg has joined #mirage