<whitequark>
Drup: yes, that's fine. but I wish it would be possible to use syntax(foo).
<Drup>
whitequark: soon™
<whitequark>
\o/
<companion_cube>
oh, you can put attributes on fun% ?
<Drup>
yes
<whitequark>
yeah, pretty much everywhere
<Drup>
on all syntactic constructs
dapz has joined #ocaml
<Drup>
whitequark: I don't know if you saw it, but you should be able to put them in top level bindings too, now
<whitequark>
[@@@foo] ?
<Drup>
no, "let%foo ..."
<Drup>
(not let in)
<whitequark>
ah
<companion_cube>
so I guess while%lwt and such will remain
<companion_cube>
sooo nice
<Drup>
yes
<Drup>
but not the "finally" :/
<Drup>
still not sure about this one
<companion_cube>
try with e -> [@@finally "yolo"] ?
<whitequark>
^
<Drup>
not very nice, to say the least.
<Drup>
"That camlp4 can handle OCaml syntax (two OCaml syntaxes, in fact, the original one and a revised one introduced specifically for camlp4) is just a special case."
<whitequark>
oh
<Drup>
there is some grammatical weirdness in this sentence.
<whitequark>
I accidentally a word
<companion_cube>
ah, GETENV
<companion_cube>
I'd like something like this to embed the current commit hash in the code
<whitequark>
companion_cube: yep. I see no point in spending bytes on explaining how to extract attributes from perhaps the most boring AST I've seen
<whitequark>
so I just cut it down to the simplest possible example
<companion_cube>
indeed
<whitequark>
I'll probably implement your commit hash thing, to experiment with packaging
<whitequark>
and publish
<whitequark>
by the way, any idea on naming conventions for ppx extensions? ppx_thing perhaps? similar to pa_thing.
<Drup>
I think ppx_thingy is good
<companion_cube>
my_little_ppx_foobar
<whitequark>
my_little_ppx_friendship_is_magic
<whitequark>
(alluding to the way ppx extensions easily interoperate)
<companion_cube>
whitequark: this is good. You should post it on reddit :)
<Drup>
whitequark: you forgot to mention the important bit
<malvarez>
I had completely overlooked the pos_bol field. of course bol stands for beginning of line...
<Drup>
self explanatory, isn't it ? :D
<Drup>
(tbh, I don't think I actually wrote this code, I probably stole it to someone)
<Drup>
(if I did write it, I didn't remember)
<Drup>
I don't*
dapz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
lopex has quit [Quit: Connection closed for inactivity]
tnguyen has quit [Ping timeout: 245 seconds]
cesar_ has joined #ocaml
cesar_ is now known as Guest66686
Guest66686 has quit [Remote host closed the connection]
everyonemines has joined #ocaml
nikki93 has quit [Remote host closed the connection]
lostcuaz has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
rgrinberg has quit [Quit: Leaving.]
nikki93 has joined #ocaml
tnguyen has joined #ocaml
nikki93 has quit [Ping timeout: 258 seconds]
nikki93 has joined #ocaml
ygrek has joined #ocaml
* whitequark
headdesks
<whitequark>
the reason camlp4 build failed was that I had a directory called camlp4 in cdpath, and camlp4's build script does 'cd camlp4'
<whitequark>
it still explodes for some entirely unrelated reason though
<whitequark>
can't find debug.ml, apparently
tnguyen_ has joined #ocaml
<whitequark>
how does one get a camlp4boot.native ?
<ygrek>
cdpath works in scripts??
<ygrek>
pure crazy
<whitequark>
ygrek: it also prints the path to stdout, exacebrating the failure
<whitequark>
hmm, apparently ocaml tree used to include camlp4/boot/camlp4boot.ml, but camlp4 tree does not
nikki93 has quit [Ping timeout: 276 seconds]
nikki93 has joined #ocaml
<whitequark>
oddly, transplanting it into camlp4 tree produces a circular build dependency
* whitequark
starts to wonder whether anyone has ever tried to build it at all
<whitequark>
jpdeplaix: were you ever able to build camlp4 successfully with your overlay?
<whitequark>
ohhh. I should have specified --prefix.
<whitequark>
... no, it actually ignores --prefix. I should have performed `make all' on a really clean tree. the cdpath thing screwed something up. nevermind all of the above.
Don_Pellegrino has joined #ocaml
arjunguha has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Don_Pellegrino has quit [Ping timeout: 265 seconds]
everyonemines has quit [Quit: Leaving.]
tnguyen_ has quit [Ping timeout: 245 seconds]
zxqdms has quit [Quit: leaving]
tnguyen_ has joined #ocaml
divyanshu has quit [Quit: Computer has gone to sleep.]
tnguyen_ has quit [Ping timeout: 265 seconds]
tnguyen_ has joined #ocaml
malvarez has quit [Remote host closed the connection]
<whitequark>
is it normal that compiling camlp4boot takes over 7.6G of RAM and more than 55 minutes already?
yacks has joined #ocaml
divyanshu has joined #ocaml
<whitequark>
poking it with gdb reveals that it has unbounded recursion with a cycle somewhere inside:
<whitequark>
#12696 0x00000000004f1150 in camlBtype__iter_type_expr_1466 ()
<whitequark>
#12697 0x00000000004f189f in camlBtype__it_type_expr_1518 ()
<whitequark>
PR6371.
tnguyen_ has quit [Ping timeout: 265 seconds]
tnguyen_ has joined #ocaml
Arsenik has quit [Remote host closed the connection]
tristero has quit [Ping timeout: 240 seconds]
siddharthv_away is now known as siddharthv
manizzle has quit [Ping timeout: 240 seconds]
pyon has joined #ocaml
tristero has joined #ocaml
axiles has joined #ocaml
<tautologico>
is that a compiler bug?
<whitequark>
seems so
<whitequark>
currently bisecting it
araujo has quit [Quit: Leaving]
ggole has joined #ocaml
michel_mno_afk is now known as michel_mno
ocp has joined #ocaml
<whitequark>
so... fb74ef5e51a is responsible
<whitequark>
fixed. garrigue is quick!
<adrien>
oh, HEAD~ is also fairly interesting (requirement for C99)
<adrien>
s/for/of/
<whitequark>
adrien: hm? which commit?
<whitequark>
oh, nevermind, I see.
<adrien>
the one before the fix
Submarine has joined #ocaml
Submarine has quit [Changing host]
Submarine has joined #ocaml
ygrek has quit [Ping timeout: 258 seconds]
yezariaely has joined #ocaml
nikki93 has quit [Remote host closed the connection]
yezariaely has quit [Quit: Leaving.]
ggole has quit []
yezariaely has joined #ocaml
ygrek has joined #ocaml
nikki93 has joined #ocaml
nikki93 has quit [Ping timeout: 276 seconds]
Simn has joined #ocaml
Kakadu has joined #ocaml
ocp has quit [Ping timeout: 258 seconds]
yacks has quit [Ping timeout: 245 seconds]
avsm has joined #ocaml
lordkryss has joined #ocaml
rand000 has joined #ocaml
avsm has quit [Quit: Leaving.]
ggole has joined #ocaml
tane has joined #ocaml
yacks has joined #ocaml
ikaros has joined #ocaml
AltGr has joined #ocaml
avsm has joined #ocaml
tautologico has quit [Quit: Connection closed for inactivity]
thomasga has joined #ocaml
zpe has joined #ocaml
yezariaely has quit [Quit: Leaving.]
avsm has quit [Quit: Leaving.]
tobiasBora has joined #ocaml
zarul has quit [Read error: Connection reset by peer]
zarul has joined #ocaml
zarul has quit [Changing host]
zarul has joined #ocaml
shinnya has joined #ocaml
tane has quit [Quit: Verlassend]
maattdd has joined #ocaml
<jpdeplaix>
01:24:45 whitequark | I mean, it doesn't even start to build stuff. it just dies somewhere inside ocamlbuild // that's why I said that it's better with pr20 (but you can trick this by: install ocamlfind; install camlp4; reinstall ocamlfind)
<jpdeplaix>
Yes, it's a little bit boring :D
<whitequark>
jpdeplaix: nah, my problem was unrelated to pr20
<whitequark>
it was $CDPATH and dirty tree
<whitequark>
I mean, I'd eventually have bumped into lack of camlp4 META, but not yet
<jpdeplaix>
what's your error message ?
<jpdeplaix>
mmmh I just saw your mantis ticket
<jpdeplaix>
well, ok. I didn't tried to compile camlp4 with trunk recently
<whitequark>
jpdeplaix: that's actually a third, unrelated problem :D
<jpdeplaix>
:DD
<whitequark>
and now I have fourth, camlp4 isn't quite updated enough
<Drup>
whitequark: remind me, why do you want to compile camlp4 against trunk ?
<Drup>
I mean, you don't have to inflict that to yourself
<whitequark>
Drup: ppx works only in trunk. everything else uses camlp4
<whitequark>
e.g. I would depend on lwt and oasis and probably other things
<whitequark>
I want to write my fancy protobuf library over ppx already.
<Drup>
oh, right, you want to compile stuff that use camlp4
<Drup>
fair enough
<whitequark>
I can't even use utop without that
thomasga has quit [Quit: Leaving.]
<gasche>
I think Jérémie plans to upgrade camlp4 to be correct wrt. trunk only after the feature freeze for 4.02
<gasche>
of course, that makes sense for new syntactic constructs to support
<gasche>
but one should still check that camlp4, without support for new constructs, at least compiles and work as expected, because that allows to spot regressions in the compiler
<gasche>
(as your typing issue)
dapz has joined #ocaml
<gasche>
only it should be OCaml and/or Camlp4's maintainers doing the checking work, not an almost-innocent end-user
<whitequark>
gasche: I've just talked with Anil, he plans to do it sooner
<whitequark>
or maybe me, if I would become free earlier
* whitequark
shrugs
<mrvn>
whitequark: CDPATH is evil
<whitequark>
gasche: I'm not used to technology telling me what I can't do. so, often I have to fix it myself. :)
<whitequark>
gasche: actually, camlp4 is mostly upgraded wrt/ trunk. it only misses annotations on one or two nodes, I believe.
<whitequark>
it's not a lot of work at this point
<gasche>
feel free to do the work
<gasche>
but it does have the good property that if you don't, someone else will do it
<whitequark>
yeah.
<gasche>
(which cannot be said of other things in the OCaml ecosystem)
<gasche>
(eg. reviewing Benoît's format+gadt work, which is my focus right now)
<gasche>
amusing format bug: (Printf.printf "%.+f" 3.5)
lopex has joined #ocaml
<whitequark>
weird
<whitequark>
gasche: (other things) yeah, there's a lot of very interesting in-flight patches. ppx and gadt-format included.
jave has quit [Read error: Operation timed out]
<whitequark>
also, record constructors are quite awesome.
<companion_cube>
can't wait
<companion_cube>
:)
<gasche>
I don't think record constructors will be merged in 4.02
<gasche>
(but that's only a personal guess)
thomasga has joined #ocaml
<whitequark>
I'm actually quite happy with the evolution of ocaml, compared to what I've seen in other languages
<whitequark>
initially I thought it would be far too conservative, but now I see that it is not the case at all
<whitequark>
as a side note, I really should write a proper LLVM backend sometimes, with all the talk about ocamlopt not inlining things where it should
<whitequark>
cmm is less than 100 lines. should be trivial to translate.
<companion_cube>
I think a llvm backend has been discussed many times
tobiasBora has quit [Ping timeout: 246 seconds]
<whitequark>
I think it's been tried at least twice
<companion_cube>
problems being the GC or the calling convention
<ggole>
Does LLVM support precise GC now?
<whitequark>
ggole: there's been some promising work in that direction
<ggole>
(There was somebody beginning work on that.)
<ggole>
Right.
<whitequark>
companion_cube: well... calling convention is simple. GC is more problematic, yes
<whitequark>
how does ocaml handle roots in registers?
<ousado>
does it have any GC?
<mrvn>
whitequark: it doesn't.
<ggole>
Everything is done with frametables afaik.
<NoNNaN>
whitequark: what do you think it is possible to create something like this for ocaml? https://scala-lms.github.io/
<whitequark>
ousado: LLVM requires you to spill all roots on stack. from that, you can generate stackmaps with custom C++ code
<whitequark>
mrvn: ggole: then it sounds like LLVM and OCaml are a perfect match. I'd need to think further about it. I think I read an elaborate description of the problems somewhere.
<whitequark>
NoNNaN: BER-MetaOCaml ?
<mrvn>
When ocaml does a function call does it even keep any values in registers?
<mrvn>
ocamlopt that is
<whitequark>
mrvn: I think it spills everything, but mostly to make setjmp/longjmp very fast
jave has joined #ocaml
<whitequark>
well, setjmp to be specific. with cdecl, it has to spill stuff. with ocaml's calling convention, it's a mov.
<companion_cube>
hmm, isn't one supposed to put everything on the stack in llvm?
<companion_cube>
leaving llvm itself decide of what goes in registers?
<companion_cube>
(and handle GC roots, hopefully)
avsm has joined #ocaml
<NoNNaN>
whitequark: i have checked it, it does not seems to possible without costly abstractions
<whitequark>
companion_cube: it's the other way around. LLVM is a register machine. but there's a catch.
<whitequark>
companion_cube: the LLVM gc intrinsics don't accept a *register*. they accept an *address*. so you have to explicitly alloca a stack slot.
<companion_cube>
oh
<NoNNaN>
whitequark: you can codegen your own gc from llvm, something like hlvm does
<whitequark>
NoNNaN: it's pointless. I mean, it's not an improvement over the scheme that LLVM already has.
<whitequark>
HLVM doesn't support roots in registers as well, and frankly I'm not quite see the point of the project as it exists
<whitequark>
I mean, it's not really high-level at this point. *shrug*
<mrvn>
I have to implement my own Thread module. So I'm wondering if I need to dump registers into memory and register them as root in some way. Is everything already spilled and registered when the GC calls one of the hooks?
<whitequark>
mrvn: since GC can only be called from an allocation routine, and with OCaml's calling convention ocamlopt would spill everything, yes
<ousado>
'since GC can only be called from an allocation routine' -why?
<mrvn>
ousado: noalloc functions aren't allowed to call the GC.
<ousado>
in llvm?
<whitequark>
ousado: um... I think that's just how it works? I mean, you can only call GC from safepoints
<whitequark>
how it works in OCaml
<ousado>
ok
<mrvn>
ousado: the noalloc keywoard say that the function doesn't call the GC so registers don't need to be spilled. makes them faster.
<NoNNaN>
probably a dumb question, but it is possible to create a subset of ocaml that has linear typing (something like linearml)? than memory usage is known, gc is not required, and llvm backend may could target architectures where gc is not yet possible (ptx, r600, hsail, fpga, etc)
<mrvn>
ousado: best not to use it
<mrvn>
NoNNaN: how do you implement List.map then?
* whitequark
sighs
<Drup>
NoNNaN: afaik, mezzo is sort of going this way
<Drup>
except it's not really a subset of ocaml, more like slightly different :)
<whitequark>
what is up with this odd fetishization of FPGAs as compiler backends?!
<whitequark>
it's not a magical sekret sauce that makes your program fast. if it's written in von neumann style, and chances are that it is, then a von neumann CPU is the best thing for running it
<Drup>
mrvn: it's already there, it's called spoc
<whitequark>
really, if something compiles down to GPUs or FPGAs, it just means that you arbitrarily select a few language construct and make them optimal, and everything else either doesn't compile, or is so horribly inefficient you wish it didn't
<whitequark>
I mean, even look at verilog (or vhdl), the native language for FPGAs. it's essentially an abstract logic description language from which the pattern matcher in the FPGA toolchain selects the parts it likes
<mrvn>
NoNNaN: How doesn't that require memory allocations?
<whitequark>
in result, you spend your days twiddling your code until it has just the right syntactic form to generate just the right RTL for your target FPGA
<NoNNaN>
mrvn: in linear typing you use your variables exactly once, so the memory allocation is known at compile time
<whitequark>
there's a few great developments in that area (e.g. migen), but they step even further from traditional languages. </rant>
<mrvn>
NoNNaN: but the amount of memory depends on the length of the list.
<whitequark>
mrvn: I would think that the issue here is not allocation, but destruction
<whitequark>
since you use every cell exactly once, you allocate it at creation and deallocate it at usage.
<ggole>
There are region systems in which you can get rid of GC
<ggole>
But they can use huge amounts of memory
<whitequark>
ggole: rust's region system doesn't, but I believe it's substantially differs from ml-with-regions (however was it called?)
<mrvn>
whitequark: Ok. And when you need it twice you have to call some special function that gives you 2 copies and destroys the input?
<ggole>
MLKit tried to make it work
<whitequark>
mrvn: I guess so
<mrvn>
I think I was thinking of region systems
<whitequark>
ggole: I believe the problem with MLKit is the odd way they interpret regions--dynamic arenas associated with stack frames, where values are allocated. in Rust, a linear type is just a wrapper for malloc() and free()
<mrvn>
I've been trying to design a system that would not need dynamic allocation.
<ggole>
The benefit is that you can free the region in constant time
<whitequark>
well, or you can allocate it on stack, but you have to know statically how much stack would you need.
<whitequark>
ggole: the drawback is that you cannot transfer ownership.
<mrvn>
whitequark: and that's where I got stuck
<ggole>
If you have allocation patterns that are stack-shaped, or close, that can work
<ggole>
But when you have uncertain escapey lifetimes it seems problematic
<whitequark>
ggole: and if Rust shows something, it's that passing ownership is an extremely powerful mechanism that makes abstractions work in a region-based lang
Hannibal_Smith has joined #ocaml
<mrvn>
You end up with the halting problem. Deciding how much stack to reserve for arbitrary code is equivalent to the halting problem.
<ggole>
I dunno how that works in a functional lang that depends on persistent structures with sharing, though
<ggole>
You don't have to put regions on the stack
<whitequark>
ggole: Rust has strong/weak reference counting for that.
<mrvn>
whitequark: ref counts breaks with mutables
<ggole>
You can put each one in its own, potentially large, memory area
<whitequark>
mrvn: hm?
<whitequark>
ggole: I know, I know. but still, expandable regions which are the only place to allocate are just asking for bloat trouble
<mrvn>
whitequark: mutables allow cyclic structures and then the refcount never reaches 0
<whitequark>
mrvn: hence strong/weak. that scheme disallows cycles.
<ggole>
It might also be possible to back region storage with GC, so that once a fixed-size region is filled, the rest is allocated normally
studybot has quit [Remote host closed the connection]
<whitequark>
mrvn: essentially, you have a tree of strong pointers, and some backedges via weak ones.
<ggole>
Then you can free the stuff in the fixed-size part in constant time, and have to rely on a tracing GC for the rest as usual
<mrvn>
ggole: or allocate things the compiler can proof in a region and unknown stuff in heap.
<Hannibal_Smith>
ggole, this is something not needed in a generational GC, or am I wrong?
<ggole>
Well, you can always fit everything in a region
<whitequark>
ggole: that's a really odd way to solve it. maybe the right thing is not to try to couple lifetimes and allocation arenas?
<ggole>
Since your program has entry and exit points
<ggole>
The problem is bounding the region size
<ggole>
whitequark: yeah, I don't think it works except in certain situations
<ggole>
And I'm not sure that the compiler can tell when those occur reliably
<ggole>
Hannibal_Smith: we're discussing alternatives to GC
<Hannibal_Smith>
NoNNaN, generally when some high level language is faster than C, it because the GC didn't starts compacting
<NoNNaN>
for some problems, I would like to use the subset of the language (not solve every problem here) that can run as fast as it can, without gc, it can run on gpu, or other architectures
<Hannibal_Smith>
ggole, ok sorry
<whitequark>
NoNNaN: the problem is that "it can run on gpu, or other architectures" is so poorly defined, it doesn't define any specific subset of a language at all.
<whitequark>
and often the requirements are so complex it is simply not viable to express them formally in a language specification.
<ggole>
I can imagine having an annotation for "make this thing not escape upwards"
<NoNNaN>
whitequark: I already mentioned, a custom dsl, where I can control the abstraction cost, eg.: https://scala-lms.github.io/
<whitequark>
e.g. take a look at the mess in C and fused multiply-add
<Drup>
NoNNaN: did you had a look at spoc ?
<whitequark>
NoNNaN: that looks really similar to metaocaml, if I understand it correctly
<NoNNaN>
Drup: yes, I did, unfortunately, not the same
<whitequark>
why do you say it is not?
<whitequark>
oh, well, no heterogenous targets. I would think this is not a fundamental limitation of metaocaml, though.
<NoNNaN>
because I would like to control the cost of abstractions
<ggole>
It'd be nice to have nice packed representations for data types, too
<ggole>
Stuffing an 'a option into a word if the 'a fits, etc
<ggole>
And there are some interesting list compaction tricks
<mrvn>
ggole: that doesn't work with polymorphism
<ggole>
Indeed, you need to specialise for that.
ygrek has quit [Ping timeout: 258 seconds]
<ggole>
It would require a very different implementation.
<mrvn>
or polymonomorphism
<NoNNaN>
there are some work in this area: "abstraction without regret"
<ggole>
MLton goes some distance down that road
<mrvn>
It would be nice to have a type foo = packed { ... }
<ggole>
They fully monomorphise and defunctionalise too
<ggole>
But, whole program.
<NoNNaN>
database systems also have extreme specializations to improve the instruction per clock cycle, eg.: monetdb will generate specialized code for every primitive operation for every type
<whitequark>
iptables, too
<whitequark>
and I think tcpdump?
<Hannibal_Smith>
(this is very similar to what C++ do with templates?)
<ggole>
Templates are similar in that source is made available in headers
<mrvn>
Hannibal_Smith: which cause totaly useless code explosion.
<mrvn>
You don't want to specialize every type.
<NoNNaN>
if you combine the extreme specialization, with small batching (but the data size is smaller than your cpu cache), your performance will skyrocket
<whitequark>
don't forget instruction cache
<whitequark>
it's a major problem with C++
<ggole>
And compilation time
<ggole>
It does seem as though a careful design could do better
<mrvn>
and if you specialize Pervasives.compare for 1000 types then it will be magnitudes slower than the polymorphic one.
<NoNNaN>
mrvn: this is why I would like to control the cost of the abstraction, when I would like to perform opreation on lot's of data, I want specialization, where I have symbolic code I don't
<ggole>
You might be able to reduce code explosion by mapping equivalent types together
<Hannibal_Smith>
ggole, a sort of "type fusion"?
<mrvn>
NoNNaN: I want to specify the set of types the compiler spcializes for. Both in the module implementing and the module using a function.
<whitequark>
aka what LLVM attempts with mergefunc and its structural typing
<ggole>
eg, all types that have two words and two pointers can be fused for the purposes of specialisation
<mrvn>
ggole: or when the code does the same thing for any 'a
<whitequark>
(but mergefunc has a bit of flawed implementation right now. well, it tends to assert, to be specific.)
<mrvn>
ggole: E.g. List.length does not have to be specialized for every type.
<ggole>
You could reorder fields, too
<ggole>
mrvn: you'd have to specialise it for each size of entry
<ggole>
Which would probably be quite affordable
<NoNNaN>
mrvn: it is possible to extend it, so I can control the "abstraction" eg.: how the types will be mapped to primitive types that cpu understand
<Hannibal_Smith>
ggole, is this what MLTon can do?
<mrvn>
ggole: no. it only cares about the next pointer
<ggole>
mrvn: the offset of the next pointer depends on the size of the entry
<ggole>
...unless the pointer is first, I suppose
<mrvn>
ggole: put it first. cons of 'a list * 'a
<ggole>
Hannibal_Smith: MLton can do something like this, yeah
<mrvn>
ggole: a cons is a pair in ocaml anyway. so the offset is always the same.
<ggole>
If you have a (int * float) list, the MLton cons will look like [header] [int] [float] [pointer to next]
<ggole>
Instead of having pointers everywhere
<ggole>
mrvn: that's because ocaml doesn't specialise
<mrvn>
ggole: you sure the next isn't first?
<ggole>
It could be
<whitequark>
there's also invasive containers. they have their place as well
<mrvn>
ggole: if next isn't first then you can't have any polymorphism at all.
<ggole>
Yes you can?
<ggole>
You just need to specialise
<mrvn>
ggole: no. then every call must be specialized, at least by the offset for next
<mrvn>
ggole: That's what I said
<NoNNaN>
whitequark: I would like to have (extreme) specialization when I do operations on lot's of data (gigabyte of), so instruction cache is not a problem
<ggole>
And you only need to specialise by the size of the list entry, like I said before.
<mrvn>
But keeping a polymorphic flavour is critical.
<Hannibal_Smith>
One moment, even polymorphic is bad for icache no?
<ggole>
No, polymorphic code is considerably smaller
<ggole>
There's one definition for everything.
<mrvn>
Hannibal_Smith: with polymorphic code you have one function that works for any type. Not millions of duplicates.
<ggole>
And there's no explosion of ancillary information (object maps), because everything looks the same
<whitequark>
NoNNaN: have you read ulrich drepper's manuscript on memory hierarchy?
<ggole>
That's the advantage: the disadvantage is that making everything look the same involves introducing lots of pointers
<mrvn>
I like that ocaml doesn't specialize the memory representation (except for a few special cases).
<ggole>
Which on modern hardware is at least half crazy
<mrvn>
ggole: not that much. The amount of sharing of data it allows balances it.
<NoNNaN>
whitequark: yes, I do, and I also read a tons of other publications on in memory databases too
<ggole>
Sharing a float by pointing at it is not an advantage.
<ggole>
And the same is probably true for most pairs, maybe triples
<whitequark>
NoNNaN: ok, you know more about this topic than me, then :)
<mrvn>
ggole: if it is shared once you break even on 32bit.
<mrvn>
ggole: if it shared a million times ...
<ggole>
No, you lose by having to follow the pointer.
<mrvn>
ggole: and you win when the float is then cached instead of having to find it in memory.
<ggole>
You would have to avoid copying recursive types though
<ggole>
The float would be right there, where the pointer would otherwise be
<mrvn>
I bet there are as many cases where sharing is faster than there are where sharing is slower.
<ggole>
If that storage is not cached then you die just as badly because the pointer will have to be fetched
thomasga has quit [Quit: Leaving.]
zpe has quit [Remote host closed the connection]
<mrvn>
ggole: in 32bit the pointer is smaller than a float.
maufred has quit [Remote host closed the connection]
<mrvn>
ggole: and a single float is an extrem case anyway.
<Hannibal_Smith>
Uhm...for specialization is only one part of the problem, for example SIMD requires specific alignment too
<ggole>
Even two or three, maybe more elements would be beneficial
<ggole>
Cache lines are fairly large.
<mrvn>
ggole: don't forget the overhead for the GC and the code duplications to deal with the different specializations.
<ggole>
The overhead is *less*, since you don't have to inspect as many pointers
<mrvn>
You are buying the benefit in the memory representation at the cost of the instruction cache.
<ggole>
And you don't have to trace arrays of pointerless elements.
<NoNNaN>
you could do operations on unboxed values lot faster, than boxed
<flux>
mrvn, how many times do you really have such a big function that paying some for the cache costs you..
<ousado>
mrvn: finding the right balance there is what this discussion is about, no?
<flux>
mrvn, given you can now fit more data into the same cache
<Hannibal_Smith>
(even Haskell let you says that embed in a type with packed)
<NoNNaN>
your operations could directly mapped to cpu operations, no type verification, no pointer magic, just direct operation on data
<mrvn>
flux: It's not the size of the function. It's the number of duplications.
<ggole>
It would also be nice to get rid of int tags, yeah
<mrvn>
NoNNaN: you want to unbox at the start of a function and rebox at the end and keep everything in registers inbetween. Then the boxing hardly matters.
zpe has quit [Ping timeout: 265 seconds]
<NoNNaN>
mrvn: if you want an extreme example for polymorhic code, take a look at K language (www.kx.com), the whole binary is about ~50k (the whole code is ~200 line of code), the whole binary could fit the cpu intstruction cache
avsm has quit [Quit: Leaving.]
<mrvn>
NoNNaN: isn't that an argument for my case?
<Hannibal_Smith>
mrvn, registers are few...no?
<ggole>
Of course you have to rebox on every function call or allocation point.
<mrvn>
ggole: not necessarily.
<ggole>
(Except for floats? I guess those can sit in xmm regs just fine.)
<NoNNaN>
mrvn: and take a look at ocaml current binary sizes
<ggole>
Or non-integral regs on whatever arch
dapz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<ggole>
mrvn: hmm... I guess an unboxed int could be marked as "don't touch" in the frame table
<mrvn>
ggole: elf supports annotating which registers hold pointer for every instruction in a binary. the GC could use that so you wouldn't have to box or tag registers for function calls.
<NoNNaN>
mrvn: if you make it fit on cpu instruction cache, than fine, but currently far from it, however the gcless linearml generated code is small
<mrvn>
NoNNaN: because it is polymorphic.
jonludlam has joined #ocaml
<mrvn>
ggole: a while back there was a discussion of changing the GC header format to include infos which fields contain pointers and such. That would be usefull for the stack frame but works for any record.
<NoNNaN>
mrvn: no, it's because the controlled abstractions, the operations could directly mapped to cpu operations
<ousado>
mrvn: that does only work up to certain sizes, though, right?
<mrvn>
NoNNaN: if you specialize every type you end up with 1000 copies of the function. Even if you get the function to half the size that is still 500 times more than one polymorphic flavour.
<ousado>
*works
<ggole>
mrvn: that could be handy, yeah
<ousado>
one doesn't have to specialize every type
<ggole>
You can make it work with arbitrary sizes by being clever
<mrvn>
ousado: depends. you could make the header arbitrary large.
<ousado>
well, ok
<ggole>
Ie, have a bit pattern than means "next word includes more info"
<ousado>
I'm thinking about that for haxe
<mrvn>
ousado: like the highest bit says there is another header word before this.
<ggole>
And you can also compress into by bringing pointer fields together
<mrvn>
ousado: How many structures do you have that have more than 8, 16, 32, 64 items?
<ousado>
not many, probably, but it's possible
<ggole>
So instead of a mask, you just need "there are this many pointers/non-pointers at the beginning of this record"
<mrvn>
ggole: reordering would make row types impossible or costly.
<ggole>
You could use another word for those
<ggole>
But yes, there are tradeoffs everywhere in runtime system design
<ousado>
I also thought about that
<ousado>
yes
<ousado>
reordering makes lots of things simpler
<ousado>
also structural subtyping
<mrvn>
I don't think I ever had a record with more than 16 fields that wasn't an array.
<ousado>
but if the language allows it, what can I do?
<ggole>
Sure, but the compiler still has to compile such code
<ggole>
And sometimes people make huge stupid records with hundreds of fields in other langs, they might do the same in OCaml
<mrvn>
allow 16 fields in the first header word with one bit saying there are more header words. Most records won't need more.
<ggole>
There are also techniques for entirely tagless GC
<ggole>
With no header at all
<mrvn>
ggole: impossible
<mrvn>
you have to have the info somewhere
<ggole>
Not at all, go read Appel's paper
<ousado>
is there a copy of that freely available?
<mrvn>
A lot of the time you can use static infos. E.g. each function has the description of its stackframe statically.
<mrvn>
But you need the info somewhere.
<ggole>
In a strongly typed language, the shape is implied by the path taken through the heap to reach it
<ggole>
You need to have type info for the root set.
<ggole>
Ie, stack maps and a map for global values.
<mrvn>
ggole: that's static tags.
<ggole>
So? It isn't a header word.
<mrvn>
ggole: it isn't tagless
<ggole>
There are no tags on values in the heap. That's what it means.
<ggole>
I suspect that it makes GC more expensive, which is why it hasn't been adopted
<mrvn>
I think you can't always statically predict what is going to be a pointer and what not.
<ggole>
It might also make the write barrier more expensive
<ggole>
(Since you need to know the type of the edges recorded in the remembered set, and you won't be able to recover them by walking the entire heap during a minor gc.)
<mrvn>
e.g. type 'a foo = Foo of int | Bar of 'a. The GC would have to know what is a foo and what the Foo and Bar tags mean for the type and what type
<mrvn>
'a is there.
<ggole>
Read the paper. Appel covers all of that.
<mrvn>
The beauty in ocaml is that the memory representation is verry simple.
<ggole>
Search for "runtime tags aren't necessary"
<ousado>
thanks
<ggole>
There's also a few followup papers by somebody else if you find that interesting
<ousado>
I don't think we'll try to go for a tagless GC, but it's always good to look at things from different perspectives
<ggole>
Also Tag-Free Garbage Collection for Strongly Typed Programming Languages - Goldberg
<ggole>
Yeah, I'm not sure that it is a good approach
<ggole>
Breaks Obj.magic
<flux>
sounds like a good approach then ;)
<ggole>
It's not clear to me how to type the queue elements in a copying collector
<ggole>
Or how to deal with polymorphic functions which have been tail-called (ie, their caller is no longer available for inspection)
<ggole>
You could have a parallel queue of typeinfo and then discard it at the end of a minor GC, I guess
<ggole>
But the papers don't go into it.
<mrvn>
ggole: ouch. that apple paper requires the GC to (worst case) do a full backtrace through all stack frames to figure out the type.
<ousado>
ggole: did they implement it?
<mrvn>
ggole: I imiagine you can only tail call when the return type allows it.
<mrvn>
ggole: which I think would be always.
<ggole>
ousado: not sure
<ggole>
mrvn: right
<ggole>
mrvn: note that specialisation would solve that problem ;)
<mrvn>
ggole: I don't think there is a problem there in the first place.
<ggole>
I think headers are a much simpler and probably superior approach.
<mrvn>
ggole: or did you mean the backtracing?
<ggole>
But it is a seductive and interesting idea.
<ggole>
mrvn: the backtracing
<mrvn>
ggole: the ocamls simple memory representation is certainly much simpler.
<mrvn>
ggole: The idea is indeed nice. Would be greate for debuggers too. The pretty printers could print every value perfectly.
<ggole>
Yeah. The toplevel could certainly use such a feature.
<mrvn>
ggole: In chapter 7 (generational GC) they say to store the type when you copy a record. So only the minor heap is tagless.
<ggole>
In fact if you created a lookaside type graph that echoes the heap structure, you might be able to add such a thing without changing representation.
<ggole>
mrvn: my understanding is that you need the type if you are going to mark+sweep that region
nikki93 has joined #ocaml
<ggole>
But you don't if you are going to trace or copyu.
<ggole>
(It's been a little while since I looked at the paper though.)
<ggole>
mrvn: so in a generational gc with a nursery + from and to space, like some JVMs, both spaces could be tagless
lordkryss has quit [Disconnected by services]
<mrvn>
ggole: you construct the type as you mark&sweep. But if you have a ref/mutable then modifying a value outside the minor heap makes that a root for the minor heap and you need the type for every root. The paper says to store the type on copying so it is cached when modification happens.
<mrvn>
ggole: you could limit that to ref/mutable. They are quite rare.
<ggole>
Hmm.
<ggole>
I should read it again.
<ousado>
the new lua GC has an approach to a generational GC without copying
<mrvn>
ggole: I don't see a way around that. Given a record you can't work back to the root to find its type and you don't want to scan the major heap every time to find the type.
<ggole>
mrvn: usually no, but there are some restrictions if you want generational gc
rgrinberg has joined #ocaml
<ggole>
mrvn: you do need to be able to relocate the object, so there needs to be some way to indicate whether something has been forwarded
<mrvn>
ggole: and you do.
<ggole>
You can just use the first pointer in the object, if it has one
<mrvn>
ggole: temporary memory during compation.
<ggole>
But if it doesn't, then I think you need a header
<ggole>
Or a table, dunno
<ggole>
Right
<mrvn>
ggole: with stop-the-world GC you simply move them all at once.
<ggole>
That might suck for locality though - not sure
tobiasBora has joined #ocaml
maattdd has quit [Ping timeout: 245 seconds]
<ggole>
How do you do that without forwarding?
_andre has joined #ocaml
<mrvn>
ggole: you need a bitmap, pointer array or hashtable to record what has been copied already.
<ggole>
Don't you also need to know *where* they have been copied?
<ggole>
(A hashtable does allow that.)
maattdd has joined #ocaml
<mrvn>
ggole: with a bitmap you store the address as first word of the old record.
<mrvn>
ggole: with array or hashtable you store the new address in the table.
<ggole>
Ah, I guess that would work
<ggole>
I'm still thinking in terms of the classic in-place algo
<mrvn>
ggole: yeah. you would loose that.
<mrvn>
and you need memory to build the type infos.
<mrvn>
Which is kind of a bad thing. You are out of memory or the GC wouldn't be running. Not a good thing to allocate more.
rgrinberg has quit [Ping timeout: 276 seconds]
<mrvn>
Does ocamls GC allocate memory or only use stack?
<ggole>
Mmm... and if you reserve space, you are potentially causing your application to OOM
<ggole>
OCaml uses the classic in-place Baker algo afaik
<mrvn>
ggole: if you reserve the space when allocating the heap then you didn't save memory from being tagless.
<ggole>
With a few bits reserved in each header word
<ggole>
Relocation markers aren't really tags, though
<ggole>
But point taken
avsm has joined #ocaml
<mrvn>
ggole: You know how ocaml uses a 0 tag for pointers and 1 for integers? I always wonder if anyone had tried it the other way around and compared what's faster.
thomasga has joined #ocaml
<ggole>
Some lisp impls have zero tags for fixnums
<ggole>
Although they tend to use more tag bits.
<mrvn>
A 0 tag on ints make arithmetic simpler. e.g. a+b just works without touching the tags. And pointer access can be done with offset on most cpus.
<mrvn>
should give less extra instructions.
thomasga has quit [Client Quit]
<ggole>
I think you have to be careful about overflow, but yeah
<mrvn>
The drawback being that you can't use scaled pointer access, e.g. R0[R1*8].
<ggole>
Lisp impls also have clever ways to avoid tagging conses
<ggole>
eg they are placed in a special place in the heap so that pointers to them are recognisable
<ggole>
...which is still tagging in some sense
<ggole>
But they are only two words
<mrvn>
ggole: apropo special place in the heap. I'm been plaing with the idea to have lots and lots of heaps. One per type. So all float arrays go to the float array heap, all int*int tuples to the int*int tuples heap and so on.
<ggole>
Yeah, this is known as BIBOP
<ggole>
BIg Bag Of Pages
<mrvn>
For polymorphic functions I would pass an allocator that would point to the right heap for allocations. But that's where it gets tricky.
<ggole>
I think it's been tried a few times, although not for an ML family language
<ggole>
mrvn: nice chat, I have to go and get some dinner
<mrvn>
lunchtime.
avsm has quit [Quit: Leaving.]
<orbitz>
I wonder how bad it would be to add CSP to Ocaml, something like Goroutines. Message sends are implicit context switches. Could do multicore
<companion_cube>
you need some stack-capturing operator to do the blocking stuff, I think
<companion_cube>
if you don't use the preemptive threads
<companion_cube>
maybe that's doable with delimcc ? :)
<companion_cube>
so, maybe delimcc would actually work (although it would probably be slow)
<orbitz>
it woudl be nice of lwt or async offered a reasonable message passing framework then the runtime coudl add across-core message passing at which point you could go multi core just by where youer Deferred runs
<companion_cube>
orbitz: the alternative is to use a monadic-ish approach, where the user writes continuations with >>=
<orbitz>
yeah that's a second reasonable option I think
<orbitz>
the problem is a lot of monad code, i think, depends on running in the same memory space
<orbitz>
Maybe you could add a new concept to deferreds for more heavyweight long running things
<companion_cube>
well you would have several processes, anyway, wouldn't you?
<orbitz>
What do you mean?
<Drup>
gasche: I have a gadt-variance question that needs your expertise
<orbitz>
I'm looking for soemthig nlike Erlang or Go where you are agnostic to if you have 1 process or multiple since the message passing takes care of it for you
Hannibal_Smith has quit [Quit: Sto andando via]
<gasche>
(ocamlopt does pass roots in registers)
<gasche>
Drup: ?
<Drup>
gasche: I'm building an AST for Z3 expressions
<Drup>
beh, I will just past you the code, and you will tell me how terrible it is to use gadt's for this :D
<companion_cube>
orbitz: in a static language like OCaml, you need good serialization support for multi-process things
<companion_cube>
does go really handle multiple processes transparently??
<companion_cube>
I know erlang does, but it's designed specifically to this end
<Drup>
an unsafe and terrible one, yes
<orbitz>
companion_cube: I'm talking about multiplecores, whatever the method of getting there is i'm agnostic
<Drup>
so I'm writing an AST for the formulas in order to be able to manipulate it
<orbitz>
companion_cube: running multipel interpeters in side teh same process would be aceptbale
ocp has joined #ocaml
<orbitz>
companion_cube: go handles utilizing multiple cores, a does erlang
<gasche>
Drup: you would make your life *much* simpler by separating zint and zreal and adding an explicit cast from int to real
<companion_cube>
ah, yes, but in a single process
<companion_cube>
otoh go has a quite bad GC so far
<orbitz>
yes
<Drup>
gasche: that was I was thinking
<orbitz>
companion_cube: I just want to seemlessly utilize multiple cores
<Drup>
gasche: but It means I have to duplicate all operators
<Drup>
of add casts everywhere
<companion_cube>
orbitz: heh.
<Drup>
+what
<orbitz>
there is already plenty of work going on to add multi core support to the runtime but it sounds like you're going to have to be aware of the fact that you're doing it
<companion_cube>
I'm pessimistic about this :/
<gasche>
I'd add casts instead of duplication, yeah
<Drup>
huum
<orbitz>
companion_cube: How so? Running multiple interpreters (1 per thread) with message pssing between them sounds rather reasonable
<nicoo>
gasche: Also, even when only keeping the S constructor (in +_ t), it failed to typecheck :(
<nicoo>
Drup: ^
<gasche>
but maybe the polymorphic variant thing can work
<companion_cube>
orbitz: it does if you have good serialization
<companion_cube>
why not
<gasche>
it is not covariant, though
iorivur has joined #ocaml
<Drup>
gasche: yes, that's precisely the problem now
<Drup>
the function to_expr works nicely
<Drup>
but of_*_expr doesn't, because of variance issues
<orbitz>
companion_cube: why do you need good serialziation?
<mrvn>
orbitz: multiple interpreter requires changing the code to message passing.
<orbitz>
mrvn: I know
<mrvn>
people are woring on a multi core GC so you don't have to.
<gasche>
Drup: I can't try the code myself
<Drup>
gasche: I know, you need the z3 binding :/
<gasche>
if you made it a functor against Z3's signature I could give it a look
<Drup>
urk
dapz has joined #ocaml
<gasche>
only the parts you use
<nicoo>
mrvn: Wasn't the proposal about having multiple independant thread with each its own (stack,heap,GC) ?
<orbitz>
mrvn: my understanding was the semantics of mutation were one of the bigger roadblocks to wanting threads with a shared heap
<mrvn>
nicoo: not sure. didn't read the proposal, was in the kitchen.
<mrvn>
orbitz: one core changing a value while another does a collection is a problem.
<companion_cube>
orbitz: sorry, I was still thinking separate processes
<companion_cube>
the simplest solution imho
sgnb has joined #ocaml
<mrvn>
orbitz: in a trivial implementation every modify would have to tell all cores about it.
<ousado>
or just the original thread
tnguyen_ has quit [Ping timeout: 245 seconds]
<mrvn>
Which brings us back to specialization. The compiler could generate a local core and multi core flavour and call local core for input that is not shared between cores.
<flux>
companion_cube, so I have this program that 1) retrieves jpeg 30 fps from a camera and 2) decodes them and 3) draws them to a bitmap with Cairo 4) shows them with lablgtk. suggestions how I would easily split this into multiple processes? decoding jpeg is the costly process.
<mrvn>
flux: decoding is done in C, right?
<flux>
mrvn, yes. so yes, that can be threadized. but I wasn't asking that ;)
<mrvn>
flux: well, easy would be to just have one thread per core doing decoding without the ocaml runtime lock.
<companion_cube>
flux: you'd need shared memory, I guess
<flux>
also I overlay a set of vector data over the bitmap I've drawn
zpe has joined #ocaml
<companion_cube>
netmulticore does something like this, in ocamlnet, I believe
<flux>
companion_cube, how do I put a Cairo canvas in shared memory? it really works?
<mrvn>
flux: use Bigarray to mmap shared memory between processes and use message passing to pass offsets into that array for the image data.
<hcarty>
flux: One somewhat simple approach - Use zeromq to send bits back and forth. Have the decoding run in a separate process. It receives the filename/encoded bytes and sends back the decoded bytes as the decoding completes.
<flux>
mrvn, yeah, I should use threads some day, and fix the libjpeg bindings to release the GC lock
<companion_cube>
flux: I don't know!
<flux>
the point I was making, though, was that splitting into processes in the presence of existing code isn't qutie as easy as it is with real threads
<mrvn>
hcarty: sending the full image might be costly.
<flux>
and look as the Apple's Grand Central Dispatch
<flux>
how great would that be in OCaml?
<hcarty>
mrvn: It could be. Not terribly expensive if both processes are running on the same machine.
<Drup>
a bit quick and dirty, but you should be able to work with it
<flux>
LWT could be churning with the power of 8 threads, even computational tasks
<mrvn>
flux: splitting is easy. splitting efficiently is harder.
zpe has quit [Read error: Connection reset by peer]
<nicoo>
flux: What is Apple's Grant Central Dispatch ?
zpe has joined #ocaml
<hcarty>
mrvn: For some definition of "not terribly expensive"
<flux>
nicoo, basically it's a queue where you send lambda functions to be evaluated
<flux>
nicoo, the jobs have dependencies
eizo_ has joined #ocaml
<flux>
so they can be dispatched to run on many cores
<flux>
I haven't used it, only read of it
<mrvn>
flux: do you know if LWT uses threads internally?
<hcarty>
mrvn: It can.
<flux>
mrvn, I don't know, but I would think it would not use threads except for some curious case of working with existing functionality
<hcarty>
mrvn: Lwt_preemptive is the module IIRC
maattdd has quit [Ping timeout: 252 seconds]
tnguyen_ has joined #ocaml
<mrvn>
hcarty: but that then uses the ocamls Thread module, right?
<hcarty>
mrvn: Yes
<flux>
Lwt was just an example, it is certainly built with single-thread processing in mind
<mrvn>
hcarty: Ok. No wonder I couldn't find any Thread implementation in LWT.
contempt has joined #ocaml
<flux>
but a similar 'm:n' thread mapping system could be built with 'real' threading
<flux>
I suppose a library facilitating just as easy use of multiple processes could be built, but for example on my case I would wonder if a Cairo canvas is marshalable or not; I would gess it's not
<adrien_oww>
isn't m:n thought to be way too complex in practice?
<adrien_oww>
at the kernel level that is
<mrvn>
flux: The first thing one has to do is switch the bindings over from using strings / arrays to Bigarray.
<adrien_oww>
well, I'm going back to work instead of commenting on IRC without reading the backlog and without thinking o/
<mrvn>
flux: or other unmovable types.
<flux>
I would guess many folks who write C bindings for stuff don't write - or test - the marshalling functions
<companion_cube>
m:n is hard because of the GC
<hcarty>
flux: If you do a memory-backed canvas then you can marshal the underlying storage
<flux>
adrien_oww, well, GCD is sort of like m:n, where the number of 'user threads' is infinite :)
<flux>
hcarty, I think I can almost guarantee it's going to be more complicated than it is now..
<hcarty>
flux: s/canvas/surface/
<mrvn>
companion_cube: you could run m:n and every now and then stop-the-world and run the GC.
tane has joined #ocaml
<hcarty>
flux: I expect so. It's possible to do. Certainly more complex than a naive sequential approach.
<companion_cube>
mrvn: that's fine the major heap, I guess, but the minor heap has to be extremely fast
thomasga has joined #ocaml
<mrvn>
companion_cube: what i ment was to coordinate the GC across all cores. Do a minor collection on all cores at the same time.
marcux has joined #ocaml
<mrvn>
companion_cube: that avoids the problem of one core modifying data while another runs the GC.
<ousado>
but that synchronization is expensive itself
maattdd has joined #ocaml
<mrvn>
ousado: extremly if one thread doesn't do allocations.
<ousado>
for certain workloads it might be no problem, though
<mrvn>
You would have to wait for all threads to reach a safe point before the GC can start.
<mrvn>
note: ocaml only does cooperative multitasking.
<adrien_oww>
I wouldn't call it that way
<mrvn>
you can't preempt an ocaml task. It has to reach a safe point first.
<ousado>
in the paper NoNNaN linked they do concurrent mark and sweep without any synchronization
<mrvn>
ousado: with one bit per core?
<ousado>
.. but the exectution times of the tests are not competitive
<ousado>
they use color transitions that are specific to dedicated mark and sweep threads
<mrvn>
To me it feels like the compiler has to do some analysing and figure out what data is shared and what not and generate different code then. Keep temporary allocs in thread private heaps and side step the whole problem.
<companion_cube>
minor heap must be very very very fast
<companion_cube>
and different threads might allocate at different rates
<mrvn>
companion_cube: as long as they allocate fast the loss is negible.
<companion_cube>
well, at each allocation a thread need check whether the minor heap is full
<companion_cube>
if this requires a synchronisation it's a deal breaker
<mrvn>
companion_cube: one minor heap per core.
<companion_cube>
indeed.
<companion_cube>
but then, what about sharing somethihg that's still in the minor heap?
<companion_cube>
you'd get a reference from a foreign core, to the local minor heap
<mrvn>
companion_cube: then when it runs full you stop all threads and do a multi-core GC run across all minor heaps.
<mrvn>
companion_cube: each core has a root set too.
zpe has joined #ocaml
<mrvn>
companion_cube: It's just a simple idea and the obvious problem is having to stop all threads. Unless they do a lot of allocations (which they usualy do) stoping can take forever.
maufred has joined #ocaml
zpe_ has joined #ocaml
<NoNNaN>
mrvn: take a look at the openjdk new Shenandoah GC, it's a regional collector
<mrvn>
NoNNaN: meaning the compiler has to analyze the code to statically define regions?
zpe has quit [Ping timeout: 240 seconds]
lostcuaz has joined #ocaml
lostcuaz has quit [Client Quit]
lostcuaz has joined #ocaml
lostcuaz has quit [Read error: Connection reset by peer]
maattdd has joined #ocaml
lostcuaz has joined #ocaml
<mrvn>
Another idea for better multi-core support would be to have 2 kinds of values. Private and sharable. Let the user decide wether to use private/fast or shared/slow.
<orbitz>
mrvn: IMO, multiple intepreters with some messag passing between them is probably easiest, and tehn people can build libraries on top of things like async to wrap the built in message passing operators
Hannibal_Smith has joined #ocaml
<ousado>
NoNNaN: that also looks interesting, but involves copying again, which is an issue for us
<NoNNaN>
ousado: well, you can compact in place, so yes, possible without copy, could you give some pointers to noncopy collectors?
<NoNNaN>
ousado: i mean, noncopy compacting
<ousado>
the luajit one I linked above
<ousado>
I'm not an expert in the field, but the issues of avoiding fragmentation and scanning free lists are orthogonal to compaction/copying
<ousado>
so I don't buy neither argument given for compaction there
<mrvn>
NoNNaN: With that every read needs an indirection, all the time. And modify becomes complex. It can fail and then has to roll back and try again.
avsm has joined #ocaml
<mrvn>
NoNNaN: The read bothers me. The write is probably irelevant.
<ggole>
Read barriers seem like a pretty heavyweight thing
<mrvn>
and you need to annotate values as volatile or you can't do any uboxing/untaging.
pyon is now known as pyon-away
<mrvn>
It all comes back to: ref/mutable is bad
<ousado>
indeed
tnguyen_ has quit [Ping timeout: 265 seconds]
divyanshu has quit [Quit: Computer has gone to sleep.]
<Drup>
gasche, nicoo : removing the subtyping and using explicit cast is a bit of an issue, because I can't do "(3 + x) mod 5" anymore :/
<Drup>
I can't do Q to Int casts, obviously
tnguyen_ has joined #ocaml
<ggole>
Azul seem have hardware support for their read barrier
<adrien_oww>
yup
<ggole>
Shades of Lisp Machines there
* ggole
wonders what an ML machine would look like
* adrien_oww
throws a lisp machines at ggole
<orbitz>
tall
<NoNNaN>
ggole: take a look at reduceron
<ggole>
There are some cool tricks with list compaction that you could make very cheap with hardware support
<NoNNaN>
ggole: azul rewrote the linux mm, it can allocate/remap memory at TB/sec rate
<ggole>
Remapping = TLB nuke though
<mrvn>
ggole: invlpg
<ggole>
I think that was one of the things they tried to make cheap on their custom hardware
<NoNNaN>
ggole: you could combine it with user level memory manager eg.: by using libdune: http://dune.scs.stanford.edu/
<ggole>
Pretty cool.
<mrvn>
ggole: building custom hardware to make your language fast is cheating.
<orbitz>
Azul does some sweet things
<ggole>
mrvn: I'm not above cheating :)
<ggole>
mrvn: I do think it is risky though
rand000 has quit [Ping timeout: 240 seconds]
<ggole>
There were "Java chips" and "Lisp Machines" and some other custom stuff
<NoNNaN>
there is no (big) reward without risks
<ggole>
All dead
<adrien_oww>
and now we're all going to run with the mill
<ggole>
Actually I think list compaction can be done in software affordably
<NoNNaN>
ggole: well, not yet, http://www.jopdesign.com -> you could do hard real-time stuff, it has a real-time gc too
<ggole>
It just complicates the runtime and GC
<ggole>
NoNNaN: well, real-time and fast aren't the same thing
<mrvn>
my system is real-time. It will at most delay a job for 10 years.
<ggole>
I'm not sure that real-time techniques are a good idea for general purpose systems
<Hannibal_Smith>
<NoNNaN> ggole: well, not yet, http://www.jopdesign.com -> you could do hard real-time stuff, it has a real-time gc too <-IBM Monotone
<mrvn>
ggole: most general purpose problems don't have a deadline.
<Hannibal_Smith>
Uhm...
<adrien_oww>
when you do network, you have deadlines
<ggole>
Sure: that's why real-time techniques have evolved, to specialise systems for this unusual requirement
<adrien_oww>
but tbh, most of the time, a large buffer and a good throughput will do the job well
<NoNNaN>
Hannibal_Smith: far from it, it provides cpu instruction wcet too
<orbitz>
Hannibal_Smith: do you mean metronome?
<Hannibal_Smith>
Yes, I have pretty bad memory
<ggole>
NoNNaN: by the way, the "Java chip" stuff I had in mind was the ARM instruction set, uh, Jazelle
dapz has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<nicoo>
mrvn: Azul also ships a JVM that runs on commodity hardware and uses (AFAIK) a wait-free concurrent GC
<orbitz>
Zing
<NoNNaN>
"At 100 MHz we measured 40 μs maximum blocking time introduced by the GC thread."
marcux has quit [Quit: marcux]
<mrvn>
nicoo: and how much does the read barrier and indirection cost you there?
arjunguha has joined #ocaml
<mrvn>
NoNNaN: One thing I noticed in the shenandoahtake4.pdf. The concurrent makr phase is shown as occuring at the same time. So all threads are stoped. This is a problem in ocaml code since it isn't preemptible like that.
<NoNNaN>
mrvn: "LVB differs from a Brooks-style [6] indirection barrier in that, like a Baker-style [4] read barrier, it imposes invariants on references as they are loaded, rather than applying them as they are used. By applying to all loaded references, LVB guarantees no uncorrected references can be propagated by the mutator, facilitating certain single-pass guarantees."
<mrvn>
But that may be just an artefact of how they drew it.
<nicoo>
mrvn: I never had my hands on it, but from the paper they published, they don't use a read barrier/indirection scheme, but “cheat” using mprotect to intercept writes to pages that are being compacted (all the magic left is “how to get an atomic stack snapshot”, IIRC)
<NoNNaN>
it's from the c4 paper
<mrvn>
nicoo: huh? No. They have indirections all nover the paper.
dapz has joined #ocaml
<mrvn>
NoNNaN: lvb?
<nicoo>
mrvn: I must be confusing with another paper, then