Amorphous has quit [Read error: 110 (Connection timed out)]
Jedai has joined #ocaml
Amorphous has joined #ocaml
ulfdoz_ has joined #ocaml
ulfdoz has quit [Read error: 110 (Connection timed out)]
ulfdoz_ is now known as ulfdoz
julm has joined #ocaml
ulfdoz has quit [Read error: 60 (Operation timed out)]
seafood has quit []
seafood has joined #ocaml
BiDOrD has quit [Read error: 110 (Connection timed out)]
BiDOrD has joined #ocaml
seafood has quit []
bluestorm has joined #ocaml
<kaustuv>
If I resize the minor heap to exactly 1/2 the L2 cache on this dual core system then I get peak performance. Any smaller and it's slower, and it's not any faster with a larger minor heap. Any conjectures as to why this might be the case?
<kaustuv>
Moreover, can I generalize this to L2 size / #cores for other systems, such as the intel i7?
<tsuyoshi>
that should depend on what you're actually doing
<kaustuv>
I'm mainly parsing and type-checking very large source files written in a Pascal-like syntax.
<kaustuv>
Profiling shows that mark/sweep_slice() is the dominant operation
xcthulhu_ has joined #ocaml
sporkmonger has quit []
<brendan>
sounds like the gc is very busy. too small and the allocators always get stuck waiting for it, and too large and either the gc stalls or the app does, either one causing about the same amount of pain. kind of makes sense
<bluestorm>
kaustuv: is your program written to make use of both cores ?
<kaustuv>
bluestorm: no, it's single threaded
<bluestorm>
in that case i'm not sure you should take #cores into account
<kaustuv>
I was figuring some sort of underlying NUMA architecture
<kaustuv>
err, drop "architecture"
<bluestorm>
well I don't know, we should really try with other computers
<brendan>
doesn't sound like #cores to me. it sounds like the gc is doing about half the work :)
<kaustuv>
slightly under 60% of the work, in fact
willb has joined #ocaml
Jedai has quit [Read error: 110 (Connection timed out)]
bluestorm has quit [Remote closed the connection]
kostas has joined #ocaml
kostas has quit [Client Quit]
ulfdoz has joined #ocaml
komar_ has joined #ocaml
Jedai has joined #ocaml
rwmjones has quit [Read error: 113 (No route to host)]
rwmjones has joined #ocaml
hkBst has joined #ocaml
Jedai has quit [Read error: 110 (Connection timed out)]
schme has quit [Read error: 104 (Connection reset by peer)]
schme has joined #ocaml
julm_ has joined #ocaml
julm has quit [Read error: 113 (No route to host)]
Amorphous has quit ["shutdown"]
julm has joined #ocaml
julm_ has quit [Read error: 104 (Connection reset by peer)]
mfp_ has quit [Read error: 110 (Connection timed out)]
schme has quit [Read error: 104 (Connection reset by peer)]
schme has joined #ocaml
mfp has joined #ocaml
Amorphous has joined #ocaml
Amorphous has quit ["shutdown"]
Amorphous has joined #ocaml
komar__ has joined #ocaml
willb has quit [Read error: 110 (Connection timed out)]
komar_ has quit [Read error: 113 (No route to host)]
komar__ has quit [Client Quit]
bluestorm has joined #ocaml
slash_ has joined #ocaml
Cromulent has joined #ocaml
Olf has joined #ocaml
Olf has quit [Read error: 104 (Connection reset by peer)]
Olf has joined #ocaml
Lomono has joined #ocaml
Lomono has left #ocaml []
Cromulent has quit []
seafood has joined #ocaml
Olf has quit [Read error: 60 (Operation timed out)]
julm_ has joined #ocaml
julm has quit [Read error: 54 (Connection reset by peer)]
BiDOrD has quit [Remote closed the connection]
julm has joined #ocaml
julm_ has quit [Read error: 60 (Operation timed out)]
seafood has quit []
M| has joined #ocaml
M| has quit ["Nice Scotty, now beam my clothes up too!"]
jdavis has joined #ocaml
<jdavis>
Sets are lazy, right? I'm new to ocaml, and trying to learn lazy data structures by reading the source for the set module.
BiDOrD has joined #ocaml
<bluestorm>
jdavis: no Set aren't lazy
<bluestorm>
Streams are lazy but are probably a bad example as they're also destructive (really not functional)
<bluestorm>
except Stream, there are afaik no lazy structure in OCaml stdlib
<bluestorm>
but it's very easy to create one yourself
<jdavis>
bluestorm: I did an experiment a while back, where I made a big set (~400MB), and then I (in the interactive shell), I did let newset = Set.add "foo" oldset
^authentic has joined #ocaml
<bluestorm>
besides, lazyness is not used that often in OCaml, so if you're trying to know OCaml better there are other ideas that may be better choices
<bluestorm>
well adding into a set is very fast
<jdavis>
bluestorm: and it didn't seem to take any more memory than before, which made me think it was lazy
<bluestorm>
if it's the set in the stdlib, it wasn't
<jdavis>
bluestorm: but didn't I make two sets? I could still count the cardinality of both oldset and newset
<bluestorm>
oh yeah
<bluestorm>
sets are shared
<jdavis>
What does that mean?
<bluestorm>
as lists and pretty much anything persistent
<bluestorm>
well
<bluestorm>
let list = [1; 2; 3; ...; n]
<bluestorm>
let alist = a :: list
<bluestorm>
let b = b :: list
<bluestorm>
list's elements aren't copied twice
<bluestorm>
(read "let blist = ..", sorry)
sramsay has joined #ocaml
<bluestorm>
alist and blist both point to the same list
<bluestorm>
so their tails are shared
<bluestorm>
similarly, let clist = c :: list won't allocate any memory
<bluestorm>
well, it will allocate a list cell (two pointers basically), but not the whole list
<jdavis>
bluestorm: how is that done? Is it purely done by the compiler, or is it some trick in implementing?
<bluestorm>
purely
<bluestorm>
in OCaml, most values are passed by references
<jdavis>
bluestorm: so how did they define sets in a way that the compiler knew to share?
<jdavis>
Oh.
<bluestorm>
hm
<bluestorm>
i probably haven't understood your ofrmer compiler/implementation question
<bluestorm>
you, the data structure implementer, control the sharing
<bluestorm>
but it comes in a natural way when writing reasonable algorithms
<jdavis>
Ok, can you show me how they accomplished that in the stdlib sets module?
<bluestorm>
they used an algebraic data type
<bluestorm>
you can read the code if you want, it's probably a bit complicated by set balancing invariants
<bluestorm>
but do you know how to add a key to a binary search tree ?
^authentic has quit [Remote closed the connection]
^authentic has joined #ocaml
<jdavis>
it's pretty short actually, about 300 lines, so I think I could probably understand it with a little effort.
<bluestorm>
well if you want
<bluestorm>
but you could use a simpler example
<jdavis>
What do you have in mind?
<bluestorm>
jdavis: do you know the definition of a search tree ? Write the insertion function yourself and I'll show you where the sharing takes place if you don't see it
<bluestorm>
(the definition of a binary search tree in OCaml could be for example : type 'a tree = Leaf | Node of 'a tree * 'a * 'a tree )
<jdavis>
ok
^authentic has quit [Connection reset by peer]
<jdavis>
I'm still working this out
^authentic has joined #ocaml
<bluestorm>
(wich the invariant that for all keys a in the left children of the node with key n, a <= n, and similarly n <= b for right children keys)
<bluestorm>
s/wich/with/
sOpen has joined #ocaml
<jdavis>
ok, I feel like I'm copying the set definition somewhat, but basically something like: let add x t = match t with Empty -> Node (Empty, x, Empty) | Node(l, m, r) -> let c = compare x m in if c > 0 then add x r else if c < 0 then add x l
authentic has quit [No route to host]
^authentic has quit [Remote closed the connection]
authentic has joined #ocaml
<jdavis>
I don't really write much ocaml (yet), so please excuse the obvious syntax errors, etc.
<bluestorm>
you should really use a pastebin next time :)
<jdavis>
Thanks. I didn't want to make you wait as I fiddled with it too much.
<bluestorm>
- you forgot the last "else" case : if .. then .. is equivalent to if .. then .. else (), so types unit, so your function won't type correctly
<bluestorm>
and most importantly, your function doesn't return the tree but only the bottom leaf turned into a node
<jdavis>
Do I want to return unit or raise an exception there?
<jdavis>
Oh, yeah, that would help ;)
<bluestorm>
well you don't handle the "= 0" case
<bluestorm>
either don't add anything (no duplicates in my tree) or add it arbitrarly in one of the branches (say the left one)
<bluestorm>
if you had written hm
<bluestorm>
if foo <= 0 then .. else if foo > 0 then ...
<bluestorm>
(exhaustive)
<bluestorm>
then you should either remove the "> 0" condition (tautological) or add an "else assert false"
<bluestorm>
(assert false types 'a and is the idiomatic way to say "never happens if I didn't screw up", though it generally asks for an explanatory comment)
<bluestorm>
(in that case I think removing the redundant condition is the best choice)
authentic has quit [Success]
authentic has joined #ocaml
<jdavis>
ok. What is a good way to return the tree itself in this situation?
<jdavis>
Also, I see the set implementation uses the "function" keyword, why is that?
<bluestorm>
function is a syntaxic detail, "fun patt -> expr" allows only one pattern, while function allows for different cases
<jdavis>
(first thought, not sure if it's correct yet)
authentic has joined #ocaml
<bluestorm>
looks good
<jdavis>
ok, so now I have two questions: why does the set implementation need to use a lambda-like "function", while I did not? and: where does the sharing happen?
<bluestorm>
"function" is mostly a syntaxic improvement and is not "need"
<bluestorm>
you could have rewritten your own code
<bluestorm>
let rec add x = function Empty -> ... | Node(...) -> ...
<bluestorm>
(you never use the "t" variable except for matching)
<jdavis>
bluestorm: oh, I see.
<bluestorm>
the sharing happens in the second case
<bluestorm>
when you write
<bluestorm>
Node(l, m, add x r)
<bluestorm>
"l" references the left branch of the (sub)tree
<bluestorm>
wich is not copied
<jdavis>
Oh, now I see.
<bluestorm>
so when you insert an element, you allocate one new node per iteration
<bluestorm>
thus you new O(tree depth) new memory
authentic has quit [Connection reset by peer]
<bluestorm>
all the rest is shared with the precedent tree
authentic has joined #ocaml
<bluestorm>
in the Set case, tree is balanced so it's actually O(log n)
<jdavis>
ok, interesting.
<jdavis>
To step back for a minute, is there a normally-accepted way people do stuff like this in an imperative language? it seems like it would be very hard to share parts of trees in C.
<jdavis>
Not impossible, but difficult to do in a general way.
hkBst has quit [Remote closed the connection]
<bluestorm>
oh you could do that with a C tree
<bluestorm>
the problem is that
<bluestorm>
when your nodes are shared, mutation is shared too
<jdavis>
yeah, that's what I mean, you update one and you update all of them, because it's by reference
<bluestorm>
OCaml does the same
<bluestorm>
except it encourages immutables values
<bluestorm>
-s
<bluestorm>
wich you can't mutate so there isn't any problem
<bluestorm>
of course if your tree keys are references instead of immutable values, you will have problems if you mutate one of the trees
<jdavis>
The reason I did that test a while ago was because I though that would be difficult to do in C: create one set, create another set just like it but with an extra element, and still have the original set, but don't copy the whole thing.
<bluestorm>
I'm not sure that's very difficult to do in C
<bluestorm>
(well of course algebraic datatypes and pattern matching make it much easier to formulate such algorithms)
authentic has quit [Connection reset by peer]
authentic has joined #ocaml
<jdavis>
The only way I could think to do it would be to have different snapshots of the tree, and tag different points in the tree so that when, e.g. counting, you would only count the elements for your particular snapshot of the tree.
<bluestorm>
hm
<jdavis>
(in C)
<bluestorm>
that's more or less what OCaml does
<jdavis>
Oh, ok.
<bluestorm>
you can think of "t" and "add key t" as two views of the "same tree"
<bluestorm>
two snapshots
<bluestorm>
except that t informations makes that, when iterating on it, you don't see "add key t" keys
<bluestorm>
and reversely
<bluestorm>
(i mean : keys wich are not shared)
<jdavis>
I see.
<jdavis>
Interesting.
<bluestorm>
that state of affairs is natural in OCaml and of course we prefer to view it as two trees by themselves, with some parts shared
<bluestorm>
there is a nice quizz about that
<jdavis>
Yeah, I like it because, conceptually, they are two distinct immutable values, but you get the benefit of sharing and pass-by-reference.
<bluestorm>
jdavis: write a function complete : int -> unit tree, that creates a complete tree of height n in times O(n)
<flux>
I think one of the main reasons why people don't do that in C is that C doesn't have GC - nor the developers have the mindset for making use of one
<flux>
sharing any kind of stuff is difficult; of course, you could use reference counting, which in addition to not being very efficient, brings other issues
<bluestorm>
(complete tree : all nodes except for the bottom ones have two non-leaf children)
authentic has quit [Read error: 54 (Connection reset by peer)]
<bluestorm>
(bottom nodes : nodes of depth height(tree))
<jdavis>
bluestorm: f x = Node(f (x - 1), f (x - 1))
<jdavis>
bluestorm: f x = if x > 0 then Node( f (x-1), f (x-1) ) else ();;
<jdavis>
bluestorm: f x = if x > 0 then Node( f (x-1), x, f (x-1) ) else ();;
authentic has joined #ocaml
<jdavis>
bluestorm: am I on the right track?
<bluestorm>
hm
<jdavis>
flux: Yeah, I think you're right about C. Most of the time it's just easier to try to make ever-fancier mutable data structures to enable the sharing you want, rather than compose simpler generic data structures.
<bluestorm>
else Leaf
<bluestorm>
but that isn't O(n)
<bluestorm>
you create 2^n leaves, so that's O(2^n)
<bluestorm>
hm
<bluestorm>
sorry
<bluestorm>
have I specified that I take n for the _height_ of the tree ?
<jdavis>
bluestorm: yeah, you said the height of the tree. I was thinking it might share the f (x-1) tree on both sides, rather than compute it twice.
<bluestorm>
(the challenge of creating a 2^n tree in O(n) times makes the question interesting)
<bluestorm>
jdavis: well you have to do that yourself
<bluestorm>
(Ocaml doesn't memoize anything, and I don't know of any language doing that)
<jdavis>
bluestorm: oh, so something like: f x = if x > 0 then let y = f (x-1) in Node( y, x, y ) else Leaf;;
<bluestorm>
yes, that's the correct answer
<jdavis>
bluestorm: oh, cool!
<jdavis>
bluestorm: is it just too hard for the compiler to guess what to memoize?
^authentic has joined #ocaml
<bluestorm>
you could say so
<bluestorm>
eg. the function could have side effects
<flux>
indeed. even haskell doesn't so it by default, although as a pure functional language it would have the possibility to do it.
<flux>
there are some obscure (?) cases that can cause big memory leaks with automatic memoization
authentic has quit [Read error: 60 (Operation timed out)]
<flux>
for example think of a program that produces the list of pi digits
<jdavis>
Oh, I see.
<flux>
then you use some variation of the rabit-and-hare-algorithm (I don't know if it's applicaple ;)) to find a loop in it
<flux>
+b
<flux>
actually, that's not a great example :)
<flux>
..I bet the one given to me was better..
<flux>
in any case you may find yourself holding a lot of data
<flux>
I suppose it would be possible to discard some of that data. on which criteria, is an interesting question.
<jdavis>
Are there subsets of the problem that are easier, like in my solution, where f (x-1) appears twice in the same function?
<jdavis>
I suppose it's also easy for the programmer to see that, though ;)
^authentic has quit [Remote closed the connection]
authentic has joined #ocaml
<jdavis>
Also, is there a reason that, in a hybrid language like ocaml, they don't allow you to tag a function as having side effects, so that the compiler knows what rearrangements are safe?
<bluestorm>
jdavis: you have to know that the ocaml compiler generally doesn't optimize code
<flux>
jdavis, and if accidentally a function is marked pure but it isn't?
<bluestorm>
OCaml programs have good performance because of a very efficient GC and good data representation choices
<bluestorm>
but other than that it does not try to be clever and mostly compiles the code as you write it
<flux>
jdavis, if you are thinking that the compiler can give errors for that case, then how about the case of a function that is pure but for example implements memoization via impure means (a hash table)
<flux>
so, it's not that simple
<bluestorm>
don't expect dubious wonders from a "sufficiently clever compiler"
<flux>
now now, the glasgow haskell compiler is indeed quite clever
<bluestorm>
if you want memoization, you can also write the memoization layer yourself, it's probably a good exercise
<jdavis>
Oh, I see. Interesting.
<flux>
although it can be too easy to inadvertently make a mis-step and disable some cool optimization
<bluestorm>
write memo : ('a -> 'b) -> ('a -> 'b) that takes any one-parameter function, supposing it's functional, and returns a memoized version
<bluestorm>
flux: I have the impression that the haskeller in the street (not the uber-ghc-gurus) have a hard time having reliable and predictible performance with their haskell code
<bluestorm>
but the GHC works is indeed very interesting
authentic has quit [Connection reset by peer]
<jdavis>
flux: yeah, little changes that disable optimizations are tricky. That's the same trouble people get into writing SQL queries :)
<bluestorm>
(and their recent switch to a strict core language is quite revealing)
authentic has joined #ocaml
<jdavis>
bluestorm: what do you mean by a strict core language?
<jdavis>
bluestorm: I thought laziness was next to godliness (or something ... :)
<jdavis>
bluestorm: I tried googling "haskell strict core language" and didn't come up with anything. I know about strict evaluation versus lazy, but I don't know exactly what you mean when you say their core language is strict.
authentic has quit [Remote closed the connection]
<bluestorm>
most compiler have a desugared-decomplexified implementation of their frontend language
<bluestorm>
GHC has a Core language, wich is Haskell reduced to core concepts (no types classes etc.), on wich it applies most of it's optimisation
<bluestorm>
(stream fusion etc.)
<bluestorm>
there was a recent paper from SPJ about a new Core language for GHC, with a strict evaluation semantic, that would allow for a finer-grained representation of calling conventions and such
<jdavis>
I think that it's always possible to make a strict, imperative program with the optimal implementation. But some languages make it much easier to get performance close to the optimal solution without so much work.
<bluestorm>
(don't take it as an attack againt Haskell : I quite appreciate Haskell as a language, I'm just not convinced that lazyness-by-default is the good choice, and apparently SPJ isn't either (famous quote "the next haskell will be pure and strict"); I also think that you made the right choice trying to learn OCaml first : learning the new Haskell concept later is facilited by knowing OCaml, and Ocaml is interesting in it's own and easier to learn for the i
<bluestorm>
mperative newcomer)
<bluestorm>
s/concept/concepts/
<jdavis>
I can't quite explain it, but I've certainly been drawn more to ocaml. I write stuff, and I feel confident that if it compiles, it will probably work.
authentic has joined #ocaml
<jdavis>
Thanks bluestorm and flux, this has been enlightening.
authentic has quit [Connection reset by peer]
authentic has joined #ocaml
authentic has quit [Remote closed the connection]
authentic has joined #ocaml
authentic has quit [Remote closed the connection]
bluestorm has quit [Remote closed the connection]
authentic has joined #ocaml
authentic has quit [Read error: 54 (Connection reset by peer)]
authentic has joined #ocaml
authentic has quit [Remote closed the connection]
authentic has joined #ocaml
authentic has quit [Remote closed the connection]
authentic has joined #ocaml
julm_ has joined #ocaml
authentic has quit [Connection reset by peer]
authentic has joined #ocaml
authentic has quit [Remote closed the connection]
authentic has joined #ocaml
nicholas_ has joined #ocaml
<nicholas_>
I'm getting an error from ocaml that doesn't seem to be happening for the obvious reason: