<thelema>
vixey: there is in batteries -- Array.mapi
<thelema>
it's pretty easy to write:
<vixey>
I did write one version
<thelema>
or maybe not -- it seems Array.mapi is standard?
<vixey>
oh brilliant
<vixey>
ok I can earse my version :)
GustNG1 has quit [Read error: 110 (Connection timed out)]
<vixey>
there is a strange warning:
<vixey>
this expression should have type unit
<vixey>
and the line of code is: ()
<thelema>
context?
<vixey>
it's in a try ... with
<vixey>
but () does have type unit
<thelema>
yes, it does.
<thelema>
can you pastebin the enclosing function?
<vixey>
oh it doesn't matter, I just thought it was a funny warning
<vixey>
I guess [| |] camlp4
<thelema>
[| |] is array creation
<vixey>
ohh
longh has quit [Read error: 104 (Connection reset by peer)]
thelema has quit [Read error: 110 (Connection timed out)]
mohbana has joined #ocaml
<mohbana>
i want to buy the Purely Functional Data Structures book but i'm slightly worried about the syntax
<mohbana>
is ocaml ml?
<ozy`>
more or less, yeah
<ozy`>
OCaml is not purely functional, though..... was the original ML purely functional?
<ozy`>
'cause as far as I know, it's basically ML + some syntax additions and such
<ozy`>
(for OOP et al)
<mohbana>
what's the best free ebook available in pdf?
<ozy`>
I'd have to say War and Peace
<ozy`>
but if you're specifically asking about OCaml books, I'm not so sure
mohbana has quit ["Ex-Chat"]
Palace_Chan has quit [Client Quit]
Palace_Chan has joined #ocaml
pastasauce has quit []
Associat0r has quit []
thelema has joined #ocaml
yziquel has quit [Read error: 110 (Connection timed out)]
<lorph>
when changing an element of a binary tree, is it better to recreate the whole tree or to use a tree with mutable refs
<ozy`>
that depends on your concerns with respect to the tree
<lorph>
well i was just wondering, coming from an imperative language
<lorph>
i just find it a hassle to recreate the whole tree
<ozy`>
do you need it to be fast, and use as small an amount of memory as possible? or do you need it to be robust and easily shared among threads and functions?
<lorph>
how is remaking the tree robust
<ozy`>
if you guarantee that a tree's nodes are immutable once they're created, you can reuse them across as many trees as you like
<ozy`>
(ie. different versions of the same tree)
electronx has joined #ocaml
delamarche_ has quit []
sporkmonger has quit []
seafood has quit []
Palace_Chan has quit [Client Quit]
ygrek_ has joined #ocaml
ygrek_ has quit [Remote closed the connection]
ygrek_ has joined #ocaml
ygrek_ has quit [Remote closed the connection]
ygrek_ has joined #ocaml
ygrek_ has quit [Remote closed the connection]
ygrek_ has joined #ocaml
Camarade_Tux has joined #ocaml
ygrek_ has quit [Remote closed the connection]
ygrek_ has joined #ocaml
ched has joined #ocaml
marmotine has joined #ocaml
GustNG has joined #ocaml
ulfdoz has joined #ocaml
authentic has quit [Read error: 104 (Connection reset by peer)]
munga_ has joined #ocaml
munga_ has quit [Client Quit]
Yoric[DT] has joined #ocaml
<Yoric[DT]>
hi
GustNG1 has joined #ocaml
GustNG2 has joined #ocaml
<Yoric[DT]>
grmph, inria has been down for 3 days, now.
GustNG has quit [Read error: 60 (Operation timed out)]
<Camarade_Tux>
well, it's week-end, I don't think we can expect it to resurrect before monday
_zack has joined #ocaml
GustNG1 has quit [Read error: 110 (Connection timed out)]
<flux>
yoric[dt], btw, have you thought of adding *Labels-modules more pervasively in Batteries?
<flux>
I don't konw if anyone uses them, though :)
<flux>
(but it'd be more consistent)
_zack has quit [Read error: 104 (Connection reset by peer)]
hkBst has joined #ocaml
Linktim has joined #ocaml
<Yoric[DT]>
flux: Yeah, I've thought about it.
<Yoric[DT]>
It's probably going to happen at some point.
<Yoric[DT]>
But not just quite yet :)
<flux>
yoric[dt], I was also thinking if it'd be cleaner, if the Labels-version was just a submodule under the main one..
<flux>
actually that might conflict with 'open'ing modules, so perhaps it's not a good idea
<flux>
atleast for modules that aren't mostly functors
hkBst has quit []
hkBst has joined #ocaml
hkBst has quit [Remote closed the connection]
hkBst has joined #ocaml
_zack has joined #ocaml
mamie_cracra has joined #ocaml
itewsh has joined #ocaml
marmotine has quit [Read error: 113 (No route to host)]
jlouis has quit ["Leaving"]
snhmib has joined #ocaml
_zack has quit ["Leaving."]
itewsh has quit ["KTHXBYE"]
GustNG1 has joined #ocaml
Snark_ has joined #ocaml
<Yoric[DT]>
flux: Actually, I think it's a good idea.
GustNG3 has joined #ocaml
GustNG4 has joined #ocaml
<Yoric[DT]>
flux: opening the Label version would therefore look like [open Data.Persistent.List, Labels]
<flux>
yoric[dt], not bad
<flux>
dang, and it was this weekend when I wanted to google for stuff that's in inria archives..
<Yoric[DT]>
Yeah, me too.
* Yoric[DT]
needed the Camlp4 manual and Averell.
<Yoric[DT]>
Gottago, though.
Yoric[DT] has quit ["Ex-Chat"]
electronx has quit []
jlouis has joined #ocaml
GustNG2 has quit [Read error: 110 (Connection timed out)]
<flux>
hm, increasing space overhead from 80% to 10000% drops run time from 2.4s to 0.9s
<flux>
even 1000% gives only 1.3s
<flux>
whops, profiling was left on :)
<Camarade_Tux>
flux, you can play with the other parameters too, make the Gc verbose and change the parameters until you see no more activity ;)
<flux>
but I doubt if affects the ratio
<flux>
camarade_tux, yeah, but I suppose I still want _some_ activity..
<flux>
but perhaps not for that stage of the program
<flux>
(it's a mailbox threader)
<flux>
but it just seems so slow :(
<flux>
only 1000 messages
<Camarade_Tux>
flux, exactly, you disable it for a moments and restore the parameters :)
<Camarade_Tux>
what are you processing exactly ?
<flux>
it's not particularly fast even without profiling, 0.5 at its best
<flux>
1000 most recent posts to caml mailing list, and building a threaded data structure out of those
<Camarade_Tux>
maybe you're doing a lot of allocations, that's very easy to do in ocaml
<flux>
regarding the algorithm's original implementation: My C version of this code was able to thread 10,000 messages in less than half a second on a low-end (90 MHz) Pentium, so the argument that it has to be in the database for efficiency is pure bunk.
<flux>
so I'm not doing very good here :)
GustNG1 has quit [Read error: 110 (Connection timed out)]
GustNG3 has quit [Read error: 110 (Connection timed out)]
<flux>
I was thinking the other day if it would be fun to write profile data analyzation software in ocaml..
<flux>
but I guess just reading the data in can be a bit of an effort
* Camarade_Tux
pings Asmadeus so he looks at what flux is doing
<flux>
man, gprof -A apparently doesn't work with ocaml :(
<Camarade_Tux>
flux, not that surprising though
<Camarade_Tux>
flux, may I see the profiling output ?
<flux>
camarade_tux, for what kind of run?
itewsh has joined #ocaml
<Camarade_Tux>
with default gc params ?
<flux>
takes a while (I'm behind a cell phone link), but it's going to : modeemi.fi/~flux/gprof.txt
<flux>
it's there
<Camarade_Tux>
that's gc activity indeed :P
<Camarade_Tux>
you shoulf try to process more mails with gc disabled, your algorithm is too fast for you to have an informative profile
<flux>
true
<flux>
I guess I have bigger mail boxes around
<flux>
..but transferring loads of data over this network sucks :)
<flux>
perhaps I could just remove the content and leave the headers
mamie_cracra has quit [Excess Flood]
<Camarade_Tux>
flux, I guess ;)
mamie_cracra has joined #ocaml
<Camarade_Tux>
maybe I can get it for you and compress it before transfer ;)
GustNG has joined #ocaml
<flux>
I can access a remote shell, that's no problem
<flux>
finding a proper grep command to extract the significant info out :)
<Camarade_Tux>
lol ;)
<Camarade_Tux>
btw, what are you using to parse the mails ? sexplib ?
<flux>
hmm, how would sexplib help with mailboxes? I have my own custom code to do that
<flux>
the ~90M mailbox compressed to 19M - not good. but the version with headers compressed took only 400k
<Camarade_Tux>
I can't remember precisely but Asmadeus had shown me that the messages were (approximately) stored with S-expressions, can't remember precisely as I did not code it though
<Camarade_Tux>
90MB is maybe a lot already...
<flux>
about 10k messages
<flux>
I wonder if the algorithm can handle that without errors ;)
<flux>
funny
<flux>
it takes also about 2.0 seconds to handle that data :-)
<flux>
perhaps the time is wasted in reading the mailbox, not in doing the threading :)
<flux>
other alternatives: the contents of v4l mailing list is dramatically different from caml, or my algorithm _really_ scales by being faster when there's more data :)
<Camarade_Tux>
flux, I thought about that reading the profiling data, it should be in cache, try to cat to /dev/null more unique data that can fit into your RAM, it should show whether it is I/O bound :)
<flux>
camarade_tux, it would be, but parsing the mailboxes could still take some time
<Camarade_Tux>
s/that can fit/than what can fit/
<Camarade_Tux>
flux, of course but at least, it'd show whether I/O is important for your program or not
<flux>
right
<flux>
Evaluating mailbox took 2.676973 seconds
<flux>
Evaluating threading took 0.163878 seconds
<flux>
it's still all cpu time
<flux>
but I guess I found something else to potimize :)
<flux>
with v4lshort (the file I call with the headers of 10k messages) the threading still takes 0.6 seconds
<flux>
and it's still gc-happy
<flux>
reading that v4lshort btw takes 1.4 seconds
<Camarade_Tux>
well, it's not bad already ;)
GustNG4 has quit [Read error: 110 (Connection timed out)]
<flux>
did you read that earlier quote? 0.5 seconds for 10k messages _on a pentium/90MHz_.. ;-)
<Camarade_Tux>
yeah ;)
<Camarade_Tux>
and I can't say anything, I did the same for my patricia tree, getting mad about optimization ;)
<flux>
wasn't there a (discontinued) memory profiler for ocaml?
<flux>
0.7 seconds of 1.4 seconds is taken by caml_input_scan_line
<flux>
ah, neve rmind, I misread it :-)
<flux>
although it _does_ look suboptimal, with its memmoving stuff :-o. I don't know how often that is performed, though
<flux>
I guess my best bet is still reducing the number of allocations
<Camarade_Tux>
flux, yes, a patch, memprof-ocaml I think
<Camarade_Tux>
I couldn't get it to compile with 3.10 though
<Camarade_Tux>
it works with 3.09, and aren't you using that version ?
<flux>
not here
<flux>
(3.10.2)
<Camarade_Tux>
you can still try to update the patch, I did not try that hard
GustNG1 has joined #ocaml
Linktim_ has joined #ocaml
Associat0r has joined #ocaml
<thelema>
who is omnion or reed wilson?
hsuh has quit [Remote closed the connection]
<flux>
surprisingly (?) removing internizing of headers made a big difference
<flux>
making mailbox parsing more imperative (instead of using Enum) to avoid allocations helped very little
Linktim has quit [Read error: 113 (No route to host)]
<thelema>
Enum seems generally pretty efficient
GustNG has quit [Read error: 110 (Connection timed out)]
ygrek_ has quit [Remote closed the connection]
itewsh has quit [Connection timed out]
itewsh has joined #ocaml
ygrek_ has joined #ocaml
GustNG has joined #ocaml
GustNG has quit [Client Quit]
<flux>
man.. my runtime dropped from 2.3 seconds to 0.5 seconds by replacing Pcre.split ~max:2 ~pat:" " with Pcre.split ~max:2 ~rex:space_re
<flux>
no profiling told me this, though
<flux>
I guess the c code isn't covered by profiling
<flux>
?-o
<flux>
I wonder is just writing that split myself could be even faster..
olegfink has joined #ocaml
Demitar has quit [Remote closed the connection]
<Camarade_Tux>
flux, I think it is somehow but it probably isn't visible in something as big as pcre
longh has joined #ocaml
GustNG1 has quit [Read error: 110 (Connection timed out)]
mamie_cracra has quit [Read error: 113 (No route to host)]
ofaurax has joined #ocaml
<flux>
funny how filtering headers (keys and values) through let internize () = let h = Hashtbl.create 1000 in fun str -> tryHashtbl.find h str with Not_found ->Hashtbl.add h str str;str made a big difference too
<flux>
and to top it, it doesn't even seem to save any memory..
Smerdyakov has quit ["Leaving"]
Linktim has joined #ocaml
Smerdyakov has joined #ocaml
Amorphous has quit [Success]
Linktim has quit ["Quitte"]
Amorphous has joined #ocaml
Linktim_ has quit [Read error: 110 (Connection timed out)]
<mfp>
flux: it *was* shown by the profile:
<mfp>
[3] 76.5 0.00 1.48 pcre_compile_stub [3]
<mfp>
0.00 1.48 51835/51835 caml_alloc_final [2]
<Camarade_Tux>
apparently not taking much time :o
<mfp>
?
<mfp>
that shows that pcre_compile_stub itself takes little time, but its calls to caml_alloc_final represent 76.5% of the CPU time
<mfp>
because of v_rex = caml_alloc_final(4, pcre_dealloc_regexp, 100, 50000); when compiling a regexp
<mfp>
btw. a custom line reader + split-on-space should be quite a bit faster than Pcre.split
<Camarade_Tux>
mfp, right, it's a bit hard to read out of context ;)
<mfp>
split uses an internal array for the subgroups (of len >= 3), and the regexp scans the string slower than a hardcoded loop
<mfp>
iow. by replacing (input_line + Pcre.split) with a custom (block reader + tokenizer), you avoid alloc for the line + for the internal subgroup array, and split faster
<flux>
hm, why doesn't pcre_compile_stub show in the flat fprofile?
<mfp>
because all the time is spent on its child
<mfp>
the flat profile is self-time
<flux>
and caml_alloc_small also is quite far from the top?
<mfp>
that's because caml_alloc_small doesn't take much time itself either
|aymeric| has quit [Remote closed the connection]
<mfp>
it's the GC triggered by caml_alloc_small that is slow
* thelema
remembers an ocaml slice library
<flux>
so I should've seen the high number of allocations from the table
<mfp>
uh and pcre_compile_stub uses caml_alloc_final anyway, not _small
<mfp>
(even though most of the time should be spent in the regexp matching)
itewsh has quit ["KTHXBYE"]
<flux>
actually only 7% speedup from using custom loop
Demitar has joined #ocaml
<mfp>
flux: getting rid of the allocation comes first, did you ditch input_line?
<mfp>
(optimizing the loop is only going to make a diff if it represents a large % of the CPU time anyway)
pango has joined #ocaml
<flux>
well, getting rid of input_line is bothersome :)
<mfp>
think of it in terms of allocation
<mfp>
what's killing you is the major GC
<mfp>
1 major GC slice per minor GC run
<mfp>
the less allocation, the fewer major GC slices
<mfp>
now, say a typical line takes ~70 chars
<mfp>
the structures allocated by pcre took like 7 words or so
<mfp>
are you on x64 or x86?
<mfp>
if the latter, that represents 28 bytes < 74 bytes attributable to the line
<mfp>
so you need to do w/o input_line to reduce the allocation rate by 75% =~ decreasing major GC time by that amount
* thelema
knows there's a library out there whose fundamental data structure is string * int * int
<mfp>
if enlarging the minor heap makes your program significantly faster, you need to find ways to allocate less (or live with a large minor heap, but that makes other areas slower due to lesser locality :-)
johnnowak has joined #ocaml
<thelema>
ah, that was it -- bitstring
<thelema>
but it's bits, not bytes
sporkmonger has joined #ocaml
nuncanada has joined #ocaml
<flux>
mfp, the thing is, Std.input_lines f and passing the resulting Enum.t around gives nice and readable code ;)
<flux>
I suppose I could write a module for parsing a file while avoiding allocations as much as possible
<flux>
but replacing the enum with the file handle changed very little performance-wise
<flux>
(and thus avoiding the option-typed return values from Enum)
<thelema>
maybe sticking with enum could allow you to chunk in bigger pieces
<thelema>
(which seems to be mfp's suggestion
<flux>
but I suppose for this purpose it is not that as useful to reduce the number of allocations than it is reducing the number of bytes allocated in total
<thelema>
do you do much string copying? ocaml doesn't have great string slicing facilities
<flux>
I do some substringing, but removing that didn't affect much
<thelema>
unlike C, where you can replace some chars with nulls and allocate pointers into your read-in string to produce an array of substrings
<flux>
I guess input_line also does substringing internally to slice result strings from the input buffer
<Raevel>
nä jag /wc
<Raevel>
heh oops
<thelema>
flux: no, it seems to have a c function to search for the next newline and return the number of chars to read
<Smerdyakov>
SML has a great [Substring] module. ;)
<thelema>
Smerdyakov: that's where I'd seen it before. thanks for reminding me
<flux>
parse_kv is actually strange in that it splits by space. it's because I used to Pcre.split by the space..
<thelema>
flux: extlib but not batteries... :(
<thelema>
if you're splitting on a single char, jane st.'s core has some good code for that.
pierre- has joined #ocaml
<flux>
I don't tihnk that particular code anymore takes a significant part of time
<thelema>
fair enough.
<thelema>
then what does take a while? all the code I've read so far seems like it'll operate on small lists, so doesn't need to be efficient
<flux>
well, gc takes 41%
<thelema>
do you have any long message bodies -- Array.of_list (List.rev (loop [])) could be optimized with a Array.of_list_rev
<flux>
caml_input_scan_line takes 11%
<thelema>
that's the c lookahead / buffer-management function
<flux>
I'd like to see gprof --annotated-source working :)
<flux>
yeah, figured that much. but rewriting that not to be line based would be a bit more low-level than I'd like
<thelema>
List.rev is usually indicative of somewhere that GC could take a while.
<thelema>
instead of your custom loop function, extlib's tail recursive map would work, no?
<thelema>
(in parse)
<mfp>
flux: do you have a recent profile?
<thelema>
oops, you'd need a tail-recursive unfold
<flux>
modeemi.fi/~flux/gprof.txt is updated
<flux>
I don't really have big test data here
<thelema>
Std.input_lines f |> Enum.map (parse_message internize) |> Enum.to_list
<flux>
and generating it is bothersome :)
<flux>
thelema, neat, gotta take a deeper look at what ExtLib can do :)
<flux>
(and batteries follows from that)
<flux>
but I need to be off the irc now
<thelema>
(sorry, there's no Enum.to_list function, but do you really need a list?)
<flux>
thanks for suggestions
<flux>
well, not really
<thelema>
cheers
jeddhaberstro has joined #ocaml
<hcarty>
flux and thelema: I think ExtLib.List has a of_enum or similar function
<thelema>
yup.
<thelema>
and it's tail-recursive and doesn't allocate-then-reverse
* thelema
forgot to check in List
<thelema>
grr, type-conv doesn't compile under 3.11
<mfp>
I don't think that changing the loop in parse will make any diff --- that's 2 only words of allocation per message (which should have several lines), totally dominated by the per-line allocation from input_line inside Std.input_lines
<thelema>
but he keeps all that data, no?
<mfp>
yes, and?
<thelema>
you're arguing for bigger chunks, no?
<thelema>
not line-by-line, but pulling in big buffers and... using String.sub to pull out the lines?
<mfp>
hmm he'd benefit from some string * int * int structure instead of string
<mfp>
no, String.sub defeats the purpose
<thelema>
I guess I should code up an ocaml substring library
<thelema>
Yoric[DT]: Rope.of_enum pulls up to 256 uchars from the input into a buffer, and then concats that buffer's contents onto the end of the existing rope.
<Yoric[DT]>
Where do you see that my functions are identical?
<thelema>
+ let is_lowercase c = match Info.general_category c with `Lu -> true | _ -> false
<thelema>
+ let is_uppercase c = match Info.general_category c with `Lu -> true | _ -> false
<thelema>
you probably meant `Ll in the first one.
<Yoric[DT]>
Ah, in UChar.
<Yoric[DT]>
Oops.
<Yoric[DT]>
Sorry, wife calling.
* thelema
fixes that
pierre- has quit [Read error: 60 (Operation timed out)]
Camarade_Tux has quit [Read error: 110 (Connection timed out)]
<johnnowak>
in ocaml, i can declare type abbreviations (e.g. type foo = int * string), but they will not appear in signatures unless i explicitly give the signature and use the abbreviation
<johnnowak>
is anyone aware of a language that can show a signature, applying abbreviations as appropriate?
<thelema>
correct, type abbreviations like that are optional.
<thelema>
the compiler doesn't look for an appropriate abbreviation when it prints the type of a value
<johnnowak>
aye, that i realize. i'm curious if any compiler does.
<TaXules>
johnnowak: how does it know which one you want ?
<johnnowak>
TaXules: it would likely have to show a "best match" of some sort
<TaXules>
this is why you have to tell it explicitly which one you want
<johnnowak>
i understand why, i'm just asking
<thelema>
none that I know of.
<johnnowak>
hm. thanks.
<TaXules>
yes, it should be able to choose the "longest" type
<thelema>
I guess it's not considered valuable by those writing compilers.
<TaXules>
and there is no perfect solution
<Smerdyakov>
...or it's considered intractable computationally.
<johnnowak>
i was just thinking that, if polymorphic variants were to supplant discriminated unions, you might need such a facility
Fullma` has joined #ocaml
<TaXules>
because if you have foo = int * string and bar = int * string , which one do you want to use ?
Fullma has quit [Nick collision from services.]
<johnnowak>
it could show both
<johnnowak>
foo : int * string -> int * string
<johnnowak>
foo : 'a -> 'a where a = foo | bar
<johnnowak>
or something along those lines
Fullma` is now known as Fullma
<TaXules>
in fact the solution is to tell it what type to choose for a given function and then it knows when you use this function that the type is foo or bar or int * string
<johnnowak>
a related question... is there any language that offers only polymorphic variants?
<TaXules>
so you don't have to tell it at each function
<johnnowak>
i think morrow may be one
<johnnowak>
TaXules: aye, could work
ofaurax has quit [Remote closed the connection]
Fullma has quit ["Quitte"]
<thelema>
biab
_zack has quit ["Leaving."]
_zack has joined #ocaml
Linktim_ has quit [Read error: 110 (Connection timed out)]
<_zack>
tuareg mode hasn't a function to switch from .ml to .mli and vice-versa, has it?
<_zack>
or am I missing some more generic Emacs way to achieve that?
Snark_ has quit ["Ex-Chat"]
fridim_ has joined #ocaml
<thelema>
C-c C-a
Linktim has joined #ocaml
ygrek_ has quit [Remote closed the connection]
_zack1 has joined #ocaml
_zack has quit [Read error: 104 (Connection reset by peer)]
<_zack1>
thelema: ah, cool, now I only need to generalize it to other languages ;)
<flux>
_zack1, I've got one function that works atleast in xemacs. it's also butt-ugly, apparently I didn't knew much less elisp back then. want it?-)
<_zack1>
flux: sure, why not, paste it into some paste bin, thanks!
<_zack1>
but I'll try to write my own anyhow, it is a good excuse to learn a bit of elisp too
hkBst has quit [Read error: 104 (Connection reset by peer)]
mjonsson has quit [Read error: 110 (Connection timed out)]
<thelema>
hmm, the following line doesn't work under 3.11: | <:ctyp< ? >> -> nil _loc
<thelema>
it wants [a_LIDENT] after the ?
<Yoric[DT]>
Sorry, gottago.
Yoric[DT] has quit ["Ex-Chat"]
Submarine has quit [Remote closed the connection]
Linktim has quit ["Quitte"]
nuncanada has quit ["Leaving"]
zbrown has quit [Remote closed the connection]
zbrown has joined #ocaml
itewsh has joined #ocaml
marmotine has quit ["mv marmotine Laurie"]
itewsh has quit [Remote closed the connection]
sporkmonger has quit []
fridim_ has quit [Remote closed the connection]
<flux>
I don't get it.. Shouldn't internizing the headers of a mailbox (both keys and values) significantly reduce the memory used to store them? yet it looks to have the opposite effect..
<det>
Grr, "ocamlc -where" returns "C:\Ocaml\lib" no matter what I set prefix to on the mingw build
<flux>
(as expected, though, replacing Hashtbl with Map increases memory consumption and takes (much) longer)
<flux>
det, maybe you could find out how -where is originally determined when building
<thelema>
internizing?
<flux>
merging all equal strings into the same object
<thelema>
hmmm... that is odd.
<flux>
it might be that the hash table itself hasn't been collected from the memory when I'm looking that
<flux>
so that could explain it
<flux>
(and the original strings also could still be in the memory, perhaps)
tomh has joined #ocaml
<thelema>
weak hashtable?
<flux>
the hashtable is removed from the memory once the data is loaded
<flux>
I tried my hypothesis, but now it appears also all the data is collected :)