<rproust>
and btw, pretty-printers are not suppose to output pretty source code, but merely source code. The other printer for camlp4 is an AST dump!
<rproust>
you can prettyfy the outputed code with "gg=G" in vim, <don't know> in emacs
ikaros has quit [Quit: Leave the magic to Houdini]
<hcarty>
rproust: I think the printer can do some beautification. There is a big difference between the revised syntax output and the standard syntax output.
MUILTFN has joined #ocaml
joewilliams_away is now known as joewilliams
jonafan_ is now known as jonafan
<thelema>
looking for data structure advice: want to push (item,points) pairs sequentially and get out the top n items by points
<thelema>
Sorted doubly-linked-list seems workable but maybe slow, although n = 100, will push 1E6 pairs in
<thelema>
Is there anything better?
<jld>
Some kind of heap? Or am I missing somehting about the requirements?
<thelema>
well, I don't really want to keep the 999,900 non-top pairs
<jld>
Ah.
<thelema>
specifically, I want to fold across 1E6 pairs and end up with the top 100 items
<thelema>
I'm thinking a map, keyed by points and inserting if better than the worst and dropping the worst on insert (after the map has 100 entries)
<jld>
That's the best thing I've thought of.
<thelema>
thanks
Yoric has quit [Quit: Yoric]
<mrvn>
Make a heap and invert the score. Heap.insert x; if Heap.size > 100 then pop
<mrvn>
At the end pop the 100 elements into a list (which also reverse them back to right order)
<mrvn>
Or just plain keep a sorted list of the top 100 and insert. That's O(100 n)
<mrvn>
Most elements will be less than the first element in the list too (smallest first).
<mrvn>
Keeping the so far 100th element cached and comparing any new element against that before inserting should give a nice speedup if you use some other structure than a list.
<thelema>
mrvn: good ideas. So far I've implemented the map version with the cached min_value
<thelema>
not yet. I'm currently extending the implementation to use many of these
<thelema>
basically I'm categorizing input data (standard mahcine learning), and need to report the n most certain results
<thelema>
in each category
brendan has joined #ocaml
avsm has joined #ocaml
Smerdyakov has joined #ocaml
<thelema>
with line 4: 4.43s, without: 4.25s -- apparently not so important
<thelema>
that said, this is benchmarking the whole fold across all the data, which does more than top_n
<thelema>
mrvn: this is only on 125K datapoints, and I'm still keeping the min_k value...
sepp2k has quit [Quit: Leaving.]
<thelema>
huh, oddly the time goes up when I don't maintain min_k
ulfdoz has joined #ocaml
avsm has quit [Quit: Leaving.]
schme has joined #ocaml
schme has quit [Changing host]
schme has joined #ocaml
Smerdyakov has quit [Quit: Leaving]
<mrvn>
thelema: Does Map have special code to insert a new min element?
<thelema>
mrvn: no
<joelr>
is there a way to write a function that works on all records that have at least field foo? the records will be in different modules but the field name and type will always be the same
<mrvn>
joelr: yes and no. What you do is you git it an accessor function.
<thelema>
joelr: not possible, because the fieldname can't be guaranteed to be in the same offset in the record
<joelr>
darn
<thelema>
you can cheat and use Obj to get data directly out of a certain position from the record, but that's double-plus bad.
<mrvn>
joelr: do_somthing_with_record (fun x -> x.foo) records
<joelr>
mrvn: that do_something fun would need to work on many records of -different- types
<joelr>
thelema: there's a benefit to objects, apparently
<mrvn>
joelr: and for each type you give it a different closure
<thelema>
joelr: each type gets its own x -> x.foo closure to access the right position in x
<joelr>
as i was able to rely on the presence of a method with a given signature
<joelr>
i see
<thelema>
yes, objects are structurally typed, records are nominally typed
<mrvn>
Could one use a functor to eliminate the closure?
<joelr>
and since the closure has the same type it should always work
<thelema>
mrvn: I don't think that would help in terms of performance. It might help in terms of code cleanliness, but functors aren't so light
boscop_ has joined #ocaml
<mrvn>
thelema: sure. but I wonder how one would even write the functor. you can't express "type t = record with field foo"
<thelema>
mrvn: type r val get_foo : r -> int
<mrvn>
thelema: that isn't eliminating the closure. That just gives it a name.
<thelema>
true. Then no, you can't eliminate the closure using a functor.
<mrvn>
would be nice if records had prefix coercion.
boscop has quit [Ping timeout: 246 seconds]
Yoric has joined #ocaml
<joelr>
darn
<rproust>
there was a "polymorphic record" syntax extension available at some point
<rproust>
idk if it has been ported to 3.12
<joelr>
i guess i'll have to gather into a module all the different functions that work on my records and then specialize that on the record type and assign the various functions
<mrvn>
or stick with objects
<joelr>
can't stick with objects :-(
<joelr>
switched from thrift to piqi
<joelr>
the former uses objects and the latter uses records
<thelema>
joelr: any reason you can't use ext-prot?
<joelr>
thelema: well... compatibility with java and such. piqi is protocol buffers.
<joelr>
what's so good about extprot? that was my preference but i have to get the team to agree
<joelr>
thelema: are you the guy behind extprot?
<thelema>
I've never used them. Jane street is behind extprot
<thelema>
iirc
<joelr>
ah!
<joelr>
mfp: ^
<thelema>
or not... mfp
<joelr>
i see that mfp has forked extprot, maybe he knows
<thelema>
I guess jane street only uses sexplib
<joelr>
thelema: how would extprot help me?
<thelema>
ah, jane street uses bin_prot. anyway...
<joelr>
bin_prot? where's that
<thelema>
joelr: I don't know, but I imagine something built in the ocaml world will have less of these impedence mismatches that you're running into
<mfp>
joelr: gtg in a bit, but some info about extprot (I wrote it) > extprot's raison d'être is allowing backward/forward-compatible protocol extensions
fraggle_ has quit [Ping timeout: 276 seconds]
<mfp>
it is designed to allow a posteriori changes to the binary protocol without having to discard all the previous data
<mfp>
which Marshal or bin_prot don't help with
smerz has joined #ocaml
<thelema>
mfp: do you know enough about zeromq or piqi to compare?
<mfp>
I don't know much about piqi; it was released after I'd written extprot, and didn't find any reason to switch to it
<mfp>
not sure it supports algebraic types
<mfp>
as for zeromq, well, it doesn't care about the messages it transports, does it?
<mfp>
(so it's completely orthogonal to the serialization format)
<hcarty>
mrvn: I'm not sure. kaustuv did some first class module tests around 3.12.0's release, and I think they do carry some overhead in a case like this.
* mfp
goes
<joelr>
mfp: thanks
<mrvn>
if I had to implement this I would create a record of all the functions of the module and pass that around.
<hcarty>
mrvn: From what I remember reading, I think that's roughly equivalent to what first class modules do internally
<mrvn>
The difference to objects is then that objects use a hash to lookup methods while first class modules can use a fixed offset.
fraggle_ has joined #ocaml
<mrvn>
can you coerce first class modules?
<orbitz>
What are the language reasons for the compiler know if value v is a record type M.t that v.r is obvioulsy v.M.r?
<orbitz>
what problem does it introduce to assume that?
<mrvn>
orbitz: The compiler doesn't know.
<thelema>
orbitz: it works the other way around - v.M.r implies that v is of type M.t
<hcarty>
mrvn: I'm not sure if it counts as coercion, but I think you apply more restrictive but still matching signatures
<mrvn>
and usualy you don't know that v is a M.t before you infer that from v.M.r
<mrvn>
hcarty: so with include and coercion you have inheritance like objects.
<orbitz>
ah ok
<hcarty>
mrvn: ex. let module M = (val m : M_sig) in ...
<orbitz>
fair enough
<hcarty>
mrvn: I believe so. There was a mailing list post along these lines recently
<mrvn>
orbitz: Worse if you have sub records: v.M.r.M.Sub.x
<orbitz>
mrvn: it quickly becomes painul
<mrvn>
v.M.r already gives the full type so the M.Sub. is a bit useless
<orbitz>
althouhg, if i know v is M.t shouldn't that tell me the rest since that record sohudl have ac omplete type deifiniton?
<orbitz>
mrvn: you stole the words form my mouht :)
ymasory has joined #ocaml
<mrvn>
I guess v.M.r.x would make it a bit random where you need the full path and where not.
<mrvn>
Think of it this way: v is a pointer, M.r and M.Sub.x are offsets and you are accessing *(v + M.r + M.sub.x)