cjohnson has quit [Read error: 131 (Connection reset by peer)]
cjohnson has joined #ocaml
cjohnson has quit ["brb"]
cjohnson has joined #ocaml
Sonarman has joined #ocaml
smimram has quit ["?"]
Skal has quit [Remote closed the connection]
SmerdyMeteor has joined #ocaml
vezenchio has quit ["I live in a yurt on the steppes of Sheepfuckistan. That's why."]
a-zwei has joined #ocaml
<a-zwei>
is anyone in here?
<SmerdyMeteor>
Ooooh yes.
SmerdyMeteor has quit ["but not anymore!"]
<Riastradh>
Nope.
<a-zwei>
I have a question about building a multiple-module library
cjohnson has quit [Read error: 60 (Operation timed out)]
<a-zwei>
I have the one module (.ml file), with interface (.mli), and then I have another auxiliary module in a separate file that needs a level of access to the main module not provided in the .mli
<a-zwei>
is there any good way of doing this sort of thing?
<TheDracle>
What do you mean "level of access"?
<a-zwei>
well, the .mli just declares "t" as an abstract type
<a-zwei>
but the other module needs to know more about the type
<TheDracle>
Maybe you should have it open the module directly then?
<a-zwei>
this would bypass the exported interface?
<a-zwei>
(sorry if I use confusing terminology; I'm new to ocaml)
<TheDracle>
Well, if you don't have a mli, you can just open Mymodule.
<TheDracle>
You'll have access to everything inside of it.
mfurr has quit ["Client Exiting"]
<Riastradh>
It would probably be easiest to write a foo_internal.ml & foo_internal.mli, which the auxiliary module would open and which foo.ml & foo.mli would be thin wrappers over.
<TheDracle>
a-zwei: The compiler will automatically generate a .cmi file for it.
<a-zwei>
true, but the main module contains many helper functions which the user needn't be aware of, and . . . well, I was going by the example of the standard library, thinking that it's desirable to hide implementation details (such as the details of the data structure). is this not how it should be done? I really don't know any ocaml "best practices" . . .
<a-zwei>
hmm . . . the foo_internal idea sounds pretty good
<a-zwei>
thanks! seems to be working well
vodka-goo has quit []
<TheDracle>
a-zwei: You should use modules for doing this.
<TheDracle>
a-zwei: If you want to control what is private and what isn't.
<a-zwei>
you mean actually declare the modules in the file?
gim has quit ["dodo"]
<TheDracle>
Yes.
<a-zwei>
I wondered about that, but I don't actually know how to do it yet
<a-zwei>
I will take a look at that
<TheDracle>
a-zwei: module FooPrivate: sig <exposed vals and types> end = Foo;;
<TheDracle>
Then declare Foo somewhere.
<TheDracle>
And you can have a FooPublic with a more restrictive sig.
<a-zwei>
cool
<a-zwei>
thanks
<a-zwei>
well, gotta go now, but thanks for the help!
a-zwei has quit ["Leaving"]
__DL__ has quit [Remote closed the connection]
Sonarman has quit [Read error: 104 (Connection reset by peer)]
Sonarman has joined #ocaml
SmerdyMeteor has joined #ocaml
<Nutssh>
TheDracle, I think I tried it but it didn't work because if FooPublic.t is opaque type, then FooPrivate.t != FooPublic.t and typechecking fails.
Erzbergwerkzwerg has quit []
monochrom has quit ["me!"]
mlh has quit [Client Quit]
tintin has joined #ocaml
SmerdyMeteor has quit ["sleep"]
Sonarman has quit ["leaving"]
Msandin has joined #ocaml
mrvn has joined #ocaml
mrvn_ has quit [Read error: 60 (Operation timed out)]
Submarine has joined #ocaml
zzorn has joined #ocaml
zzorn is now known as zzorn_away
nlv11757__ has joined #ocaml
Herrchen has joined #ocaml
Skal has joined #ocaml
m3ga has joined #ocaml
__DL__ has joined #ocaml
smimou has joined #ocaml
m3ga has quit ["disappearing into the sunset"]
mlh_ has joined #ocaml
ejt has joined #ocaml
Submarine has quit ["Leaving"]
Msandin has quit [Read error: 104 (Connection reset by peer)]
_shawn has joined #ocaml
cjohnson has joined #ocaml
shawn_ has quit [Read error: 110 (Connection timed out)]
<vincenz>
How would I compare two lists lexicographically?
<ejt>
where the elements of the list have a compare function ?
<mellum>
well, < is defined for everything in Ocaml
<mellum>
So you would use "l1 < l2"
<ejt>
sure, I wasn't sure that's what vincenz meant
<vincenz>
ejt: yes
<vincenz>
they have type Id.t although under the hood they're just ints
<ejt>
vincenz: give me five minutes to code up an example
<mellum>
let rec list_cmp a b = match a, b with [], _ -> -1 | _, [] -> 1 | a::aa, b::bb -> (match compare a b with 0 -> list_cmp aa bb | x -> x)
<ejt>
vincenz: y
<vincenz>
ejt yours will never give 0
* vincenz
mutters
<mellum>
oh, right. one more case
<vincenz>
this sucks
<ejt>
ah, partition to remove the common prefix, then compare the first elem of the tails ?
mflux has joined #ocaml
<vincenz>
got it
<vincenz>
[] []
<vincenz>
[], _
<vincenz>
_, []
<vincenz>
a::aa; b::bb
<vincenz>
with an internal match compare
Submarine has joined #ocaml
cjohnson has quit [Read error: 110 (Connection timed out)]
Submarine has quit [zelazny.freenode.net irc.freenode.net]
Submarine has joined #ocaml
gim has joined #ocaml
<vincenz>
hmm
<vincenz>
my ocaml application is so slow :(
<ejt>
what does it do ?
gim has quit [zelazny.freenode.net irc.freenode.net]
Submarine has quit [zelazny.freenode.net irc.freenode.net]
<vincenz>
analyse a log file
<vincenz>
a WHOLE bunch of different maps
<vincenz>
that get accessed for every packet
nlv11757__ has left #ocaml []
<vincenz>
and I have about 90M packets
<ejt>
!
<vincenz>
or 70M
<ejt>
can you unify the maps into a single map ?
<vincenz>
no
<vincenz>
they contain different information
<ejt>
so define a new type
<vincenz>
?
gim has joined #ocaml
<ejt>
type map_result = Result1 of blah * blah | Result2 of blah * blah
<vincenz>
euh
<vincenz>
the maps are to be tables
<vincenz>
plus I would slow it down
<vincenz>
think about it
<vincenz>
I'm accessing one bigger structure
<ejt>
the map lookup is O(ln n) ?
<vincenz>
yes
<mflux>
maybe you could try hashes instead of trees?
<vincenz>
and
<ejt>
so accessing 1 big structure will be faster than lots of little ones
<vincenz>
no
<vincenz>
you're not thinking
<vincenz>
a *ln na + b * ln nb < (a+b)* ln (na+nb)
<mflux>
it would perhaps help if we saw the source ;)
<vincenz>
it's quite extensive
<mflux>
well what kind of log are you parsing?
<vincenz>
it's a binary logfile
<vincenz>
custom format
<vincenz>
all memory accesses and allocations
<mflux>
and by which criterion you're catogorizing it
<vincenz>
lots of different things
<vincenz>
accesses/block size
<vincenz>
access/varid
<vincenz>
accesses/scope
<vincenz>
lifetime of blocks
<vincenz>
etc
<mflux>
how much data you have in the maps after a run?
<vincenz>
no idea to be honest
<vincenz>
not TOO much
<vincenz>
it's mostly
<vincenz>
(find...Add one...store)
<mflux>
I was thinking cache-locality and if multi-pass approach could in that case be more efficient
<mflux>
have you profiled for a bottle-neck?
<vincenz>
the temporal aspect is crucial
<vincenz>
I have
<vincenz>
I removed some untyped compares but that is about it
<vincenz>
I even rewrote it once to do away with objects
<vincenz>
cause I'm calling about 200 million methods
<vincenz>
but it didn't matter much
<mflux>
how about using a hash instead of binary trees, which I believe the default Map is?
<vincenz>
you believe this could improve performance?
<mflux>
yes
<ejt>
the map is functional unlike the hash I think
<mflux>
true
<vincenz>
indeed it is
<mflux>
but there are functional hash-implementations around too
<vincenz>
and hash is more performant?
<mflux>
could be
<mflux>
depending on the number of elements
<mflux>
after all, most cases accessing a hash is O(1) :)
<vincenz>
it's about 4 seconds for 1M packets
<vincenz>
and I got about 70M
<mflux>
so it mostly depends on the complexity of the hashing function
<vincenz>
aha
<vincenz>
yes but you're calculating the the hash each time
<Herrchen>
theoretically hashes aren't faster than O(log n)
<mflux>
but it's a simple operation which doesn't require memory operations
<Herrchen>
but pratically with small maps they can improve performance
<mflux>
herrchen, if the statistical properties of the data are known before hand, hashes can be O(1)?
<vincenz>
Maybe I should do away with some of the simpler maps and only keep the complex ones and generate the simple ones at the end based on the complex ones
<Herrchen>
mflux: yes - then - if you know much more than just how to compare two keys or calculate a hash value, you can speed up things of course
<mflux>
so the real world is based on approximations of that
<mflux>
of course, in worst case hashes can be O(n), but that needs to be considered only if it can be an attack vector
<Herrchen>
if you are searching for a functional hash-map, maybe looking at how Icon does it
<Herrchen>
mflux: you can build hashes with worst case O(log n)
<mflux>
herrchen, well sure, but if you're building hashes, it's not often quite as efficient to make the buckets out of trees..
<Herrchen>
the Icon implementation that is ported to Erlang is quite zippie (so iirc the roots were icon)
<mflux>
I suppose in most of the cases it wouldn't hurt either, now that I think of it ;)
<mflux>
only when there are collisions
<vincenz>
I do not think the bottleneck is localized
<mflux>
but sometimes if you expect there are only a few collisions, you could still be more efficient, overall, by using linked lists or arrays and perform scans on them
<mflux>
s/ or / of /
<Herrchen>
mflux: of course, if you have only two items you can throw hashes away, too :)
<mflux>
sometimes at work we discussed it would be nice if you could just use A_Data_Structure in the code, perform some runs with test data with profiler, and then recompile the binary with Optimal_Data_Structure decided by the profiler ;)
<Herrchen>
sometimes the optimal data structure isn't implemented ... :p
<Herrchen>
vincenz: are you're keys a static set?
<vincenz>
?
<Herrchen>
if so - maybe looking at a good paper/textbook describing perfect hashing, this may improve performance
<vincenz>
they are not known at compiletime no
<Herrchen>
Cormen, Leiserson, Rivest, Stein: "Introduction to Algorithms" describe a technique using universal hashing - this should be possible to implement without having to know all keys at compile time
<Herrchen>
but then the keys should be static -- maybe if keys do not change often (so we have a nearly static set of keys), you can gain performance improvements
<Herrchen>
(hope this isn't asked before) have you profiled your current implementation?
<vincenz>
I have
<vincenz>
it's pretty spread out
<vincenz>
I don't like doing it too often as it takes LONG
<vincenz>
mostly compare functions
<vincenz>
cause they're called so often due to the hashmaps
<vincenz>
though
<vincenz>
some of the maps can be derived from the others
<vincenz>
so I might remove some
<Herrchen>
buffering comparisons may be an option, is there some possibility to do so? - or really using a hash-map instead of a tree which might reduce the need for some comparisons
gim has quit [Read error: 110 (Connection timed out)]
gim has joined #ocaml
bzzbzz has joined #ocaml
CosmicRay has joined #ocaml
<vincenz>
hmm
<vincenz>
not sure
mrvn_ has joined #ocaml
vezenchio has joined #ocaml
mrvn has quit [Read error: 110 (Connection timed out)]
_fab has quit [Remote closed the connection]
vezenchio has quit ["I live in a yurt on the steppes of Sheepfuckistan. That's why."]
vezenchio has joined #ocaml
<vincenz>
10GB logfile and counting
<vincenz>
going onto 11
* vincenz
hopes it ends soon
Herrchen has quit ["bye"]
ejt has quit ["leaving"]
Ag_47 has joined #ocaml
<Ag_47>
hiall
Gueben has joined #ocaml
Ag_47 has left #ocaml []
svenl_ has joined #ocaml
svenl has quit [Nick collision from services.]
svenl_ is now known as svenl
pango_ has joined #ocaml
pango_ has quit [Client Quit]
pango_ has joined #ocaml
pango has quit [Read error: 60 (Operation timed out)]
Msandin has joined #ocaml
zzorn_away is now known as zzorn
cognominal has quit [Read error: 54 (Connection reset by peer)]
cognominal has joined #ocaml
bzzbzz has quit ["leaving"]
<mflux>
vincenz, simplest optimization: buy a faster computer!
<vincenz>
DAMN
<vincenz>
one logfile 23 mb
<vincenz>
GB
<vincenz>
I count packets by the million
<vincenz>
1073000000
<vincenz>
-1073000000
<vincenz>
-1072000000
<vincenz>
it overflowed my counter!
<mellum>
Buy a real computer. 32-bit machines are so 90s.
Msandin has quit [Read error: 104 (Connection reset by peer)]
<vincenz>
lol
<mflux>
vincenz, so how's the speed?
bzzbzz has joined #ocaml
_JusSx_ has quit [Read error: 110 (Connection timed out)]
monochrom has joined #ocaml
Zaius has joined #ocaml
Snark has joined #ocaml
TheDracle has quit ["Leaving"]
Snark has quit ["Leaving"]
Submarine has joined #ocaml
Skal has quit ["Client exiting"]
CosmicRay has quit ["Client exiting"]
zzorn has quit ["They are coming to take me away, ha ha"]