#ocaml on 2011-02-11 — irc logs at freenode.irclog.whitequark.org

2010-08-02 14:28 gildor changed the topic of #ocaml to: Discussions about the OCaml programming language | http://caml.inria.fr/ | OCaml 3.12.0 http://bit.ly/aNZBUp

00:15 mnabil has quit [Ping timeout: 255 seconds]

00:25 alexyk has joined #ocaml

00:26 onigiri_ has joined #ocaml

00:26 onigiri_ has left #ocaml []

00:27 mnabil has joined #ocaml

00:29 alexyk has quit [Read error: Connection reset by peer]

00:35 agarwal1975 has quit [Quit: agarwal1975]

00:37 mfp has quit [Ping timeout: 264 seconds]

00:37 mfp has joined #ocaml

00:40 alexyk has joined #ocaml

00:42 onigiri_ has joined #ocaml

00:43 onigiri_ has left #ocaml []

00:43 boscop has quit [Ping timeout: 240 seconds]

00:48 sepp2k has quit [Quit: Leaving.]

01:09 alexyk has quit [Read error: Connection reset by peer]

01:09 arubin has joined #ocaml

01:19 alexyk has joined #ocaml

01:27 tony_ has left #ocaml []

01:29 mnabil has quit [Ping timeout: 276 seconds]

01:30 alexyk has quit [Quit: alexyk]

01:40 mnabil has joined #ocaml

01:50 iris1 has quit [Quit: iris1]

02:01 lopex has quit []

02:14 eaburns has left #ocaml []

02:16 CoryDambach has joined #ocaml

02:23 cthuluh has quit [Ping timeout: 260 seconds]

02:24 cthuluh has joined #ocaml

02:34 philtor has joined #ocaml

02:40 mnabil has quit [Ping timeout: 276 seconds]

02:49 kaustuv has left #ocaml []

02:50 mnabil has joined #ocaml

02:57 <philtor> I've got ocaml 3.11 that comes with Ubuntu 10.10. And I've also got OCaml 3.12 installed via godi. So I

02:57 <philtor> So i've set up the path to point to the 3.12 one from godi.

02:58 <philtor> which ocaml -> /home/phil/godi-3.12/bin/ocaml

02:58 <philtor> However... when I try to build type-conv and install it...

02:58 <philtor> It tries to install in /usr/local/ocaml/lib

02:58 <philtor> What is that?

02:58 <philtor> s/What/Why/

03:02 bzzbzz has quit [Read error: Connection reset by peer]

03:02 bzzbzz has joined #ocaml

03:08 Associat0r has joined #ocaml

03:08 shachaf has quit [*.net *.split]

03:08 BiDOrD has quit [*.net *.split]

03:08 jlenormand has quit [*.net *.split]

03:08 kig has quit [*.net *.split]

03:08 diml has quit [*.net *.split]

03:08 hcarty has quit [*.net *.split]

03:08 chicco has quit [*.net *.split]

03:11 <thelema> philtor: that's where findlib is configured to install?

03:12 <philtor> How does one change that configuration?

03:12 <philtor> What I found was that if I run this from the commandline it works: ocamlfind install type-conv META pa_type_conv.cmi pa_type_conv.cmo

03:13 <philtor> That's what the makefile is generating - but the ocamlfind being run in the makefile is the old ocamlfind.

03:14 <philtor> From the OCamlMakefile included:

03:14 <philtor> ifndef OCAMLFIND

03:14 <philtor> OCAMLFIND := ocamlfind

03:14 <philtor> endif

03:15 <philtor> that seems to be the 3.11 ocamlfind.

03:15 <philtor> but only inside of the makfile...

03:16 <philtor> From the commandline: which ocamlfind

03:16 <philtor> /home/phil/godi-3.12/bin/ocamlfind

03:16 <philtor> (which is the correct one)

03:17 Amorphous has quit [Ping timeout: 272 seconds]

03:19 <thelema> philtor: ocamlfind uses some environment variable to decide where to install

03:19 <thelema> iirc

03:20 <thelema> and there's an ocamlfind.conf file somewhere

03:20 <thelema> OCAMLPATH is the environment variable

03:23 <philtor> Hmm... I don't have an OCAMLPATH env variable set.

03:24 <thelema> maybe it defaults to ocamlc -where

03:25 <philtor> That points to the 3.12 installation in my case (the correct one)

03:27 arubin has quit [Quit: arubin]

03:28 <philtor> Not finding an ocamlfind.conf anywhere.

03:32 Amorphous has joined #ocaml

03:39 shachaf has joined #ocaml

03:39 BiDOrD has joined #ocaml

03:39 jlenormand has joined #ocaml

03:39 kig has joined #ocaml

03:39 diml has joined #ocaml

03:39 hcarty has joined #ocaml

03:39 chicco has joined #ocaml

03:50 Associat0r has quit [Quit: Associat0r]

04:27 philtor has quit [Remote host closed the connection]

04:38 philtor has joined #ocaml

04:44 <philtor> Ah, there is an ocamlfind.conf in /etc

04:54 <philtor> What's the difference between ocamlfind.conf and findlib.conf? Are they the same thing?

05:22 ulfdoz has joined #ocaml

05:47 Yoric has joined #ocaml

05:54 avsm has quit [Ping timeout: 240 seconds]

06:02 avsm has joined #ocaml

06:13 eye-scuzzy has quit [Quit: leaving]

06:15 eye-scuzzy has joined #ocaml

06:28 philtor has quit [Ping timeout: 265 seconds]

06:36 ulfdoz has quit [Ping timeout: 245 seconds]

06:42 Snark has joined #ocaml

06:45 Yoric has quit [Quit: Yoric]

07:14 edwin has joined #ocaml

07:25 ttamttam has joined #ocaml

08:02 ftrvxmtrx has quit [Quit: Leaving]

08:09 avsm has quit [Quit: Leaving.]

08:09 avsm has joined #ocaml

08:20 Yoric has joined #ocaml

08:33 ikaros has joined #ocaml

08:59 avsm has quit [Quit: Leaving.]

09:03 ftrvxmtrx has joined #ocaml

09:05 ftrvxmtrx has quit [Remote host closed the connection]

09:06 ftrvxmtrx has joined #ocaml

09:07 munga has joined #ocaml

09:18 mnabil has quit [Ping timeout: 276 seconds]

09:22 mnabil has joined #ocaml

09:54 _andre has joined #ocaml

10:19 eaburns has joined #ocaml

10:21 munga has quit [Ping timeout: 240 seconds]

11:09 mnabil has quit [Ping timeout: 241 seconds]

11:15 lopex has joined #ocaml

11:22 mnabil has joined #ocaml

11:25 tony_ has joined #ocaml

11:53 ttamttam has quit [Remote host closed the connection]

12:04 boscop has joined #ocaml

12:19 tony_ has quit [Ping timeout: 246 seconds]

12:28 oriba has joined #ocaml

12:28 eye-scuzzy has quit [Quit: leaving]

12:29 eye-scuzzy has joined #ocaml

12:41 sepp2k has joined #ocaml

13:15 ftrvxmtrx has quit [Read error: Connection reset by peer]

13:15 ftrvxmtrx has joined #ocaml

13:20 mnabil has quit [Ping timeout: 250 seconds]

13:22 mnabil has joined #ocaml

14:40 mnabil has quit [Ping timeout: 276 seconds]

14:49 <thelema> Is anyone involved in ongoing discussions on COCAN?

14:49 myu2 has quit [Remote host closed the connection]

14:50 <thelema> oops, year. Feb 2010 != yesterday

14:50 <thelema> COCAN seems to have quietly passed

14:53 eye-scuzzy has quit [*.net *.split]

14:53 shachaf has quit [*.net *.split]

14:53 BiDOrD has quit [*.net *.split]

14:53 jlenormand has quit [*.net *.split]

14:53 kig has quit [*.net *.split]

14:53 diml has quit [*.net *.split]

14:53 hcarty has quit [*.net *.split]

14:53 chicco has quit [*.net *.split]

14:53 mlh has quit [*.net *.split]

14:53 lopex has quit [*.net *.split]

14:53 f[x] has quit [*.net *.split]

14:53 Julien_T has quit [*.net *.split]

14:53 kerneis has quit [*.net *.split]

14:53 xl0 has quit [*.net *.split]

14:53 lamawithonel has quit [*.net *.split]

14:53 ski has quit [*.net *.split]

14:53 sepp2k has quit [*.net *.split]

14:53 Yoric has quit [*.net *.split]

14:53 mfp has quit [*.net *.split]

14:53 srcerer has quit [*.net *.split]

14:53 trigen has quit [*.net *.split]

14:53 avsm2 has quit [*.net *.split]

14:53 pantsd has quit [*.net *.split]

14:53 ikaros has quit [*.net *.split]

14:53 eaburns has quit [*.net *.split]

14:53 hto has quit [*.net *.split]

14:53 orbitz has quit [*.net *.split]

14:53 mrvn has quit [*.net *.split]

14:53 alpounet has quit [*.net *.split]

14:53 vk0 has quit [*.net *.split]

14:53 pheredhel has quit [*.net *.split]

14:53 deavid has quit [*.net *.split]

14:53 bitbckt has quit [*.net *.split]

14:53 _2x2l has quit [*.net *.split]

14:53 metasyntax` has quit [*.net *.split]

14:53 thelema has quit [*.net *.split]

14:53 hyperboreean has quit [*.net *.split]

14:53 Snark has quit [*.net *.split]

14:53 Amorphous has quit [*.net *.split]

14:53 cthuluh has quit [*.net *.split]

14:53 npouillard has quit [*.net *.split]

14:53 mcclurmc has quit [*.net *.split]

14:53 patronus_ has quit [*.net *.split]

14:53 svenl_ has quit [*.net *.split]

14:53 mehdid has quit [*.net *.split]

14:53 rossberg has quit [*.net *.split]

14:53 noj has quit [*.net *.split]

14:53 mnabil has joined #ocaml

14:54 deavid has joined #ocaml

14:55 joewilliams is now known as joewilliams_away

14:55 eaburns has joined #ocaml

14:56 alpounet has joined #ocaml

14:56 orbitz has joined #ocaml

14:56 hto has joined #ocaml

14:58 orbitz has quit [Client Quit]

14:58 orbitz has joined #ocaml

14:59 joewilliams_away is now known as joewilliams

15:00 thelema_ has joined #ocaml

15:00 lopex has joined #ocaml

15:00 vk0_ has joined #ocaml

15:00 ikaros has joined #ocaml

15:00 mrvn has joined #ocaml

15:00 pheredhel has joined #ocaml

15:00 _2x2l has joined #ocaml

15:00 bitbckt has joined #ocaml

15:00 sepp2k has joined #ocaml

15:00 eye-scuzzy has joined #ocaml

15:00 Yoric has joined #ocaml

15:00 Snark has joined #ocaml

15:00 chicco has joined #ocaml

15:00 hcarty has joined #ocaml

15:00 diml has joined #ocaml

15:00 kig has joined #ocaml

15:00 jlenormand has joined #ocaml

15:00 BiDOrD has joined #ocaml

15:00 shachaf has joined #ocaml

15:00 Amorphous has joined #ocaml

15:00 cthuluh has joined #ocaml

15:00 mfp has joined #ocaml

15:00 srcerer has joined #ocaml

15:00 metasyntax` has joined #ocaml

15:00 thelema has joined #ocaml

15:00 lamawithonel has joined #ocaml

15:00 trigen has joined #ocaml

15:00 mlh has joined #ocaml

15:00 npouillard has joined #ocaml

15:00 hyperboreean has joined #ocaml

15:00 ski has joined #ocaml

15:00 f[x] has joined #ocaml

15:00 avsm2 has joined #ocaml

15:00 mcclurmc has joined #ocaml

15:00 pantsd has joined #ocaml

15:00 Julien_T has joined #ocaml

15:00 kerneis has joined #ocaml

15:00 xl0 has joined #ocaml

15:00 patronus_ has joined #ocaml

15:00 svenl_ has joined #ocaml

15:00 mehdid has joined #ocaml

15:00 rossberg has joined #ocaml

15:00 noj has joined #ocaml

15:01 eaburns is now known as Guest96760

15:01 BiDOrD has quit [Read error: Operation timed out]

15:01 BiDOrD has joined #ocaml

15:02 srcerer has quit [Ping timeout: 262 seconds]

15:03 BiDOrD_ has joined #ocaml

15:03 BiDOrD has quit [Read error: Operation timed out]

15:03 Snark has quit [*.net *.split]

15:03 Amorphous has quit [*.net *.split]

15:03 cthuluh has quit [*.net *.split]

15:03 npouillard has quit [*.net *.split]

15:03 mcclurmc has quit [*.net *.split]

15:03 patronus_ has quit [*.net *.split]

15:03 svenl_ has quit [*.net *.split]

15:03 mehdid has quit [*.net *.split]

15:03 rossberg has quit [*.net *.split]

15:03 noj has quit [*.net *.split]

15:03 hyperbor1ean has joined #ocaml

15:05 Snark has joined #ocaml

15:05 Amorphous has joined #ocaml

15:05 cthuluh has joined #ocaml

15:05 npouillard has joined #ocaml

15:05 mcclurmc has joined #ocaml

15:05 patronus_ has joined #ocaml

15:05 svenl_ has joined #ocaml

15:05 mehdid has joined #ocaml

15:05 rossberg has joined #ocaml

15:05 noj has joined #ocaml

15:06 Guest96760 has quit [Quit: leaving]

15:08 hyperboreean has quit [Write error: Connection reset by peer]

15:08 thelema has quit [Write error: Broken pipe]

15:08 ccasin has joined #ocaml

15:11 <gildor> thelema_: what the pb with COCAN ?

15:17 <thelema_> gildor: Firefox can't establish a connection to the server at www.cocan.org.

15:18 <gildor> thelema_: it is dead since at least 2 months

15:18 <gildor> I have an old copy on mirror.ocamlcore.org

15:18 <gildor> and Ashish Agarwal is working on transfering part of its content

15:20 <thelema_> ok, thanks

15:20 thelema_ is now known as thelema

15:24 <thelema> wow, another extlib... I'd never heard of the CDK before

15:28 myu2 has joined #ocaml

15:33 <gildor> thelema: where did you hear about the CDK?

15:33 eye-scuzzy has quit [Quit: leaving]

15:33 <thelema> gildor: There was a link here: http://mirror.ocamlcore.org/wiki.cocan.org/humpopaedia.html

15:34 <gildor> the CDK has 3 release and lasted ~1 year

15:34 <gildor> +s

15:34 eye-scuzzy has joined #ocaml

15:34 <gildor> and 80% consist of copy of other project (like lablgtk or part of ocamlnet)

15:34 <gildor> +s

15:35 <thelema> yup, so it seems.

15:35 <gildor> there was probably 20% which was original code

15:35 <f[x]> it is more like batteries than extlib

15:35 <f[x]> pack of useful libraries

15:35 <gildor> and its main author (F. Le Fessant) has forked it many time in various application

15:35 <thelema> f[x]: there's an extlib within CDK

15:36 <thelema> http://camlcvs.inria.fr/cgi-bin/cvsweb/cdk/extlib/

15:36 <f[x]> gildor, yeah, e.g. mldonkey

15:36 <gildor> mnplight also

15:36 <gildor> I think there is one fork per project

15:36 <gildor> with different extension

15:36 <f[x]> as usual with bundled libs

15:37 <thelema> Maybe I can absorb any good parts of that 20% into batteries...

15:37 <thelema> can't figure out this one: http://camlcvs.inria.fr/cgi-bin/cvsweb/cdk/extlib/todo.mli?rev=1.1

15:37 <gildor> beware the license

15:39 <thelema> gildor: not LGPL+exception? :(

15:39 <gildor> range from Public Domain to GPL

15:40 <gildor> (and for god sake, don't include bundled libraries that continue to live elsewhere)

15:40 <thelema> looks like the bits I'm interested in are GPL. maybe I can get a different License

15:41 <thelema> gildor: would you consider this code as living?

15:41 <gildor> e.g. camlimages, camlzip, configwin, cryptgps, facile, labgl, lablgtk

15:42 <thelema> of course, I won't repeat that mistake.

15:42 <gildor> thelema: if it is just the code, in extlib, just go on

15:44 <thelema> wow, specialized code for float arrays...

15:47 <thelema> It looks like I need to make a benchmark for reading files - I can't figure out why I keep seeing [filename -> string] code that uses a resizing buffer to store chunks of data

15:47 <thelema> plus, what's the optimal buffer size for ext3? Wouldn't it be 4K?

15:51 mnabil has quit [Read error: Operation timed out]

15:53 <gildor> I don't think buffer size is a big issue

15:53 <gildor> thelema: ^^^

15:53 <thelema> There's only one way to tell

15:53 <gildor> thelema: I have tried with various buffer size (direct file access) and didn't get a really improved throughput

15:53 <thelema> using Science!

15:54 <thelema> true, it's probably not a huge deal, as the CPU load is much smaller than the drive access time

15:54 <gildor> and 4k buffer can give you almost 130MB/s on a RAID array

15:54 <thelema> sure. but I keep seeing 1K buffers

15:55 <thelema> and bitstring uses 16K buffers

15:55 <gildor> give us you benchmark results when you are done

15:55 <thelema> of course

15:56 <thelema> once I finish my more urgent task for the day

16:01 <thelema> btw, wikipedia says that our new hotness PRNG has very poor behavior: http://en.wikipedia.org/wiki/Lagged_Fibonacci_generator

16:01 lopex has quit []

16:02 <gildor> thelema: the default PRNG is only there for convenience, cryptokit is here to provide better PRNG

16:02 <gildor> (system prng in fact)

16:02 <thelema> true

16:10 philtor has joined #ocaml

16:12 sepp2k1 has joined #ocaml

16:13 sepp2k has quit [Ping timeout: 255 seconds]

16:15 mnabil has joined #ocaml

16:21 ankit9 has joined #ocaml

16:24 trch has joined #ocaml

16:29 <thelema> gildor: except why upgrade it if it's not supposed to be a good PRNG?

16:31 <thelema> why break compatibility with older programs

16:31 <gildor> thelema: not sure to understand, what do you want to upgrade

16:31 <thelema> all users of random are forced to upgrade to the new PRNG

16:32 <gildor> ah, you mean that 3.12 has a new prng!

16:32 <thelema> yes

16:32 <gildor> (I already spotted that in fact)

16:32 <thelema> :)

16:33 <gildor> I have the code of the old PRNG sitting in a project (ocaml-fastrandom)

16:33 <gildor> maybe it is time to release it

16:33 <gildor> "fast"random because it is ~3 time faster than the old PRNG

16:46 jonafan has joined #ocaml

16:47 lopex has joined #ocaml

16:52 arubin has joined #ocaml

17:02 arubin has quit [Remote host closed the connection]

17:07 <thelema> http://pastebin.com/Th1qZDKt

17:08 <thelema> it seems that a 2k buffer is best, closely followed by a 4k buffer

17:09 <thelema> tested on a 183MB file

17:11 <thelema> a file that is almost certainly cached in linux's filesystem buffers by now

17:13 <mrvn> thelema: 4k should be optimal usualy because that is the granularity file operations work at in the kernel. Must be some cache issue.

17:14 <thelema> that was my reasoning

17:14 <thelema> I'm surprised that the cost of blitting the data from a string to a buffer isn't more

17:15 Yoric has quit [Quit: Yoric]

17:15 eye-scuzzy has quit [Quit: leaving]

17:16 <mrvn> can you add bigarray mmap?

17:16 <thelema> just loop to copy from the mmap to a string?

17:17 <mrvn> thelema: that would skew the result.

17:17 <thelema> http://eigenclass.org/repos/widefinder/head/bigstring.ml

17:17 <thelema> just map_file?

17:17 <mrvn> I guess you would need to add some access to the read data to all tests for mmap to be fair.

17:17 <mrvn> Without access mmap wouldn't read anything.

17:18 <thelema> yup, it'd be almost a noop

17:18 <mrvn> But you could read into a normal bigarray. The Unix.read copies each chunk it reads.

17:18 <thelema> pastebin the function and I'll add it in

17:20 <mrvn> http://pastebin.com/ATTP1sHA

17:20 <thelema> :P

17:21 <mrvn> Non threaded, lacking the enter/leave_blocking_mode() around the pread.

17:21 <thelema> no threads in my test

17:21 <mrvn> I assumed

17:22 <thelema> #includes?

17:24 <thelema> well, it turns out that a lot of time *is* saved by checking the size of the file and reading exactly that much. what a surprise. http://pastebin.com/T6NG3ncn

17:25 <thelema> mrvn: and what type is` buffer`?

17:26 joewilliams is now known as joewilliams_away

17:28 bitbckt_ has joined #ocaml

17:28 mnabil_ has joined #ocaml

17:28 eye-scuzzy has joined #ocaml

17:29 <mrvn> I should polish the bigarray io functions for inclusion in batteries

17:29 sepp2k1 has quit [Ping timeout: 255 seconds]

17:29 _2x2l has quit [Ping timeout: 255 seconds]

17:29 mnabil has quit [Ping timeout: 265 seconds]

17:29 bitbckt has quit [Ping timeout: 265 seconds]

17:29 _2x2l has joined #ocaml

17:29 <thelema> mrvn: sounds good to me

17:30 <mrvn> thelema: Bigarray. any kind should do. I use uint8 with c layout.

17:30 joewilliams_away is now known as joewilliams

17:30 <thelema> ok

17:31 <mrvn> http://pastebin.com/1U1vZyEJ are the includes but you don't need all of them.

17:32 <mrvn> _XOPEN_SOURCE and unistd.h for pread, caml/bigarray.h for bigarray stufff. mlvalues.h I think you need too

17:32 <thelema> it shouldn't hurt to over-include

17:33 <mrvn> yeah, just comment out lines till it gives an error.

17:33 trch has left #ocaml []

17:34 <mrvn> thelema: one thing that might also be interesting would be to create threads and do multiple reads in parallel.

17:34 <thelema> I think someone already did those benchmarks in C

17:34 <mrvn> for sure

17:34 <thelema> http://www.tbray.org/ongoing/When/200x/2007/09/20/Wide-Finder

17:39 sepp2k has joined #ocaml

17:45 <mrvn> thelema: somehow I find optimizing jobs that run under 10s pointless.

17:49 mnabil_ has quit [Remote host closed the connection]

17:54 ftrvxmtrx has quit [Quit: Leaving]

17:54 myu2 has quit [Remote host closed the connection]

17:55 ymasory has joined #ocaml

17:56 myu2 has joined #ocaml

17:58 <thelema> http://pastebin.com/AvP01xA0 <- mmap is too fast to measure, Unix fds are slower than Pervasives in_channels

17:58 <thelema> and buffer size does matter

17:58 arubin has joined #ocaml

17:59 _2x2l has quit [Ping timeout: 255 seconds]

17:59 joewilliams is now known as joewilliams_away

18:00 _2x2l has joined #ocaml

18:00 joewilliams_away is now known as joewilliams

18:00 ulfdoz has joined #ocaml

18:03 bitbckt_ is now known as bitbckt

18:04 bitbckt has quit [Changing host]

18:04 bitbckt has joined #ocaml

18:16 tony_ has joined #ocaml

18:22 ymasory has quit [Remote host closed the connection]

18:25 <mrvn> How can Unix.fd be slower? That is the most basic thing you can use.

18:26 <thelema> here's the whole testfile: http://pastebin.com/VtD2qEsy

18:26 jlenormand has quit [Quit: Leaving]

18:27 <thelema> compile with ocamlfind ocamlopt -linkpkg -thread -package batteries,threads,benchmark,bitstring t_read.ml -o t_read

18:42 _andre has quit [Quit: Zzz]

18:43 <mrvn> thelema: I'm not disputing your results. Just wondering how that can be.

18:49 eye-scuzzy has quit [Quit: leaving]

18:50 eye-scuzzy has joined #ocaml

18:50 Snark has quit [Quit: Ex-Chat]

18:52 <thelema> any jane-street core users here? Is there a helper function in that library to read a file into a string?

18:53 <gildor> thelema: this is in the module bigstring, i think (and it uses bigarray and mmap)

18:53 <bitbckt> Common.read_lines

18:53 <bitbckt> string -> string list, anyhow.

18:54 <thelema> bitbckt: almost. I'm tempted to add filename -> BatRope.t to see how much the buffer costs

18:54 <bitbckt> thelema: That's as close as it gets in Core, I think.

18:55 <gildor> thelema: what is vbu ?

18:55 <thelema> gildor: thanks

18:55 <gildor> thelema: Bigstring.map_file seems nice

18:55 <thelema> variable-sized buffer unix

18:56 <gildor> thelema: everything use channel ?

18:56 <thelema> gildor: doesn't actually read the file into memory - I guess I could String.iter each string to force it into memory

18:57 <thelema> gildor: vbu uses Unix.fd, vbp uses pervasives.in_channel

18:57 <thelema> gildor: batio uses batteries' channel + read_all

18:58 <gildor> 183MB ?

18:58 <thelema> yes, the size of the test file

18:58 <gildor> humm

18:58 <gildor> the test is short, that what worry me

18:58 <thelema> because it only runs for a few seconds?

18:58 <gildor> can you send me your code

18:59 <thelema> I have larger files I can run it on

18:59 <thelema> 13:48 < thelema> here's the whole testfile: http://pastebin.com/VtD2qEsy

18:59 <gildor> you should use a file bigger than the memory

18:59 <gildor> i.e. ~2 to 4 GB

18:59 <thelema> eep! wouldn't that measure how fast my system can swap?

19:00 <brendan> it'd measure gc and disk throughput. Better to use something that fits in cache

19:00 <gildor> you keep the data in memory after the read ?

19:00 <thelema> I'm reading the whole file into memory, having linux put that memory back into the swap seems silly

19:00 <thelema> yes, I'm testing [filename -> string] functions

19:01 ikaros has quit [Read error: Connection timed out]

19:01 <thelema> mrvn: your pread completes almost as fast as mmap

19:01 <gildor> ah yes, so indeed you will stress Gc and swap

19:01 <gildor> my use case was more io -> process -> io

19:01 <gildor> with a few data in memory

19:01 ikaros has joined #ocaml

19:02 <thelema> I could easily move to that model, but I worry about doing benchmarks on that and sensitivity to disk activity

19:02 <brendan> if you want to test the ocaml-level overhead, it's probably better if the file is in the buffer cache. reading from disk will dominate everything else (though there may be differences in prefetching for read vs mmap)

19:02 <thelema> i.e. me being able to browse the web while the test runs

19:02 <thelema> (this is for my real test, for which I can't share as much code)

19:03 <gildor> my test procedure was: run 3 times, take the best time

19:03 <thelema> well, reading the same file a gazillion times I think guarantees the file is in the buffer cache

19:03 <gildor> and I was not able to browse the web while doing this

19:03 <gildor> thelema: no because I took file bigger than memory

19:03 <thelema> yup

19:03 <brendan> twice will do, but only if you don't have something else evict it in between :)

19:04 * gildor gtg

19:04 <thelema> well, I want to compare my network protocol parser with another

19:04 ymasory has joined #ocaml

19:04 <thelema> but I have huge pcap files, and thought that pre-reading them as a string would eliminate the disk access penalty

19:05 <thelema> I'd be happy to mmap, but bitstring requires a string.

19:05 ftrvxmtrx has joined #ocaml

19:05 <thelema> anyway, gotta go, back in 4 hours

19:05 <brendan> might be better to let the kernel prefetcher take care of the read pipeline

19:06 <brendan> you get lower latency to start and shouldn't be much slower overall

19:06 <thelema> I approve of the lower latency, except for the disk access problem

19:06 <brendan> well that's what the prefetcher is supposed to hide :)

19:07 <flux> it would be extra nice if bitstring supported streaming in

19:07 <flux> but, it would break the interface

19:07 <flux> thus such a system would perhap better be called bitstring2

19:08 rup has joined #ocaml

19:09 Yoric has joined #ocaml

19:09 <mrvn> thelema: So all the delay is caused by ocaml copying buffers around and the GC cleaning up then.

19:10 <mrvn> thelema: mmap should be zero copy as the data is never accessed while pread copies it once from kernel to user space.

19:11 <mrvn> Unix.read copies twice and in_channel probably too.

19:11 ftrvxmtrx has quit [Ping timeout: 276 seconds]

19:11 <flux> hmm..

19:12 <flux> I wonder if mmapped ocaml strings would be feasible..

19:12 <flux> basically it would work like:

19:12 <flux> 1) mmap a region

19:12 <flux> 2) mmap zero before it

19:12 <flux> 3) stick the mmapped region size into it

19:12 <flux> 4) return as ocam lstring!

19:12 <flux> but it might not work great with ocaml gc?

19:13 ygrek has joined #ocaml

19:14 ftrvxmtrx has joined #ocaml

19:14 <brendan> is that Bigstring.map_file function doing something like that?

19:17 Associat0r has joined #ocaml

19:18 <flux> I don't recall seeing that

19:19 <flux> perhaps it has arrived at a later version than what I've looked at/used, or it simply reads the file before processing it

19:21 <brendan> it looks like it to me, but I have never learned Obj.magic etc

19:21 <brendan> http://eigenclass.org/repos/widefinder/head/bigstring.ml

19:23 <flux> heh, it appears to do exactly that except in a more researched, ie. more working fashion :)

19:23 <flux> ah, biGstring

19:24 <flux> I read biTstring

19:24 <brendan> yeah, bigstring looks pretty nice

19:25 <mrvn> flux: that would have quite an overhead. A full page.

19:26 <mrvn> flux: and you would have to make it a custom block with finalizer and then pretend it actually is a string. Not sure the two structures are compatible.

19:26 <flux> mrvn, a full page for the integer? well, I would assume it would still be for one, say, pcap file, which would be potentially megabytes

19:26 <mrvn> flux: a page for the GC heade before the mmaped data.

19:27 <mrvn> For strings the GC header contains the size. Are those bits free in a custom block with finalizer?

19:28 <mrvn> Never mind. they must be free as they are the size of the block.

19:29 <mrvn> But strings also have a length modifier at the end saying how much of the last word is string iirc. How would you get that after the mmaped data? Add another page and only allow 4k aligned string size?

19:30 <mrvn> Another snag would be that the toplevel wouldn't print the string. But I guess for strings >4k that is a good thing.

19:33 <flux> forgot about that

19:33 <flux> but you can map with private

19:33 <flux> and modify the block

19:33 <flux> which gets copied-of-write

19:34 <flux> s/of/on/

19:36 <mrvn> then the file needs to be 4 bytes longer than what you map or the behaviour is undefined. But you can mmap /dev/zero for the last 4 bytes and copy the last chunk into it plus the 4 bytes.

19:38 <mrvn> I think for those cases a char bigarray is probably better suited. You can create sub arrays from it when you parse the data into more usefull chunks and such.

19:38 <flux> well, even bigarrays are unsuitable for bitstring as of now ;)

19:38 <flux> maybe.. a functorized version of bitstring!

19:39 <brendan> but does bigstring work with bitstring?

19:39 <flux> I seriously doubt that

19:39 <flux> but, I need to suspend my laptop, not much power left :)

19:39 <mrvn> what does string have that bigarray doesn't?

19:43 <brendan> bigstring? it looks like it provides a mmaped file as a string, but I am not good enough at ocaml to tell whether it would work with bitstring

19:43 <brendan> it's very short: http://eigenclass.org/repos/widefinder/head/bigstring.ml

19:47 <mrvn> no, I mean string itself. Why does string work for bitstring but not bigarray?

19:55 tony_ has quit [Quit: Ex-Chat]

20:01 arubin has quit [Ping timeout: 240 seconds]

20:11 smerz has joined #ocaml

20:26 arubin has joined #ocaml

20:27 ftrvxmtrx has quit [Ping timeout: 255 seconds]

20:31 ftrvxmtrx has joined #ocaml

20:56 tony_ has joined #ocaml

21:03 eye-scuzzy has quit [Quit: leaving]

21:04 eye-scuzzy has joined #ocaml

21:08 agarwal1975 has joined #ocaml

21:13 thieusoai has joined #ocaml

21:14 agarwal1975 has quit [Ping timeout: 276 seconds]

21:14 agarwal1975 has joined #ocaml

21:25 edwin has quit [Remote host closed the connection]

21:35 ccasin has quit [Quit: Leaving]

21:38 <thelema> mrvn: well, pread is super fast, if that's the case...

21:39 thieusoai has quit [Remote host closed the connection]

21:45 <thelema> mrvn: and the str_only code allocates the right size string and really_inputs into it - there can't be that much overhead in that copying that it would take 100* more time

21:47 <mrvn> thelema: If it is thread safe then it has to allocate a temporary buffer in the C stub, enter_blocking_section(), read into that, leave_blocking_section(), memcpy into the string.

21:47 <thelema> I'm reading the stub now..

21:47 <thelema> http://pastebin.com/RFi4chUN

21:47 <mrvn> The problem is that strings can be moved by the GC unlike the data section of a bigarray.

21:48 <thelema> it looks like it's dealing with buffered IO

21:48 <thelema> true, but I doubt that's causing the higher read time

21:49 <thelema> so I either need to patch bitstring to use bigarrays (or bigstrings) or write my own streaming Pcap parser... :(

21:49 <mrvn> What is the buffer size of a channel?

21:50 <thelema> 4096

21:52 <mrvn> Maybe your kernel is using hardware memcpy

21:53 <thelema> for pread?

21:54 <mrvn> for copy_to_user()

21:55 <thelema> that's an impressive difference, if so

21:55 <mrvn> You said the file is cached so all pread does is copy the data from cache to userspace.

21:56 <mrvn> does the wall clock go down or just the cpu time used?

21:56 <thelema> that's still 180MB of memcopy

21:56 <thelema> pread: 0.01 WALL ( 0.00 usr + 0.00 sys = 0.00 CPU)

21:56 <thelema> for 1.8G of pread

21:56 <mrvn> oehm, that is wrong then. too fast.

21:57 <mrvn> You allocated the bigarray of proper size?

21:57 <thelema> could be a COW

21:57 <thelema> let buf = Array1.create char c_layout len in

21:57 <thelema> let len = (Unix.stat fn).Unix.st_size in

21:58 <mrvn> It could be the bigarray is page aligned and the kernel just mmaps. But I didn't know it checks for that.

21:58 <mrvn> flush the caches and see if it goes down to disk speed.

21:58 <thelema> could it Copy On Write the pages in the cache?

21:59 <mrvn> or try it on a smaller file and check the contents

21:59 <mrvn> thelema: theoretically sure.

22:02 agarwal1975 has quit [Read error: Connection reset by peer]

22:03 agarwal1975 has joined #ocaml

22:07 <thelema> oops, pread was broken

22:07 ymasory has quit [Remote host closed the connection]

22:08 <thelema> running the test again...

22:08 <thelema> I forgot to pass the length to pread as well as for creating the buffer

22:08 <thelema> oops, that's not the length, that's the offset

22:09 <thelema> need more labels

22:15 <mrvn> My stub took the length from the bigarray

22:16 <thelema> yup, I was thinking it seemed wierd to pass the length twice

22:16 <mrvn> so you read from the end and that allways returned 0 bytes then.

22:17 <thelema> no, I didn't even pass the last parameter, and my code ignore'd the partially applied function

22:17 <thelema> hmm, I wonder if there's a way to get ocaml to warn when ignoreing a functions

22:17 <thelema> *function

22:17 <thelema> as opposed to ignoring a primitive value

22:18 <thelema> anyway, now that I'm iterating through the returned string (or bigarray), ignoring each character, it's taking a bit longer

22:19 Yoric has quit [Quit: Yoric]

22:27 <thelema> grr, forgot the -1 on bitarray iteration, have to restart test

22:27 * thelema puts mmap first

22:28 <thelema> hmm, mmap doesn't seem quite so fast... maybe it's the iteration that's not as fast

22:28 <thelema> oh yeah, didn't specialize the ba type

22:29 <mrvn> page faults are expensive

22:29 <thelema> well, it beats the time for my str_only function, but not by too much - 18.8s vs. 23.3s

22:30 <thelema> to read and examine 5.4GB

22:30 <thelema> and pread is the same time as really_input

22:31 <thelema> s/time/speed/

22:33 <thelema> hmm, I wonder if the .mli file can specialize codegen in its corresponding .ml

22:33 <thelema> I bet it doesn't, meaning we can squeeze out a bit more performance from mfp's widefinder

22:36 <thelema> http://pastebin.com/9ak2kUfy <- current rankings

22:36 * thelema is going to check this code into batteries VCS and be done for a while

22:40 <thelema> and done

22:56 BiDOrD has joined #ocaml

22:57 BiDOrD_ has quit [Read error: Connection reset by peer]

23:09 ygrek has quit [Ping timeout: 240 seconds]

23:11 <_habnabit> If I'm doing `match i mod 3 with ...`, ocaml complains that my match isn't exhaustive because it doesn't match the value 3.

23:11 <_habnabit> Is there a way to not emit the warning?

23:13 oriba has quit [Quit: Verlassend]

23:14 <brendan> _ -> failwith "impossible" ?

23:15 <_habnabit> I suppose!

23:16 <_habnabit> I was hoping for something else.

23:16 <adrien> for mod 3, if then else will probably be just as good

23:17 <_habnabit> I changed it from an if-then because it looked rather ugly. The match is easier to read.

23:18 ikaros has quit [Quit: Leave the magic to Houdini]

23:18 <mrvn> 0 -> ... | 1 -> ... | _ ->

23:18 <_habnabit> Yeah I guess. :(

23:23 seafood has joined #ocaml

23:24 agarwal1975 has quit [Quit: agarwal1975]

23:37 arubin has quit []

23:48 seafood has quit [Ping timeout: 265 seconds]