<m_mans>
yes, of course it fits very specific case - import for example
<Regenaxer>
I think this is not the problem
<Regenaxer>
The bottleneck is disk cache
<m_mans>
you could just test it in asm or in pil32 to see the difference. I'm not so fast in C programming
<Regenaxer>
if the accumulated areas being accessed is bigger than disk caches, it begins to trash
<Regenaxer>
I did many tests
<Regenaxer>
it is almost only disk caches
<Regenaxer>
if just one index file is bigger than RAM
<Regenaxer>
and accessed in *random* order, it can't be cached by the O ss
<Regenaxer>
OS
<Regenaxer>
and gets *very* slow
<Regenaxer>
So what 'create' does is pre-sorting all data
<Regenaxer>
Then it sweeps linearly, not random
<Regenaxer>
so disk caches work perfectly
<Regenaxer>
Thats the true bottleneck
<Regenaxer>
The simple example in the ref of 'create'
<Regenaxer>
10 Mio objects take 50 min here
<Regenaxer>
The naive, randow way will not finish in a week I think
<Regenaxer>
(not tried)
<Regenaxer>
Why do Schemers worry so much about avoiding garbage collection?
<Regenaxer>
I think the opposite
<Regenaxer>
Small memory is better, with fast GC
<m_mans>
did you try to just write big amount of data with one write-call? Is it slow too?
<Regenaxer>
I pil it just takes milliseconds, and is needed anyway to clean up also non-referred and non-dirty DB objects and trees
<Regenaxer>
I think it makes no difference
<Regenaxer>
one big or many small, if the bottleneck is the disk cache
<Regenaxer>
really, try it
<Regenaxer>
it is really dramatic
<razzy>
imho GC require operation. schemers have idea that their system is used nonstop and every operation is very valuable.
<Regenaxer>
Why?
<Regenaxer>
GC takes only a very little fraction of the time
<m_mans>
ok, T, I could try it by myself
<m_mans>
must go, bb all
<Regenaxer>
ok, see you!
m_mans has left #picolisp [#picolisp]
<Regenaxer>
afp
razzy has quit [Ping timeout: 250 seconds]
razzy has joined #picolisp
<tankf33der>
i can try create on fastest intel xeon cpu in february
<tankf33der>
intel xeon gold 6144
<razzy>
maybe you could functionally without GC. or wait for downtime
<Regenaxer>
tankf33der, great!
<Regenaxer>
razzy, why worry? GC runs several times per second in pil, that's the best I think
orivej has quit [Ping timeout: 250 seconds]
<razzy>
Regenaxer: if you run as process in other OS. you are propably better with GC running often and having small footprint.
<Regenaxer>
Other OS?
<Regenaxer>
Ah, you mean non-PilOS?
<Regenaxer>
Well. small memory footprint is always better
<Regenaxer>
CPU-cache etc.
<razzy>
if you run as OS, if you have whole memory for yourself. it is better to use most you have :]
<razzy>
also you need to have all algorithms adjusted to that :]
<razzy>
general advice would be: smaller footprint -> better .
<Regenaxer>
T
<razzy>
Regenaxer: "all algorithms adjusted" is really big optimization problem :]
<Regenaxer>
I have no clue what you mean
<razzy>
long story, little payoff
orivej has joined #picolisp
orivej has quit [Ping timeout: 268 seconds]
abel-normand has quit [Ping timeout: 250 seconds]
m_mans has joined #picolisp
<Regenaxer>
m_mans, concerning our discussion before: I think all the system calls take up together only a few seconds in a day-long import. All the time is spent inside the Linux-kernel juggling with the disk buffers
<Regenaxer>
So it will not help at all to optimize simple writes
<Regenaxer>
It is the fact that many places in a huge file are accessed in a random order
<razzy>
pilOS is the answer?
<Regenaxer>
There is no question
m_mans has quit [Quit: Leaving.]
orivej has joined #picolisp
freemint has joined #picolisp
aw- has quit [Quit: Leaving.]
<freemint>
Regenaxer: How "large/redundant" is a PicoLisp DB when compared to the original file/another database file with the same data?
<Regenaxer>
This depends on the number of indexes and joints you put into the model