aw- has quit [Quit: Leaving.]
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #picolisp
orivej has quit [Remote host closed the connection]
orivej has joined #picolisp
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #picolisp
orivej has quit [Read error: Connection reset by peer]
orivej has joined #picolisp
orivej has quit [Quit: No Ping reply in 180 seconds.]
orivej has joined #picolisp
orivej has quit [Ping timeout: 260 seconds]
<
Regenaxer>
Until now always crashed in the context of binary read (plio), probably when reading external symbols
rob_w has joined #picolisp
<
tankf33der>
my plio tests passed gc+
<
tankf33der>
i run gc+ tests all nigth:
<
tankf33der>
pil21 @lib/test.l
<
tankf33der>
minimal bundle of pil21-tests passed.
<
Regenaxer>
Good morning tankf33der
<
tankf33der>
morning all
<
Regenaxer>
Do your plio tests also handle external symbols?
<
Regenaxer>
I think it has to do with symbols like {A1}
<
Regenaxer>
stress.l sends very long lists of external symbols to other processes
<
Regenaxer>
the crashes occurred there
<
Regenaxer>
and the last crash was when reading an index node from the DB, which also is a nested list of only external symbols
<
Regenaxer>
I'm just inspecting the code handling this, but cannot see anything suspicious
<
tankf33der>
i dont think so
<
Regenaxer>
cause not so easy to handle in unit tests
<
tankf33der>
feel free send me more tests and i will run
<
Regenaxer>
yeah, that's the problem
<
Regenaxer>
How about very simple ones first? Perhaps it shows something
<
tankf33der>
from plio.l file ?
<
Regenaxer>
(out "file" (pr (make (do 1000 (link *DB]
<
Regenaxer>
(in "file" ...
<
Regenaxer>
or a pipe
<
Regenaxer>
Who knows? Perhaps it helps :)
<
Regenaxer>
But I think the problem is deeper
<
Regenaxer>
Internal trees
<
tankf33der>
all db also passed, btw
<
Regenaxer>
so it needs more different symbols
<
tankf33der>
if you wish send full code and i will run.
<
Regenaxer>
Thats why I think it needs more complicated cases
<
Regenaxer>
where certain borderline conditions occur
<
Regenaxer>
But I don't know which condition (yet)
<
Regenaxer>
I'm just running a test with certain stuff commented out
<
Regenaxer>
hmm, no, did not help
<
Regenaxer>
Not crash, but db check error
<
Regenaxer>
Something internally is messed up
<
Regenaxer>
and the crash or runtime error occurs much later
<
Regenaxer>
Can you use random external objects in the test?
<
Regenaxer>
I think this would work:
<
Regenaxer>
(id (rand 1 16) (rand 1 99))
<
tankf33der>
just one line, right?
<
Regenaxer>
yes, in a loop
<
Regenaxer>
I think it works without a DB
<
Regenaxer>
just random symbols
<
Regenaxer>
Perhaps (id (rand 1 16) (rand 1 63))
<
tankf33der>
started
<
Regenaxer>
gives symbols up to {O77}
<
Regenaxer>
one "hax" letter and two octal digits
<
tankf33der>
no crash so far
<
tankf33der>
i have idea what to run under gc+
<
tankf33der>
let me try
<
Regenaxer>
I just know (from core) that all crashes were in 'binRead' (in src/io.l)
<
Regenaxer>
the last one was binRead -> extern -> consSym
<
Regenaxer>
and stress.l reads/writes lots of external symbols
<
Regenaxer>
Studying the code again ...
<
tankf33der>
i cant write full code to read/write lots of external symbols - you can.
<
Regenaxer>
The above random symbols is all I can think of
<
Regenaxer>
Perhaps lists of lists of random symbols
<
Regenaxer>
*reading* them
<
Regenaxer>
I think printing is not the problem
<
Regenaxer>
always the receiving side
mtsd has joined #picolisp
<
Regenaxer>
Trying to produce another core
<
Regenaxer>
No other idea :(
<
Regenaxer>
tankf33der, have you ever tried stress.l under gc+ directly?
<
Regenaxer>
Perhaps only with a single process
<
Regenaxer>
This may be the most promising way
<
Regenaxer>
*if* it is indeed a gc issue
mtsd_ has joined #picolisp
<
tankf33der>
it run 30mins yesterday
<
Regenaxer>
So it is something different
<
Regenaxer>
Something new
<
Regenaxer>
Also interesting that it crashes only here on my system
aw- has joined #picolisp
<
Regenaxer>
Perhaps you are right and it has to do with the llvm version?
mtsd has quit [Ping timeout: 258 seconds]
<
Regenaxer>
(though I cannot imagine)
<
Regenaxer>
Here it crashes quite reliably
<
Regenaxer>
after running stress/9 two or three times
mtsd has joined #picolisp
<
tankf33der>
i could find system with llvm9 but it takes time
<
tankf33der>
i should create a list with versions
<
tankf33der>
also try to run under llvm-as without opt -O3
<
tankf33der>
it will be faster result.
mtsd_ has quit [Ping timeout: 256 seconds]
<
Regenaxer>
It never crashed for you, right?
<
Regenaxer>
I will run on Android now (llvm 10)
<
tankf33der>
never. but crashed under opt -O3 a long ago.
<
tankf33der>
and maybe it was llvm9
<
tankf33der>
so you need to check.
<
Regenaxer>
I have 9 on all Debian systems
<
Regenaxer>
and 10 on Termux
<
Regenaxer>
Runs now
<
tankf33der>
you run without opt -O3, right?
<
Regenaxer>
I use the standard ASM = opt -O3
<
Regenaxer>
unmodified Makefile
<
tankf33der>
compile and run under:
<
tankf33der>
ASM = llvm-as
<
Regenaxer>
Just wait until this pass finished
<
Regenaxer>
You always test with llvm-as?
<
tankf33der>
no, I always use default Makefile
<
tankf33der>
and everywhere it was ok, ok i meant pass tests in "pil21-tests" bundle
<
tankf33der>
even solaris :)
<
tankf33der>
even solaris sparc :)
<
tankf33der>
llvm in solaris sparc is 10, btw.
<
tankf33der>
but seems i tried all versions in range 7-11.
<
tankf33der>
pil21 supports LLVM7+ version only.
<
Regenaxer>
What happened on below 7?
<
Regenaxer>
build error?
<
tankf33der>
build error in mem* llvm's primitives.
<
Regenaxer>
all right
<
Regenaxer>
No worries about old llvm
<
Regenaxer>
will disappear soon
orivej has joined #picolisp
<
Regenaxer>
Crashed on Termux (LLVM 10) on 3rd pass
Blue_flame has quit [Quit: killed]
<
Regenaxer>
Now testing LLVM 9 without opt -O3
Blue_flame has joined #picolisp
aw- has quit [Quit: Leaving.]
<
tankf33der>
you run misc/stress.l file without modifications, right?
<
Regenaxer>
12 passes with 99 child processes
<
Regenaxer>
Now 2 passes done without 'opt'
<
Regenaxer>
parallel on 2 machines
<
Regenaxer>
If really 'opt' were wrong, I would think the error would be immediately reproducible
<
Regenaxer>
On the 2nd machine 'opt' is still used
<
Regenaxer>
it is faster it seems
<
Regenaxer>
Started later but earlier on pass 3
<
Regenaxer>
hmm, strange
<
Regenaxer>
already on pass 5
<
Regenaxer>
can't be sooo fast
<
Regenaxer>
Getting more and more confused
<
Regenaxer>
That second machine is also slower
<
Regenaxer>
has 4 CPUs, as opposed to 8 on the first machine
<
Regenaxer>
Still it is already on pass 5
<
Regenaxer>
the first started 4 now
<
Regenaxer>
without 'opt'
<
Regenaxer>
in the 5th pass
<
Regenaxer>
So llvm version or opt don't matter
<
tankf33der>
ok, good test.
<
tankf33der>
debian10?
<
tankf33der>
i will start stress.l again somewhere
<
Regenaxer>
It is most probably IPC
<
Regenaxer>
many processes
<
Regenaxer>
cause gc+ did not show anything
<
Regenaxer>
The binary data interfer
mtsd has quit [Quit: Leaving]
<
tankf33der>
started stress on oracle 8, llvm9
<
tankf33der>
before void and fedora was always ok
<
tankf33der>
already two loops passed
<
tankf33der>
crash, hehe.
<
tankf33der>
now it somehow hang.
<
tankf33der>
eats cpu and dont run to next loop
<
Regenaxer>
Yes, saw that too
<
Regenaxer>
And sometimes you get a data error
<
Regenaxer>
!? (wipe Lst)
<
Regenaxer>
({1641} . "+A") -- Symbol expected
<
Regenaxer>
'wipe' is called in 'upd'
<
Regenaxer>
(commit 'upd) causes a list ({xxx} ...} to be sent to all other processes
<
Regenaxer>
What is received is nonsense
<
Regenaxer>
"+A" is obviously the class, which is never sent
<
Regenaxer>
so plio gets messed up completely
<
Regenaxer>
*But* only sometimes
<
Regenaxer>
If it is not plio, it
*could* also be gc messing up the data, but that seems a bit improbable
<
Regenaxer>
especially as gc+ did not find anything
<
Regenaxer>
I have to go through all related sources later
<
tankf33der>
running under gc+ on oracle8 linux
<
Regenaxer>
Hmm, remember yesterday when the first key in repl did not work?
<
Regenaxer>
It was a change from the day before
<
Regenaxer>
But I think this was right
<
Regenaxer>
So something else is messed up
<
Regenaxer>
The nonblocking read
<
tankf33der>
one whole loop still running under gc+.
<
Regenaxer>
I now understand what the issue is with the first char in repl
<
Regenaxer>
It is in fact a problem of using readline()
<
Regenaxer>
For that I used a wrong event handling
<
Regenaxer>
Need to find a solution for that
<
Regenaxer>
But I think this is not the reason for the crashes (it has an effect how IPC works though)
<
beneroth>
Regenaxer, readline() is one of the few external dependencies, right? does the cost hold up so far?
<
Regenaxer>
Yeah, I'm not so happy about the dependency
<
Regenaxer>
But it is kind of standard
<
Regenaxer>
and everybody can configure is she likes
<
Regenaxer>
The main problem is that it is difficult to get readline() work well with the pil event handling, i.e. *Run / task
<
Regenaxer>
There are some hooks, but not really as I'd like it to be
<
Regenaxer>
That's why doc/diff says: " Because of readline(3), '*Run' tasks are suspended while typing in the REPL"
<
Regenaxer>
This is a small restriction compared to old pils
<
Regenaxer>
The problem I have atm is that readline does its own key reading
<
Regenaxer>
So we have that lost first key
<
Regenaxer>
OK, hopefully have the right solution for readline/event now
<
Regenaxer>
Released it
<
Regenaxer>
However, the stress test still fails
<
Regenaxer>
Interesting. If I insert a (gc 60)
<
Regenaxer>
it seems not to crash
<
Regenaxer>
So it
*is* a gc issue?
<
tankf33der>
first loop still running under gc+
<
Regenaxer>
Too bad that it does not show anything
orivej has quit [Ping timeout: 258 seconds]
ym has joined #picolisp
rob_w has quit [Quit: Leaving]
orivej has joined #picolisp
karswell has joined #picolisp
karswell has quit [Ping timeout: 260 seconds]