ChanServ changed the topic of #picolisp to: PicoLisp language | Channel Log: https://irclog.whitequark.org/picolisp/ | Check also http://www.picolisp.com for more information
aw- has quit [Quit: Leaving.]
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #picolisp
orivej has quit [Remote host closed the connection]
orivej has joined #picolisp
orivej has quit [Ping timeout: 256 seconds]
orivej has joined #picolisp
orivej has quit [Read error: Connection reset by peer]
orivej has joined #picolisp
orivej has quit [Quit: No Ping reply in 180 seconds.]
orivej has joined #picolisp
orivej has quit [Ping timeout: 260 seconds]
<Regenaxer> Until now always crashed in the context of binary read (plio), probably when reading external symbols
rob_w has joined #picolisp
<tankf33der> my plio tests passed gc+
<tankf33der> i run gc+ tests all nigth:
<tankf33der> pil21 @lib/test.l
<tankf33der> and
<tankf33der> minimal bundle of pil21-tests passed.
<Regenaxer> Good morning tankf33der
<tankf33der> morning all
<Regenaxer> Do your plio tests also handle external symbols?
<Regenaxer> I think it has to do with symbols like {A1}
<Regenaxer> stress.l sends very long lists of external symbols to other processes
<Regenaxer> the crashes occurred there
<Regenaxer> and the last crash was when reading an index node from the DB, which also is a nested list of only external symbols
<Regenaxer> I'm just inspecting the code handling this, but cannot see anything suspicious
<tankf33der> i dont think so
<Regenaxer> yeah
<Regenaxer> cause not so easy to handle in unit tests
<tankf33der> feel free send me more tests and i will run
<Regenaxer> yeah, that's the problem
<Regenaxer> How about very simple ones first? Perhaps it shows something
<tankf33der> from plio.l file ?
<Regenaxer> (out "file" (pr (make (do 1000 (link *DB]
<tankf33der> eh
<tankf33der> ah
<tankf33der> ok
<Regenaxer> (in "file" ...
<Regenaxer> or a pipe
<Regenaxer> Who knows? Perhaps it helps :)
<Regenaxer> But I think the problem is deeper
<Regenaxer> Internal trees
<tankf33der> all db also passed, btw
<Regenaxer> so it needs more different symbols
<Regenaxer> yeah
<tankf33der> if you wish send full code and i will run.
<Regenaxer> Thats why I think it needs more complicated cases
<Regenaxer> where certain borderline conditions occur
<Regenaxer> But I don't know which condition (yet)
<Regenaxer> I'm just running a test with certain stuff commented out
<Regenaxer> hmm, no, did not help
<Regenaxer> Not crash, but db check error
<Regenaxer> Something internally is messed up
<Regenaxer> and the crash or runtime error occurs much later
<Regenaxer> Can you use random external objects in the test?
<Regenaxer> I think this would work:
<Regenaxer> (id (rand 1 16) (rand 1 99))
<tankf33der> just one line, right?
<Regenaxer> yes, in a loop
<Regenaxer> I think it works without a DB
<Regenaxer> just random symbols
<Regenaxer> Perhaps (id (rand 1 16) (rand 1 63))
<tankf33der> started
<Regenaxer> gives symbols up to {O77}
<Regenaxer> one "hax" letter and two octal digits
<tankf33der> no crash so far
<Regenaxer> hmm :(
<tankf33der> i have idea what to run under gc+
<tankf33der> let me try
<Regenaxer> good
<Regenaxer> I just know (from core) that all crashes were in 'binRead' (in src/io.l)
<Regenaxer> the last one was binRead -> extern -> consSym
<Regenaxer> and stress.l reads/writes lots of external symbols
<Regenaxer> Studying the code again ...
<tankf33der> i cant write full code to read/write lots of external symbols - you can.
<Regenaxer> The above random symbols is all I can think of
<Regenaxer> Perhaps lists of lists of random symbols
<Regenaxer> *reading* them
<Regenaxer> I think printing is not the problem
<Regenaxer> always the receiving side
mtsd has joined #picolisp
<Regenaxer> Trying to produce another core
<Regenaxer> No other idea :(
<Regenaxer> tankf33der, have you ever tried stress.l under gc+ directly?
<Regenaxer> Perhaps only with a single process
<Regenaxer> This may be the most promising way
<Regenaxer> *if* it is indeed a gc issue
mtsd_ has joined #picolisp
<tankf33der> sure
<tankf33der> it run 30mins yesterday
<Regenaxer> hmm
<Regenaxer> So it is something different
<Regenaxer> Something new
<Regenaxer> Also interesting that it crashes only here on my system
aw- has joined #picolisp
<Regenaxer> Perhaps you are right and it has to do with the llvm version?
mtsd has quit [Ping timeout: 258 seconds]
<Regenaxer> (though I cannot imagine)
<Regenaxer> Here it crashes quite reliably
<Regenaxer> after running stress/9 two or three times
mtsd has joined #picolisp
<tankf33der> i could find system with llvm9 but it takes time
<tankf33der> i should create a list with versions
<tankf33der> also try to run under llvm-as without opt -O3
<tankf33der> it will be faster result.
mtsd_ has quit [Ping timeout: 256 seconds]
<Regenaxer> It never crashed for you, right?
<Regenaxer> I will run on Android now (llvm 10)
<tankf33der> never. but crashed under opt -O3 a long ago.
<tankf33der> and maybe it was llvm9
<tankf33der> so you need to check.
<Regenaxer> I have 9 on all Debian systems
<Regenaxer> and 10 on Termux
<Regenaxer> Runs now
<tankf33der> you run without opt -O3, right?
<Regenaxer> I use the standard ASM = opt -O3
<Regenaxer> unmodified Makefile
<tankf33der> no
<tankf33der> compile and run under:
<tankf33der> ASM = llvm-as
<Regenaxer> ok
<Regenaxer> Just wait until this pass finished
<Regenaxer> You always test with llvm-as?
<tankf33der> no, I always use default Makefile
<Regenaxer> ok
<tankf33der> and everywhere it was ok, ok i meant pass tests in "pil21-tests" bundle
<tankf33der> even solaris :)
<tankf33der> even solaris sparc :)
<tankf33der> llvm in solaris sparc is 10, btw.
<Regenaxer> I see
<tankf33der> but seems i tried all versions in range 7-11.
<tankf33der> pil21 supports LLVM7+ version only.
<Regenaxer> yeah
<Regenaxer> What happened on below 7?
<Regenaxer> build error?
<tankf33der> build error in mem* llvm's primitives.
<Regenaxer> all right
<Regenaxer> No worries about old llvm
<tankf33der> T.
<Regenaxer> will disappear soon
orivej has joined #picolisp
<Regenaxer> Crashed on Termux (LLVM 10) on 3rd pass
Blue_flame has quit [Quit: killed]
<Regenaxer> Now testing LLVM 9 without opt -O3
Blue_flame has joined #picolisp
aw- has quit [Quit: Leaving.]
<tankf33der> you run misc/stress.l file without modifications, right?
<Regenaxer> right
<Regenaxer> 12 passes with 99 child processes
<Regenaxer> Now 2 passes done without 'opt'
<Regenaxer> parallel on 2 machines
<Regenaxer> If really 'opt' were wrong, I would think the error would be immediately reproducible
<Regenaxer> On the 2nd machine 'opt' is still used
<Regenaxer> it is faster it seems
<Regenaxer> Started later but earlier on pass 3
<Regenaxer> hmm, strange
<Regenaxer> already on pass 5
<Regenaxer> can't be sooo fast
<Regenaxer> Getting more and more confused
<Regenaxer> That second machine is also slower
<Regenaxer> has 4 CPUs, as opposed to 8 on the first machine
<Regenaxer> Still it is already on pass 5
<Regenaxer> the first started 4 now
<Regenaxer> Crash!
<Regenaxer> without 'opt'
<Regenaxer> in the 5th pass
<Regenaxer> So llvm version or opt don't matter
<tankf33der> ok, good test.
<tankf33der> debian10?
<tankf33der> i will start stress.l again somewhere
<Regenaxer> It is most probably IPC
<Regenaxer> many processes
<Regenaxer> cause gc+ did not show anything
<Regenaxer> The binary data interfer
mtsd has quit [Quit: Leaving]
<tankf33der> started stress on oracle 8, llvm9
<tankf33der> before void and fedora was always ok
<tankf33der> already two loops passed
<Regenaxer> ok
<tankf33der> crash, hehe.
<Regenaxer> :)
<tankf33der> now it somehow hang.
<tankf33der> eats cpu and dont run to next loop
<Regenaxer> Yes, saw that too
<Regenaxer> And sometimes you get a data error
<Regenaxer> like:
<Regenaxer> !? (wipe Lst)
<Regenaxer> ({1641} . "+A") -- Symbol expected
<Regenaxer> 'wipe' is called in 'upd'
<Regenaxer> (commit 'upd) causes a list ({xxx} ...} to be sent to all other processes
<Regenaxer> What is received is nonsense
<Regenaxer> "+A" is obviously the class, which is never sent
<Regenaxer> so plio gets messed up completely
<Regenaxer> *But* only sometimes
<Regenaxer> If it is not plio, it *could* also be gc messing up the data, but that seems a bit improbable
<Regenaxer> especially as gc+ did not find anything
<Regenaxer> I have to go through all related sources later
<tankf33der> running under gc+ on oracle8 linux
<Regenaxer> ok
<Regenaxer> Hmm, remember yesterday when the first key in repl did not work?
<Regenaxer> It was a change from the day before
<Regenaxer> But I think this was right
<Regenaxer> So something else is messed up
<Regenaxer> The nonblocking read
<tankf33der> one whole loop still running under gc+.
<Regenaxer> ok
<Regenaxer> I now understand what the issue is with the first char in repl
<Regenaxer> It is in fact a problem of using readline()
<Regenaxer> For that I used a wrong event handling
<Regenaxer> Need to find a solution for that
<Regenaxer> But I think this is not the reason for the crashes (it has an effect how IPC works though)
<beneroth> Regenaxer, readline() is one of the few external dependencies, right? does the cost hold up so far?
<Regenaxer> Yeah, I'm not so happy about the dependency
<Regenaxer> But it is kind of standard
<Regenaxer> and everybody can configure is she likes
<Regenaxer> The main problem is that it is difficult to get readline() work well with the pil event handling, i.e. *Run / task
<Regenaxer> There are some hooks, but not really as I'd like it to be
<Regenaxer> That's why doc/diff says: " Because of readline(3), '*Run' tasks are suspended while typing in the REPL"
<Regenaxer> This is a small restriction compared to old pils
<Regenaxer> The problem I have atm is that readline does its own key reading
<Regenaxer> So we have that lost first key
<beneroth> I see
<Regenaxer> OK, hopefully have the right solution for readline/event now
<Regenaxer> Released it
<Regenaxer> However, the stress test still fails
<Regenaxer> Interesting. If I insert a (gc 60)
<Regenaxer> http://ix.io/2Bnt
<Regenaxer> it seems not to crash
<Regenaxer> So it *is* a gc issue?
<tankf33der> first loop still running under gc+
<Regenaxer> Too bad that it does not show anything
orivej has quit [Ping timeout: 258 seconds]
ym has joined #picolisp
rob_w has quit [Quit: Leaving]
orivej has joined #picolisp
karswell has joined #picolisp
karswell has quit [Ping timeout: 260 seconds]