#picolisp on 2020-10-20 — irc logs at freenode.irclog.whitequark.org

2018-09-14 18:41 ChanServ changed the topic of #picolisp to: PicoLisp language | Channel Log: https://irclog.whitequark.org/picolisp/ | Check also http://www.picolisp.com for more information

00:00 aw- has quit [Quit: Leaving.]

01:08 orivej has quit [Ping timeout: 256 seconds]

01:37 orivej has joined #picolisp

01:46 orivej has quit [Remote host closed the connection]

01:47 orivej has joined #picolisp

02:01 orivej has quit [Ping timeout: 256 seconds]

02:06 orivej has joined #picolisp

02:30 orivej has quit [Read error: Connection reset by peer]

02:31 orivej has joined #picolisp

02:39 orivej has quit [Quit: No Ping reply in 180 seconds.]

02:41 orivej has joined #picolisp

03:59 orivej has quit [Ping timeout: 260 seconds]

06:02 <Regenaxer> Until now always crashed in the context of binary read (plio), probably when reading external symbols

06:12 rob_w has joined #picolisp

06:25 <tankf33der> my plio tests passed gc+

06:25 <tankf33der> i run gc+ tests all nigth:

06:26 <tankf33der> pil21 @lib/test.l

06:26 <tankf33der> and

06:26 <tankf33der> minimal bundle of pil21-tests passed.

06:27 <Regenaxer> Good morning tankf33der

06:27 <tankf33der> morning all

06:27 <Regenaxer> Do your plio tests also handle external symbols?

06:28 <Regenaxer> I think it has to do with symbols like {A1}

06:29 <Regenaxer> stress.l sends very long lists of external symbols to other processes

06:29 <Regenaxer> the crashes occurred there

06:29 <Regenaxer> and the last crash was when reading an index node from the DB, which also is a nested list of only external symbols

06:30 <Regenaxer> I'm just inspecting the code handling this, but cannot see anything suspicious

06:30 <tankf33der> i dont think so

06:30 <Regenaxer> yeah

06:30 <tankf33der> https://git.envs.net/mpech/pil21-tests/raw/branch/master/plio.l

06:30 <Regenaxer> cause not so easy to handle in unit tests

06:30 <tankf33der> feel free send me more tests and i will run

06:31 <Regenaxer> yeah, that's the problem

06:31 <Regenaxer> How about very simple ones first? Perhaps it shows something

06:32 <tankf33der> from plio.l file ?

06:32 <Regenaxer> (out "file" (pr (make (do 1000 (link *DB]

06:32 <tankf33der> eh

06:32 <tankf33der> ah

06:32 <tankf33der> ok

06:32 <Regenaxer> (in "file" ...

06:32 <Regenaxer> or a pipe

06:32 <Regenaxer> Who knows? Perhaps it helps :)

06:33 <Regenaxer> But I think the problem is deeper

06:33 <Regenaxer> Internal trees

06:33 <tankf33der> all db also passed, btw

06:33 <Regenaxer> so it needs more different symbols

06:33 <Regenaxer> yeah

06:34 <tankf33der> if you wish send full code and i will run.

06:34 <Regenaxer> Thats why I think it needs more complicated cases

06:34 <Regenaxer> where certain borderline conditions occur

06:34 <Regenaxer> But I don't know which condition (yet)

06:35 <Regenaxer> I'm just running a test with certain stuff commented out

06:38 <Regenaxer> hmm, no, did not help

06:38 <Regenaxer> Not crash, but db check error

06:39 <Regenaxer> Something internally is messed up

06:39 <Regenaxer> and the crash or runtime error occurs much later

06:42 <Regenaxer> Can you use random external objects in the test?

06:42 <Regenaxer> I think this would work:

06:42 <Regenaxer> (id (rand 1 16) (rand 1 99))

06:43 <tankf33der> just one line, right?

06:43 <Regenaxer> yes, in a loop

06:43 <Regenaxer> I think it works without a DB

06:44 <Regenaxer> just random symbols

06:44 <Regenaxer> Perhaps (id (rand 1 16) (rand 1 63))

06:44 <tankf33der> started

06:44 <Regenaxer> gives symbols up to {O77}

06:45 <Regenaxer> one "hax" letter and two octal digits

06:50 <tankf33der> no crash so far

06:50 <Regenaxer> hmm :(

06:56 <tankf33der> i have idea what to run under gc+

06:56 <tankf33der> let me try

06:56 <Regenaxer> good

06:57 <Regenaxer> I just know (from core) that all crashes were in 'binRead' (in src/io.l)

06:57 <Regenaxer> the last one was binRead -> extern -> consSym

06:58 <Regenaxer> and stress.l reads/writes lots of external symbols

06:59 <Regenaxer> Studying the code again ...

06:59 <tankf33der> i cant write full code to read/write lots of external symbols - you can.

07:00 <Regenaxer> The above random symbols is all I can think of

07:01 <Regenaxer> Perhaps lists of lists of random symbols

07:01 <Regenaxer> *reading* them

07:01 <Regenaxer> I think printing is not the problem

07:02 <Regenaxer> always the receiving side

07:25 mtsd has joined #picolisp

07:28 <Regenaxer> Trying to produce another core

07:28 <Regenaxer> No other idea :(

07:32 <Regenaxer> tankf33der, have you ever tried stress.l under gc+ directly?

07:32 <Regenaxer> Perhaps only with a single process

07:33 <Regenaxer> This may be the most promising way

07:33 <Regenaxer> *if* it is indeed a gc issue

07:34 mtsd_ has joined #picolisp

07:34 <tankf33der> sure

07:34 <tankf33der> it run 30mins yesterday

07:34 <Regenaxer> hmm

07:35 <Regenaxer> So it is something different

07:35 <Regenaxer> Something new

07:35 <Regenaxer> Also interesting that it crashes only here on my system

07:35 aw- has joined #picolisp

07:36 <Regenaxer> Perhaps you are right and it has to do with the llvm version?

07:36 mtsd has quit [Ping timeout: 258 seconds]

07:36 <Regenaxer> (though I cannot imagine)

07:37 <Regenaxer> Here it crashes quite reliably

07:37 <Regenaxer> after running stress/9 two or three times

07:41 mtsd has joined #picolisp

07:42 <tankf33der> i could find system with llvm9 but it takes time

07:42 <tankf33der> i should create a list with versions

07:43 <tankf33der> also try to run under llvm-as without opt -O3

07:43 <tankf33der> it will be faster result.

07:44 mtsd_ has quit [Ping timeout: 256 seconds]

07:45 <Regenaxer> It never crashed for you, right?

07:46 <Regenaxer> I will run on Android now (llvm 10)

07:46 <tankf33der> never. but crashed under opt -O3 a long ago.

07:47 <tankf33der> and maybe it was llvm9

07:47 <tankf33der> so you need to check.

07:48 <Regenaxer> I have 9 on all Debian systems

07:48 <Regenaxer> and 10 on Termux

07:49 <Regenaxer> Runs now

07:54 <tankf33der> you run without opt -O3, right?

07:55 <Regenaxer> I use the standard ASM = opt -O3

07:56 <Regenaxer> unmodified Makefile

07:56 <tankf33der> no

07:56 <tankf33der> compile and run under:

07:56 <tankf33der> ASM = llvm-as

07:56 <Regenaxer> ok

07:56 <Regenaxer> Just wait until this pass finished

07:57 <Regenaxer> You always test with llvm-as?

07:57 <tankf33der> no, I always use default Makefile

07:57 <Regenaxer> ok

07:57 <tankf33der> and everywhere it was ok, ok i meant pass tests in "pil21-tests" bundle

07:57 <tankf33der> even solaris :)

07:58 <tankf33der> even solaris sparc :)

07:58 <tankf33der> llvm in solaris sparc is 10, btw.

07:58 <Regenaxer> I see

07:58 <tankf33der> but seems i tried all versions in range 7-11.

07:58 <tankf33der> pil21 supports LLVM7+ version only.

07:59 <Regenaxer> yeah

07:59 <Regenaxer> What happened on below 7?

07:59 <Regenaxer> build error?

08:00 <tankf33der> build error in mem* llvm's primitives.

08:00 <Regenaxer> all right

08:00 <Regenaxer> No worries about old llvm

08:00 <tankf33der> T.

08:01 <Regenaxer> will disappear soon

08:11 orivej has joined #picolisp

09:20 <Regenaxer> Crashed on Termux (LLVM 10) on 3rd pass

09:24 Blue_flame has quit [Quit: killed]

09:25 <Regenaxer> Now testing LLVM 9 without opt -O3

09:31 Blue_flame has joined #picolisp

09:32 aw- has quit [Quit: Leaving.]

10:07 <tankf33der> you run misc/stress.l file without modifications, right?

10:07 <Regenaxer> right

10:08 <Regenaxer> 12 passes with 99 child processes

10:09 <Regenaxer> Now 2 passes done without 'opt'

10:09 <Regenaxer> parallel on 2 machines

10:09 <Regenaxer> If really 'opt' were wrong, I would think the error would be immediately reproducible

10:11 <Regenaxer> On the 2nd machine 'opt' is still used

10:11 <Regenaxer> it is faster it seems

10:12 <Regenaxer> Started later but earlier on pass 3

10:12 <Regenaxer> hmm, strange

10:12 <Regenaxer> already on pass 5

10:12 <Regenaxer> can't be sooo fast

10:13 <Regenaxer> Getting more and more confused

10:13 <Regenaxer> That second machine is also slower

10:14 <Regenaxer> has 4 CPUs, as opposed to 8 on the first machine

10:14 <Regenaxer> Still it is already on pass 5

10:14 <Regenaxer> the first started 4 now

10:38 <Regenaxer> Crash!

10:38 <Regenaxer> without 'opt'

10:39 <Regenaxer> in the 5th pass

10:41 <Regenaxer> So llvm version or opt don't matter

10:44 <tankf33der> ok, good test.

10:44 <tankf33der> debian10?

10:46 <tankf33der> i will start stress.l again somewhere

10:50 <Regenaxer> It is most probably IPC

10:50 <Regenaxer> many processes

10:51 <Regenaxer> cause gc+ did not show anything

10:51 <Regenaxer> The binary data interfer

10:58 mtsd has quit [Quit: Leaving]

11:26 <tankf33der> started stress on oracle 8, llvm9

11:26 <tankf33der> before void and fedora was always ok

11:26 <tankf33der> already two loops passed

11:30 <Regenaxer> ok

11:31 <tankf33der> https://i.imgur.com/9tdIKMH.jpg

11:31 <tankf33der> crash, hehe.

11:48 <Regenaxer> :)

11:59 <tankf33der> https://i.imgur.com/Oiqm05W.jpg

11:59 <tankf33der> now it somehow hang.

11:59 <tankf33der> eats cpu and dont run to next loop

12:00 <Regenaxer> Yes, saw that too

12:00 <Regenaxer> And sometimes you get a data error

12:01 <Regenaxer> like:

12:01 <Regenaxer> !? (wipe Lst)

12:01 <Regenaxer> ({1641} . "+A") -- Symbol expected

12:01 <Regenaxer> 'wipe' is called in 'upd'

12:02 <Regenaxer> (commit 'upd) causes a list ({xxx} ...} to be sent to all other processes

12:02 <Regenaxer> What is received is nonsense

12:02 <Regenaxer> "+A" is obviously the class, which is never sent

12:03 <Regenaxer> so plio gets messed up completely

12:04 <Regenaxer> *But* only sometimes

12:04 <Regenaxer> If it is not plio, it *could* also be gc messing up the data, but that seems a bit improbable

12:05 <Regenaxer> especially as gc+ did not find anything

12:06 <Regenaxer> I have to go through all related sources later

12:17 <tankf33der> running under gc+ on oracle8 linux

12:17 <Regenaxer> ok

12:19 <Regenaxer> Hmm, remember yesterday when the first key in repl did not work?

12:19 <Regenaxer> It was a change from the day before

12:19 <Regenaxer> But I think this was right

12:19 <Regenaxer> So something else is messed up

12:21 <Regenaxer> The nonblocking read

13:12 <tankf33der> one whole loop still running under gc+.

13:18 <Regenaxer> ok

13:18 <Regenaxer> I now understand what the issue is with the first char in repl

13:18 <Regenaxer> It is in fact a problem of using readline()

13:18 <Regenaxer> For that I used a wrong event handling

13:19 <Regenaxer> Need to find a solution for that

13:19 <Regenaxer> But I think this is not the reason for the crashes (it has an effect how IPC works though)

13:21 <beneroth> Regenaxer, readline() is one of the few external dependencies, right? does the cost hold up so far?

13:22 <Regenaxer> Yeah, I'm not so happy about the dependency

13:22 <Regenaxer> But it is kind of standard

13:22 <Regenaxer> and everybody can configure is she likes

13:23 <Regenaxer> The main problem is that it is difficult to get readline() work well with the pil event handling, i.e. *Run / task

13:23 <Regenaxer> There are some hooks, but not really as I'd like it to be

13:24 <Regenaxer> That's why doc/diff says: " Because of readline(3), '*Run' tasks are suspended while typing in the REPL"

13:24 <Regenaxer> This is a small restriction compared to old pils

13:25 <Regenaxer> The problem I have atm is that readline does its own key reading

13:25 <Regenaxer> So we have that lost first key

13:42 <beneroth> I see

13:54 <Regenaxer> OK, hopefully have the right solution for readline/event now

13:54 <Regenaxer> Released it

13:54 <Regenaxer> However, the stress test still fails

14:54 <Regenaxer> Interesting. If I insert a (gc 60)

14:54 <Regenaxer> http://ix.io/2Bnt

14:54 <Regenaxer> it seems not to crash

14:55 <Regenaxer> So it *is* a gc issue?

15:15 <tankf33der> first loop still running under gc+

15:16 <Regenaxer> Too bad that it does not show anything

15:53 orivej has quit [Ping timeout: 258 seconds]

16:35 ym has joined #picolisp

17:15 rob_w has quit [Quit: Leaving]

19:17 orivej has joined #picolisp

23:00 karswell has joined #picolisp

23:22 karswell has quit [Ping timeout: 260 seconds]