#litex on 2021-04-02 — irc logs at freenode.irclog.whitequark.org

2020-02-07 11:13 _florent_ changed the topic of #litex to: LiteX FPGA SoC builder and Cores / Github : https://github.com/enjoy-digital, https://github.com/litex-hub / Logs: https://freenode.irclog.whitequark.org/litex

00:00 tpb has quit [Remote host closed the connection]

00:00 tpb has joined #litex

01:14 Degi_ has joined #litex

01:17 Degi has quit [Ping timeout: 268 seconds]

01:17 Degi_ is now known as Degi

02:35 CarlFK has quit [Ping timeout: 265 seconds]

04:02 kgugala_ has joined #litex

04:04 kgugala__ has joined #litex

04:04 kgugala has quit [Ping timeout: 240 seconds]

04:07 kgugala_ has quit [Ping timeout: 265 seconds]

04:47 cr1901_modern1 has quit [Read error: Connection reset by peer]

04:52 kgugala has joined #litex

04:54 kgugala__ has quit [Ping timeout: 268 seconds]

05:34 peeps[zen] is now known as peepsalot

06:15 kgugala_ has joined #litex

06:16 kgugala has quit [Ping timeout: 240 seconds]

06:19 Melkhior has joined #litex

06:33 cr1901_modern has joined #litex

06:48 peepsalot has quit [Ping timeout: 246 seconds]

06:49 Billy_ has joined #litex

06:51 Billy_ is now known as fly

06:55 peepsalot has joined #litex

07:13 <Melkhior> @somlo About the micro-sd card; I have a new board for LiteX where the sd-card works better than on my custom carrier (that I used previously for LIteX); which kernel (git/branch/commit/...) would you recomment at this time? I can boot Yocto from a sd-card root, but I have random issues that seem related to sd-card access (buildroot works

07:13 <Melkhior> apparently fine, but is much lighter on storage I/O...) Currently booting the commit pointed to by linux-on-litex-vexriscv.

07:13 <Melkhior> TIA

07:21 Bertl_oO is now known as Bertl_zZ

09:32 hansfbaier has joined #litex

09:59 hansfbaier has quit [Quit: WeeChat 2.8]

11:58 <Melkhior> @somlo tried branch litex-rebase, but it dies during boot after timed-uot CMD18: https://pastebin.com/ScJ7WX93

11:58 <tpb> Title: [ 1.291426] mmc0: new SDHC card at address 0001[ 1.299177] mmcblk0: mmc0 - Pastebin.com (at pastebin.com)

12:19 <somlo> @Melkhior: I just rebased litex-rebase again, and put back the restriction to single-block transfers

12:20 <Melkhior> I'm trying on the previous commit, I got to the login prompt

12:20 <somlo> with multi-block, reads are about twice as fast as before, but (large) writes will cause a hard lock-up of kernel and/or gateware, whereas large single-block writes might fail silently but leave the system still operational otherwise

12:21 <Melkhior> it's trying to log me in but that was very slow on the regular kernel as well

12:21 <somlo> _florent_ is looking at the gateware, I've been pretty busy over the last week, didn't have a chance to do anything on my end

12:22 <Melkhior> no problem, just wanted to know what I needed to test with VexRiscv

12:22 <Melkhior> as you're working with Rocket

12:22 <somlo> right, thanks!

12:23 <Melkhior> I'm on the sdcard because I couldn't get the Ethernt to work in Linux :-)

12:23 <somlo> my test is: `mount /dev/mmcblk0p1 /mnt; cp /mnt/boot.bin /root/foo; cp /root/foo/mnt/; umount /mnt; mount /dev/mmcblk0p1 /mnt; md5sum /mnt/*`

12:24 <Melkhior> otherwise NFS would be easier to have a large FS accessible

12:24 <somlo> the idea is after mounting the second time, /mnt/boot.bin and /mnt/foo should have the same md5sum

12:24 <Melkhior> yes they definitively should...

12:25 <Melkhior> I've gone one further:

12:25 <Melkhior> root@litex-riscv32:/root# mount | grep mmc

12:25 <somlo> which only happens for me with single-block, *and* if I further slow down the sdclock to max out at 12.5MHz :)

12:25 <Melkhior> dev/mmcblk0p2 on / type ext4 (rw,relatime)

12:25 <somlo> otherwise (full 25MHz sdclock) the write fails silently in single-block mode; in multi-block mode, the write locks up the system at any sdclock (limited or not)

12:26 <somlo> Melkhior: yeah, any *fancy* filesystem (i.e., fancier than fat16 :) ) will involve lots of housekeeping writes (unless you mount r/o)

12:26 <Melkhior> mmm, I'm 4 cores @ 85 MHz so:

12:26 <Melkhior> [ 1.788867] litex-mmc f0006000.mmc: Requested clk_freq=0: set to 332031 via div=256

12:26 <Melkhior> [ 1.808410] litex-mmc f0006000.mmc: Requested clk_freq=12500000: set to 10625000 via di

12:26 <Melkhior> v=8

12:26 <Melkhior> [ 1.845192] litex-mmc f0006000.mmc: Requested clk_freq=25000000: set to 21250000 via di

12:26 <Melkhior> v=4

12:27 <somlo> and those writes will sometimes fail, because the (gateware x linux_driver) combo is still having some issues...

12:27 <Melkhior> slower than 25 MHz but faster than 12.5

12:28 <Melkhior> have you tried intermediate speed or just an increased divisor going straight from 25 to 12.5 ?

12:30 <somlo> Melkhior: try adding a `div <<= 1;` after this line: https://github.com/litex-hub/linux/blob/litex-rebase/drivers/mmc/host/litex_mmc.c#L87

12:31 <somlo> yeah, no idea if (and how) I could get additional resolution for sdclock beyond factors of 2

12:32 <somlo> FWIW, the `dd if=/mnt/boot.bin of=/dev/null` reported time doesn't change by more of 1 or 2 seconds (out of 20) when halving the sdclock

12:32 <somlo> where boot.bin is about 15MB

12:33 <somlo> all this on Rocket, I *think* it should be fairly similar on vexriscv

12:33 <Melkhior> I guess it would be, but you never know ...

12:33 <Melkhior> I will try with the increased divisor as well

12:33 <Melkhior> buildroot was much more stable than Yocto but probably doesn't do anywhere near as much I/O

12:34 <Melkhior> systemd is ... well, systeld :-)

12:34 <Melkhior> systemd

12:34 <somlo> baby steps :D

12:35 <Melkhior> yes, I guess 4 RV32GCBK cores and a 'full' distro might be a bit ambitious for a soft-SoC :-)

12:36 <somlo> you say that now, but who knows ;)

12:36 <Melkhior> but except for mass I/O it works fairly well !

12:37 <Melkhior> but currently, process '(rdisc)' with parentesis is using a full core at 100% and I get in dmesg:

12:37 <Melkhior> [ 314.216330] rcu: INFO: rcu_sched self-detected stall on CPU

12:37 <Melkhior> [ 314.216701] rcu: 1-....: (52515 ticks this GP) idle=512/1/0x40000004 softirq=7625/7

12:37 <Melkhior> 627 fqs=26174

12:37 <Melkhior> [ 314.217306] (t=52509 jiffies g=11841 q=4215)

12:37 <Melkhior> [ 314.217615] Task dump for CPU 1:

12:37 <Melkhior> [ 314.217821] task:(rdisc) state:R running task stack: 0 pid: 140 ppid:

12:37 <Melkhior> 1 flags:0x00000008

12:37 <Melkhior> [ 314.218548] Call Trace:

12:37 <Melkhior> [ 314.218689] [<c00033f0>] walk_stackframe+0x0/0xca

12:38 <somlo> pretty ominous...

12:39 <Melkhior> yes, and I accidentaly typed 'tab' and invoked auto-completion ... and the terminal froze

12:39 <Melkhior> from pas experience, after some time (closer to minutes than seconds...) I will get the control back

12:39 <Melkhior> s/pas/past/

12:42 <somlo> tab auto-complete is a good test of how well filesystem reads work...

12:43 <Melkhior> yes, unfortunately a test I often invoke involuntarily...

12:43 <Melkhior> rebooting on latest rebase

12:48 <Melkhior> this time it's dbus-daemon eating up a core; hadn't seen that one before:

12:48 <Melkhior> [ 311.231188] rcu: INFO: rcu_sched self-detected stall on CPU

12:48 <Melkhior> [ 311.231526] rcu: 2-....: (1 GPs behind) idle=99a/1/0x40000004 softirq=7055/7056 fqs=26140

12:48 <Melkhior> [ 311.232099] (t=52509 jiffies g=11441 q=4277)

12:48 <Melkhior> [ 311.232406] Task dump for CPU 2:

12:48 <Melkhior> [ 311.232609] task:dbus-daemon state:R running task stack: 0 pid: 123 ppid: 1 flags:0x00000008

12:48 <Melkhior> [ 311.233326] Call Trace:

12:48 <Melkhior> [ 311.233460] [<c00033f0>] walk_stackframe+0x0/0xca

13:08 <Melkhior> despite that, copying a 12 MiB file seemed to work... same md5sum after reboot

13:40 <somlo> Melkhior: although the output you pasted seems somewhat orthogonal to any sdcard issues, not sure I see a strong link there

13:40 <Melkhior> me neither, but more often that not it's rdisc

13:41 <Melkhior> maybe some I/O issue causing weird behavior to random programs ?

13:41 <Melkhior> I'll have to re-test a buildroot root as well

13:43 <Melkhior> compilation issue would be deterministic accross reboot

13:43 <Melkhior> core issue would probably also affect buidlroot

13:43 <Melkhior> so I'm guessing I/O to the root FS as yocto is more intensive

13:44 <Melkhior> but I agree it's just a guess and not substantiated by the output in dmesg :-(

14:08 <Melkhior> Buildroot has no apparent issue; of course it starts a lot fewer processes during boot as well...

14:10 <Melkhior> No idea how to figure out what is causing those semi-random failures in Yocto... any suggestions welcome!

14:12 <Melkhior> The only thing that points to the sd-card subsystem is that when there's no 'living dead' process (100% cpu, unkillabe, flagged aby rcu_sched), then auto-comlpetion is decently fast... not particularily conclusive :-(

14:12 <Melkhior> anyway, will try to figure out the yocto issue as time permits

14:12 <Melkhior> and watch litex-rebase for update :-)

15:07 Bertl_zZ is now known as Bertl

16:40 <geertu> https://society.oftrolls.com/@geert/105996716422075863

16:40 <tpb> Title: Geert Uytterhoeven: "OrangeCrab ECP5 FPGA board + LiteX + VexRiscv + A…" - Society of Trolls (at society.oftrolls.com)

18:22 m4ssi has joined #litex

18:53 kgugala has joined #litex

18:55 kgugala_ has quit [Ping timeout: 240 seconds]

19:38 kgugala_ has joined #litex

19:41 kgugala has quit [Ping timeout: 246 seconds]

22:08 BryceSchroeder has quit [Read error: Connection reset by peer]

22:33 m4ssi has quit [Remote host closed the connection]

23:41 lf has quit [Ping timeout: 250 seconds]

23:41 lf has joined #litex