#litex on 2021-01-28 — irc logs at freenode.irclog.whitequark.org

2020-02-07 11:13 _florent_ changed the topic of #litex to: LiteX FPGA SoC builder and Cores / Github : https://github.com/enjoy-digital, https://github.com/litex-hub / Logs: https://freenode.irclog.whitequark.org/litex

00:00 tpb has quit [Remote host closed the connection]

00:00 tpb has joined #litex

00:43 lf has quit [Ping timeout: 264 seconds]

00:43 lf_ has joined #litex

00:49 Bertl_oO has quit [Ping timeout: 246 seconds]

00:59 peeps[zen] has joined #litex

01:00 peepsalot has quit [Ping timeout: 246 seconds]

01:01 peeps[zen] is now known as peepsalot

01:15 st-gourichon-f has joined #litex

01:16 st-gourichon-fid has quit [Quit: ZNC - https://znc.in]

02:20 tpb has quit [Disconnected by services]

02:21 tpb has joined #litex

02:21 carlomaragno has quit [Ping timeout: 256 seconds]

02:22 carlomaragno has joined #litex

02:52 Bertl_oO has joined #litex

03:21 lkcl has quit [Ping timeout: 264 seconds]

03:43 Degi_ has joined #litex

03:44 lkcl has joined #litex

03:44 Degi has quit [Ping timeout: 256 seconds]

03:44 Degi_ is now known as Degi

05:02 peepsalot has quit [Ping timeout: 246 seconds]

05:05 peeps has joined #litex

05:13 peepsalot has joined #litex

05:15 peeps has quit [Ping timeout: 264 seconds]

06:20 kgugala has quit [Read error: Connection reset by peer]

06:21 kgugala has joined #litex

06:40 Bertl_oO is now known as Bertl_zZ

07:27 lkcl has quit [Ping timeout: 272 seconds]

07:31 kgugala has quit [Quit: -a- Connection Timed Out]

07:31 kgugala has joined #litex

07:39 lkcl has joined #litex

07:48 kgugala_ has joined #litex

07:52 kgugala has quit [Ping timeout: 264 seconds]

08:20 <_florent_> acathla: zyp's explanations make sense to me, if you want to ease investigation on this, you could also use litex_sim with --trace. This would give you direct visibility on all signals of the SoC

08:20 <_florent_> Also not your you have an instruction cache on your CPU, but if not adding one (even small) could help

08:22 <acathla> _florent_, I use vexriscv-minimal from LiteX, I don't know if there is an instruction cache.

08:24 <_florent_> acathla: Litescope is nice for things that are difficult to simulate but here I would look at this in simulation

08:24 <_florent_> VexRiscv-minimal has the caches disabled IIRC

08:24 <acathla> I understand but I don't well how it works yet. I don't how to add something to the simulation for example

08:25 <acathla> unless it's available as an option

08:26 <_florent_> acathla: litex_sim simulation is really similar to the target you are running on hardware, it's possible to customize it, add your own peripherals, etc...

08:27 <_florent_> I can try to get you started on this, I'm going to try to reproduce the behaviour you see in simulation

08:28 <acathla> Oh, nice :)

08:29 <acathla> I'm not sure it's critical on the versa, as zyp said, but on the fomu it get worse and the clock is slower so we end up with a 32 microcontroller slower than everything

08:37 <_florent_> ok, so with a minor modification to the BIOS to reproduce your behavior:

08:37 <_florent_> https://www.irccloud.com/pastebin/tEo539RX/

08:37 <tpb> Title: Snippet | IRCCloud (at www.irccloud.com)

08:38 <_florent_> rm -rf build && lxsim --cpu-type=vexriscv --cpu-variant=minimal --trace

08:38 <_florent_> gtkwave build/sim/gateware/sim.vcd.

08:39 <_florent_> you'll get this:

08:39 <_florent_> https://usercontent.irccloud-cdn.com/file/Ntaxggr1/Screenshot%20from%202021-01-28%2009-39-14.png

08:40 <_florent_> which seems to exhibit the same behavior you are seeing on hardware, except that it takes a few seconds to reproduce :)

08:40 <_florent_> no let's try with others variant of VexRiscv

08:41 <_florent_> rm -rf build && lxsim --cpu-type=vexriscv --cpu-variant=lite --trace

08:44 <_florent_> the behaviour is now different:

08:44 <_florent_> https://usercontent.irccloud-cdn.com/file/jx2VROpm/Screenshot%20from%202021-01-28%2009-44-24.png

08:47 <acathla> Hum, I cannot decode the matrix that easily yet...

08:47 <_florent_> and also different with standard: rm -rf build && lxsim --cpu-type=vexriscv --cpu-variant=standard --trace:

08:47 <_florent_> https://usercontent.irccloud-cdn.com/file/GdUWqgXb/Screenshot%20from%202021-01-28%2009-47-17.png

08:48 <_florent_> where we no longer see instruction bus accesses during the while(1) puts("h") loop since instructions are in the cache

08:50 <acathla> But the standard vex cannot fit in an iCE40up5k, right?

08:51 <_florent_> acathla: sure that's not easy to decode when not familiar with it, but it was just to show you that the simulation could be useful for this work and to understand the performance issue and how to improve/avoid it

08:51 kgugala has joined #litex

08:51 <acathla> Ok, thank you.

08:53 kgugala_ has quit [Ping timeout: 260 seconds]

08:53 <acathla> Oh, standard vexriscv fits in a fomu without USB

08:54 <acathla> Since I don't need USB but Infrared, may be it can all fit.

08:55 kgugala_ has joined #litex

08:55 <_florent_> if you want to see nice optimizations on iCE40 with VexRiscv and a specific cache using the SPRAM, I would recommend looking at https://twitter.com/esden/status/1354568108510388232 :)

08:56 <acathla> omg Doom with sound!

08:58 kgugala has quit [Ping timeout: 256 seconds]

09:26 <acathla> _florent_, so you jsut said that for nice optimizations I should forget about litex and go verilog...

09:27 <acathla> =)

09:31 <_florent_> acathla: that's not necessarily what I say :), but LiteX covers verious various cases and has to be generic enough to do so, It should provide a good basis and trade off between genericity/performance, but if you really want to push performance, the first thing is to understand the real bottleneck and do maybe some optimizations

09:32 <acathla> That's a lot of work...

09:33 <_florent_> what is a lot of work?

09:33 <acathla> SPI RAM seems to be a nice idea

09:33 <acathla> _florent_, understand every level to optimize

09:34 <acathla> and my job is to make a robot with IR communication, from scratch, better than the kilobot.

09:34 <_florent_> ah not necessarily every level, but at least the bottleneck for your use case

09:38 <_florent_> If LiteX provides you 90% of what you need and you just need to plug a custom module (specific cache, SPI RAM), that can still be interesting compared to verilog, but that's sure that it relies on the same principles than others SoCs and has the same limitations

09:40 <zyp> custom interconnect, perhaps

09:40 <_florent_> I was sharing the Doom design on iCEBreaker since find it a good example of what you can do in a constrainted environment with things optimized as much as it can possible be

09:41 <zyp> you'd probably want something more efficient than a plain shared interconnect, but smaller than a full crossbar

09:45 <acathla> I'm not sure I could explain what's a shared interconnect or a full crossbar

09:45 <zyp> _florent_, maybe a general reduced crossbar would be useful? I figure in a small system the only masters would be I-bus and D-bus and the I-bus would probably only need to access one or two of the slaves

09:46 <zyp> acathla, the interconnect is what connect wishbone masters to wishbone slaves -- a shared interconnect can be accessed by one master at a time

09:48 <zyp> so there's an arbiter to select which master to serve followed by an address decoder to select which slave to access

09:48 <acathla> usually there is only one master, right? Unless we add a bridge to debug things.

09:48 <_florent_> zyp: yes that indeed be useful

09:48 <zyp> acathla, no, the cpu alone has two; ibus and dbus

09:49 <acathla> to be sure : does that mean instruction-bus and data-bus?

09:49 <zyp> acathla, part of what you're seeing is that when the dbus wants to access CSR, it has to wait because the interconnect is busy by the ibus accessing code

09:49 <zyp> yes

09:50 <acathla> Can't we make the SPRAM accessible only by the CPU so he doesn't have to wait?

09:50 <zyp> a crossbar is a larger interconnect where each master has its own address decoder and each slave has its own arbiter so that masters only have to wait if multiple are trying to access the same slave at a time

09:51 <zyp> so it gets a lot bigger than a shared interconnect

09:52 <zyp> and a reduced crossbar is an optimization where each slave can only be used by the masters that actually need it

09:53 <acathla> Ok.

09:54 <zyp> the dbus generally need to be able to access everything, because it might need to fetch data that's embedded in the code

09:54 <acathla> For dbus, I understand, but ibus

09:54 <zyp> but in your case, the ibus probably doesn't need to access anything other than spram

09:55 <acathla> I agree (and understand, yay!)

09:56 <zyp> i.e. you could put a decoder on the dbus to let it access everything, and then an arbiter only in front of the spram, to let it be accessed either by the ibus directly, or by the dbus decoder

09:57 <zyp> that way there would only be slowdowns when the dbus needs to access the spram

09:58 <acathla> I understand but not sure I can do that. It already took me hours to try to understand how the UART works

10:09 <acathla> I guess we need to modify the CPU itself as it has only a wishbone interface (well, reconfigure it as it is generated)

10:10 <acathla> or the wishbone interface is okay

10:10 <_florent_> if you are using VexRiscv, it already has separate ibus/dbus

10:18 lkcl has quit [Ping timeout: 264 seconds]

10:31 lkcl has joined #litex

10:44 Zguig has joined #litex

10:55 <_florent_> acathla: here is a quick test to add a direct connection between the ibus of VexRiscv and a peripheral (here the ROM):

10:55 <_florent_> https://www.irccloud.com/pastebin/ISAUB91o/

10:55 <tpb> Title: Snippet | IRCCloud (at www.irccloud.com)

10:56 <_florent_> you could also extend it to have access to others peripherals

10:57 <_florent_> this removes the ibus/dbus bottleneck you were seeing with vexriscv minimal:

10:58 <_florent_> https://usercontent.irccloud-cdn.com/file/IYbCEht9/Screenshot%20from%202021-01-28%2011-56-52.png

10:59 <leons> _florent_: do you happen to accept Git patches to LiteX via email or prefer GitHub PRs?

11:06 <acathla> _florent_, thank you. First time I see pop or Arbiter...

11:07 <acathla> Can I simply replace rom with spram?

11:08 Zguig has quit [Quit: Ping timeout (120 seconds)]

11:09 <zyp> rom is implemented as spram, I'd guess

11:16 <acathla> It's in SPI flash by default

11:16 <acathla> but the code is moved to ram at start

11:25 <acathla> _florent_, how could it work since the ibus is not connected anymore to the ram?

11:25 <acathla> oh, probably no instruction goes into ram in the sim

11:31 <acathla> why do you put in masters (for the Arbiter) the sram itself?

11:36 <zyp> acathla, rom starts out being connected to the shared interconnect, line 37 in the patch disconnects it and reconnects it with the new Arbiter in between

11:52 <zyp> acathla, https://bit.ly/2NF6I0R <- here's a quick illustration

11:52 <tpb> Title: Graphviz Online (at bit.ly)

12:08 <futarisIRCcloud> https://youtu.be/3ZBAZ5QoCAk

12:18 kgugala_ has quit [Quit: -a- Connection Timed Out]

12:18 kgugala has joined #litex

12:20 <_florent_> acathla: yes you can also add the spram to the arbiter

12:21 <keesj> nice video

12:21 <_florent_> leons: I have a preference for PRs but also accept patches via email

12:24 <leons> _florent_: That's great to hear! I've already started a PR for this one, but might resort to patches in the future since they tend to better integrate with my workflow and I don't have to use GitHub then :)

12:55 FFY00 has quit [Ping timeout: 240 seconds]

13:12 Bertl_zZ is now known as Bertl

13:30 <acathla> zyp, thank you for the illustration.

13:31 <acathla> I added two Arbiter as it seems logic like this, one for the ROM(spiflash) and one for the spram, but it does not seem to work.

13:46 lkcl has quit [Ping timeout: 256 seconds]

13:47 lkcl has joined #litex

13:49 <_florent_> acathla: ah sorry, the Arbiter will not work with two slaves, you'll need to use the InterconnectShared, 2s

13:51 <_florent_> something like this should work:

13:51 <_florent_> https://www.irccloud.com/pastebin/UCCQdBql/

13:51 <tpb> Title: Snippet | IRCCloud (at www.irccloud.com)

13:52 <zyp> seems a bit roundabout to have two slave ports on the main interconnect hook up to two master ports on the second interconnect :)

14:07 <acathla> _florent_, it boots!

14:11 <_florent_> zyp: indeed, that's just a quick workaround until we could support it natively :)

14:16 <acathla> And it finally works! The code was able to fill the UART FIFO so I could send a full frame instead of separated bytes !

14:16 <acathla> thank you _florent_ & zyp

14:17 <_florent_> great

14:18 <futarisIRCcloud> https://bostonarch.github.io/2021/

14:18 <tpb> Title: BARC 2021 (at bostonarch.github.io)

14:39 Zguig has joined #litex

14:42 <Zguig> Hi _florent_, just did a commit related to Linux-vexrisc/ECPIX-5 Board and saw that now there is a L2 parameter defined to 2048. This is making the boot and board much slower at boot. Did some tests with and without this paremeters: 4 secs without it and here are the numbers: 72 secs until random: dd message with VS 17 secs without this parameter.

14:42 <Zguig> Is it something normal and expected?

14:46 <_florent_> Zguig: I added this to fix issues with the OrangeCrab and boards that don't support DMs

14:46 <_florent_> this should impact performance a bit, but not that much

14:46 <_florent_> I'm going to do a test

14:51 <Zguig> I thought I had broken everything first, but after being patient it boots until the end. Did a test with same code and only commenting the parameter for the board and everything is much faster

15:30 SpaceCoaster_ has joined #litex

15:30 SpaceCoaster has quit [Ping timeout: 272 seconds]

15:51 <_florent_> Zguig: I just reverted the ECPIX5 to use direct LiteDRAM interface, with the L2 cache I get [ 23.603248] random, with the direct LiteDRAM interface: [ 8.715306] random

16:10 FFY00 has joined #litex

16:36 rohitksingh has quit [Read error: Connection reset by peer]

16:37 rohitksingh has joined #litex

16:37 alanvgreen has quit [Read error: Connection reset by peer]

16:37 alanvgreen has joined #litex

16:53 Zguig has quit [Ping timeout: 248 seconds]

17:07 kgugala has quit [Ping timeout: 256 seconds]

17:07 kgugala has joined #litex

17:57 <somlo> _florent_: commit 2287f739 is very interesting, am I really able to read/write CSRs over jtag while Linux is running on my rocket/litex rig?

17:58 <somlo> and to be precise, the "address" is realative to the soc bus start, so basically an offset as far as the cpu's view of a full MMIO address would be

18:15 FFY00 has quit [Read error: Connection reset by peer]

18:18 FFY00 has joined #litex

18:21 <somlo> _florent_: so I tried using my jtag_bone enabled rocket SoC (on nexys4ddr), with the litex_server (same setup that works with litescope_cli)

18:21 <somlo> and whatever register I'm reading always returns `0xc3bfc3bf`

18:22 <somlo> whether I write anything (else) to it beforehand or not :)

18:39 ranzbak has quit [Ping timeout: 272 seconds]

19:01 <_florent_> somlo: litex_cli --read/--write just provide a simple way to do accesses to the bus of the SoC (with no address translation)

19:02 <_florent_> somlo: if litescope_cli works, this should also works

19:02 <_florent_> with the csr.csv of the SoC in the same directory, you can try litex_cli --regs, this will make a dump of all the available registers

19:05 <somlo> aha, got it, the address *is* absolute, as shown by `--regs` output

19:08 <somlo> but weirdly, if I write `litex_client --write 0x12000004 0x12345678` then `litex_client --read 0x12000004` will return what I wrote; but if I write something else, e.g. 0x123456ff, I read back 0x123456c3

19:08 <somlo> there's some bit masking going on at least, if not something worse...

19:09 <acathla> somlo, you must write in a register where you can write, or in RAM where a program is not also writing

19:10 <somlo> I'm writing to the scratch register, which the linux driver only accesses once during boot, then leaves alone from that point on

19:11 <somlo> 0x12000004 is the scratch CSR on my litex/rocket SoC

19:12 <acathla> Ok. You can try to write to RAM, in the middle.

19:13 ranzbak has joined #litex

19:13 <somlo> not sure how reading ram would work on rocket (ram is connected to a dedicated point-to-point axi interface on the rocket chip, not shared on the same bus where the CSRs are located). So I wanted to start with baby steps -- the scratch CSR, like in the commit log example :)

19:15 <acathla> Why do people use AXI?

19:16 Bertl is now known as Bertl_oO

19:19 <somlo> acathla: I use it because Rocket exposes it as its interface with the outside world :)

19:19 <somlo> the actual pro / con between axi and wishbone is a whole different topic :)

19:33 <somlo> anyway, I prevented the thing from booting into linux, got it at the bios prompt

19:34 <somlo> writing 0xffff to the scratch register reads back 0xc3bf, but it's not a straightforward bit mask that somehow luckily avoids affecting 0x12345678, but something a bit weirder than that...

19:35 <somlo> _florent_: not sure it's only specific to rocket, haven't tried a different cpu / memory map / bus/memory interconnect scheme

20:18 <_florent_> somlo: strange, this would need to be investigated... A good use case for Litescope now that you are familiar with it :)

20:18 <_florent_> I could look at it tomorrow otherwise

20:42 <somlo> _florent_: not sure how litescope would help, but I tried `mem_read` and `mem_write` from the litex bios prompt

20:43 <somlo> initially, read: "0x12000004 78 56 34 12" (from 0x12000004)

20:44 <somlo> wrote 0x0000ffff, read back "0x12000004 ff ff 00 00" -- so it seems to work fine. It's just through the client via server and jtag interface that it turns out weird

20:47 <somlo> _florent_: writing via litex_client -> server -> jtag, I can `mem_read` the right values from the bios prompt, so writes work via jtag

20:47 <somlo> reading back (via jtag) is what's getting messed up

20:48 <_florent_> somlo: ok, could you try lowering adapter_khz in the openocd config file (in prog/openocd_xc7_ftxy.cfg)

20:49 <somlo> it's 2500 now, what's a good test value?

20:49 <somlo> 15000?

20:49 <somlo> ha

20:51 <somlo> nah, for a second I thought I'd gotten it, but nope, writes work, reads are erratic and mostly non-sensical, even with as low as 5000

20:55 <somlo> so I tried the default 25000, then 15000, then 5000, then 500 - and same result, writing works (can confirm by `mem_read` on bios prompt; reading back is all over the place, now I'm getting 0xc3bfc39e (there's that c3bf pattern again)

21:03 <_florent_> ok, I started really using the jtagbone today but haven't seen this behaviour, I'll continue more testing tomorrow

23:14 feldim2425_ has joined #litex

23:14 feldim2425 has quit [Read error: Connection reset by peer]

23:15 feldim2425_ is now known as feldim2425

23:21 Claude has quit [Disconnected by services]

23:31 vup has quit [Remote host closed the connection]

23:32 vup has joined #litex

23:36 y2kbugger has quit [Ping timeout: 268 seconds]

23:36 esden has quit [Ping timeout: 272 seconds]

23:37 y2kbugger has joined #litex

23:37 esden has joined #litex

23:37 tannewt has quit [Ping timeout: 264 seconds]

23:39 tannewt has joined #litex