kgugala has quit [Read error: Connection reset by peer]
kgugala has joined #litex
Bertl_oO is now known as Bertl_zZ
lkcl has quit [Ping timeout: 272 seconds]
kgugala has quit [Quit: -a- Connection Timed Out]
kgugala has joined #litex
lkcl has joined #litex
kgugala_ has joined #litex
kgugala has quit [Ping timeout: 264 seconds]
<_florent_>
acathla: zyp's explanations make sense to me, if you want to ease investigation on this, you could also use litex_sim with --trace. This would give you direct visibility on all signals of the SoC
<_florent_>
Also not your you have an instruction cache on your CPU, but if not adding one (even small) could help
<acathla>
_florent_, I use vexriscv-minimal from LiteX, I don't know if there is an instruction cache.
<_florent_>
acathla: Litescope is nice for things that are difficult to simulate but here I would look at this in simulation
<_florent_>
VexRiscv-minimal has the caches disabled IIRC
<acathla>
I understand but I don't well how it works yet. I don't how to add something to the simulation for example
<acathla>
unless it's available as an option
<_florent_>
acathla: litex_sim simulation is really similar to the target you are running on hardware, it's possible to customize it, add your own peripherals, etc...
<_florent_>
I can try to get you started on this, I'm going to try to reproduce the behaviour you see in simulation
<acathla>
Oh, nice :)
<acathla>
I'm not sure it's critical on the versa, as zyp said, but on the fomu it get worse and the clock is slower so we end up with a 32 microcontroller slower than everything
<_florent_>
ok, so with a minor modification to the BIOS to reproduce your behavior:
<_florent_>
where we no longer see instruction bus accesses during the while(1) puts("h") loop since instructions are in the cache
<acathla>
But the standard vex cannot fit in an iCE40up5k, right?
<_florent_>
acathla: sure that's not easy to decode when not familiar with it, but it was just to show you that the simulation could be useful for this work and to understand the performance issue and how to improve/avoid it
kgugala has joined #litex
<acathla>
Ok, thank you.
kgugala_ has quit [Ping timeout: 260 seconds]
<acathla>
Oh, standard vexriscv fits in a fomu without USB
<acathla>
Since I don't need USB but Infrared, may be it can all fit.
<acathla>
_florent_, so you jsut said that for nice optimizations I should forget about litex and go verilog...
<acathla>
=)
<_florent_>
acathla: that's not necessarily what I say :), but LiteX covers verious various cases and has to be generic enough to do so, It should provide a good basis and trade off between genericity/performance, but if you really want to push performance, the first thing is to understand the real bottleneck and do maybe some optimizations
<acathla>
That's a lot of work...
<_florent_>
what is a lot of work?
<acathla>
SPI RAM seems to be a nice idea
<acathla>
_florent_, understand every level to optimize
<acathla>
and my job is to make a robot with IR communication, from scratch, better than the kilobot.
<_florent_>
ah not necessarily every level, but at least the bottleneck for your use case
<_florent_>
If LiteX provides you 90% of what you need and you just need to plug a custom module (specific cache, SPI RAM), that can still be interesting compared to verilog, but that's sure that it relies on the same principles than others SoCs and has the same limitations
<zyp>
custom interconnect, perhaps
<_florent_>
I was sharing the Doom design on iCEBreaker since find it a good example of what you can do in a constrainted environment with things optimized as much as it can possible be
<zyp>
you'd probably want something more efficient than a plain shared interconnect, but smaller than a full crossbar
<acathla>
I'm not sure I could explain what's a shared interconnect or a full crossbar
<zyp>
_florent_, maybe a general reduced crossbar would be useful? I figure in a small system the only masters would be I-bus and D-bus and the I-bus would probably only need to access one or two of the slaves
<zyp>
acathla, the interconnect is what connect wishbone masters to wishbone slaves -- a shared interconnect can be accessed by one master at a time
<zyp>
so there's an arbiter to select which master to serve followed by an address decoder to select which slave to access
<acathla>
usually there is only one master, right? Unless we add a bridge to debug things.
<_florent_>
zyp: yes that indeed be useful
<zyp>
acathla, no, the cpu alone has two; ibus and dbus
<acathla>
to be sure : does that mean instruction-bus and data-bus?
<zyp>
acathla, part of what you're seeing is that when the dbus wants to access CSR, it has to wait because the interconnect is busy by the ibus accessing code
<zyp>
yes
<acathla>
Can't we make the SPRAM accessible only by the CPU so he doesn't have to wait?
<zyp>
a crossbar is a larger interconnect where each master has its own address decoder and each slave has its own arbiter so that masters only have to wait if multiple are trying to access the same slave at a time
<zyp>
so it gets a lot bigger than a shared interconnect
<zyp>
and a reduced crossbar is an optimization where each slave can only be used by the masters that actually need it
<acathla>
Ok.
<zyp>
the dbus generally need to be able to access everything, because it might need to fetch data that's embedded in the code
<acathla>
For dbus, I understand, but ibus
<zyp>
but in your case, the ibus probably doesn't need to access anything other than spram
<acathla>
I agree (and understand, yay!)
<zyp>
i.e. you could put a decoder on the dbus to let it access everything, and then an arbiter only in front of the spram, to let it be accessed either by the ibus directly, or by the dbus decoder
<zyp>
that way there would only be slowdowns when the dbus needs to access the spram
<acathla>
I understand but not sure I can do that. It already took me hours to try to understand how the UART works
<acathla>
I guess we need to modify the CPU itself as it has only a wishbone interface (well, reconfigure it as it is generated)
<acathla>
or the wishbone interface is okay
<_florent_>
if you are using VexRiscv, it already has separate ibus/dbus
lkcl has quit [Ping timeout: 264 seconds]
lkcl has joined #litex
Zguig has joined #litex
<_florent_>
acathla: here is a quick test to add a direct connection between the ibus of VexRiscv and a peripheral (here the ROM):
<leons>
_florent_: do you happen to accept Git patches to LiteX via email or prefer GitHub PRs?
<acathla>
_florent_, thank you. First time I see pop or Arbiter...
<acathla>
Can I simply replace rom with spram?
Zguig has quit [Quit: Ping timeout (120 seconds)]
<zyp>
rom is implemented as spram, I'd guess
<acathla>
It's in SPI flash by default
<acathla>
but the code is moved to ram at start
<acathla>
_florent_, how could it work since the ibus is not connected anymore to the ram?
<acathla>
oh, probably no instruction goes into ram in the sim
<acathla>
why do you put in masters (for the Arbiter) the sram itself?
<zyp>
acathla, rom starts out being connected to the shared interconnect, line 37 in the patch disconnects it and reconnects it with the new Arbiter in between
kgugala_ has quit [Quit: -a- Connection Timed Out]
kgugala has joined #litex
<_florent_>
acathla: yes you can also add the spram to the arbiter
<keesj>
nice video
<_florent_>
leons: I have a preference for PRs but also accept patches via email
<leons>
_florent_: That's great to hear! I've already started a PR for this one, but might resort to patches in the future since they tend to better integrate with my workflow and I don't have to use GitHub then :)
FFY00 has quit [Ping timeout: 240 seconds]
Bertl_zZ is now known as Bertl
<acathla>
zyp, thank you for the illustration.
<acathla>
I added two Arbiter as it seems logic like this, one for the ROM(spiflash) and one for the spram, but it does not seem to work.
lkcl has quit [Ping timeout: 256 seconds]
lkcl has joined #litex
<_florent_>
acathla: ah sorry, the Arbiter will not work with two slaves, you'll need to use the InterconnectShared, 2s
<Zguig>
Hi _florent_, just did a commit related to Linux-vexrisc/ECPIX-5 Board and saw that now there is a L2 parameter defined to 2048. This is making the boot and board much slower at boot. Did some tests with and without this paremeters: 4 secs without it and here are the numbers: 72 secs until random: dd message with VS 17 secs without this parameter.
<Zguig>
Is it something normal and expected?
<_florent_>
Zguig: I added this to fix issues with the OrangeCrab and boards that don't support DMs
<_florent_>
this should impact performance a bit, but not that much
<_florent_>
I'm going to do a test
<Zguig>
I thought I had broken everything first, but after being patient it boots until the end. Did a test with same code and only commenting the parameter for the board and everything is much faster
SpaceCoaster_ has joined #litex
SpaceCoaster has quit [Ping timeout: 272 seconds]
<_florent_>
Zguig: I just reverted the ECPIX5 to use direct LiteDRAM interface, with the L2 cache I get [ 23.603248] random, with the direct LiteDRAM interface: [ 8.715306] random
FFY00 has joined #litex
rohitksingh has quit [Read error: Connection reset by peer]
rohitksingh has joined #litex
alanvgreen has quit [Read error: Connection reset by peer]
alanvgreen has joined #litex
Zguig has quit [Ping timeout: 248 seconds]
kgugala has quit [Ping timeout: 256 seconds]
kgugala has joined #litex
<somlo>
_florent_: commit 2287f739 is very interesting, am I really able to read/write CSRs over jtag while Linux is running on my rocket/litex rig?
<somlo>
and to be precise, the "address" is realative to the soc bus start, so basically an offset as far as the cpu's view of a full MMIO address would be
FFY00 has quit [Read error: Connection reset by peer]
FFY00 has joined #litex
<somlo>
_florent_: so I tried using my jtag_bone enabled rocket SoC (on nexys4ddr), with the litex_server (same setup that works with litescope_cli)
<somlo>
and whatever register I'm reading always returns `0xc3bfc3bf`
<somlo>
whether I write anything (else) to it beforehand or not :)
ranzbak has quit [Ping timeout: 272 seconds]
<_florent_>
somlo: litex_cli --read/--write just provide a simple way to do accesses to the bus of the SoC (with no address translation)
<_florent_>
somlo: if litescope_cli works, this should also works
<_florent_>
with the csr.csv of the SoC in the same directory, you can try litex_cli --regs, this will make a dump of all the available registers
<somlo>
aha, got it, the address *is* absolute, as shown by `--regs` output
<somlo>
but weirdly, if I write `litex_client --write 0x12000004 0x12345678` then `litex_client --read 0x12000004` will return what I wrote; but if I write something else, e.g. 0x123456ff, I read back 0x123456c3
<somlo>
there's some bit masking going on at least, if not something worse...
<acathla>
somlo, you must write in a register where you can write, or in RAM where a program is not also writing
<somlo>
I'm writing to the scratch register, which the linux driver only accesses once during boot, then leaves alone from that point on
<somlo>
0x12000004 is the scratch CSR on my litex/rocket SoC
<acathla>
Ok. You can try to write to RAM, in the middle.
ranzbak has joined #litex
<somlo>
not sure how reading ram would work on rocket (ram is connected to a dedicated point-to-point axi interface on the rocket chip, not shared on the same bus where the CSRs are located). So I wanted to start with baby steps -- the scratch CSR, like in the commit log example :)
<acathla>
Why do people use AXI?
Bertl is now known as Bertl_oO
<somlo>
acathla: I use it because Rocket exposes it as its interface with the outside world :)
<somlo>
the actual pro / con between axi and wishbone is a whole different topic :)
<somlo>
anyway, I prevented the thing from booting into linux, got it at the bios prompt
<somlo>
writing 0xffff to the scratch register reads back 0xc3bf, but it's not a straightforward bit mask that somehow luckily avoids affecting 0x12345678, but something a bit weirder than that...
<somlo>
_florent_: not sure it's only specific to rocket, haven't tried a different cpu / memory map / bus/memory interconnect scheme
<_florent_>
somlo: strange, this would need to be investigated... A good use case for Litescope now that you are familiar with it :)
<_florent_>
I could look at it tomorrow otherwise
<somlo>
_florent_: not sure how litescope would help, but I tried `mem_read` and `mem_write` from the litex bios prompt
<somlo>
wrote 0x0000ffff, read back "0x12000004 ff ff 00 00" -- so it seems to work fine. It's just through the client via server and jtag interface that it turns out weird
<somlo>
_florent_: writing via litex_client -> server -> jtag, I can `mem_read` the right values from the bios prompt, so writes work via jtag
<somlo>
reading back (via jtag) is what's getting messed up
<_florent_>
somlo: ok, could you try lowering adapter_khz in the openocd config file (in prog/openocd_xc7_ftxy.cfg)
<somlo>
it's 2500 now, what's a good test value?
<somlo>
15000?
<somlo>
ha
<somlo>
nah, for a second I thought I'd gotten it, but nope, writes work, reads are erratic and mostly non-sensical, even with as low as 5000
<somlo>
so I tried the default 25000, then 15000, then 5000, then 500 - and same result, writing works (can confirm by `mem_read` on bios prompt; reading back is all over the place, now I'm getting 0xc3bfc39e (there's that c3bf pattern again)
<_florent_>
ok, I started really using the jtagbone today but haven't seen this behaviour, I'll continue more testing tomorrow
feldim2425_ has joined #litex
feldim2425 has quit [Read error: Connection reset by peer]