<shorne>
I noticed the performance on mor1kx arty is slow ~1.6mb/sec
<shorne>
I notice for the dma transfer we use: sg_copy_to_buffer(), which will slow things down, the alternative is to use dma channel's
<shorne>
I haven't implemented this before but I will look into to see how it will help, just want to mention it to get any feedback
<shorne>
I am not sure if there is a "real" dma engine with multiple programmable channels that can handle scatter gather
<_florent_>
Hi
<_florent_>
nickoe: would you mind creating an issue in litex with your notes/questions? this will be easier to answer
<_florent_>
shorne: we don't currently have scatter gather for the SDCard's DMA
<_florent_>
but if you think this is useful, I could have a look. It seems that currently the SDCard is a lot slower in Linux than with the BIOS, we should probably try to understand that first since the limitations seems to come from the software.
Bertl_zZ is now known as Bertl
lkcl has quit [Ping timeout: 264 seconds]
lkcl has joined #litex
hansfbaier has quit [Read error: Connection reset by peer]
<somlo>
_florent_, shorne: after having stared at the Linux litesdcard driver for a loooong time, I can't find anything it's doing significantly different from what the LiteX bios does in terms of accessing the card, other than maybe the timing of the requests (although even disabling interrupts on a single-core soc around litex_request() doesn't seem to make a difference)
<somlo>
likely I'm missing something, but the litesdcard FSMs are timing out left and right when driven by the Linux driver, and seem happy and healthy when driven by the bios...
mikeK_de1soc has joined #litex
FFY00 has joined #litex
<somlo>
_florent_, shorne: https://github.com/enjoy-digital/litex/pull/820 should ensure that 1. we never timeout during sdcardboot (even on weird CPU configurations such as Rocket :) and 2. don't run into command timeouts with the LiteSDCard FSMs in single-block (cmd17-only) mode on the Linux driver
<somlo>
This should help stabilize the current linux driver for now at no extra penalty for any other sdcard use (larger timeout values don't actually slow down the sdcard FSMs when things work *well*, they just avoid unnecessary errors and retries in linux)
<somlo>
I'm still trying to get to the bottom of why enabling cmd18 multi-block breaks horribly even with the larger timeouts
<somlo>
and will open a separate issue once I have my ducks in a row (have to remember everything I tried and write down a coherent report)
<_florent_>
thanks @somlo, the PR looks fine, I'm going to merge it. For the issue with the Linux driver, I could do some capture of the SDCard signals with an external logic analyzer to try to understand the difference between the BIOS and Linux driver.
<somlo>
_florent_: thanks, that could be helpful! I've been studying the migen sources (and doing remedial learning on streams, as you are aware)
<somlo>
I'm slow almost by definition -- anything I need to touch I need to learn from scratch, first :)
<_florent_>
That's a good approach that I also generally try to apply when time allows it :) (and your feedback while learning is very valuable).
<_florent_>
somlo: when in Linux, can you tell me how you reproduce the timeouts easily? I could try on Arty with Linux-on-LiteX-VexRiscv
<somlo>
I tried to document my understanding of how the mmc subsystem figures out everything else around that setting in the surrounding comments
<somlo>
scatter/gather is off by default (as it should be, for now)
futarisIRCcloud has joined #litex
<tmbinc>
_florent_: for programming the sds1104xe.bit, is setting the scope's jumper to "JTAG", then using openocd's "zynqpl_program"/"pld load" the right approach? (I've always used Xilinx tools before on Zynq)
<mikeK_de1soc>
_florent_: I was just wondering, Is the LiteVideo Currently working? I would like to use it for te DE1-SoC board. Thanks!
<shorne>
_florent_: I think the reason the sdcard is slower in linux is because of sg_copy_to_buffer(), I can't easily prove it now, I'llhave to think thow to profile the code in the driver.
<shorne>
in linux the driver gets a bunch of blocks in the scatter gather list, it then has to, in software, copy those into a dma buffer (with sg_copy_to_buffer) before sending to the sdcard (similar for reading).
<shorne>
that extra copy slows it down. is my assumption. With a dma engine the hardware could handle the sg lists directly via queuing multiple small dma transactions.
<shorne>
Thats what I gather from my investigatiion last mogjt reading the kernel code and some dma controller specs
<_florent_>
tmbinc: if you generated the bistream with the target from litex-boards, you can also use --load (it uses VivadoProgrammer)
<_florent_>
tmbinc: I could provide you some test bistreams if you want to check your hardware (tomorrow)
<_florent_>
mithro: thanks for the pythondata repos, I'll have a closer look tomorrow and do the integration
<_florent_>
shorne: thanks interesting, do you know what's the general queue depth of usual scatter gather for similar cases? Implementing it should not be too complicated if you think this can be useful (just adding a FIFO for the queue)
<shorne>
_florent_: I am trying to figure that out too at the moment, I know they talk about having 32/64 channels, but I think that is for each device connected, the queue depth is separate
<mikeK_de1soc>
_florent_: Do you have an example of how to implement the liteVideo submodule?
<tmbinc>
_florent_: thanks - I've (now) noticed --load, and I'll check what Vivado does differently than my openocd setup. In neither case I could get UDP traffic working, maybe I should route the serial port to some physical pins first
<tmbinc>
(I didn't use --load but used hardware manager manually since I need a Xilinx Virtual Cable setup)
<shorne>
_florent_: it looks like some dma engines the command queue is even 1, and the queue is maintained per channel in the kernel side (readin r-car dma controller driver).
<somlo>
which is before any data transfer comes into play
<shorne>
I will read more, maybe I can try programming a 1-channel dma engine for sdcard
<shorne>
I see
<shorne>
I didn't notice the timeouts
<shorne>
I will see if that is happening for me too
<shorne>
Got to go
<somlo>
so while I agree SG will probably make it faster in the end, I'd like to understand why simply increasing the command, data, (and sure, DMA) timeouts in the linux driver to as huge as we can *still* won't make the errors go away :)
<tmbinc>
_florent_: I'll port over the stuff I had (will probably look rough :) regarding the DAC analog MUX and the other components to configure the frontend. I don't remember exactly but I remember we've been using PS SPI for PLL setup, I need to check if this can be done from PL side as well
<tmbinc>
_florent_: LCD was done by G33KatWork or q3k, I need to check
<tmbinc>
(it already worked when I started on it)
<_florent_>
tmbinc: ok good you got it working. So now basically you'll be able to control the SoC over UDP with litex_server
<tmbinc>
yep - very nice! (Never worked on an ethernet-enabled litex device, so this is more friendly than I imagined :)
<_florent_>
just need to start the server with litex_server --udp and then execute your scripts to control your DAC registers