st-gourichon-fid has quit [Ping timeout: 256 seconds]
<gregdavill>
_florent_: Thanks for looking into that so quickly! I've pulled your litedram changes and altered the CSR module in my design with the external reset signal. It's looking good here.
<gregdavill>
An interesting observation, I'm getting different bitslip results if I load my design via JTAG, compared to if it's loaded from FLASH.
Degi has quit [Ping timeout: 260 seconds]
Degi has joined #litex
jaseg has quit [Ping timeout: 272 seconds]
jaseg has joined #litex
guan has quit [Read error: Connection reset by peer]
bubble_buster has quit [Ping timeout: 240 seconds]
mithro has quit [Ping timeout: 272 seconds]
levi has quit [Read error: Connection reset by peer]
mithro has joined #litex
bubble_buster has joined #litex
guan has joined #litex
levi has joined #litex
m4ssi has joined #litex
kgugala_ has joined #litex
kgugala has quit [Ping timeout: 265 seconds]
m4ssi has quit [Remote host closed the connection]
st-gouri- has joined #litex
st-gourichon-f has quit [Ping timeout: 264 seconds]
kgugala has joined #litex
kgugala_ has quit [Ping timeout: 260 seconds]
gregdavill has quit [Ping timeout: 264 seconds]
leons has quit [Quit: killed]
CarlFK[m] has quit [Quit: killed]
disasm[m] has quit [Quit: killed]
sajattack[m] has quit [Quit: killed]
nrossi has quit [Quit: killed]
david-sawatzke[m has quit [Quit: killed]
john_k[m] has quit [Quit: killed]
xobs has quit [Quit: killed]
david-sawatzke[m has joined #litex
gregdavill has joined #litex
xobs has joined #litex
disasm[m] has joined #litex
john_k[m] has joined #litex
CarlFK[m] has joined #litex
nrossi has joined #litex
sajattack[m] has joined #litex
leons has joined #litex
kgugala_ has joined #litex
scanakci has quit [Quit: Connection closed for inactivity]
kgugala_ has quit [Read error: Connection reset by peer]
kgugala has quit [Ping timeout: 258 seconds]
kgugala has joined #litex
<_florent_>
gregdavill: indeed i also get different bitstlip results when loading multiple time via JTAG, that's the next things to investigate :)
<_florent_>
things/thing
<somlo>
_florent_, gregdavill: memtest on rocket+litex on the trellisboard (ecp5-85k) used to fail 30-50% of the time, depending on the week. After yesterday's litedram update, it's down to 10% :) I used to think it had maybe something to do with the board and chips warming up after a while (when the error rate would decrease)...
<somlo>
not sure I'm helping, but figured I'd throw in an extra data point, fwiw...
<somlo>
never made it to actualy tinkering with the litedram settings in a systematic way, so thanks for doing that!
Skip has joined #litex
gregdavill has quit [Ping timeout: 240 seconds]
<_florent_>
somlo: thanks for the feedback
CarlFK has quit [Ping timeout: 240 seconds]
CarlFK has joined #litex
<st-gouri->
Hi! We are sending bulk (around 150kbytes total) data to a wishbone target, and it's very very slow, around 50 bytes per second. Is there some documentation about overhead and what we could do?
<lf>
st-gouri-: i am new to this but could you give some extra info like: is that over a bridge, how many master are on the bus.
<st-gouri->
lf, sure, thanks.
<st-gouri->
To be clearer, PC runs wishbone client code in python, connected to a litex_server, that litex_server sends data through a 3-wire UART to the design. so far so good?
<st-gouri->
The bulk data is used to drive a second UART that sends the bulk data to some other device.
<st-gouri->
Currently, we send bytes one by one and it's not clear if that is the actual performance killer.
<st-gouri->
We have understood that an event register is necessary to read data back to our client. Writing 2 to the UART_EV_PENDING registers signals that we have read the byte from UART_RXTX, so that the design make the next byte received by that UART available at the register.
<zyp>
AFAIK the uart protocol does 32-bit transfers, so if you're only using 8 of them, that's a 4x overhead in itself
<lf>
but still you should be able to get nearly 1000 request per sec
<st-gouri->
lf, interesting.
<zyp>
what baudrate does the bridge run at?
<st-gouri->
Let me check.
<st-gouri->
zyp, bridge at 115200 bauds.
<st-gouri->
Any idea about the overhead of a request?
<zyp>
are you doing interleaved reads and writes as well?
<zyp>
if so, you're also subject to the latency of any usb-uart involved
<zyp>
I've found that one of my usb-uart adapters takes like 10x longer to transfer a litescope buffer than another, because it's using a buffering strategy optimized for throughput rather than latency, or something like that
<_florent_>
st-gouri-: i also think it could be related to interleaved reads/writes as zyp suggests
<_florent_>
st-gouri-: the UART bridge is not very fast, but it shouldn't be that slow...
<st-gouri->
How many bytes are sent by litex_server for each request?
<zyp>
I'm guessing 8
<zyp>
four for the address and four four the data
<st-gouri->
Sounds reasonable.
<lf>
cmd, length, 4x addr, 4x data i think for write
<st-gouri->
Regarding interleaved read and write, in spirit yes, let me check.
<zyp>
ah, yeah, of course there has to be a cmd as well
<zyp>
yeah, lf is correct, here's a usb-capture I did of wishbone-bridge traffic some weeks ago
<st-gouri->
lf, both from litex. Not much activity inside the design at this point in time.
<lf>
st-gouri-: so only one master on the bus?
<st-gouri->
zyp, interesting.
<st-gouri->
Mhmhm, most probably only one active master on the bus, the uart fed by the litex_server.
<zyp>
st-gouri-, what is your goal for what you're doing?
<st-gouri->
Our design is a kind of I/O chip for an external (separate chip) CPU.
<zyp>
and you're planning to use uart for the interface towards the CPU?
<st-gouri->
The UART that we drive from the PC, along with some GPIO pins, is now capable of sending the correct sequence to boot the CPU (microcontroller). But it's slow.
<st-gouri->
Booting the CPU involves to GPIO and an UART.
<st-gouri->
Booting the CPU involves two (2) GPIO and an (1) UART.
<zyp>
and what will drive this?
<st-gouri->
zyp, not sure about what your asking.
<lf>
ok if it looks like this "INFO:SoCBusHandler:Interconnect: InterconnectShared (1 <-> 2)." i can cross of rouge bus master from my list. and go to python usb.
<zyp>
st-gouri-, right now you're using a PC to talk to the wishbone bridge
<st-gouri->
After the CPU is booted, it will drive the I/O chip by targetting the wishbone bus. Probably through SPI.
<st-gouri->
In the field, the design will manage to setup the CPU by itself. No PC will be needed.
<zyp>
so if you won't use the uart bridge in the field, I guess the performance of it is kinda moot
<st-gouri->
One simple solution in the field is to have the design (in a FPGA) boot a softcore that runs some code to target wishbon. Same actions, lower overhead.
<st-gouri->
zyp, not so moot, because currently, to boot the CPU we need to send 160k of data and that takes 41 minutes. That is a problem.
<zyp>
if you want a faster bridge, I guess you could try valentyusb
<st-gouri->
In another setup, the PC directly driving the CPU with an UART, we get 10kbytes/s, not 50 bytes/sec.
<st-gouri->
zyp, at this point I'm trying to understand the actual source of slowliness. (I have a simple external log analyzer at my disposal, too.)
<lf>
load a bitstream to connect the pins boot cpu load new bitstream?
<lf>
you are pulling the status register to see it it finnisched?
<_florent_>
st-gouri-: are you able to provide a minimal design to reproduce the issue?
<_florent_>
i could look at this
<st-gouri->
lf, planned in the field, FPGA gets the design from flash, which begets a small softcore CPU. That CPU boots from same flash, runs software. That software targets through wishbone GPIO and UART. These boot the main processor. Then the main processor (fast) can do whatever, targetting wishbone through SPI.
<_florent_>
the bridge supports bursts, but not sure the current software makes use of it
<st-gouri->
lf, which status register? EV_PENDING ?
<st-gouri->
_florent_, bursts to same address?
<_florent_>
st-gouri-: no this is incrementing
<_florent_>
st-gouri-: but we could eventually add a different comment for non-incrementing writes
<st-gouri->
If the UART has, say, 16 bytes buffer, and we send by burst of 16 bytes, then we might have only 60% overhead instead of 1000%.
<st-gouri->
But I suspect there is something else.
<_florent_>
st-gouri-: yes i also suspect there is something else
<_florent_>
can you share the part of the python code on the host that is doing the upload?
<st-gouri->
Yes.
<st-gouri->
Will be all open-source eventually, but is not yet. Will pastebin parts.
<lf>
and can you hookup a logic analyser to both uarts? i maybe the USB-UART adapter is realy the slow with short messages
<st-gouri->
One question about EV_PENDING. Is it set after any byte sent? Or when send buffer becomes empty?
<st-gouri->
That's very nice for a local ISR. This allows nice pipelining to use the full UART bandwidth. Cool.
<_florent_>
st-gouri-: could you do a quick test to see if the issue is related to the interleaved reads/writes: remove the while (self.isTxFull()) and use a times.sleep() after self._wb.regs.uart_rxtx.write(byte)
<st-gouri->
MMh, I have a strange unrelated error "index out of range", have to check that.
<st-gouri->
pycharm does not let me know exactly where the exception is raised.
FFY00 has quit [Ping timeout: 260 seconds]
<st-gouri->
Ah, some progress.
FFY00 has joined #litex
FFY00 has quit [Remote host closed the connection]
FFY00 has joined #litex
<st-gouri->
_florent_, I simply disabled the call to isTxFull(), and get speed up to 973 bytes/sec.
<st-gouri->
So, the problem is most probably the interleaved read/writes.
<st-gouri->
973 bytes/sec is much more inline with what is expected from a 10-bytes long packet sending one payload byte.
<st-gouri->
The transfer was successful and took 2 minutes 25 seconds instead of 41 minutes when interleaving reads and writes.
<st-gouri->
I know it's successful because the CPU has definitely booted and runs our code.
<lf>
ya i would say zyp is right with the slow uart adapter. it probebly thakes some time for it to flush its buffer.
<st-gouri->
Is the problem in the design of the UART, litex-level?
<st-gouri->
Of in a third-party USB-UART bridge?
<lf>
not sure. but i do know that some ppl curse about uart adapperts buffering small transfers for long times.
<st-gouri->
In this experiment, the UART adapter is soldered in the TinyFPGA BX board being used.
<st-gouri->
Ah, no wrong.
<st-gouri->
The UART advertises FTDI TTL232R-3V3 idVendor=0403, idProduct=6001, bcdDevice= 6.00
<st-gouri->
Do you expect we might have better performance with another one?
<lf>
ya i never had that problem so i don't know.
<lf>
and if it where a problem with that chip i don't think they would use it
<st-gouri->
Was wrong when mentioning TinyFPGA Bx. It's not that one. It's a separate independent cable, with the device details I provided.
<st-gouri->
So, now I'm sending data as fast as I can without checking the TX-Full register, and it definitely can't be full, because it has n times more time that it needs, where n in the side of the etherbone packet!
<st-gouri->
s/side/size/
<lf>
ok the ftdi should flush its buffer every 16ms
<lf>
mmh that are like 60Hz or you know ~50 byte over the uart
<st-gouri->
"For application programmers it must be stressed that data should be sent or received using buffersand not individual characters." ... well, this protocol kind of needs interleaving read and writes.
<st-gouri->
"3.2Adjusting the Receive Buffer Latency Timer" -> howdo you know how to to that on Linux? From Python?
<lf>
ya the problem is the read. as the respons from the brige gets stuck in the buffer
<tpb>
Title: FTDI Linux USB latency - Granite Devices Knowledge Wiki (at granitedevices.com)
<st-gouri->
Thanks lf for those relevant links!
<st-gouri->
Setting latency_timer to 1 I get 218b/s instead of 50b/s.
<st-gouri->
This shows that the latency timer is indeed involved in the delay.
<lf>
well that is that mystery solved. but how to solve the problem
<st-gouri->
Let n=10 the size of a Etherbone packet writing one byte to the UART.
<st-gouri->
As long as the baudrate PC side is not higher than n times the baudrate on the other UART, we're safe.
<lf>
lol yes
<st-gouri->
And... guess what the next step will be to gain some performance. ;-)
<st-gouri->
More seriously, what we have done today is very good. We understood the reason for such slow performance, got a fix, proved that ugly as it looks it is actually safe.
<lf>
do i even dare guessing
<st-gouri->
Currently, the litex_server runs at 115200. One solution is to pump it up to 10 times that speed and call it a day.
<lf>
st-gouri-: can you just bypass all the logic and switch the bypass of after you are done. but sure it its only for develepment its probebly the best solution
<st-gouri->
Bypass all which logic?
<lf>
my writing bad
<lf>
just connect uart_a to uart_b this comb logic. like a mux on the tx pin
<lf>
with
FFY00 has quit [Read error: Connection reset by peer]
FFY00 has joined #litex
FFY00 has quit [Remote host closed the connection]
FFY00 has joined #litex
<st-gouri->
Mfmfmf. We would lose the multiplexing property of the wishbone bridge. Would need to kill litex_server to free the UART. Then would need some way to revert. Once booted the CPU can do that. That could actually work.
<st-gouri->
If we can beef up the first UART to 1152000 we have the same benefits and no downside.
<lf>
ture
<st-gouri->
Still, it's interesting.
<st-gouri->
Many ideas flying. Even if we don't implement most, it's still interesting.
FFY00 has quit [Read error: Connection reset by peer]
FFY00 has joined #litex
m4ssi has joined #litex
<zyp>
sorry, I had to go put the kid to bed :)
<zyp>
the 16ms buffer flush that lf is quoting sounds like the same I ran into
<zyp>
one of the adapters I used was based on FT232X
<zyp>
and a different adapter (based on a stm32 running a usb-uart firmware) got 10x the throughput at the same baudrate
<zyp>
before I left I proposed looking into running the valentyusb bridge, i.e. usb directly to the fpga instead of uart
<lf>
zyp: we could add an option to send an event charater at the end of a read. but you would still need to configure that in the ftdi chip
<zyp>
I haven't tested the performance of that myself, but it should eliminate the buffer latency completely
<zyp>
the valentyusb bridge stuff is based on control requests, which usb should be able to send a couple thousand of per second, so if I'm guessing, you might see a kilobyte or more per second from that
<zyp>
bottleneck is going to be how fast the host side is passing requests and replies between userland and the usb controller
<lf>
ya
<st-gouri->
reading
<st-gouri->
Thanks for the hints.
<lf>
i would need to look into litescop but maybe there is a way to change the reading behavier to get less overhead when reading or optimies it more for this buffering behevier
<st-gouri->
TinyFPGA BX and Fomu use valentyUSB for their bootloader, IIRC.
<zyp>
st-gouri-, yes
<zyp>
lf, the problem with the wishbone bridge is that currently the only way to get flow control is to poll a register in between reads or writes
<zyp>
although litescope shouldn't need that when dumping the buffer, so the protocol could be extended to add a «read address A N times»
<zyp>
as far as I can see, the length argument is not considered for reads currently, only writes
<zyp>
and no incrementing reads and writes are necessary for dealing with fifo registers
<lf>
but does that help? it will just read the address as fast as the bus can. it think useing the event char or the flow controle lines of the ftdi chip to force a buffer flush. would be more helpfull.
<lf>
or useing the usb bridge
<zyp>
help what? it'd help for litescope
<zyp>
the slow part of using litescope is dumping the capture buffer after the capture is finished
<lf>
ah i have not read how litescope transfers date. but if that is behinde one address then yes that should give big bust
<zyp>
yeah, litescope puts everything behind a set of CSRs
<zyp>
one of them pops from a fifo
<zyp>
actually, it's not even doing flow control, it's just the constant back and forth between «READ 1 from ADDR» «DATA» with latency in between
<lf>
ya you could just not wait for the response and send the next request
<zyp>
that is true
<zyp>
but I wonder if that would risk overflowing the uart on the fpga
<zyp>
although as long as it has deep enough buffers it should be fine
<lf>
i dont think UARTBone has a buffer. only the UART for CSR has fifos.
<lf>
ah there is a "FT245 Asynchronous FIFO mode" but i think that uses extra pins
<zyp>
but we're discussing software fixes now :)
<lf>
ya
<zyp>
hardware solutions are not very useful if you already have a hardware design you don't wan't to modify
<st-gouri->
will go afk
<lf>
but i think you are right that would overwelm the uart.
<zyp>
yeah, it'd need enough buffering that it could start receiving the second read while replying to the first
<lf>
you could set the event char to zero because reading a csr register will always return 3 zero bytes
<zyp>
not for 32-bit CSRs
<lf>
arg right
<lf>
we need our own uart adapter that is uartbone aware
<st-gouri->
Wht
<st-gouri->
sory
<st-gouri->
sorry
<zyp>
lf, just have the rx buffer flush at four bytes :)
<st-gouri->
A modified UARTbone could be fed 4-bytes at a time... but how to tell it there's only 3 bytes.
<lf>
ya or let it handel read write and you just give it a list of addresses you need
<st-gouri->
The fourth byte would be a status to tell if there's one, two or three bytes in the packet?
<st-gouri->
Overhead divided by 3.
<st-gouri->
Not only when driven through litex_server. Also locally.
<st-gouri->
Unused bits of the fourth byte could be UART lines.
<st-gouri->
But I digress.
<lf>
ya for now the bigest deleay is the 1-16ms buffer delay of the ftdi. and without changeing the bitstream. the only way to make that fast is to put litex server an the adapter
<lf>
mmh pi zero?
<st-gouri->
Seeya.
<lf>
bye
<lf>
i think when i get that problem i will just try some cortex-a with linux and run litex-server on that with its nativ uart.
Skip has quit [Remote host closed the connection]
<lf>
n8
daveshah has quit [Read error: Connection reset by peer]
daveshah has joined #litex
st-gouri- has quit [Ping timeout: 240 seconds]
m4ssi has quit [Remote host closed the connection]