futarisIRCcloud has quit [Quit: Connection closed for inactivity]
CarlFK has joined #litex
<shuffle2>
would it make sense to use dsp blocks for ip checksum on ecp5? clarity has an "adder_tree" which results in ALU54Bs+MULT18X18Ds..i dont see perf/operation really described anywhere tho
<shuffle2>
alternatively i can just hack it up as most of header is static in my case, but Feels Bad :)
Skip has quit [Remote host closed the connection]
Degi has quit [Ping timeout: 272 seconds]
Degi has joined #litex
CarlFK has quit [Ping timeout: 260 seconds]
_whitelogger has joined #litex
FFY00 has quit [Read error: Connection reset by peer]
FFY00 has joined #litex
HoloIRCUser has joined #litex
HoloIRCUser2 has quit [Ping timeout: 265 seconds]
awordnot has quit [Ping timeout: 240 seconds]
awordnot has joined #litex
CarlFK has joined #litex
HoloIRCUser1 has joined #litex
HoloIRCUser has quit [Ping timeout: 256 seconds]
futarisIRCcloud has joined #litex
<_florent_>
scanakci: nice! congrats
<_florent_>
benh: not sure we tried to optimize the memtest speed since it was fast enough with the others CPU, but that could now indeed make sense to have a closer look
HoloIRCUser has joined #litex
HoloIRCUser1 has quit [Ping timeout: 272 seconds]
<benh>
_florent_: oh I just used the lfsr that Anton wrote for microwatt test suite and it's a lot faster now
<benh>
_florent_: VexRiscV Mini takes a couple of seconds to init the Arty now. It was a lot slower before, maybe 5 to 10s ?
<benh>
_florent_: I'll clean it up and send you the patch tonight (hopefully)
<benh>
_florent_: it's as good a rng as the multiplication method when it comes to generating memory test patterns I reckon
<benh>
_florent_: sent
<_florent_>
benh: thanks!
HoloIRCUser1 has joined #litex
HoloIRCUser has quit [Ping timeout: 272 seconds]
<benh>
_florent_: for 64-bit CPUs, would it make sense to have a topology with a 64-bit WB going out with 2 slave legs, a 64-bit one to memory and a converted-to-32-bit one for all the IOs ?
<benh>
_florent_: also powerpc64 is really meant to be used on fully cache coherent systems (I could elaborate on the reasons if you want)
<benh>
_florent_: so at some point, we might need to figure out how to implement cache coherent DMA in LiteX :)
<benh>
_florent_: my initial plan was to do something like a snoop-fifo where all addresses go back to the core, and have the core 'sync instruction go out as a special signal that waits for that fifo to drain
<benh>
(well or drain anything prior to that signal being asserted)
<_florent_>
benh: we are currently working on similar things with VexRiscv SMP
<benh>
but it's just ideas so far
<sorear>
non-coherence on riscv sounds like a bad time
<benh>
ah SMP would require a real cache coherency protocol
<benh>
yeah non-coherence is a mess on CPUs with speculative loads
<sorear>
especially when there are 0 data cache instructions in your ISA
<_florent_>
VexRiscv SMP has two dedicated instruction/data ports directly connected to the DRAM with a larger data-width (128-bit for now to ease testing)
<benh>
bcs you almost always end up with weird collisions between cachable and non-cachable loads
<benh>
happens on ARM since v7, powerpc, ...
<benh>
sorear: yeah that doesn't help :-)
<benh>
_florent_: some kind of MERSI protocol ?
<_florent_>
we are also going to work on cache coherent DMA, so most of it could probably be reused by Microwatt later
<benh>
ok good
<benh>
thx
<benh>
one thing to avoid btw... I noticed it's somewhat doable currently
<benh>
is have the control reg path to the DMA engine be a separate bus from the data path where DMAs occur
<_florent_>
the implementation still has to be discussed with Charles that is doing the Vexriscv/SMP work
<benh>
it''s a recipe for interesting ordering issues
<benh>
for example that happened on some IBM Cell that used a sideband bus to control the ethernet DMA engine
<benh>
you would stop the engine via that (CSR style)
<benh>
but the writes it did to memory migght still be in various pipeline/bridge buffers and have not reached memory yet
<benh>
howveer, SW has already freed the memory and given it up to something else, it then gets corrupted by DMA data
<benh>
ie, unless the control registers are on the same data path as the DMA, or some other mechanism allows to ensure that the full DMA path to memory has been flushed, that problem will potentially exist
<benh>
on things like PCIe it's typically a non-issue despite fairly large bridge induced latencies because the control path is ordered vs the data path, so for example, reading a DMA status reg will have the read response behind all previous DMA writes to memory
<benh>
by the time the CPU gets it, all the previous DMAs have hit coherency
<benh>
with control is via some CSR bus that might not be on the same path as the DMA -> memory path, you lose that property
<_florent_>
thanks interesting, we'll have indeed to be careful on these things
<benh>
the best way is to ensure that the control path from the CPU to a device (MMIO) is ordered in some way with the DMA data path from that device to memory
<benh>
yup, it's bitten folks in the past :)
<benh>
when adapting esp. old school "simple" design to more recent CPUs, adding bridges etc...
<benh>
for example powerpc 4xx embedded has a "DCR" bus (a bit like CSRs but special core instructions)
<benh>
that's a complete sideband
<benh>
it used that to talk to the DMA engine
<benh>
that was ok when there was only simple busses and generally, no much buferring
<benh>
but that whole architecture was then ported to an IO chip for the Cell processor with 3 layers of bridging and pipelining
<benh>
and hell broke lose
<benh>
the most typical example is probably old school PCI devices (before MSIs)
<benh>
a device writes a packet to memory, then sends an interrupt
<benh>
the interrupt is a wire, so out of band, it often arrives before the DMA data has reached coherency (or a point where it's visible to the CPU)
<benh>
it could be in pipeline buffers on the bus or anything
<benh>
the CPU gets the interrupt and reads an MMIO register from the device, usually some kind of interrupt status
FFY00 has quit [Ping timeout: 244 seconds]
FFY00 has joined #litex
<benh>
what a lot of folks didn't realize is that the key purposes of that read is not only to know what happened
<benh>
but to have the response to that read be queued behind all the previous DMAs done by the device so that by the time the CPU gets it
<benh>
it will also "see" all the DMA data
<benh>
now, apologies if I haven't fully understood the LiteX design, but from what I've seen, it *seems* like your DMA engine has its own port to the memory controller
<benh>
thus is not ordered vs CSRs to device that can trigger dMA
<benh>
as long as there isn't much bufferring/pipelining it's probably fine
<benh>
but in a world of delays introduced by cache coherency protocols etc... that can quickly fall appart
<benh>
I hope I'm clear :) Otherwise let me know
FFY00 has quit [Ping timeout: 260 seconds]
FFY00 has joined #litex
<shuffle2>
LiteEthPHYHWReset is just a delay counter. does this exist for some specific reason?
<dkozel>
Mine should arrive tomorrow but I don't have any way of using it until I find an interface that will work outside of the computer
<dkozel>
the heatsink/fan is too large for my desktop's m.2 slot and I don't have room for another full size PCIe card.
<_florent_>
you also need a specific cable for the JTAG (i soldered it on mine, but will try to find/order a cable)
<dkozel>
Thus my interest in the USB 3 PIPE interface or an m.2 thunderbolt enclosure. I've been very confused about the seeming specificity of NVME support in the adapters
<shuffle2>
(i wanted to keep some other modules in reset if link is down)
<shuffle2>
is there a way to chain reset of some clock domains from another? when i click reset button i'd like eth_tx/eth_rx to reset as well as sys (it's currently tied to sys via AsyncResetSynchronizer)
<shuffle2>
eh, using ResetSignal('sys') is good enough i guess
gregdavill has quit [Ping timeout: 240 seconds]
FFY00 has quit [Remote host closed the connection]
FFY00 has joined #litex
<_florent_>
shuffle2: yes sure it would be possible to integrate it, this could be an optional module
<dkozel>
xobs: Works, perfect.
<xobs>
dkozel: Released! (Or at least pushed the tags. Give it five minutes or so.)
<dkozel>
That last commit was a real reshuffle. Thanks for all the time that must have taken.
<xobs>
It seemed the best way to do it.
<xobs>
As a result, offsets are computed for everything, including uart and gdb fields.
<dkozel>
gdb I haven't tried yet, but terminal worked.
<zyp>
xobs, is «load» in gdb supposed to work?
<zyp>
it doesn't seem to be working properly for me, but I haven't looked into why yet, I guess it might not be fully implemented so I figured I'd ask before spending time to only discover that
<xobs>
zyp: nope. it's unclear how that would work.
<zyp>
how so?
<xobs>
Well, if your program is XIP, then it would need to know how to program flash. And I gather most programs are XIP.
<xobs>
I guess if it's entirely in RAM that would work.
<xobs>
But no, it hasn't been implemented yet.
<zyp>
ah, yes, flash would require knowledge of the specific flash, I was thinking ram
<zyp>
I didn't know litex supported XIP flash, does it?
<tpb>
Title: GitHub - litex-hub/litespi: Small footprint and configurable SPI core (at github.com)
<zyp>
I've written flash programming code for a couple of various microcontrollers for black magic probe some years ago, so maybe I'll have a go at adding that at some point then
<zyp>
shouldn't be too hard to find the flash from the csr.csv
<xobs>
Yep, you probably can figure out what it is by comparing names.
<xobs>
For example, if there are three addresses next to each other called `???_bitbang`, `???_miso`, and `???_bitbang_en`, then that's probably a litex SPI block.
<xobs>
What to do when you encounter two blocks that match that is left as an exercise to the implementer :P
<zyp>
map both, obviously
<xobs>
Oh, good point. Obviously.
FFY00 has quit [Remote host closed the connection]
<zyp>
but yeah, one thing I miss in the csr.csv is a «type» field
FFY00 has joined #litex
<rvense>
been trying again to get my ice40 hx8k evb working with litex, with the stub firmware and an lm32 processor, i get nothing on the serial port, but i do see a slow binary count on the LEDs, does anyone know where that might come from?
<xobs>
rvense: address lines somehow got wired to the LEDs, and it's falling along a nop sled?
<rvense>
does it even expose its address bus by default?
<xobs>
No, but apparently 0xffffffff is `cmpne ba, ba, ba`, which could effectively be a nop.
<xobs>
You could take a look at `top.v` and trace backwards what you have mapped to the io pins.
<rvense>
good point
<xobs>
While consulting `top.pcf`to make sure they're assigned to what you think they are.
<rvense>
yeah, i'll look at that later, thanks
CarlFK has quit [Ping timeout: 260 seconds]
Skip has joined #litex
CarlFK has joined #litex
<tmbinc>
are there tools that parse VCD/FST and extract for example wishbone transfers, vexrisc PC traces and things like that? I'm running a (bit hacked up) lxsim SoC with an existing RISCV binary (for which I want to build an emulation environment)
<tmbinc>
i can manually inspect the --trace and usually figure out why it's crashing (mostly unimplemented peripherals etc.), but is there a more automated way?