<somlo>
_florent_: somewhere between 21d38701 and 4a15c3e2 I lost the ability to build LiteX + Rocket for (any) FPGAs (trellisboard and nexys4ddr hang after displaying "__ _" (the top of 'L' and 'i' of the bootsplash logo)
<somlo>
still works in the simulator, so that's not useful to troubleshoot :)
<somlo>
first problem is we need an "from litedram.frontend.axi import LiteDRAMAXI2Native" in soc/integration/soc.py for the direct point-to-point rocket <-> litedram path
<somlo>
but that's just to compile the bitstream, which then hangs
<somlo>
tried to bisect, but I keep landing on commits that won't build for other reasons, so for now I'm stuck...
<acathla>
_florent_, perfect. Should I create a pull-request?
<daveshah>
somlo: may or may not be relevant, I had that problem when doing some VexRiscv debugging a while ago and it was interrupt related in the end
freemint has quit [Ping timeout: 248 seconds]
freemint has joined #litex
<somlo>
daveshah: interrupts is what I suspect as well -- I almost wish the simulator had been broken too, that way debugging would have been much faster :)
<somlo>
probably going to start removing anything with an IRQ that isn't also present on the simulator (probably LiteETH), and see if that influences the outcome in any way...
<somlo>
daveshah: also, with the new soc.py, rocket on ecp5_versa is back to 101% slice utilization... So I'm only able to try things on the trellisboard and the nexys4ddr (with vivado)
<_florent_>
somlo: for the functional regression, in the log, can you check that all Bus Regions are what you expect? (origin, size, mode, cached?)
<_florent_>
acathla: yes sure a PR is welcome
<_florent_>
somlo: while doing this, i indeed tested regression with Rocket only in simulation. Last time i had a similar issue, it was caused by a CSR mapping issue.
<somlo>
_florent_: thanks for the tips, I'll keep digging. Do you want the soc/integration/soc.py "from litedram.frontend.axi import LiteDRAMAXI2Native" as a PR, or would you rather just add it in directly?
<somlo>
(I'm using direct point-to-point AXI between Rocket and LiteDRAM, so I need that extra import)
<acathla>
I've got an error when programming the Versa board :Error: tdo check error at line 11
<acathla>
Error: READ = 0x2224086
<acathla>
Error: WANT = 0x41112043
<acathla>
The example of prjtrellis still works fine with this error (Hello, World, etc on the LED), but that's all. OpenOCD too old?
<_florent_>
acathla: strange, seems shifted by 1 bit
<_florent_>
somlo: i pushed the fix
<_florent_>
somlo: are you running the simulation with direct point-to-point?
<_florent_>
somlo: btw, it's now possible to simulate with any type of memory (--sdram-module) and any data width (--sdram-data-width), so even if the simulation does not default to point-to-point, you should be able to test it
<somlo>
_florent_: I didn't simulate with any memory, so no to your first question
<somlo>
I did try the wishbone converter option (different litedram port width than rocket's axi port), which forces the wishbone conversion, got the same results
<somlo>
_florent_: like daveshah mentioned, the fact that I get three or four putchars before things hang *feels* like an IRQ problem
<somlo>
so I'm going to try turning off LiteETH when building bitstream next, to see if that makes a difference
<somlo>
it's the only thing with an IRQ that's not in the default simulator config that I can think of :)
<_florent_>
somlo: i already add a similar issue with a cached/uncached CSR mapping issue (IIRC CSR was mapped to a cached region)
<somlo>
ok, so I need to make sure that in my case no CSRs are mapped above 0x80000000 (which is Rocket's cached region, routed through its mem_axi port)
<_florent_>
somlo: we have the current mapping in Rocket:
<_florent_>
for others CPU, we are putting rom and sram in a cached region
<somlo>
right, rocket overrides the mem_map property, and everything (rom, sram, csr) is accessed uncached via the mmio-axi port that's converted to wishbone and ends up as the wishbone bus master
<somlo>
0x80000000 and above goes through rocket's internal L1 cache, and out via the mem-axi port
<somlo>
and looking through csr.csv for my latest nexys4ddr build, all CSRs are below 0x80000000, where they *shold* be -- so accessing CSRs through the cache doesn't seem to be my issue
freemint has joined #litex
<somlo>
_florent_, daveshah: so I tried "litex/litex/tools/litex_sim.py --csr-data-width 32 --with-ethernet --cpu-type rocket" from a freshly cloned repo (still bisecting inside my main one :) )
<somlo>
and it still works
<somlo>
was hoping the ethernet thing will trigger the bug I saw on the fpga, but no :)
<_florent_>
somlo: FYI i tried a build on Arty and reproduced your issue
<somlo>
_florent_: thanks for the confirmation, and sorry to be the bearer of bad news :)
<somlo>
I had to skip a couple of commits during bisect, still hoping to narrow down the list of things to look at for clues (not sure that'll end up being useful, but since I started it I'll take it as far as I can)...
<_florent_>
that's better to know it now than later :), i could do more tests tomorrow, if you bisect it more closely, please post here the commits, i'll try to understand
<somlo>
will do, and thanks again!
rohitksingh has joined #litex
rohitksingh has quit [Ping timeout: 268 seconds]
<_florent_>
somlo: i found a regression on CSR alignment with Rocket (it was set to 32 instead of 64), i'm building a SoC on Arty with the fix
<somlo>
_florent_: for better or worse, bisect blames commit 29bbe4c0 :)
<somlo>
not sure if that's consistent with what you found, and I just got that information 5 seconds ago, so I haven't looked inside yet :)
<tpb>
Title: soc/add_cpu: use cpu.data_width as CSR alignment, fix regression on R… · enjoy-digital/litex@5b34f4c · GitHub (at github.com)
<somlo>
building on trellisboard now...
<somlo>
_florent_: while I'm waiting for the build, I dug around, and it was actually commit 84b5df78 that removed the line "csr_alignment = self.cpu.data_width" from soc_core.py
<somlo>
which is where we used to adjust alignment, specifically for rocket at that time
<somlo>
so I guess if I have to "git bisect skip", I may end up randomly blaming the wrong commit for whatever upsets me at that moment :)
<somlo>
the more interesting question to me is, why was this working in simulation ? :)
<somlo>
_florent_: \o/
<somlo>
it now hangs during Linux boot when attempting to initialize the LiteETH driver, but that probably means I have to redo the driver initialization to account for whatever changes occurred in LiteETH over the last few days
<somlo>
either way, thanks for catching the alignment thing!