_florent_ changed the topic of #litex to: LiteX FPGA SoC builder and Cores / Github : https://github.com/enjoy-digital, https://github.com/litex-hub / Logs: https://freenode.irclog.whitequark.org/litex
tpb has quit [Remote host closed the connection]
tpb has joined #litex
Degi has quit [Ping timeout: 240 seconds]
Degi has joined #litex
midnight has joined #litex
_whitelogger has joined #litex
CarlFK has quit [Read error: Connection reset by peer]
CarlFK has joined #litex
st-gourichon-fid has joined #litex
<_florent_> thanks for looking at this somlo, zyp. That's also a bit out of my confort zone, but could try to help. We could use the simulation with verilator to reproduce the issue and fix it correctly on at least one CPU first (running a simulation with SERV or Vexriscv is just a few seconds) and i could apply the change to the others CPU.
_whitelogger has joined #litex
tcal has quit [Remote host closed the connection]
<zyp> I fixed the CRC code as well and am building for vexriscv now
<zyp> fun, that broke everything because apparently the BIOS for vexriscv adds a .got section
<zyp> I've got a solution that appears to work on vexriscv now, I'll spend some time looking at the other cpus as well and then submit a PR
<zyp> but I've got some errands to run before that, so it might be a while
<benh> right I noticed our .lds might need to be enriched to support more sections
<benh> .got comes to mind
<benh> the kernel is a good example of how messy it can get :)
<benh> some archs might have a .toc or .opd
<somlo> zyp: commenting out the crc function allowed the bios to proceed, but then it went on to print out a bunch of garbage and eventually crash
<somlo> printf says content (at least first 16 bytes) starting at 0x10008600 are not the same as content starting at 0x11000000
<somlo> gotta be afk for a few hours, bbl
<zyp> I tried --cpu-type rocket, but litex is failing with this: https://paste.jvnv.net/view/dsI8G
<tpb> Title: JVnV Pastebin View paste – Untitled (at paste.jvnv.net)
<zyp> apparently it wants to connect a 64-bit cpu port to a 128-bit memory port, and upconversion is not supported
<zyp> okay, --cpu-variant linuxd solves that
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<zyp> appears to work perfectly here on both vexriscv and rocket now: https://paste.jvnv.net/view/WFsom
<tpb> Title: JVnV Pastebin View paste – Untitled (at paste.jvnv.net)
<benh> Memspeed Writes: 1047Mbps Reads: 1051Mbps
<benh> :-)
<benh> Microwatt's getting there ... (standalone MW with a full 64-bit pipelined wishbone and my L2 cache)
<benh> hopefully we'll get that in LiteX soon
Skip has joined #litex
somlo has quit [Remote host closed the connection]
somlo has joined #litex
<somlo> zyp: mind posting a patch (or sending a PR, and linking issue #566)?
<zyp> I'll do a PR, but I'm looking at having a go at all the other cpus as well
<zyp> benh, as far as I can tell, microwatt/crt0.S doesn't even set up .bss, is that correct?
<keesj> what is microwatt?
<tpb> Title: litex/core.py at master · enjoy-digital/litex · GitHub (at github.com)
<keesj> wow.. it is vhdl code
<keesj> (just had a 4 days advanced security training using FPGA / verilog) https://advancedsecurity.training/training/live-fpga-hacking . I became better a verilog but.well..
<keesj> I also tried to do the assignments with litex (parts of them) but I was kinda lacking a UART RX/TX block that is simple enough to mess arround with (not one to access via wishbone or .. to complicated)
<keesj> and the second "problem" I had (that I solved) was when using additional pins bit is just a little bit cumbersome if you are trying to prove "litex is much better"
<tpb> Title: bios/linker: Place .data in sram with initial copy in rom. by zyp · Pull Request #567 · enjoy-digital/litex · GitHub (at github.com)
tcal has joined #litex
<somlo> zyp: thanks! I tried, it, but it's printing garbage to the terminal for me (starting with the sdram initialization output), then eventually hangs
<somlo> well, if I don't try to boot from sdcard it works (boots from ethernet)
<somlo> so I think we're on the right track, just not 100% there yet
Skip has quit [Remote host closed the connection]
Dolu has joined #litex
<Dolu> futarisIRCcloud, about the DMIPS in linux. One big issue is currently that the strcmp used for userspace is a stupid loop which check one byte at the time, instead of the word based one which is used in barmetal/newlib
<Dolu> So, in barmetal, the CPU configuration used is about 1.1 or 1.2 DMIPS/Mhz
<Dolu> (baremetal => not the same libc => optimized strcmp)
<Dolu> Not sure if that was strcmp or strlen XD, but one of those function was realy dumply implemented in the linux libc
<Dolu> Then about the memory speed, in addition of what florent said, there is also the fact that the i$ bus is 128 bits wide / d$ bus is 64 bits wide on the SMP cluster. On the single core version they are both only 32 bits wide, which result into high cache miss penality (transfer time)
<zyp> somlo, I wonder if the problem is that putting .data in sram leaves less room for the stack, so stack grows into .bss
<zyp> somlo, any chance you could try simply increasing sram size?
<zyp> worst case we rule out that as an issue
<somlo> zyp: trying now (it's only 4k by default, trying 32k). If that works, I'll have to also try this whole thing on an 85k ecp5 board, there might be less wiggle room there...
Skip has joined #litex
Dolu has quit [Read error: Connection reset by peer]
kgugala_ has joined #litex
kgugala has quit [Ping timeout: 264 seconds]
<zyp> if 4k is too little, it's probably enough to increase by one block or so, as far as I can tell .data itself tends to be less than 1k
Dolu has joined #litex
<zyp> it was 776B in the bios.elf you sent me yesterday
<Dolu> There is a past bin with the two version of memcmp : https://pastebin.com/QL9CSuF6 . Quite some difference. I'm now mesuring in a simulation the drhystone run, to be sure there is not something else hidden.
<tpb> Title: [M68000 Assembler] linux libc : 0007764c : 7764c: 00150513 add - Pastebin.com (at pastebin.com)
kgugala has joined #litex
kgugala_ has quit [Ping timeout: 246 seconds]
<somlo> zyp: works like a charm with 8x the default :)
<somlo> we should come up with a reasonable number for the new default, maybe incorporate that into the PR with a note in the commit blurb
<somlo> probably doubling it to 8k will work, I can try it now that my knee-jerk "make it BIG" thing has panned out :)
<zyp> if it worked at 4k before, I'm confident it'd work at 6k now, unless it actually had a collision before as well that just didn't show any symptoms
<zyp> but yeah, 8k is a rounder number
<zyp> .data and .bss are easy to find the requirements for, since you can just read those right from the elf header, but stack usage is harder
<zyp> I did some experiments many years ago extracting function stack frame and call tree information from gcc to find the maximum stack usage, but as soon as you have any indirect calls, building a complete call tree gets pretty hard
<tpb> Title: litex/soc_core.py at master · enjoy-digital/litex · GitHub (at github.com)
tcal has quit [Ping timeout: 260 seconds]
<somlo> zyp: doubling sram works (is sufficient). Would you mind pushing an additional commit to PR #567 to modify the default in soc_core.py?
<zyp> sure, only need to change those two places?
tcal has joined #litex
<somlo> that's where the default comes from on every target's command line, as far as I can tell
<zyp> done
<somlo> zyp: cool beans, I changed my hacky workaround for the global variable to let it remain in .data :)
<zyp> great, now we just need the two last cpus fixed as well
FFY00 has quit [Remote host closed the connection]
FFY00 has joined #litex
st-gourichon-fid has quit [Ping timeout: 265 seconds]
<Dolu> futarisIRCcloud: I investigated more about the dhrystone package in buildroot and how that's compiled. And basicaly that's shity. It's compiled in -Os instead of -O3, and i can't spot any -fno-inline. Most of the time, the CPU has to run into strcpy and strcmp which are pretty bad compared to the bar metal version.
<Dolu> with the SoC in baremetal, you should get about 1.08 DMIPS/Mhz, i turned off quite some feature while i was trying to boost the cluster frequancy (should be able to go up to 125 Mhz with some margin)
<zyp> -Os doesn't tend to be too bad though, it's -O2 with a few speed for size tradeoffs
<Dolu> zyp: In that specific case, that's prety bad, basicaly, the strcpy of the benchmark is using 200 instructions per benchmark loop, instead of 20. On bar metal, one benchmark loop is 322 instructions.
<zyp> fair enough
tpearson-mobile has joined #litex
<tpearson-mobile> So I'm trying to use the SpiFlash peripheral in MMIO mode with a Micron SPI part
<tpearson-mobile> when I try to do a dump of the MMIO space, all I get is 0xa0000000 00 00 00 00 44 44 44 44 88 88 88 88 cc cc cc cc
<tpearson-mobile> 0xa0000010 00 00 00 00 44 44 44 44 88 88 88 88 cc cc cc cc ....DDDD........
<tpearson-mobile> etc.
<tpearson-mobile> obviously I'm missing something, any hints? :)
<tpearson-mobile> is this what the peripheral does when it can't talk to an external SPI (it does the same thing with the SPI pins completely disconnected, FWIW) or is this a more fundamental problem with the SoC configuration?
Dolu has quit [Ping timeout: 264 seconds]
<awordnot> tpearson-mobile: which flash part are you using? Looking at the SpiFlash peripherals' code it seems like they only support spi mode 3 so the part will need to support that
<tpearson-mobile> yeah it supports mode 3
<tpearson-mobile> N25Q512
<tpearson-mobile> nearly identical to the N25Q128 that's already on the Versa board
<awordnot> and you're sure you're looking at the right MMIO address range?
<tpearson-mobile> should be, yeah
<tpearson-mobile> mem_map = { "hostspiflash": 0xa0000000, }
<tpearson-mobile> and "mr 0xa0000000 32"
<tpearson-mobile> I was honestly expecting tis to "just work"
<awordnot> hmm, well I don't see anything in the code that would artificially generate that pattern you're seeing (unless I'm missing it)
<awordnot> which is strange
<tpearson-mobile> here's the tie in: https://paste.ee/p/lZ9Yw
<tpb> Title: Paste.ee - View paste lZ9Yw (at paste.ee)
<tpearson-mobile> and yeah, I'm puzzled
<awordnot> when you run the soc builder, does your 'hostspiflash' memory region get printed out?
<tpearson-mobile> INFO:SoCBusHandler:hostspiflash Region added at Origin: 0xa0000000, Size: 0x04000000, Mode: RW, Cached: True Linker: False.INFO:SoCBusHandler:hostspiflash added as Bus Slave.INFO:SoCCSRHandler:hostspiflash CSR allocated at Location 7.
<awordnot> hmm yeah everything looks good in terms of the SpiFlash instantiation. I guess the next thing to check would be the pinout