tpb has joined #litex
freemint has quit [Ping timeout: 264 seconds]
freemint has joined #litex
freeemint has joined #litex
freemint has quit [Read error: Connection reset by peer]
CarlFK has quit [Ping timeout: 268 seconds]
CarlFK has joined #litex
CarlFK has quit [Ping timeout: 264 seconds]
freeemint has quit [Ping timeout: 264 seconds]
CarlFK has joined #litex
freeemint has joined #litex
scanakci has quit [Quit: Connection closed for inactivity]
freeemint has quit [Ping timeout: 264 seconds]
freeemint has joined #litex
freeemint has quit [Ping timeout: 250 seconds]
freeemint has joined #litex
<somlo> _florent_, daveshah: over the weekend I ran a quick experiment on how performance would be affected by the data width of a future direct link between Rocket's cached-ram 'mem-axi' port and LiteDRAM: https://github.com/enjoy-digital/litex/issues/299
<daveshah> somlo: very interesting
<daveshah> impressed how big the linpack difference is
<somlo> should more or less match the "integer" performance portion of NBench
<somlo> since fp is emulated in bbl (can't fit a real FPU on ecp5)
<somlo> so, in conclusion, I'm tempted to add a new Rocket variant to litex, one with a 256-bit mem_axi port, which would access main-ram in bursts of 2 x 256bit beats, instead of 8 x 64bit, and which would perfectly match the trellisboard
<daveshah> 2 beats seems fairly low for ddr3
<somlo> during early experimentation I noticed that over the 64bit mem_axi there's always 8 accesses at a time, which tells me the cache line is 512 bits
<somlo> * L1 cache line (only cache there is on Rocket, atm)
<daveshah> oh nvm, I realise that that's 2 system clock cycles
<daveshah> which is 8 memory cycles which is the burst length of the dram
<daveshah> all makes sense
<somlo> oh, was this an "axi burst" vs. "ddr3 burst" terminology overload thing ? :)
<daveshah> yeah
<somlo> So given that Rocket has 512-bit cache lines internally, my instincts tell me it'll always be more efficient to dump as much of it out per system clock cycle as LiteDRAM is capable of accepting, as opposed to chopping it into smaller slices internally, only to have some data-width conversion RTL reassemble it into 256-bit slices downstream, before passing it to LiteDRAM
<somlo> admittedly, the performance delta between my 128-bit measurements and the 256-bit ones is less spectacular than I would have expected :)
<daveshah> Guess it is possible that memory just isn't the bottleneck for those
<somlo> but then I also think having *some* L1 cache alleviates that difference, and if I rebuilt everything with less (or no) L1 cache on the rocket side, I'd see more of an impact
<daveshah> Rocket isn't awfully high IPC anyway
<daveshah> would be much bigger than an ecp5, but might be interesting to repeat the experiment with boom...
<somlo> _florent_, daveshah: LiteX code I used in the experiment is here: https://github.com/enjoy-digital/litex/pull/300
<tpb> Title: RFC: Direct link between Rocket/mem_axi <--> LiteDRAM dataport by gsomlo · Pull Request #300 · enjoy-digital/litex · GitHub (at github.com)
<somlo> (minus the additional Rocket variants themselves, which are simply more pre-built verilog variants in rocket-litex-verilog, with additional width options for axi_mem)
<_florent_> somlo: thanks interesting, i'll have a closer look at it a bit later
scanakci has joined #litex
freeemint has quit [Remote host closed the connection]
freeemint has joined #litex
ambro718 has joined #litex
freeemint has quit [Ping timeout: 245 seconds]
freeemint has joined #litex
freeemint has quit [Ping timeout: 250 seconds]
freeemint has joined #litex
freeemint has quit [Ping timeout: 245 seconds]
ambro718 has quit [Ping timeout: 240 seconds]
tpb has quit [Remote host closed the connection]