jevinskie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
OmniMancer has joined ##openfpga
noobineer has joined ##openfpga
ZombieChicken has joined ##openfpga
unixb0y_ has joined ##openfpga
unixb0y has quit [Ping timeout: 250 seconds]
jevinskie has joined ##openfpga
jevinskie has quit [Quit: Textual IRC Client: www.textualapp.com]
gsi__ has joined ##openfpga
gsi_ has quit [Ping timeout: 244 seconds]
jevinskie has joined ##openfpga
mumptai_ has joined ##openfpga
mumptai has quit [Ping timeout: 245 seconds]
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
gsi__ is now known as gsi_
ZombieChicken has quit [Ping timeout: 256 seconds]
ZombieChicken has joined ##openfpga
futarisIRCcloud has joined ##openfpga
Jybz has joined ##openfpga
emeb_mac has quit [Ping timeout: 252 seconds]
Jybz has quit [Ping timeout: 264 seconds]
Jybz has joined ##openfpga
Dolu1990 has joined ##openfpga
GuzTech has joined ##openfpga
jevinski_ has joined ##openfpga
jevinskie has quit [Ping timeout: 248 seconds]
rohitksingh has joined ##openfpga
<Sprite_tm> Hey all, is there an example for instancincing an MULT18X18D in an ECP5 somewhere? Preferably in the form of a 32x32bit multiplication example :3
<daveshah> Sprite_tm: this is a small 16x16 example
<daveshah> I realise there is no timing data for the DSPs yet, so you might have to be a bit careful with frequency
<Sprite_tm> daveshah: Eh, she'll be fine :P I can register the in- and outputs; I don't really care *that* much about speed, I just don't want to spend 20% of my FPGA on two multipliers :)
Jybz has quit [Quit: Konversation terminated!]
Jybz has joined ##openfpga
Jybz has quit [Client Quit]
Jybz has joined ##openfpga
Jybz has quit [Quit: Konversation terminated!]
Jybz has joined ##openfpga
flea86 has joined ##openfpga
Asu has joined ##openfpga
s_frit has quit [Remote host closed the connection]
s_frit has joined ##openfpga
s_frit has quit [Remote host closed the connection]
s_frit has joined ##openfpga
ym has quit [Remote host closed the connection]
rohitksingh has quit [Ping timeout: 268 seconds]
rohitksingh has joined ##openfpga
Dolu1990 has quit [Quit: Leaving]
Dolu has joined ##openfpga
rohitksingh has quit [Ping timeout: 246 seconds]
rohitksingh has joined ##openfpga
Dolu has quit [Quit: Leaving]
Dolu1990 has joined ##openfpga
rohitksingh has quit [Ping timeout: 244 seconds]
X-Scale has quit [Ping timeout: 246 seconds]
X-Scale has joined ##openfpga
lutsabound has joined ##openfpga
s_frit has quit [Remote host closed the connection]
s_frit has joined ##openfpga
rohitksingh has joined ##openfpga
<sorear> daveshah: roughly what does utilization/timing look like for https://twitter.com/fpga_dave/status/1124974146625187840 ?
s_frit has quit [Remote host closed the connection]
s_frit has joined ##openfpga
cr1901_modern1 has joined ##openfpga
cr1901_modern has quit [Ping timeout: 250 seconds]
cr1901_modern has joined ##openfpga
cr1901_modern1 has quit [Ping timeout: 246 seconds]
<daveshah> sorear: about half the 45k's slices (would be significantly less with DSP inference), and 70% of its block RAM (76 18kbit blocks)
<daveshah> Runs at 75MHz, Fmax is 80-85MHz at present (also seems rock solid at 100MHz on my board, DDR controller seems to break at 125MHz)
cr1901_modern1 has joined ##openfpga
OmniMancer has quit [Quit: Leaving.]
cr1901_modern has quit [Ping timeout: 255 seconds]
<Sprite_tm> daveshah: If you have some time, can you perhaps help me? I'm having essentially the same issue as https://github.com/YosysHQ/nextpnr/issues/208, but (probably) with the alu slices. How can I figure out which things are the offenders?
<daveshah> Sprite_tm: The ALU slices aren't really supported
<Sprite_tm> Aw :/ that means I need to come up with a 32bit multiplyer myself.
<daveshah> In theory they might work with careful manual placement, but as the current packer doesn't consider the direct connections to multipliers (which are required for the ALUs to have any useful function) they certainly won't be correctly automatically placed
<Sprite_tm> Hmm, that sounds like too much work... if my excuse to dive deeper into FPGA fabric stuff ever materializes, I may look into that.
<daveshah> If you are curious what the problem net is, running with --debug --verbose should give much more insight into the routing
<Sprite_tm> Gotcha. It sounds like the chance that just removing the offending nets clears up the issue is nil, though, so I won't bother (unless it's useful to you).
<daveshah> Yeah, I'll be the first to admit that DSPs are very much the weak link in the ECP5 flow at the moment
<Sprite_tm> No worries, I'm already 110% happy the 'normal' fabric works like a charm.
<Sprite_tm> Would be nice to get that RiscV multiply instruction done by the hw multipliers, though.
<Sprite_tm> On that matter, thanks for your work on the Trellis and nextpnr stuff. If you ever happen to be in Shanghai, hit me up and I'll buy you a beer or two :)
<Adluc> yo guys, just from curiosity, whats approximately improvement in used LUTs when using homebrew tools like nextpnr instead of original from lattice?
emeb has joined ##openfpga
<daveshah> Right now it's often a step backwards by 10-20%, but hopefully that will improve
<daveshah> otoh, the open source tools can often be significantly faster
<daveshah> *faster runtime
jevinski_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<sorear> daveshah: nice (half of 45k & 100mhz)
<daveshah> Much better than Rocket...
flea86 has quit [Quit: Goodbye and thanks for all the dirty sand ;-)]
lutsabound has quit [Quit: Connection closed for inactivity]
lutsabound has joined ##openfpga
<sorear> that’s what Rocket gets… on the vc707
<somlo> daveshah, sorear: right now, 64bit rocket (without fpu) with LiteX uses about 96% of the 45k ecp5, and (sometimes) passes timing at 50MHz
<daveshah> Oh, 50MHz is better than expected
<somlo> of course, I haven't added the plic handler code yet, so it doesn't actually *work* yet
<daveshah> which ECP5?
<somlo> 5g 45k (versa board)
<Dolu1990> I will check where are the critical paths and the core logic usage
<Dolu1990> Not sure how fast VexRiscv can go on ECP5. How ECP5 compare to, let's say, a CycloneIV ? or an Artix ?
<daveshah> It's quite a bit slower than Artix
<daveshah> Not sure about Cyclone IV
<Adluc> whats Rocket?
<daveshah> Rocket is a 64-bit RISC-V processor
<Adluc> found repo, thx
<sorear> I’m also wondering whether there was any significant worsening of utilization for the new MMU
<daveshah> No, no significant change
<daveshah> a few hundred LUTs max
<daveshah> Dolu1990: critical path at the moment is from SEIE, through interrupt code stuff, jump interface and to the IBus pc
rohitksingh has quit [Ping timeout: 248 seconds]
rohitksingh has joined ##openfpga
<Dolu1990> daveshah: Thanks :)
<daveshah> This is using this Verilog https://github.com/enjoy-digital/linux-on-litex-vexriscv/blob/master/VexRiscv.v if you want to correlate back the _zz_s
<Dolu1990> Should be possible to relax that timing by setting that option : https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/plugin/IBusCachedPlugin.scala#L29 I will give some try
<Dolu1990> It would avoid having the next pc directly going to the instruction cache rams address, by adding an extra fetch stage.
Asu has quit [Read error: Connection reset by peer]
Asu has joined ##openfpga
X-Scale has quit [Quit: Try HydraIRC -> http://www.hydrairc.com <-]
<daveshah> So tried that and seems pretty much the same Fmax, but now I think the critical path is the multiplication
<daveshah> let me see what iterative multiplication does
zng_ has joined ##openfpga
zng has quit [Ping timeout: 246 seconds]
<daveshah> that gets it up to 90MHz, and utilisation now 42% rather than 52% of the ECP5 (but obviously worse performance too)
<Dolu1990> Ahh nice
<Dolu1990> Then at 90 Mhz, what is the critical path ?
carl0s has joined ##openfpga
Dolu1990 has quit [Read error: Connection reset by peer]
Laksen has joined ##openfpga
Dolu has joined ##openfpga
<daveshah> Dolu: seems to be the exception and pc stuff again
<Dolu> daveshah, thanks ! Yes you are right. I will check out if there is a way to cut some combinatorial links there. But without any design changes, maybe removing the static branch prediction will relax the next PC calculation, as it will remove a 32 bits adder -> branch interface from the decode stage, --prediction none
<daveshah> Dolu: --prediction none gets up to 98.9MHz
<daveshah> critical path now
<daveshah> VexRiscv.v:4489 is `assign MmuPlugin_ports_0_cacheHits_2 = ((MmuPlugin_ports_0_cache_2_valid && (MmuPlugin_ports_0_cache_2_virtualAddress_1 == DBusCachedPlugin_mmuBus_cmd_virtualAddress[31 : 22])) && (MmuPlugin_ports_0_cache_2_superPage || (MmuPlugin_ports_0_cache_2_virtualAddress_0 == DBusCachedPlugin_mmuBus_cmd_virtualAddress[21 : 12])));`
<Dolu> Iaaarg nearly 100 XD
<daveshah> I'm sure it will make 100 with another seed...
<Dolu> ^^
<adamgreig> does it have a built in mode for my current "up, delete last seed, increment by one, retry, check if we met timing" thing? :P
<adamgreig> better yet, run 16 seeds in parallel...
emeb_mac has joined ##openfpga
rohitksingh has quit [Ping timeout: 246 seconds]
zachjs has joined ##openfpga
<Dolu> So it look it is the DBus MMU translation, which has a whole cycle in the memory stage, to go from virtual to physical address. For this one, i would say, one option could be added to relax the input virtual address register.
<Dolu> The virtual address register is shared with the ALU result and stuff like this from the execute stage to save ressource. But it add constraints on routing. (https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/plugin/DBusCachedPlugin.scala#L222 -> https://github.com/SpinalHDL/VexRiscv/blob/master/src/main/scala/vexriscv/ip/DataCache.scala#L415)
<Dolu> It would be easy to relax it with a dedicated register to go from the 32 bits adder from execute stage to memory stage
Asu has quit [Ping timeout: 252 seconds]
Asu` has joined ##openfpga
zachjs has quit [Remote host closed the connection]
zachjs has joined ##openfpga
<zachjs> I've been working on a research project for converting SystemVerilog to Verilog and heard that people here might be interested. If you're interested, the repo is at https://github.com/zachjs/sv2v
<daveshah> zachjs: Awesome!
Jybz has quit [Quit: Konversation terminated!]
jevinskie has joined ##openfpga
<hackerfoo> zachjs: I just found this, which might be useful: https://github.com/freechipsproject/firrtl
carl0s has quit [Ping timeout: 256 seconds]
<mithro> zachjs: I just discovered the above too
<zachjs> mithro: I saw that project a bit ago, but found that it was limited in scope. For example, it doesn't appear to work on interfaces. Also, I found that verilator (which it relies on) has some odd limitations, for example, with the sizing of members in struct literals.
<zachjs> The output also isn't fully Verilog-2005 compliant, still producing logics.
<zachjs> hackerfoo: Thanks for the link, I'll check it out!
<sorear> difficult for me to get enthusiastic over something that will add more build steps and more opportunities to have error messages with zero correlation to the source
wpwrak has quit [Read error: Connection reset by peer]
ZombieChicken has quit [Remote host closed the connection]
ZombieChicken has joined ##openfpga
GuzTech has quit [Remote host closed the connection]
<mithro> anyone know anything about https://github.com/pablomarx/rodinia ?
Asu has joined ##openfpga
Asu has quit [Client Quit]
Asu` has quit [Ping timeout: 258 seconds]
cr1901_modern1 has quit [Quit: Leaving.]
cr1901_modern has joined ##openfpga
kerel has joined ##openfpga
<zachjs> sorear: That's a great point. The larger goal of the project is to enable the synthesis of SystemVerilog without using commercial tools, rather than to completely cut out commercial tools from the development process.
wpwrak has joined ##openfpga
<zachjs> The Verilog specification has a preprocessor directive which can indicate what source file/line generated output came from. yosys already supports that directive. I'd be very interested to hear how outputting such directives, or maybe some other methods, might make debugging the output of the tool easier.
<whitequark> that directive is not particularly useful
<whitequark> sure, it gives you the rough indication, but you still have to read generated source
Dolu has quit [Quit: Leaving]
mumptai_ has quit [Quit: Verlassend]
Dolu1990 has joined ##openfpga
<zachjs> whitequark: I'd be eager to try out anything that could make that process easier. The output isn't too rough, though I'm sure it could be refined. We were able to get a RISC-V core converted and booted without much issue.
<whitequark> zachjs: you could use the (* loc *) attributes
<whitequark> in principle that would be as good as using yosys' verilog or sv frontend
<zachjs> whitequark: Thanks for the heads up! I'll be sure to look into that.
knielsen has quit [Ping timeout: 250 seconds]
Laksen has quit [Quit: Leaving]
Dolu1990 has quit [Quit: Leaving]
lutsabound has quit [Quit: Connection closed for inactivity]
knielsen has joined ##openfpga