#symbiflow on 2020-06-10 — irc logs at freenode.irclog.whitequark.org

00:00 tpb has quit [Remote host closed the connection]

00:00 tpb has joined #symbiflow

00:13 <sf-slack> <pgielda> That will happen most probably tomorrow

00:14 <sf-slack> <pgielda> Sorry for the delay, one of engineers that is responsible for this part is out of office this week by coincidence

00:18 <_whitenotifier-f> [yosys] HackerFoo opened issue #79: Need updated Yosys to successfully run attosoc test with nextpnr-xilinx - https://git.io/JfyNj

00:18 <sf-slack> <pgielda> But thanks for comments in the PR would be great tp get it in and then expand and fix things

00:18 <sf-slack> <pgielda> We will also then have one set of packages for both xc7 and eos3 flows which would be a dream ;)

00:46 <HackerFoo> nextpnr takes 104s on vexrisc, where approximately all (all but 100ms) of that time is taken in fasm and bitstream generation: https://docs.google.com/document/d/1-lDeYYwmfxanod441FUjIoAKW061qmgKsqTcKTb_mXE/edit?usp=sharing

00:46 andrewb1999 has quit [Read error: Connection reset by peer]

00:46 <tpb> Title: Performance Comparison - Google Docs (at docs.google.com)

00:47 <HackerFoo> At least according to fpga-tool-perf

01:01 <mithro> @HackerFoo Did it actually generate anything? There is a lot of N/A output

02:02 <HackerFoo> mithro: It generated a 2MB .bit file; I haven't tried it. fpga-tool-perf doesn't seem to keep the output from nextpnr yet, but it didn't seem to fail.

02:05 <HackerFoo> If you want to try it, you need https://github.com/SymbiFlow/edalize/pull/47, https://github.com/HackerFoo/fpga-tool-perf/tree/nextpnr-vexriscv, and a recent yosys.

02:15 <HackerFoo> I re-ran it to capture the output: https://gist.github.com/HackerFoo/7b2a7bed7632a981399d54cba389d749

02:15 <tpb> Title: vexriscv/nextpnr-xilinx log · GitHub (at gist.github.com)

02:16 <HackerFoo> Placement takes about 10 seconds.

02:22 citypw has joined #symbiflow

02:23 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

02:29 rvalles_ has joined #symbiflow

02:30 rvalles has quit [Ping timeout: 256 seconds]

02:31 Degi has quit [Ping timeout: 246 seconds]

02:33 Degi has joined #symbiflow

03:22 rvalles_ is now known as rvalles

03:24 az0re has quit [Remote host closed the connection]

03:52 futarisIRCcloud has joined #symbiflow

05:07 tpagarani has joined #symbiflow

05:28 smkz has quit [Quit: reboot@]

05:31 smkz has joined #symbiflow

05:33 smkz has quit [Client Quit]

05:37 smkz has joined #symbiflow

06:17 tpagarani has quit [Remote host closed the connection]

06:22 lns1 has joined #symbiflow

06:31 lns1 has left #symbiflow [#symbiflow]

06:37 OmniMancer has joined #symbiflow

06:39 OmniMancer1 has quit [Ping timeout: 256 seconds]

06:46 az0re has joined #symbiflow

07:53 epony has quit [Quit: QUIT]

07:58 epony has joined #symbiflow

07:59 epony has quit [Max SendQ exceeded]

08:01 epony has joined #symbiflow

08:46 kraiskil has joined #symbiflow

08:57 <sf-slack> <acomodi> nextpnr in tool-perf needs to be fixed to have all the output results correctly parsed, I am dealing with that now

09:42 mkru has joined #symbiflow

10:03 mkru has quit [Quit: Leaving]

10:33 lnsharma has joined #symbiflow

10:35 lnsharma has left #symbiflow [#symbiflow]

10:54 craigo has joined #symbiflow

11:13 <Lofty> tnt: you around?

11:13 <tnt> Lofty: I am

11:13 <Lofty> Could you send me your designs so I can play about with them?

11:17 <tnt> Lofty: sure. http://people.osmocom.org/~tnt/stuff/s3/s3-usb.tar.bz2

11:18 <tnt> Note that you can only run it through yosys atm ... you can't actually run VPR because I haven't ported the RAM or IO so you have SB_IO and SB_RAM_4K blackboxes ...), nor worked out the proper interconnect.

11:22 anuejn_ is now known as anuejn

11:23 <tnt> But so far it's been good enough for me to look at the deepest path with 'show' (something like show n:$abc$8664$techmap\tx_pkt_I.$0\len[10:0][6] %ci*:-dff:-dffc:-SB_RAM40_4K ) and see if I can do better.

11:24 <Lofty> If ABC9 gets hooked up there'll be sta, which is even more useful

11:32 <Lofty> tnt: Is this after you've manually optimised it, or before?

11:37 <tnt> What I sent ? that's before I did anything.

11:38 <tnt> I didn't actually do any optimization anywhere else than on paper ...

11:43 <Lofty> Ah, okay

11:47 <Lofty> Let's start by seeing what happens if we map muxes before giving them to ABC

11:49 <Lofty> Since to implement a MUX8 in pure-LUT requires 11 inputs, and ABC thinks we have 4

11:52 kraiskil has quit [Ping timeout: 256 seconds]

11:54 <Lofty> Doesn't seem like you have anything mappable there. At least with `muxcover -nodecode`

11:54 <Lofty> Let's see what happens if we allow decoders

11:55 <Lofty> Again, nothing

11:55 <Lofty> Ah well

11:56 <tnt> Unfortunately they're probably never pure-mux. The 8:1 mux example is just like a synthetic worst case. For the few path I looked at it's more subtle that you can use the muxes better.

11:57 <Lofty> Alright, 1467 cells to 1261 cells by using `abc -luts 1,2,2,4`

11:58 <Lofty> Let's use a crappy formula for LC usage: LCs = LUT2/2 + LUT3/2 + LUT4

11:59 <Lofty> Before: 385/2 + 509/2 + 56 = 407 LCs

12:01 <Lofty> After: 107/2 + 420/2 + 217 = 498 LCs.

12:01 <Lofty> Hmm

12:01 <tnt> How's the ltp ?

12:01 <Lofty> length=7

12:02 <Lofty> So equivalent to before it seems

12:03 <Lofty> Okay, what happens if I reduce the area cost of LUT4s to make it try to cover more logic with it?

12:04 <Lofty> 109/2 + 389/2 + 238 = 486 LCs

12:04 kraiskil has joined #symbiflow

12:05 <Lofty> That's `abc -luts 1,2,2,3`

12:07 <Lofty> I just double-checked with flowmap: it seems the critical path is 7 and it can't actually be shortened

12:11 <Lofty> What if we make LUT4s the same cost as LUT3s?

12:12 <Lofty> 87/2 + 322/2 + 305 = 509 LCs

12:12 <Lofty> Hrm.

12:13 <Lofty> What if we make LUT4s unnecessarily expensive?

12:13 <tnt> I'll try to come up with a more synthetic example that's easier to analyze and that comes up quite a bit. ( wich is basically a loadable counter with enable ).

12:13 <Lofty> Sure

12:13 <tnt> For instance, it doesn't _use_ dff enable from what I've seem, which increase the length of the logic for nothing.

12:14 <Lofty> That's because the flow doesn't map dff enable

12:14 <Lofty> Which might help :P

12:17 <Lofty> I have no idea what the primitive for it is

12:18 <tnt> dffe ( Q, D, CLK, EN );

12:20 <Lofty> Okay, so literally just inserting a dff2dffe in the flow pulls out 191 dffes

12:27 <Lofty> <Lofty> Before: 385/2 + 509/2 + 56 = 407 LCs

12:27 <Lofty> After: 472/2 + 375/2 + 46 = 469

12:27 <Lofty> Eh?

12:28 <Lofty> ...Ah

12:28 <Lofty> Here's the answer: 385/2 + 509/2 + 56 = 502 LCs :P

12:29 <Lofty> Thanks windows calculator

12:29 <Lofty> Real helpful of you

12:29 <tnt> lol

12:31 <Lofty> Okay, so, dffes help

12:36 <Lofty> Let's go back to LUT mapping with the math problems solved

12:37 <Lofty> 186/2 + 345/2 + 178 = 443 LCs (abc -luts 1,2,2,4)

12:41 <Lofty> So I've shaved 60 LCs by configuring ABC better and using DFFEs

12:43 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

12:44 citypw has quit [Ping timeout: 240 seconds]

12:45 <Lofty> Surprisingly, this design isn't too awful to run `freduce` on

12:45 <Lofty> 200/2 + 291/2 + 168 = 413 LCs

12:46 <Lofty> Although maybe put that behind an option :P

12:49 <tnt> What's freduce ?

12:49 <Lofty> It looks for functionally-equivalent subcircuits (e.g. subcircuits that return the same value) to reduce area

12:50 <Lofty> Unfortunately the pass is horrifically slow for anything larger than toy examples

12:50 <Lofty> Because it attempts to SAT compare stuff without even trying

12:54 olegfink1 has joined #symbiflow

12:55 <Lofty> Hmmm

12:56 <Lofty> Without going to the effort of writing an ABC9 flow, I think I've mostly exhausted the things I can think of

12:58 smkz has quit [Ping timeout: 256 seconds]

12:58 <Lofty> Actually, there is one

12:59 <Lofty> Nope, doesn't apply here.

13:00 xobs1 has joined #symbiflow

13:01 FFY00 has quit [*.net *.split]

13:01 promach3 has quit [*.net *.split]

13:01 xobs has quit [*.net *.split]

13:01 olegfink has quit [*.net *.split]

13:02 <Lofty> Anyway, 60 LCs by configuring ABC better, another 30 by using freduce

13:03 FFY00 has joined #symbiflow

13:04 <tnt> Do you have a diff I can apply to yosys to try that here ?

13:05 <tnt> Did the dffe have any impact on ltp btw ?

13:06 <Lofty> https://gist.github.com/ZirconiumX/ce9a6e4cc286294d9fd56fa04c103c4b

13:06 <tpb> Title: synth_quicklogic.diff · GitHub (at gist.github.com)

13:06 <Lofty> No, but it reduces area

13:06 <Lofty> As I said: you can't do any better with depth

13:06 <Lofty> At least without resynthesis

13:08 <tnt> I'm also wondering why the flow is two steps of mapping. synth_quicklogic is the first stage but then they actually re-run yosys a second time with a different techmapping pass.

13:09 smkz has joined #symbiflow

13:09 <Lofty> ?!

13:09 <Lofty> That's...what.

13:11 <tnt> https://pastebin.com/RHmrqy0n

13:11 <tpb> Title: synth.sh: yosys -p "tcl ${SYNTH_TCL_PATH}" -l $LOG ${VERILOG_FILES[*]} pytho - Pastebin.com (at pastebin.com)

13:12 <Lofty> This should be within synth_quicklogic

13:15 <tnt> Yeah, they have a second level cells_map.v that maps to T_FRAG / B_FRAG and other types of cells used by vtr rather than lut{2,3,4}.

13:15 * Lofty sighs

13:17 <tnt> https://pastebin.com/qjLFp01T

13:17 <tpb> Title: [VeriLog] // ============================================================================ - Pastebin.com (at pastebin.com)

13:18 <tnt> It seems to also have stuff like "// FIXME: Always select QDI as the FF's input" which seems rather sub-obtimal ... or maybe vpr's packing step can 'fix' that if it sees it can pack the Q FRAG with the T/B FRAG

13:20 <tnt> Also the LUT4 are mapped to 2 LUT3 + 1 F_FRAG. So not sure why that is. I'm hoping the F_FRAG can also be placed by VTR as the TBS_mux and not just the bottom Fmux because that seems like a huge waste.

13:23 <Lofty> Mmm

13:28 az0re has quit [Remote host closed the connection]

13:31 <Lofty> tnt: But presumably shaving 20% off the area helps a bit, right?

13:35 <tnt> It helps for sure, but atm I'm more concerned about the fmax / depth.

13:36 <tnt> I have to finish porting so I can make a full run synth->pnr to see the actual frequency. Less area will help placement for sure.

13:37 <Lofty> I'm fairly sure ABC9 will help a lot here

13:41 <Lofty> By the way, tnt, I added some other stuff you'll probably find helpful

13:42 <tnt> where ?

13:43 <Lofty> I just updated the diff I gave earlier

13:44 <Lofty> Sorry, Git was being a pain

13:44 <Lofty> You'll now see the names of wires in the LTP output

13:44 <Lofty> Should help your optimisation efforts

13:44 <tnt> Ah yeah, naming :)

13:45 <tnt> tx

14:01 <tnt> Lofty: it might not solve the longest one, but still provides a good reduction : https://pastebin.com/Swhjs6Gn

14:01 <tpb> Title: Paths length distribution: Before After 0: 270 307 1: 19 - Pastebin.com (at pastebin.com)

14:01 <tnt> (the method I use to collect them is flawed, but I still it still gives a decent idea ...)

14:01 <Lofty> Yay

14:02 <Lofty> That should improve routability a bit then

14:02 <Lofty> i.e. less shitty runtime

14:07 futarisIRCcloud has joined #symbiflow

14:31 kraiskil has quit [Ping timeout: 265 seconds]

14:46 kraiskil has joined #symbiflow

15:12 <sf-slack> <acomodi> mithro, HakerFoo: I have merged a fix in tool-perf to parse nextpnr results (frequency, runtime and resources)

15:19 OmniMancer has quit [Quit: Leaving.]

15:20 OmniMancer has joined #symbiflow

15:20 OmniMancer has quit [Client Quit]

15:32 epony has quit [Ping timeout: 258 seconds]

15:42 mangelis has quit [Ping timeout: 258 seconds]

15:46 craigo has quit [Quit: Leaving]

15:52 kraiskil has quit [Ping timeout: 256 seconds]

16:26 mkru has joined #symbiflow

16:30 mangelis has joined #symbiflow

16:42 mkru has quit [Quit: Leaving]

16:54 <mithro> Lofty: awesome work on the synthesis stuff

16:54 <Lofty> It was a handful of hours that I could spare

16:55 <mithro> Lofty: did you see that QuickLogic provide full liberty files which include timing for all the parts?

16:55 <mithro> Lofty: https://github.com/QuickLogic-Corp/EOS-S3/tree/master/Timing%20Data%20Files

16:55 <tpb> Title: EOS-S3/Timing Data Files at master · QuickLogic-Corp/EOS-S3 · GitHub (at github.com)

16:56 <Lofty> I did, yes

16:57 <Lofty> But porting to ABC9 is a sufficiently big enough task that I'm not going to do it for free

16:58 <Lofty> Household expenditures, etc

17:00 <sf-slack> <acomodi> litghost, HakerFoo: I have an almost working lookahead. I have merged the lookahead creation from the connection box and used the SRC/OPIN --> CHAN and CHAN --> IPIN information from the upstream lookahead map

17:01 <litghost> acomodi: Define almost working? It routes well?

17:01 <sf-slack> <acomodi> almost working because it still take more time than the connection box lookahead (271 seconds vs ~130 seconds)

17:01 <sf-slack> <acomodi> I still need to check whether the bitstream works though, but the routing step completed successfully

17:02 <litghost> acomodi: How long did the untuned version take?

17:02 <sf-slack> <acomodi> By untuned you mean without the information on SRC/OPIN --> CHAN and CHAN --> IPIN?

17:03 <sf-slack> <acomodi> If that's the case it took 313 seconds

17:03 <sf-slack> <acomodi> Also, the lookahead generation took 473.11 seconds with NUM_WORKERS set to 20

17:04 <litghost> acomodi: I assume this is on the A50 fabric?

17:04 <sf-slack> <acomodi> Yes

17:05 <sf-slack> <acomodi> I still need to do more testing and probably fix some things, but I think we are on the right path

17:05 <litghost> acomodi: 313 -> 271 isn't a huge improvement, but it is moving in the right direction. It is likely worth comparing how and when the lookahead mispredictions occur. I expect you can use that to inform how to iterate

17:27 az0re has joined #symbiflow

17:35 epony has joined #symbiflow

17:46 kraiskil has joined #symbiflow

17:57 mkru has joined #symbiflow

18:17 <sf-slack> <acomodi> Sure, I'll try and get it down to reasonable run-time. Hopefully matching the current lookahead. I have tested the routed design on HW and it works

18:18 <litghost> acomodi: The runtime doesn't have to match exactly, but 271 vs 130 is a large enough difference that points to the refactored map lookahead still mispredicting pretty badly

18:19 <litghost> acomodi: Identifying how the new map lookahead mispredicts will enable targeted development to reduce the mispredicts

18:21 <litghost> acomodi: Another thing I expect is the new map lookahead output file to be significantly smaller than the connection box lookahead. Is that true?

18:28 <sf-slack> <acomodi> Indeed, here there is a pretty huge benefit. • refactored lookahead: 17M • connection box lookahead: 557<

18:28 <sf-slack> <acomodi> *M

18:29 <sf-slack> <acomodi> I will need to run more detailed routes, maybe there are some specific nets that encounter lots of mispredicts

18:29 <litghost> acomodi: That is really good! Hopefully with some further development we can capture the relevant data to make the map lookahead as good as the connection box lookahead, without the extra data

18:30 <litghost> acomodi: When I was doing initial development for the connection box lookahead, I enabled router profiling, and made sure to print per-connection route times per iteration, and printed the worst connection (OPIN -> IPIN) times

18:31 <litghost> acomodi: From there I used the routing_diag tool at criticality == 1 to find lookahead mispredictions

18:31 <litghost> acomodi: You may find that you will need to have several connections to suss out where the mispredictions were coming from

18:31 <litghost> acomodi: However it the lookahead is close, you will likely see a step function somewhere in the path that points to the current error source

18:32 <litghost> acomodi: Another thing is when you are examining just 1 connection, you can easily compare the ideal route (e.g. astar_fac = 0) versus the route at various A* levels

18:33 <litghost> acomodi: With a well tuned lookahead, the router should behave similiarly at A* = 0 and A* ~= .1 - .5

18:33 <litghost> acomodi: Once that is working well, A* = 1 should return a good route quickly, and A* = 1.05 ~ 1.2 should return a route quickly, but with somewhat worse path delay

18:34 <litghost> acomodi: Ideally A* <= 1 should all return best (or close to best) delay, and A* > 1 < 1.2 should return worse path delay, but with increased speed (via decreased search space)

18:37 <sf-slack> <acomodi> litghost: All right, thanks for the insight, I will get to analyze all of this on smaller circuits then, to keep things simple

18:37 <litghost> acomodi: No, my suggestion was not to use a smaller circuit, but to use a circuit, and pick the worst connection route time, and debug that connection specifically

18:37 <sf-slack> <acomodi> One question, by router profiling you mean to have the debug logging enabled or is that something else entirely?

18:38 <litghost> acomodi: https://github.com/SymbiFlow/vtr-verilog-to-routing/blob/master%2Bwip/vpr/src/route/route_profiling.h#L7

18:38 <tpb> Title: vtr-verilog-to-routing/route_profiling.h at master+wip · SymbiFlow/vtr-verilog-to-routing · GitHub (at github.com)

18:38 <litghost> https://github.com/SymbiFlow/vtr-verilog-to-routing/blob/master%2Bwip/vpr/src/route/route_profiling.cpp#L233

18:38 <tpb> Title: vtr-verilog-to-routing/route_profiling.cpp at master+wip · SymbiFlow/vtr-verilog-to-routing · GitHub (at github.com)

18:39 <sf-slack> <acomodi> Ok, great, thanks

18:42 mkru has quit [Quit: Leaving]

18:53 mkru has joined #symbiflow

18:53 mkru has quit [Remote host closed the connection]

19:07 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

19:45 kraiskil has quit [Ping timeout: 246 seconds]

19:48 kraiskil has joined #symbiflow

20:54 tonlage has joined #symbiflow

21:15 OmniMancer has joined #symbiflow

21:25 andrewb1999 has joined #symbiflow

21:31 <andrewb1999> Is there a reason that the wire CLK_HROW_TOP_R_X60Y130/CLK_HROW_CK_BUFHCLK_L7 would exist in the database but not CLK_HROW_TOP_R_X60Y130/CLK_HROW_CK_BUFHCLK_L6?

21:32 <andrewb1999> Both of these wires were chosen by Vivado as partition pins, but and the xray python lookup scripts can find L7 but not L6

21:32 <andrewb1999> All other partition pins can be found fine too

21:35 <andrewb1999> Or is it possible this is an issue with using 2 clocks?

21:35 <andrewb1999> Does symbiflow support multiple clocks yet?

21:36 <litghost> andrewb1999: 7-series VPR does support multiple clocks, but it doesn't currently support explicit BUFH instancing

21:37 <litghost> andrewb1999: There is not support for fully route-through sites right now in VPR

21:37 <litghost> andrewb1999: You can operate with all explicitly BUFH instancing, or all implicit BUFH instancing (via routing)

21:38 <litghost> andrewb1999: Mixing them is not supported at this time

21:39 <litghost> andrewb1999: Support could be added, but it is unclear why it would be a priority

21:40 <andrewb1999> litghost: Ok thanks, will probably switch to one clock now for simplicity

21:40 <litghost> andrewb1999: FYI, One vs multiple clocks has no revelance on the BUFH discussion

21:41 <andrewb1999> litghost: Oh ok

21:41 futarisIRCcloud has joined #symbiflow

21:51 az0re has quit [Remote host closed the connection]

22:07 az0re has joined #symbiflow

22:17 gsmecher has quit [Ping timeout: 246 seconds]

22:18 kraiskil has quit [Ping timeout: 246 seconds]

22:23 <andrewb1999> Is the difference between graph_limit and roi just whether synth_tiles are created and if fasm from the harness gets merged?

22:23 <andrewb1999> Or does it get treated differently in other ways?

22:28 <litghost> andrewb1999: The ROI provides the graph limit as part of the ROI spec (along with the FASM for the bits for the harness)

22:29 <litghost> andrewb1999: The graph_limit is for ROI-less designs, where we want a sub-graph excluding some of the fabric

23:07 tonlage has quit [Remote host closed the connection]