#nmigen on 2020-12-06 — irc logs at freenode.irclog.whitequark.org

2020-11-16 20:55 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting November 23th

00:24 nelgau has joined #nmigen

00:28 nelgau has quit [Ping timeout: 272 seconds]

01:00 <lkcl> whitequark: if we needed to do a parallel (SIMD) version of Module, having all of the with m.If/Else capability

01:01 <lkcl> here would be where the AST to do so would be needed, is that right?

01:01 <lkcl> https://github.com/nmigen/nmigen/blob/59ef6e6a1c4e389a41148554f2dd492328820ecd/nmigen/hdl/dsl.py#L447

01:04 nelgau has joined #nmigen

01:06 <whitequark> needs more context

01:07 <lkcl> cf the (brief) conversation we had a couple months back: we're doing a SIMD-capable processor

01:08 <lkcl> this could be done in an absolutely terrible way: one pipeline for 1x64, one for 2x32, one for 4x16, another for 8x8

01:09 nelgau has quit [Ping timeout: 256 seconds]

01:09 <lkcl> turns out that by inserting "partition" points (7 of them) and using carry-overflow you can subdivide a 64-bit adder to do all those combinations by inserting 7 "partition" bits, one every byte

01:10 <lkcl> we then went, "well if add can be dynamically partitioned to do all the different SIMD operations, can shift, greater, less-than be partitioned as well?"

01:10 <lkcl> and the answer turns out to be *yes*

01:10 <lkcl> :)

01:11 <lkcl> the next logical question is, then: given a suite of dynamically-partitionable logical and arithmetic primitives, can you provide nmigen m.If/Elif/Else and m.Switch/Case support on top of those primitives?

01:12 <lkcl> such that, with a partition context (that tells the Signal to subdivide into 1x64, 2x32, 4x16 or 8x8 dynamically)

01:12 <lkcl> you can still do even boolean if/else constructs that *also* subdivide into 1x64, 2x32, 4x16 or 8x8

01:13 <lkcl> thus eliminating completely the need to have 4 utterly separate pipelines with 4x the amount of gates.

01:14 nelgau has joined #nmigen

01:16 <whitequark> sure

01:17 <lkcl> i described this to Paul Mackerras, who is implementing microwatt Vectors in VHDL, and he went, "cool, i'm nicking that carry-propagating idea" :)

01:18 <lkcl> gt/lt/ge was hairy. shift was *real* hairy. multiply took around... 8 weeks to develop (!)

01:19 nelgau has quit [Ping timeout: 256 seconds]

01:19 <lkcl> m.If/Elif/Else is the last piece of the puzzle and we would not have to do massive rewrite / code-duplication (yay)

01:30 Degi_ has joined #nmigen

01:33 Degi has quit [Ping timeout: 240 seconds]

01:33 Degi_ is now known as Degi

01:59 nelgau has joined #nmigen

02:03 nelgau has quit [Ping timeout: 256 seconds]

02:23 nelgau has joined #nmigen

02:39 nelgau has quit [Ping timeout: 246 seconds]

03:29 lkcl has quit [Ping timeout: 256 seconds]

03:39 nelgau has joined #nmigen

03:42 electronic_eel has quit [Ping timeout: 260 seconds]

03:42 lkcl has joined #nmigen

03:42 electronic_eel has joined #nmigen

03:44 nelgau has quit [Ping timeout: 260 seconds]

03:58 lkcl has quit [Ping timeout: 272 seconds]

04:11 lkcl has joined #nmigen

04:13 nelgau has joined #nmigen

04:18 nelgau has quit [Ping timeout: 260 seconds]

04:20 PyroPeter_ has joined #nmigen

04:24 PyroPeter has quit [Ping timeout: 265 seconds]

04:24 PyroPeter_ is now known as PyroPeter

04:38 nelgau has joined #nmigen

04:43 nelgau has quit [Ping timeout: 260 seconds]

05:30 lkcl has quit [Ping timeout: 272 seconds]

05:37 nelgau has joined #nmigen

05:42 lkcl has joined #nmigen

05:42 nelgau has quit [Ping timeout: 240 seconds]

05:53 nelgau has joined #nmigen

06:03 nelgau has quit [Ping timeout: 272 seconds]

06:06 nelgau has joined #nmigen

06:07 lkcl has quit [Ping timeout: 260 seconds]

06:11 nelgau has quit [Ping timeout: 272 seconds]

06:20 lkcl has joined #nmigen

06:27 lkcl has quit [Ping timeout: 260 seconds]

06:41 lkcl has joined #nmigen

06:43 nelgau has joined #nmigen

06:48 nelgau has quit [Ping timeout: 272 seconds]

06:51 nelgau has joined #nmigen

06:56 nelgau has quit [Ping timeout: 272 seconds]

07:31 lkcl has quit [Ping timeout: 240 seconds]

07:45 lkcl has joined #nmigen

07:51 nelgau has joined #nmigen

07:53 emeb_mac has quit [Quit: Leaving.]

07:55 nelgau has quit [Ping timeout: 240 seconds]

08:44 nelgau has joined #nmigen

08:56 nelgau has quit [Ping timeout: 260 seconds]

09:51 jeanthom has joined #nmigen

11:12 emily has quit [Ping timeout: 244 seconds]

11:12 cesar[m] has quit [Ping timeout: 240 seconds]

11:13 gkelly has quit [Ping timeout: 268 seconds]

11:15 jfng has quit [Ping timeout: 240 seconds]

11:25 cesar[m] has joined #nmigen

11:38 jfng has joined #nmigen

11:41 emily has joined #nmigen

11:45 gkelly has joined #nmigen

11:50 nelgau has joined #nmigen

12:31 <pepijndevos> woooow, I tried to convert a design that simulates fine to verilog and got a huuuuuge stacktrace

12:32 <pepijndevos> https://bpa.st/3AVQ

12:41 <pepijndevos> I have no idea what it means, but it appears to be coming from my least favourite hack: the QArray, which is like an nMigen array for my custom datatype

14:30 <lkcl> pepijndevos: that's not huge :)

14:30 <pepijndevos> ...

14:30 <lkcl> and it is no use to provide just the stack trace, you need to show the source code

14:31 <lkcl> i know what the problem is (because i have seen this before): however i cannot point it out to you without the source code

14:31 chipmuenk has joined #nmigen

14:31 <pepijndevos> yea... I don't really have a minimal example though.

14:32 <lkcl> is the source closed?

14:32 <lkcl> i don't need a minimal example, just the existing source code will do.

14:33 <lkcl> i know what the problem is because i have done this mistake myself before

14:33 <pepijndevos> https://github.com/pepijndevos/excellerate

14:33 <lkcl> you are using a temporary intermediate expression to access an Array

14:34 <pepijndevos> https://github.com/pepijndevos/excellerate/blob/master/excellerate/fixedpoint.py#L169 this is where one line in the stacktrace pointed to iirc

14:34 <lkcl> where what you needed to do is: assign the thing "index" to a Signal

14:34 <lkcl> then use that Signal via __getitem__

14:35 <lkcl> whereas

14:35 <lkcl> you have fallen into the common mistake of forgetting that python variables which contain a batch of Abstract Syntax Tree

14:35 <lkcl> is not equal to

14:35 <lkcl> an actual Signal which has a NETLIST associated with it

14:36 <lkcl> let's see if the stacktrace contains the location where you're calling that...

14:36 <lkcl> no, unfortunately it doesn't.

14:37 <lkcl> so, _somewhere_ in your code, you have this:

14:37 <lkcl> x = QArray(...)

14:37 <pepijndevos> I'm guessing somewhere around here https://github.com/pepijndevos/excellerate/blob/master/excellerate/functions.py#L57

14:37 <lkcl> y = Signal(...)

14:37 <lkcl> expression = y[2:3] <<<--- here is the problem

14:38 <lkcl> z = x[expression] <<<--- QArray indexed by an expression

14:38 <pepijndevos> hmmmm

14:38 <lkcl> you want this:

14:38 <lkcl> index = Signal(...)

14:38 <lkcl> m.d.comb += index.eq(expression)

14:39 <lkcl> followed by

14:39 <lkcl> z = x[index] <<--- this is safe because QArray.__getitem__ is called with a Signal *NOT* a piece of Abstract Syntax Tree

14:40 <pepijndevos> I see... now... I need to find where this is happening...

14:40 <lkcl> trying to do this results in abbbbsoltuuutely f*****g awwwwfull HDL

14:41 <lkcl> as in: if it succeeded (was compiled correctly), the pmux that was generated would have that expression copied into it *multiple times*!

14:41 <lkcl> in every single place where you used that pmux!

14:41 <lkcl> and if that pmux is then *also* assigned to a piece of AST, that also gets copied multiple times into the output HDL!

14:42 <lkcl> can you imagine what that would look like, at the gate level?

14:42 <pepijndevos> aweful

14:42 <lkcl> hundreds to thousands of gates!

14:42 <lkcl> and the chances of yosys spotting it, and optimising them out? slim-to-negligeable

14:43 <lkcl> bottom line is, the abstraction that nmigen offers, you really do have to be very careful

14:44 <pepijndevos> Problem is... the only place I can find where I'm indexing an array is in https://github.com/pepijndevos/excellerate/blob/master/excellerate/functions.py#L57 and that's definitely a signal

14:46 <lkcl> pepijndevos: no, it's a piece of Abstract Syntax Tree fragment that requests that a pmux be created, should that AST fragment be handed to the IL compiler

14:46 <lkcl> you've *named* it sig but it is an expression

14:46 <lkcl> counter is a Signal

14:47 <lkcl> however at line 58, can you see how you perform an addition of acc+sig?

14:47 <lkcl> that is trying to use that copy of the AST-fragment "sig" in the RHS of *another* piece of AST fragment

14:47 <lkcl> so here is the first duplication of that (awful) pmux AST fragment

14:48 <lkcl> now we go on to line 66, see there is a second one?

14:48 <pepijndevos> ah, okay... in your example you suggested the problem is that the index to the array is an expression, but in my case the problem is that the *result* of the index operation is an expression that gets used multiple times?

14:48 <lkcl> so that's two copies of the AST-fragment requesting that the exact same pmux be inserted into the HDL

14:48 <lkcl> correct

14:48 <lkcl> so you can solve this with:

14:48 <lkcl> sig = SIgnal(....)

14:48 <lkcl> m.d.comb += sig.eq(args[counter])

14:48 <lkcl> that should do the job

14:49 <lkcl> he said

14:49 <pepijndevos> thanks, lemme try...

14:50 <lkcl> you should also consider having the sum acc+sig in a separate Signal so as to not have the HDL circuitry duplicated

14:51 <lkcl> although it is "not a lot of gates", yosys might not be able to detect the duplication at the depth involved in the FSM

14:52 <pepijndevos> hrm you're probably right

14:56 <pepijndevos> Welp, another huge stacktrace... but maybe I can figure this one out... although it doesn't even mention my own code...

14:57 <pepijndevos> AssertionError: Invalid constant <function signed at 0x7f627e783160>

14:57 <pepijndevos> is the bottom line...

14:58 <pepijndevos> I guess it refers to https://github.com/pepijndevos/excellerate/blob/master/excellerate/fixedpoint.py#L80

14:59 <pepijndevos> which is weird... because it's a property, nobody should ever see that as a function... so maybe it's something else...

16:57 emeb has joined #nmigen

18:22 pilmihilmi has joined #nmigen

18:25 jeanthom has quit [Ping timeout: 265 seconds]

18:35 chipmuenk has quit [Quit: chipmuenk]

18:39 thomas has joined #nmigen

18:39 thomas is now known as Guest41271

18:40 Guest41271 has quit [Client Quit]

18:41 Guest41271 has joined #nmigen

18:41 Guest41271 is now known as coldelectrons

18:42 coldelectrons has left #nmigen [#nmigen]

18:42 coldelectrons has joined #nmigen

18:51 <vup> coldelectrons: something like this: https://github.com/nmigen/nmigen/blob/7dfd7fb/examples/board/01_blinky.py ?

18:55 <coldelectrons> vup: maybe, but what I'm trying to do is an actual array, and I'm sure there must be something more elegant than typing out an m.d.comb statement for each led

18:58 <coldelectrons> The sad part is that I managed to do this sometime earlier this year, and I can't find my code or remember what I did

19:03 <vup> the convention is to add multiple leds as seperate resources

19:03 <vup> but you can do somethin like this: https://paste.niemo.de/eboqicitoy.py

19:03 <vup> coldelectrons: ^

19:07 <coldelectrons> vup: thank you!

19:07 <d1b2> <dub_dub_11> there is also an LEDResources resource that is standardised

19:07 chipmuenk has joined #nmigen

19:08 <vup> @dub_dub_11: yes, but you can't assign to all leds in a LEDResources at once

19:08 <vup> as it creates a Resources for each led

19:09 <d1b2> <dub_dub_11> oh really

19:09 <d1b2> <dub_dub_11> all the examples I'd seen create a single array

19:10 <d1b2> <dub_dub_11> like this from Arty A7 *LEDResources(pins="H5 J5 T9 T10", attrs=Attrs(IOSTANDARD="LVCMOS33")),

19:10 <vup> well yes

19:10 <vup> but this creates multiple Resources with the name "led"

19:10 <vup> you can't request them all at once

19:11 <vup> if you want all of them, you have to request them in a loop and build a list of them (and then you can assign to them all at once using `Cat`)

19:11 <vup> like this does: https://github.com/nmigen/nmigen-boards/blob/342b009/nmigen_boards/test/blinky.py

19:11 <d1b2> <dub_dub_11> oh right, I see the loop in blinky yeah

19:11 <d1b2> <dub_dub_11> on the subject I do have a question about that actually

19:12 <d1b2> <dub_dub_11> I've got a board where the user LEDs are on different banks with different voltages

19:14 <d1b2> <dub_dub_11> to handle that in the platform file so the user can treat it the same as LEDResources, that presumably means I can create multiple Resources named "led" then?

19:15 <vup> why not use multiple `LEDResources`?

19:16 <vup> oh wait that doesn't work

19:17 <vup> thats unfortunate

19:17 jeanthom has joined #nmigen

19:17 <vup> so yes, multiple Resources named "led" seems the way to go

19:18 <coldelectrons> I think the pythonic list comprehension is what I ended up doing before

19:55 <_whitenotifier-4> [nmigen] cestrauss commented on issue #439: fsm_state changes mid cycle - https://git.io/JIWoX

20:04 <awygle> whitequark: am i correct that adding ghdl to yowasp would be prohibitive due to it being written in ada?

20:36 emeb_mac has joined #nmigen

20:39 <_whitenotifier-4> [nmigen] whitequark commented on issue #439: fsm_state changes mid cycle - https://git.io/JIWXD

20:51 <whitequark> awygle: there's no ada compiler that targets wasm

20:51 <whitequark> well, there's gnat-llvm, but until someone proves that can successfully build against wasi, i'm not even going to try to use it

20:52 <whitequark> coldelectrons: indeed, a list comprehension that requests LED resources in a loop is the way to go

20:56 <awygle> yeah that's what i figured

20:56 <awygle> sounds like a fun project, but not something important enough to justify the time

21:00 petitionynd has joined #nmigen

21:07 <whitequark> awygle: have some time to discuss #324?

21:12 petitionynd has quit [K-Lined]

21:12 chipmuenk has quit [Quit: chipmuenk]

22:13 <awygle> whitequark: sure, how can I be useful?

22:36 FFY00 has quit [Remote host closed the connection]

22:42 <whitequark> awygle: so, there are only three significant issues left with cxxrtl

22:42 <whitequark> but they're thorny enough i'm not entirely sure which way of dealing with them is best

22:43 <whitequark> first... let's say #439

22:43 <whitequark> that's an incredibly long thread (don't bother reading it), i'll summarize the problem

22:45 <whitequark> the problem is that for pysim, all signals are effectively `wire`, double-buffered, with an init value

22:46 <whitequark> but for cxxsim (which inherits this behavior from back.rtlil), undriven signals are treated specially: they are turned into inputs

22:46 <whitequark> the cause of #439 is that their init values are lost

22:47 <whitequark> so when you're simulating a part of a design, you don't normally carefully expose all inputs as ports (or at least, it's not something we require right now)

22:48 <whitequark> which means that there's no difference between "input" and "stays at reset value"

22:52 <awygle> i see

22:52 <awygle> and we can't have back.rtlil emit a constant instead of an input because it's also used for synthesis i suppose? although inputs in synthesis should be ports i guess so maybe that's not true

22:53 <whitequark> we can trivially make it emit a constant

22:53 <whitequark> the problem is that then you can't drive/override it from a testbench

22:53 <awygle> oh, sure

22:53 <whitequark> (well, it would not be very nice to make back.rtlil behave differently for synthesis or simulation, but it's not *that* bad)

22:55 <awygle> thinking out loud - emit an sdff for undriven non-port signals? and rely on yosys to do the right thing in synthesis?

22:55 <awygle> or rdff i dunno whatever yosys calls "ff with initial value"

22:55 <whitequark> ff.

22:55 <whitequark> well, dff.

22:56 <whitequark> this would cause issues for both synthesis and simulation

22:56 <whitequark> hm, let me check something

23:00 jeanthom has quit [Ping timeout: 256 seconds]

23:02 <whitequark> awygle: ok yeah, opt_dff will collapse that ff out of existence

23:02 <whitequark> but... can we rely that every proprietary toolchain will optimize out `always @(posedge clk) r <= 1'b0;` ?

23:03 <awygle> if we can't rely on that like.... i dunno what to say

23:03 <awygle> that's the most trivial optimization possible

23:03 <awygle> can you run opt_dff before technology mapping? we could run that before write_verilog, i guess, but that seems fraught

23:04 <whitequark> nope

23:04 <whitequark> doesn't work if there are processes

23:05 <awygle> ah, figures

23:06 <whitequark> so there are some solutions i've been considering

23:08 <whitequark> first, i could slap an init value on every input. like `input reg i = 1'b1;`. which is obviously not valid verilog etc

23:08 <whitequark> this is the workaround i posted in the last comment in #439

23:10 <whitequark> unfortunately it seems like it interacts badly with other stuff i haven't yet looked into, which isn't really surprising

23:13 <whitequark> second, i could always enumerate through every signal at reset, check if it's an undriven input, and set the proper reset value to it

23:13 <whitequark> which is something i've tried hard to avoid, since ffi from python is slow

23:15 <awygle> yeah i see why neither of those is desireable

23:19 <awygle> what does happen if the synthesis toolchain doesn't eat the flop that has a constant value? is it something worse than "one extra flop is used"?

23:27 <whitequark> it prevents further optimizations

23:27 <whitequark> also, don't forget that i can't emit a $dff cell

23:28 <whitequark> it's more complicated because it has to be accessible through introspection

23:28 <awygle> right

23:29 <whitequark> at least, the cell's clock has to be a wire

23:29 <whitequark> and not a wire tied to a constant either

23:29 <awygle> i'm not too fussed about preventing further optimizations tbh, on the theory that any toolchain which can't optimize out a constant flip flop isn't going to do any useful optimizations anyway

23:29 <whitequark> yeah, but like i mentioned, there are other issues here

23:29 <awygle> as for the rest of it i don't have a clear enough picture of the mechanics at that level

23:31 <whitequark> the other issue i have to consider is what happens if e.g. a clock becomes driven by a DFF

23:31 <whitequark> right now this will uh... just completely prevent any logic connected to that clock from firing

23:31 <whitequark> (this is another open issue i was going to raise)

23:32 <awygle> it feels to me like conceptually we're talking about an object which is a flip flop if driven by the testbench and a constant otherwise, which is why i find myself drawn to "let's pawn that determination off on the toolchain"

23:32 <whitequark> almost but not quite

23:32 <whitequark> it is an object that has state in a testbench

23:32 <awygle> the other option that makes sense to me conceptually is to require you to expose any input as a port

23:32 <whitequark> but it is not a real FF since there is no clock

23:33 <awygle> aren't the testbench processes clocked though? at least usually

23:33 <whitequark> in nmigen/cxxrtl "clocked" only applies to netlists

23:33 <whitequark> testbench processes can wait on a trigger

23:34 <awygle> mm, ok

23:34 <whitequark> basically, the undriven signal must stay a "wire" in cxxrtl

23:34 <whitequark> rather than a "value" it becomes now because it is an input

23:34 <whitequark> ... in fact, if it will *not* become an input, it might actually do what i want?

23:35 <whitequark> hold on i have an idea