ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen · IRC meetings each Monday at 1800 UTC · next meeting November 23th
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
<lkcl> whitequark: if we needed to do a parallel (SIMD) version of Module, having all of the with m.If/Else capability
<lkcl> here would be where the AST to do so would be needed, is that right?
nelgau has joined #nmigen
<whitequark> needs more context
<lkcl> cf the (brief) conversation we had a couple months back: we're doing a SIMD-capable processor
<lkcl> this could be done in an absolutely terrible way: one pipeline for 1x64, one for 2x32, one for 4x16, another for 8x8
nelgau has quit [Ping timeout: 256 seconds]
<lkcl> turns out that by inserting "partition" points (7 of them) and using carry-overflow you can subdivide a 64-bit adder to do all those combinations by inserting 7 "partition" bits, one every byte
<lkcl> we then went, "well if add can be dynamically partitioned to do all the different SIMD operations, can shift, greater, less-than be partitioned as well?"
<lkcl> and the answer turns out to be *yes*
<lkcl> :)
<lkcl> the next logical question is, then: given a suite of dynamically-partitionable logical and arithmetic primitives, can you provide nmigen m.If/Elif/Else and m.Switch/Case support on top of those primitives?
<lkcl> such that, with a partition context (that tells the Signal to subdivide into 1x64, 2x32, 4x16 or 8x8 dynamically)
<lkcl> you can still do even boolean if/else constructs that *also* subdivide into 1x64, 2x32, 4x16 or 8x8
<lkcl> thus eliminating completely the need to have 4 utterly separate pipelines with 4x the amount of gates.
nelgau has joined #nmigen
<whitequark> sure
<lkcl> i described this to Paul Mackerras, who is implementing microwatt Vectors in VHDL, and he went, "cool, i'm nicking that carry-propagating idea" :)
<lkcl> gt/lt/ge was hairy. shift was *real* hairy. multiply took around... 8 weeks to develop (!)
nelgau has quit [Ping timeout: 256 seconds]
<lkcl> m.If/Elif/Else is the last piece of the puzzle and we would not have to do massive rewrite / code-duplication (yay)
Degi_ has joined #nmigen
Degi has quit [Ping timeout: 240 seconds]
Degi_ is now known as Degi
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 256 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 246 seconds]
lkcl has quit [Ping timeout: 256 seconds]
nelgau has joined #nmigen
electronic_eel has quit [Ping timeout: 260 seconds]
lkcl has joined #nmigen
electronic_eel has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
lkcl has quit [Ping timeout: 272 seconds]
lkcl has joined #nmigen
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 265 seconds]
PyroPeter_ is now known as PyroPeter
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
lkcl has quit [Ping timeout: 272 seconds]
nelgau has joined #nmigen
lkcl has joined #nmigen
nelgau has quit [Ping timeout: 240 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
nelgau has joined #nmigen
lkcl has quit [Ping timeout: 260 seconds]
nelgau has quit [Ping timeout: 272 seconds]
lkcl has joined #nmigen
lkcl has quit [Ping timeout: 260 seconds]
lkcl has joined #nmigen
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
lkcl has quit [Ping timeout: 240 seconds]
lkcl has joined #nmigen
nelgau has joined #nmigen
emeb_mac has quit [Quit: Leaving.]
nelgau has quit [Ping timeout: 240 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
jeanthom has joined #nmigen
emily has quit [Ping timeout: 244 seconds]
cesar[m] has quit [Ping timeout: 240 seconds]
gkelly has quit [Ping timeout: 268 seconds]
jfng has quit [Ping timeout: 240 seconds]
cesar[m] has joined #nmigen
jfng has joined #nmigen
emily has joined #nmigen
gkelly has joined #nmigen
nelgau has joined #nmigen
<pepijndevos> woooow, I tried to convert a design that simulates fine to verilog and got a huuuuuge stacktrace
<pepijndevos> https://bpa.st/3AVQ
<pepijndevos> I have no idea what it means, but it appears to be coming from my least favourite hack: the QArray, which is like an nMigen array for my custom datatype
<lkcl> pepijndevos: that's not huge :)
<pepijndevos> ...
<lkcl> and it is no use to provide just the stack trace, you need to show the source code
<lkcl> i know what the problem is (because i have seen this before): however i cannot point it out to you without the source code
chipmuenk has joined #nmigen
<pepijndevos> yea... I don't really have a minimal example though.
<lkcl> is the source closed?
<lkcl> i don't need a minimal example, just the existing source code will do.
<lkcl> i know what the problem is because i have done this mistake myself before
<lkcl> you are using a temporary intermediate expression to access an Array
<pepijndevos> https://github.com/pepijndevos/excellerate/blob/master/excellerate/fixedpoint.py#L169 this is where one line in the stacktrace pointed to iirc
<lkcl> where what you needed to do is: assign the thing "index" to a Signal
<lkcl> then use that Signal via __getitem__
<lkcl> whereas
<lkcl> you have fallen into the common mistake of forgetting that python variables which contain a batch of Abstract Syntax Tree
<lkcl> is not equal to
<lkcl> an actual Signal which has a NETLIST associated with it
<lkcl> let's see if the stacktrace contains the location where you're calling that...
<lkcl> no, unfortunately it doesn't.
<lkcl> so, _somewhere_ in your code, you have this:
<lkcl> x = QArray(...)
<lkcl> y = Signal(...)
<lkcl> expression = y[2:3] <<<--- here is the problem
<lkcl> z = x[expression] <<<--- QArray indexed by an expression
<pepijndevos> hmmmm
<lkcl> you want this:
<lkcl> index = Signal(...)
<lkcl> m.d.comb += index.eq(expression)
<lkcl> followed by
<lkcl> z = x[index] <<--- this is safe because QArray.__getitem__ is called with a Signal *NOT* a piece of Abstract Syntax Tree
<pepijndevos> I see... now... I need to find where this is happening...
<lkcl> trying to do this results in abbbbsoltuuutely f*****g awwwwfull HDL
<lkcl> as in: if it succeeded (was compiled correctly), the pmux that was generated would have that expression copied into it *multiple times*!
<lkcl> in every single place where you used that pmux!
<lkcl> and if that pmux is then *also* assigned to a piece of AST, that also gets copied multiple times into the output HDL!
<lkcl> can you imagine what that would look like, at the gate level?
<pepijndevos> aweful
<lkcl> hundreds to thousands of gates!
<lkcl> and the chances of yosys spotting it, and optimising them out? slim-to-negligeable
<lkcl> bottom line is, the abstraction that nmigen offers, you really do have to be very careful
<pepijndevos> Problem is... the only place I can find where I'm indexing an array is in https://github.com/pepijndevos/excellerate/blob/master/excellerate/functions.py#L57 and that's definitely a signal
<lkcl> pepijndevos: no, it's a piece of Abstract Syntax Tree fragment that requests that a pmux be created, should that AST fragment be handed to the IL compiler
<lkcl> you've *named* it sig but it is an expression
<lkcl> counter is a Signal
<lkcl> however at line 58, can you see how you perform an addition of acc+sig?
<lkcl> that is trying to use that copy of the AST-fragment "sig" in the RHS of *another* piece of AST fragment
<lkcl> so here is the first duplication of that (awful) pmux AST fragment
<lkcl> now we go on to line 66, see there is a second one?
<pepijndevos> ah, okay... in your example you suggested the problem is that the index to the array is an expression, but in my case the problem is that the *result* of the index operation is an expression that gets used multiple times?
<lkcl> so that's two copies of the AST-fragment requesting that the exact same pmux be inserted into the HDL
<lkcl> correct
<lkcl> so you can solve this with:
<lkcl> sig = SIgnal(....)
<lkcl> m.d.comb += sig.eq(args[counter])
<lkcl> that should do the job
<lkcl> he said
<pepijndevos> thanks, lemme try...
<lkcl> you should also consider having the sum acc+sig in a separate Signal so as to not have the HDL circuitry duplicated
<lkcl> although it is "not a lot of gates", yosys might not be able to detect the duplication at the depth involved in the FSM
<pepijndevos> hrm you're probably right
<pepijndevos> Welp, another huge stacktrace... but maybe I can figure this one out... although it doesn't even mention my own code...
<pepijndevos> AssertionError: Invalid constant <function signed at 0x7f627e783160>
<pepijndevos> is the bottom line...
<pepijndevos> which is weird... because it's a property, nobody should ever see that as a function... so maybe it's something else...
emeb has joined #nmigen
pilmihilmi has joined #nmigen
jeanthom has quit [Ping timeout: 265 seconds]
chipmuenk has quit [Quit: chipmuenk]
thomas has joined #nmigen
thomas is now known as Guest41271
Guest41271 has quit [Client Quit]
Guest41271 has joined #nmigen
Guest41271 is now known as coldelectrons
coldelectrons has left #nmigen [#nmigen]
coldelectrons has joined #nmigen
<coldelectrons> vup: maybe, but what I'm trying to do is an actual array, and I'm sure there must be something more elegant than typing out an m.d.comb statement for each led
<coldelectrons> The sad part is that I managed to do this sometime earlier this year, and I can't find my code or remember what I did
<vup> the convention is to add multiple leds as seperate resources
<vup> but you can do somethin like this: https://paste.niemo.de/eboqicitoy.py
<vup> coldelectrons: ^
<coldelectrons> vup: thank you!
<d1b2> <dub_dub_11> there is also an LEDResources resource that is standardised
chipmuenk has joined #nmigen
<vup> @dub_dub_11: yes, but you can't assign to all leds in a LEDResources at once
<vup> as it creates a Resources for each led
<d1b2> <dub_dub_11> oh really
<d1b2> <dub_dub_11> all the examples I'd seen create a single array
<d1b2> <dub_dub_11> like this from Arty A7 *LEDResources(pins="H5 J5 T9 T10", attrs=Attrs(IOSTANDARD="LVCMOS33")),
<vup> well yes
<vup> but this creates multiple Resources with the name "led"
<vup> you can't request them all at once
<vup> if you want all of them, you have to request them in a loop and build a list of them (and then you can assign to them all at once using `Cat`)
<d1b2> <dub_dub_11> oh right, I see the loop in blinky yeah
<d1b2> <dub_dub_11> on the subject I do have a question about that actually
<d1b2> <dub_dub_11> I've got a board where the user LEDs are on different banks with different voltages
<d1b2> <dub_dub_11> to handle that in the platform file so the user can treat it the same as LEDResources, that presumably means I can create multiple Resources named "led" then?
<vup> why not use multiple `LEDResources`?
<vup> oh wait that doesn't work
<vup> thats unfortunate
jeanthom has joined #nmigen
<vup> so yes, multiple Resources named "led" seems the way to go
<coldelectrons> I think the pythonic list comprehension is what I ended up doing before
<_whitenotifier-4> [nmigen] cestrauss commented on issue #439: fsm_state changes mid cycle - https://git.io/JIWoX
<awygle> whitequark: am i correct that adding ghdl to yowasp would be prohibitive due to it being written in ada?
emeb_mac has joined #nmigen
<_whitenotifier-4> [nmigen] whitequark commented on issue #439: fsm_state changes mid cycle - https://git.io/JIWXD
<whitequark> awygle: there's no ada compiler that targets wasm
<whitequark> well, there's gnat-llvm, but until someone proves that can successfully build against wasi, i'm not even going to try to use it
<whitequark> coldelectrons: indeed, a list comprehension that requests LED resources in a loop is the way to go
<awygle> yeah that's what i figured
<awygle> sounds like a fun project, but not something important enough to justify the time
petitionynd has joined #nmigen
<whitequark> awygle: have some time to discuss #324?
petitionynd has quit [K-Lined]
chipmuenk has quit [Quit: chipmuenk]
<awygle> whitequark: sure, how can I be useful?
FFY00 has quit [Remote host closed the connection]
<whitequark> awygle: so, there are only three significant issues left with cxxrtl
<whitequark> but they're thorny enough i'm not entirely sure which way of dealing with them is best
<whitequark> first... let's say #439
<whitequark> that's an incredibly long thread (don't bother reading it), i'll summarize the problem
<whitequark> the problem is that for pysim, all signals are effectively `wire`, double-buffered, with an init value
<whitequark> but for cxxsim (which inherits this behavior from back.rtlil), undriven signals are treated specially: they are turned into inputs
<whitequark> the cause of #439 is that their init values are lost
<whitequark> so when you're simulating a part of a design, you don't normally carefully expose all inputs as ports (or at least, it's not something we require right now)
<whitequark> which means that there's no difference between "input" and "stays at reset value"
<awygle> i see
<awygle> and we can't have back.rtlil emit a constant instead of an input because it's also used for synthesis i suppose? although inputs in synthesis should be ports i guess so maybe that's not true
<whitequark> we can trivially make it emit a constant
<whitequark> the problem is that then you can't drive/override it from a testbench
<awygle> oh, sure
<whitequark> (well, it would not be very nice to make back.rtlil behave differently for synthesis or simulation, but it's not *that* bad)
<awygle> thinking out loud - emit an sdff for undriven non-port signals? and rely on yosys to do the right thing in synthesis?
<awygle> or rdff i dunno whatever yosys calls "ff with initial value"
<whitequark> ff.
<whitequark> well, dff.
<whitequark> this would cause issues for both synthesis and simulation
<whitequark> hm, let me check something
jeanthom has quit [Ping timeout: 256 seconds]
<whitequark> awygle: ok yeah, opt_dff will collapse that ff out of existence
<whitequark> but... can we rely that every proprietary toolchain will optimize out `always @(posedge clk) r <= 1'b0;` ?
<awygle> if we can't rely on that like.... i dunno what to say
<awygle> that's the most trivial optimization possible
<awygle> can you run opt_dff before technology mapping? we could run that before write_verilog, i guess, but that seems fraught
<whitequark> nope
<whitequark> doesn't work if there are processes
<awygle> ah, figures
<whitequark> so there are some solutions i've been considering
<whitequark> first, i could slap an init value on every input. like `input reg i = 1'b1;`. which is obviously not valid verilog etc
<whitequark> this is the workaround i posted in the last comment in #439
<whitequark> unfortunately it seems like it interacts badly with other stuff i haven't yet looked into, which isn't really surprising
<whitequark> second, i could always enumerate through every signal at reset, check if it's an undriven input, and set the proper reset value to it
<whitequark> which is something i've tried hard to avoid, since ffi from python is slow
<awygle> yeah i see why neither of those is desireable
<awygle> what does happen if the synthesis toolchain doesn't eat the flop that has a constant value? is it something worse than "one extra flop is used"?
<whitequark> it prevents further optimizations
<whitequark> also, don't forget that i can't emit a $dff cell
<whitequark> it's more complicated because it has to be accessible through introspection
<awygle> right
<whitequark> at least, the cell's clock has to be a wire
<whitequark> and not a wire tied to a constant either
<awygle> i'm not too fussed about preventing further optimizations tbh, on the theory that any toolchain which can't optimize out a constant flip flop isn't going to do any useful optimizations anyway
<whitequark> yeah, but like i mentioned, there are other issues here
<awygle> as for the rest of it i don't have a clear enough picture of the mechanics at that level
<whitequark> the other issue i have to consider is what happens if e.g. a clock becomes driven by a DFF
<whitequark> right now this will uh... just completely prevent any logic connected to that clock from firing
<whitequark> (this is another open issue i was going to raise)
<awygle> it feels to me like conceptually we're talking about an object which is a flip flop if driven by the testbench and a constant otherwise, which is why i find myself drawn to "let's pawn that determination off on the toolchain"
<whitequark> almost but not quite
<whitequark> it is an object that has state in a testbench
<awygle> the other option that makes sense to me conceptually is to require you to expose any input as a port
<whitequark> but it is not a real FF since there is no clock
<awygle> aren't the testbench processes clocked though? at least usually
<whitequark> in nmigen/cxxrtl "clocked" only applies to netlists
<whitequark> testbench processes can wait on a trigger
<awygle> mm, ok
<whitequark> basically, the undriven signal must stay a "wire" in cxxrtl
<whitequark> rather than a "value" it becomes now because it is an input
<whitequark> ... in fact, if it will *not* become an input, it might actually do what i want?
<whitequark> hold on i have an idea