<lkcl>
cf the (brief) conversation we had a couple months back: we're doing a SIMD-capable processor
<lkcl>
this could be done in an absolutely terrible way: one pipeline for 1x64, one for 2x32, one for 4x16, another for 8x8
nelgau has quit [Ping timeout: 256 seconds]
<lkcl>
turns out that by inserting "partition" points (7 of them) and using carry-overflow you can subdivide a 64-bit adder to do all those combinations by inserting 7 "partition" bits, one every byte
<lkcl>
we then went, "well if add can be dynamically partitioned to do all the different SIMD operations, can shift, greater, less-than be partitioned as well?"
<lkcl>
and the answer turns out to be *yes*
<lkcl>
:)
<lkcl>
the next logical question is, then: given a suite of dynamically-partitionable logical and arithmetic primitives, can you provide nmigen m.If/Elif/Else and m.Switch/Case support on top of those primitives?
<lkcl>
such that, with a partition context (that tells the Signal to subdivide into 1x64, 2x32, 4x16 or 8x8 dynamically)
<lkcl>
you can still do even boolean if/else constructs that *also* subdivide into 1x64, 2x32, 4x16 or 8x8
<lkcl>
thus eliminating completely the need to have 4 utterly separate pipelines with 4x the amount of gates.
nelgau has joined #nmigen
<whitequark>
sure
<lkcl>
i described this to Paul Mackerras, who is implementing microwatt Vectors in VHDL, and he went, "cool, i'm nicking that carry-propagating idea" :)
<lkcl>
gt/lt/ge was hairy. shift was *real* hairy. multiply took around... 8 weeks to develop (!)
nelgau has quit [Ping timeout: 256 seconds]
<lkcl>
m.If/Elif/Else is the last piece of the puzzle and we would not have to do massive rewrite / code-duplication (yay)
Degi_ has joined #nmigen
Degi has quit [Ping timeout: 240 seconds]
Degi_ is now known as Degi
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 256 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 246 seconds]
lkcl has quit [Ping timeout: 256 seconds]
nelgau has joined #nmigen
electronic_eel has quit [Ping timeout: 260 seconds]
lkcl has joined #nmigen
electronic_eel has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
lkcl has quit [Ping timeout: 272 seconds]
lkcl has joined #nmigen
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 265 seconds]
PyroPeter_ is now known as PyroPeter
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
lkcl has quit [Ping timeout: 272 seconds]
nelgau has joined #nmigen
lkcl has joined #nmigen
nelgau has quit [Ping timeout: 240 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
nelgau has joined #nmigen
lkcl has quit [Ping timeout: 260 seconds]
nelgau has quit [Ping timeout: 272 seconds]
lkcl has joined #nmigen
lkcl has quit [Ping timeout: 260 seconds]
lkcl has joined #nmigen
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 272 seconds]
lkcl has quit [Ping timeout: 240 seconds]
lkcl has joined #nmigen
nelgau has joined #nmigen
emeb_mac has quit [Quit: Leaving.]
nelgau has quit [Ping timeout: 240 seconds]
nelgau has joined #nmigen
nelgau has quit [Ping timeout: 260 seconds]
jeanthom has joined #nmigen
emily has quit [Ping timeout: 244 seconds]
cesar[m] has quit [Ping timeout: 240 seconds]
gkelly has quit [Ping timeout: 268 seconds]
jfng has quit [Ping timeout: 240 seconds]
cesar[m] has joined #nmigen
jfng has joined #nmigen
emily has joined #nmigen
gkelly has joined #nmigen
nelgau has joined #nmigen
<pepijndevos>
woooow, I tried to convert a design that simulates fine to verilog and got a huuuuuge stacktrace
<pepijndevos>
I have no idea what it means, but it appears to be coming from my least favourite hack: the QArray, which is like an nMigen array for my custom datatype
<lkcl>
pepijndevos: that's not huge :)
<pepijndevos>
...
<lkcl>
and it is no use to provide just the stack trace, you need to show the source code
<lkcl>
i know what the problem is (because i have seen this before): however i cannot point it out to you without the source code
chipmuenk has joined #nmigen
<pepijndevos>
yea... I don't really have a minimal example though.
<lkcl>
is the source closed?
<lkcl>
i don't need a minimal example, just the existing source code will do.
<lkcl>
i know what the problem is because i have done this mistake myself before
<lkcl>
pepijndevos: no, it's a piece of Abstract Syntax Tree fragment that requests that a pmux be created, should that AST fragment be handed to the IL compiler
<lkcl>
you've *named* it sig but it is an expression
<lkcl>
counter is a Signal
<lkcl>
however at line 58, can you see how you perform an addition of acc+sig?
<lkcl>
that is trying to use that copy of the AST-fragment "sig" in the RHS of *another* piece of AST fragment
<lkcl>
so here is the first duplication of that (awful) pmux AST fragment
<lkcl>
now we go on to line 66, see there is a second one?
<pepijndevos>
ah, okay... in your example you suggested the problem is that the index to the array is an expression, but in my case the problem is that the *result* of the index operation is an expression that gets used multiple times?
<lkcl>
so that's two copies of the AST-fragment requesting that the exact same pmux be inserted into the HDL
<lkcl>
correct
<lkcl>
so you can solve this with:
<lkcl>
sig = SIgnal(....)
<lkcl>
m.d.comb += sig.eq(args[counter])
<lkcl>
that should do the job
<lkcl>
he said
<pepijndevos>
thanks, lemme try...
<lkcl>
you should also consider having the sum acc+sig in a separate Signal so as to not have the HDL circuitry duplicated
<lkcl>
although it is "not a lot of gates", yosys might not be able to detect the duplication at the depth involved in the FSM
<pepijndevos>
hrm you're probably right
<pepijndevos>
Welp, another huge stacktrace... but maybe I can figure this one out... although it doesn't even mention my own code...
<pepijndevos>
AssertionError: Invalid constant <function signed at 0x7f627e783160>
<coldelectrons>
vup: maybe, but what I'm trying to do is an actual array, and I'm sure there must be something more elegant than typing out an m.d.comb statement for each led
<coldelectrons>
The sad part is that I managed to do this sometime earlier this year, and I can't find my code or remember what I did
<vup>
the convention is to add multiple leds as seperate resources
<d1b2>
<dub_dub_11> oh right, I see the loop in blinky yeah
<d1b2>
<dub_dub_11> on the subject I do have a question about that actually
<d1b2>
<dub_dub_11> I've got a board where the user LEDs are on different banks with different voltages
<d1b2>
<dub_dub_11> to handle that in the platform file so the user can treat it the same as LEDResources, that presumably means I can create multiple Resources named "led" then?
<vup>
why not use multiple `LEDResources`?
<vup>
oh wait that doesn't work
<vup>
thats unfortunate
jeanthom has joined #nmigen
<vup>
so yes, multiple Resources named "led" seems the way to go
<coldelectrons>
I think the pythonic list comprehension is what I ended up doing before
<whitequark>
awygle: there's no ada compiler that targets wasm
<whitequark>
well, there's gnat-llvm, but until someone proves that can successfully build against wasi, i'm not even going to try to use it
<whitequark>
coldelectrons: indeed, a list comprehension that requests LED resources in a loop is the way to go
<awygle>
yeah that's what i figured
<awygle>
sounds like a fun project, but not something important enough to justify the time
petitionynd has joined #nmigen
<whitequark>
awygle: have some time to discuss #324?
petitionynd has quit [K-Lined]
chipmuenk has quit [Quit: chipmuenk]
<awygle>
whitequark: sure, how can I be useful?
FFY00 has quit [Remote host closed the connection]
<whitequark>
awygle: so, there are only three significant issues left with cxxrtl
<whitequark>
but they're thorny enough i'm not entirely sure which way of dealing with them is best
<whitequark>
first... let's say #439
<whitequark>
that's an incredibly long thread (don't bother reading it), i'll summarize the problem
<whitequark>
the problem is that for pysim, all signals are effectively `wire`, double-buffered, with an init value
<whitequark>
but for cxxsim (which inherits this behavior from back.rtlil), undriven signals are treated specially: they are turned into inputs
<whitequark>
the cause of #439 is that their init values are lost
<whitequark>
so when you're simulating a part of a design, you don't normally carefully expose all inputs as ports (or at least, it's not something we require right now)
<whitequark>
which means that there's no difference between "input" and "stays at reset value"
<awygle>
i see
<awygle>
and we can't have back.rtlil emit a constant instead of an input because it's also used for synthesis i suppose? although inputs in synthesis should be ports i guess so maybe that's not true
<whitequark>
we can trivially make it emit a constant
<whitequark>
the problem is that then you can't drive/override it from a testbench
<awygle>
oh, sure
<whitequark>
(well, it would not be very nice to make back.rtlil behave differently for synthesis or simulation, but it's not *that* bad)
<awygle>
thinking out loud - emit an sdff for undriven non-port signals? and rely on yosys to do the right thing in synthesis?
<awygle>
or rdff i dunno whatever yosys calls "ff with initial value"
<whitequark>
ff.
<whitequark>
well, dff.
<whitequark>
this would cause issues for both synthesis and simulation
<whitequark>
hm, let me check something
jeanthom has quit [Ping timeout: 256 seconds]
<whitequark>
awygle: ok yeah, opt_dff will collapse that ff out of existence
<whitequark>
but... can we rely that every proprietary toolchain will optimize out `always @(posedge clk) r <= 1'b0;` ?
<awygle>
if we can't rely on that like.... i dunno what to say
<awygle>
that's the most trivial optimization possible
<awygle>
can you run opt_dff before technology mapping? we could run that before write_verilog, i guess, but that seems fraught
<whitequark>
nope
<whitequark>
doesn't work if there are processes
<awygle>
ah, figures
<whitequark>
so there are some solutions i've been considering
<whitequark>
first, i could slap an init value on every input. like `input reg i = 1'b1;`. which is obviously not valid verilog etc
<whitequark>
this is the workaround i posted in the last comment in #439
<whitequark>
unfortunately it seems like it interacts badly with other stuff i haven't yet looked into, which isn't really surprising
<whitequark>
second, i could always enumerate through every signal at reset, check if it's an undriven input, and set the proper reset value to it
<whitequark>
which is something i've tried hard to avoid, since ffi from python is slow
<awygle>
yeah i see why neither of those is desireable
<awygle>
what does happen if the synthesis toolchain doesn't eat the flop that has a constant value? is it something worse than "one extra flop is used"?
<whitequark>
it prevents further optimizations
<whitequark>
also, don't forget that i can't emit a $dff cell
<whitequark>
it's more complicated because it has to be accessible through introspection
<awygle>
right
<whitequark>
at least, the cell's clock has to be a wire
<whitequark>
and not a wire tied to a constant either
<awygle>
i'm not too fussed about preventing further optimizations tbh, on the theory that any toolchain which can't optimize out a constant flip flop isn't going to do any useful optimizations anyway
<whitequark>
yeah, but like i mentioned, there are other issues here
<awygle>
as for the rest of it i don't have a clear enough picture of the mechanics at that level
<whitequark>
the other issue i have to consider is what happens if e.g. a clock becomes driven by a DFF
<whitequark>
right now this will uh... just completely prevent any logic connected to that clock from firing
<whitequark>
(this is another open issue i was going to raise)
<awygle>
it feels to me like conceptually we're talking about an object which is a flip flop if driven by the testbench and a constant otherwise, which is why i find myself drawn to "let's pawn that determination off on the toolchain"
<whitequark>
almost but not quite
<whitequark>
it is an object that has state in a testbench
<awygle>
the other option that makes sense to me conceptually is to require you to expose any input as a port
<whitequark>
but it is not a real FF since there is no clock
<awygle>
aren't the testbench processes clocked though? at least usually
<whitequark>
in nmigen/cxxrtl "clocked" only applies to netlists
<whitequark>
testbench processes can wait on a trigger
<awygle>
mm, ok
<whitequark>
basically, the undriven signal must stay a "wire" in cxxrtl
<whitequark>
rather than a "value" it becomes now because it is an input
<whitequark>
... in fact, if it will *not* become an input, it might actually do what i want?