#nmigen on 2020-05-31 — irc logs at freenode.irclog.whitequark.org

2020-01-27 18:31 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen

00:16 <agg> is it possible to get an ice40 SB_IO in "PIN_OUTPUT_REGISTERED_ENABLE" mode (i.e., enable is *not* registered) without just instantiating the SB_IO myself and putting dir="-"?

00:17 <agg> as far as I can see from vendor/lattice_ice40 it's not an option, all dir=io xdr=1 will have registered enable

00:19 <zignig> I think I have worked it out, using main generate , rather than platform build seems to work.

00:19 <zignig> at least it make some .cpp now.

00:19 <zignig> thanks anyway agg. ;)

00:20 <agg> sorry, i was asking my own question :X

00:21 <zignig> agg: erk, sorry.

00:21 <zignig> agg: so you want a bidirectional pin that you can select input / output ?

00:21 <agg> my bus is contending because my OE is deasserted in good time but the extra flop means a half period of sadness https://imgur.com/a/JL6jlid

00:22 <agg> I have dir="io" xdr=1, which generates an SB_IO with output mode PIN_OUTPUT_REGISTERED_ENABLE_REGISTERED

00:22 <agg> registering the output is fine, but i don't want the enable registered

00:23 <agg> the hardware supports it, but I don't think nmigen has a way to emit it

00:24 <agg> maybe I shouldn't use xdr=1 at all for this bus... hmm....

00:24 <zignig> try without it , how are you swapping you output mode , sync or comb ?

00:25 <agg> oe is driven comb, which is why it toggles so soon after the clock rising edge

00:25 <agg> but there's a flop inside the SB_IO instance which is enabled by the lattice_ice40.py platform code when xdr=1, so the output isn't disabled for an extra clock period

00:26 <agg> I wanted xdr=1 because this line bus is externally clocked and that way you can directly feed the input clock to the flop inside the SB_IO, ensuring it gets sampled right on the clock edge

00:27 <agg> (I guess ideally I want to specify input and output xdr differently :P)

00:35 <zignig> agg: is that even possible ?

00:36 <zignig> can you use xdr=2 and navigate the doubling in code ?

00:40 <agg> xdr=2 also registers the enable

00:40 <agg> it's possible in that if you set a parameter bit differently in the SB_IO instance it will do what I want; it's a supported mode in the datasheet; just I don't think nmigen has a way to do it with the built in platform

00:43 <_whitenotifier-f> [nmigen] shawnanastasio opened pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IM

00:45 <zignig> agg: you could use an Instance and hard declare _that_ pin with the correct settings

00:45 <zignig> but that takes you back to your original question ....

00:46 <agg> yea, i'm happy to do that if need be, might just be an unusual use case

00:46 <agg> going to stare at my timing diagrams harder to see if i can tell that OE needs to die one cycle earlier than currently, which would also fix it

00:51 <zignig> agg: staring at timing diagrams to find ZEN huh ? ;P

00:52 <agg> in theory digital logic seems so simple: everything happens on the clock, time is discrete, everyone's happy

00:52 <agg> but it all breaks down so quickly

00:52 <zignig> and then physics gets in the way ! so rude.

00:53 <agg> so rude!

00:54 <agg> even just the logic of a d flop sampling on the clock and holding that sampled value for the rest of the clock period... but a second series d-flop would sample its old value, i.e. the clock-to-out time is always larger than the input hold time, so all the transitions on my diagram are really little deltas away from the clocks

00:55 <zignig> just set you clock speed to Zero Hz , problem solved.

00:59 <_whitenotifier-f> [nmigen] codecov[bot] commented on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IF

00:59 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IF

01:01 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IF

01:11 <awygle> "PIN_OUTPUT_REGISTERED_ENABLE_REGISTERED" wow, best constant ever

01:19 <agg> they do it with "_INVERTED" at the end too

01:21 <agg> (though I don't think nmigen ever uses the output inverter)

01:22 <_whitenotifier-f> [nmigen] shawnanastasio commented on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6LB

01:25 <agg> hmm, I have to be explicit about using a global input, right? just realised I've not set my bus clock to global in, nextpnr says the net is being promoted to use a gbuf but presumably it's not using the pin to drive it then

01:36 <_whitenotifier-f> [nmigen] shawnanastasio synchronize pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IM

01:36 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IF

01:38 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IF

01:39 <_whitenotifier-f> [nmigen] codecov[bot] edited a comment on pull request #394: hdl.rec: don't save casted shapes in Layout constructor - https://git.io/Jf6IF

01:48 _whitelogger has joined #nmigen

02:16 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

02:53 Degi has quit [Ping timeout: 246 seconds]

02:53 Degi has joined #nmigen

03:51 _whitelogger has joined #nmigen

05:23 <_whitenotifier-f> [nmigen] whitequark reviewed pull request #394 commit - https://git.io/Jf63G

05:24 <_whitenotifier-f> [nmigen] whitequark reviewed pull request #394 commit - https://git.io/Jf63c

06:24 <whitequark> Degi: try `extref1.attrs["LOC"] = "EXTREF1"`

06:24 <whitequark> awygle: a value stashed in the platform yes

06:26 <whitequark> awygle: a probe is a value stashed in the platform but not only that; at least it should have an associated clock domain, possibly other things too

06:26 <whitequark> agg: no, nmigen.build doesn't provide registered output + unregistered enable, the same way it doesn't let you make e.g. output at xdr=1 and input at xdr=2 or something like that

06:27 <whitequark> and i think it's going to stay that way absent a very good reason why it shouldn't. it's geared towards source synchronous buses with dir="-" provided as a fallback for anything weird you might want to do

06:28 <whitequark> i'm open to considering reasons why it should be changed though

06:28 <whitequark> the problem with registered output, unregistered enable is that clock-to-out timings are well defined but clock-to-oe are just whatever

06:29 <whitequark> so i don't feel like nmigen should provide a shortcut for this problematic mode of operation

06:29 <whitequark> since it doesn't prevent you from doing it yourself

06:29 <whitequark> i also think not all of our platforms support it, though i might be wrong about that

06:29 <whitequark> but regardless, platform support is a secondary consideration here

06:31 <whitequark> agg: regarding global inputs... nmigen provides a fake "GLOBAL" attribute in the platform abstraction that causes it to use SB_GB_IO, since SB_GB_IO works very unlike SB_IO+SB_GB

06:31 <whitequark> other than that it doesn't do anything for you

06:31 <whitequark> awygle: regarding UserValue, that's a good point, yeah. can you file an issue on that so we can consider it?

06:32 <awygle> whitequark: sure

06:33 <awygle> It came up in the context of considering what Values didn't have DUIDs

06:33 <whitequark> awygle: i think i'm going to get the glasgow ELA done today to free you up for ILA work

06:33 <whitequark> ugh, DUIDs are a hack

06:33 <whitequark> please don't think too much about them

06:33 <awygle> Oh thanks, I already moved back to it but the ELA would still be hugely useful

06:33 <whitequark> they're just there to get consistent ordering and hashing between python runs

06:34 <whitequark> it's ... taking you a lot of time and i feel like i could make your life easier by just doing it

06:34 <awygle> lol

06:35 <awygle> delicately put :P

06:35 <awygle> i have spent exactly 2.5 hours on it

06:35 <awygle> but yeah i decided to move back to the ILA stuff already

06:36 <awygle> got a fair bit of playing around the design space done today all things considered, hope to have something to show tomorrow

06:37 <awygle> been trying to capture design decision points as i come across them, lay them out, and record why i picked what i did

06:37 <whitequark> cool!

06:37 <whitequark> my approach for this would be to bring in the smallest possible chunks of functionality into core but make sure it works well

06:37 <whitequark> not unlike what rust does

06:38 <whitequark> we can't do exactly like rust because rust has cargo and python has... sadness

06:38 <awygle> i am fully prepared for you to hate everything about what i've done but it should at least serve to center discussion :p

06:38 <awygle> lmao yes

06:38 <whitequark> but there's still a lot of value in cribbing their approach because nmigen tries to provide actual backwards compatibility

06:39 <awygle> yup understood. minimum core disruption.

06:40 <whitequark> so i think *eventually* it would be extremely valuable to provide a turnkey ILA shipped with nmigen but i think it's more important to have a common API in core that we won't regret

06:40 <whitequark> just my general approach

06:40 <awygle> totally agree

06:41 <awygle> that said i'm still prototyping the whole thing because i think it's important to see how the API would be used

06:41 <awygle> and also i uh... want an ILA i can use >_>

06:41 <awygle> so i'll have the core changes and then also an example of using them

06:41 <whitequark> yeah

06:42 <whitequark> that's a good approach

06:42 <awygle> mk well if i'm gonna have results tomorrow i better get to sleep

06:42 <awygle> night wq

06:43 <whitequark> night!

06:51 _whitelogger has joined #nmigen

08:15 thinknok has joined #nmigen

08:33 Guest30583 has joined #nmigen

08:40 Asu has joined #nmigen

09:00 _whitelogger has joined #nmigen

09:12 _whitelogger has joined #nmigen

09:12 <zignig> whitequark: is there an easy(ish) way to get nmigen -> cxxrtl name mapping ?

09:16 <whitequark> not yet, i'm working on it right now

09:24 <zignig> I'm considering grepping and munging ::commit() for now.

09:24 <zignig> thanks anyway.

09:29 <zignig> oh, and carry on !

09:32 <whitequark> you can get the mapping from back.rtlil and join the hierarchy with dots

09:32 <whitequark> and then you can map it to C++ names by looking at the algorithm in the pass

09:32 <whitequark> but it's not ideal

09:39 Asu has quit [Remote host closed the connection]

09:39 Asu has joined #nmigen

09:56 Guest30583 has quit [Quit: Nettalk6 - www.ntalk.de]

10:16 <whitequark> agg: re nmigen not using the output inverter, the reason for that is it's only available in one single mode

10:27 <_whitenotifier-f> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/Jf6CL

10:27 <_whitenotifier-f> [nmigen/nmigen] whitequark afa4345 - vendor.lattice_ice40: reword confusing comment. NFC.

10:30 <agg> whitequark: thanks, that all makes sense. for my case I've swapped to xdr=0 and put input registers into my logic and left the output unregistered which works better for this anyway

10:31 <agg> Hadn't realised all this time I've had clocks coming into GBIN but not using the GB io and just having nextpnr promote it to a gbuf in logic, heh

10:31 <agg> Worked fine...

10:32 <whitequark> input registers in logic don't give you a defined phase relationship

10:33 <whitequark> this has bitten me badly, took months to figure out (and help of tnt)

10:36 <agg> whitequark: as in, each bit in the signal will be sampled at a different instant?

10:37 <agg> But presumably all will be strictly after the clock rising edge enters the device? But possibly any time before the subsequent rising edge?

10:40 <agg> The input registers in logic are still clocked from a gbuf that's driven by the external clock signal, is there also a concern that the internal clock signal night have significant delay compared to the external clock?

10:44 <agg> I guess if there's no defined phase relationship between the external clock and the internal flops clocked from a gbuf driven by it then my output registers are in trouble too, hmm

10:48 <whitequark> yes and yes

10:49 <whitequark> and yes re trouble

10:49 <whitequark> it will change with pnr seed

10:49 <whitequark> you might be able to use explicit LOC constraints to work around that

10:50 <agg> At the moment my output data comes from a bram and is then directly connected to output signals, so the bram registers drive it

10:50 <agg> Enabling the io output registers adds an annoying extra cycle of latency

10:51 <agg> How come the GBIO in/out regs are well defined but logic registers are not from the same gbuf? Is it because the signal routing is variable delay even though the clock is on the gbuf?

10:51 <whitequark> agg: routing latency

10:51 <whitequark> yes

10:52 <whitequark> each routing point in ice40 is a buffer

10:52 <agg> Right, makes sense

10:54 <agg> So the phase between external clock rising edge and my data output changing is the clock gbuf latency plus the bram clock to out plus the variable bram out to io routing?

10:56 <agg> In principle that total latency is constrained to less than the clock period plus the io-to-gbuf or I'd fail timing, right? So a sufficiently strict timing constraint would ensure the outputs made it in time?

11:00 <agg> Oh well, sounds like in practice I had better tell the stm32 to add one more clock of data latency betwet address and data phases, put xdr=1, live with half a cycle of contention

11:00 <agg> At higher clocks the stm32 adds more cycles between transfers anyway so I think the contention will go away due to that

11:00 <whitequark> yes

11:05 <agg> Thanks for saving me loads of head scratching months down the line when the design suddenly starts failing :p

11:06 <whitequark> glad to help

11:11 thinknok has quit [Remote host closed the connection]

11:12 thinknok has joined #nmigen

11:37 <kbeckmann> Is there an elegant way in nMigen when doing formal assertions with Past() to ensure they only are done when the past data is valid? Currently I have a counter and do something like 'with m.If(count > 5): m.d.comb += Assert(signal1 == Past(signal2, clocks=5))'

11:48 <whitequark> define valid?

12:04 <ZirconiumX> As in "has data", presumably

12:05 <whitequark> right I guessed that, I'd like to have kbeckmann's confirmation

12:08 <kbeckmann> yes that's right

12:10 <whitequark> what does SVA do here?

12:10 <kbeckmann> if i skip the if case, we're trying to look into the past that hasn't happened yet so it will be incorrect.

12:10 <kbeckmann> oh i don't know..

12:11 <whitequark> so to discuss the way to go here we first should take a look at the way Past() gets lowered

12:11 <whitequark> which is to say, it becomes a wide shift register essentially

12:11 <kbeckmann> ah, i see

12:11 <kbeckmann> and it gets initialized with zeroes?

12:11 <whitequark> yes

12:11 <whitequark> for BMC this is trivially ok, for induction not quite as simple

12:12 <whitequark> i think we can do better here though

13:34 Marc93 has quit [Ping timeout: 245 seconds]

13:39 <lkcl_> kbeckmann: use Initial()

13:39 <lkcl_> if you do not use Initial(), random data is inserted into the signals

13:40 <whitequark> yeah, that's the part where we could do better

13:40 <whitequark> nmigen can generate formal statements, but it's not really nicely integrated with the engine

13:40 <whitequark> and it should be

13:40 <lkcl_> https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/fu/compunits/formal/proof_fu.py;h=c9ecd623e70c1ef6ad6ce276d1aab35fa4acae91;hb=HEAD#l46

13:41 <lkcl_> whitequark: we seem to be getting on fine :)

13:41 <lkcl_> we do however critically rely on this: from nmigen.test.utils import FHDLTestCase

13:42 <whitequark> nmigen.test is a private module

13:42 <whitequark> you should copy FHDLTestCase into your own code

13:43 <lkcl_> whitequark: then we will need to take a copy, which is... redundant and... yeah.

13:43 <whitequark> it's not redundant

13:43 <lkcl_> we seem to be taking copies of quite a lot of nmigen functionality

13:43 <whitequark> FHDLTestCase isn't nmigen functionality

13:43 <whitequark> it's a helper class I wrote in a hour without much thought as to its design or future evolution

13:44 <whitequark> it's only part of the package because of an oversight, actually

13:44 <lkcl_> and it's extremely useful and valuable

13:45 <whitequark> it has several major issues, such as "me not bothering to read unittest docs"

13:45 <lkcl_> i don't think i've read them either :)

13:45 <lkcl_> i learned by a process of cut/paste and osmosis

13:45 <whitequark> and "hardcoding a specific SMT engine"

13:45 <whitequark> well that seems like a problem to me too

13:46 <whitequark> it doesn't help that nmigen doesn't yet have much in the way of docs (the WIP branch notwithstanding)

13:46 <lkcl_> after 20 years of python-unit-test osmosis you kinda learn the ropes :0

13:47 <whitequark> that doesn't really work in my experience

13:47 <lkcl_> fortunately our team is mostly autodidacts, used to examining the code

13:47 <whitequark> yeah, that also doesn't work

13:47 <lkcl_> everyone's different

13:48 <whitequark> if you look at the code, you see one particular implementation, but that doesn't tell you about the API contract

13:48 <lkcl_> without so many auto-learners on the team, there'd be absolutely no way we could write 75,000 lines of nmigen code

13:48 <whitequark> so when the implementation changes? if you rely on details that weren't a part of the contract, your code breaks

13:49 <whitequark> so, sure, read the code to understand how it works, and then read the docs to understand what you can actually rely upon

13:50 <lkcl_> yyeah that was quite irritating to have the shift behaviour change without warning. 10,000 lines of extremely complex IEEE754 FPU code that took 6 months to write suddenly "stopped working"

13:50 <whitequark> it's like when people "learn" C by trying different things and seeing what happens. they don't end up knowing how C works, they end up knowing how their specific version of a specific C compiler happens to work

13:51 <lkcl_> i found the discussion retrospectively

13:51 <lkcl_> i did that :)

13:51 <whitequark> then they upgrade their compiler and complain about it "breaking" their code, when in fact it was broken in first place and the upgrade just made that visible

13:52 <lkcl_> but i was lucky that my first major c project was samba, made extensive use of autoconf, and worked on pretty much absolutely every broken variant of any broken c compiler.

13:52 <lkcl_> except sunos 4.1.3's cc which was just... far too brain-dead.

13:52 <whitequark> now C isn't a very good example because the language makes it really hard to write robust code

13:53 <lkcl_> it's evolved considerably

13:53 <whitequark> regarding the shift change, I believe that was a bug

13:53 <whitequark> in both nmigen and yosys simultaneously

13:53 <lkcl_> interesting.

13:53 <whitequark> were you referring to https://github.com/nmigen/nmigen/issues/302 ?

13:53 <lkcl_> ok i'm now not annoyed, i'm grateful.

13:54 <whitequark> I don't recall any other shift change, at least

13:54 <lkcl_> yes that was the one. we got away with it for some reason

13:54 <whitequark> right so this was always prohibited in RTLIL

13:54 <whitequark> but... not checked or documented

13:54 <whitequark> it was just silently broken if you gave it as input to yosys

13:55 <whitequark> same for nmigen

13:55 <lkcl_> lovely. and appreciated that it was spotted.

13:55 <whitequark> i had to fix yosys' validator, yosys' manual, then nmigen itself to make sure you can't end up with undefined behavior

13:56 <whitequark> there will likely be more instances of this

13:56 <lkcl_> painstaking and necessary.

13:56 <whitequark> as with any other massively complex software project, really

13:56 <lkcl_> hmmm.... would formal correctness proofs actually have caught that?

13:56 <lkcl_> argh.

13:56 <whitequark> i try to systematically eliminate these issues when i see them, and i did in a few other cases, but it's very time-consuming

13:56 <lkcl_> yes, not surprising

13:56 <lkcl_> i wonder...

13:56 <whitequark> you can look at my commits to yosys manual if you want to see the other cases

13:57 <whitequark> there were *quite* a few ways where RTLIL's contract was totally different from what you'd expect by investigating yosys' output

13:57 <lkcl_> we were looking to use coriolis2 gate-level simulation capabilities.

13:57 <lkcl_> appreciated the heads-up

13:57 <whitequark> so... yeah, i learned how RTLIL works by examining yosys' implementation, and i was often very wrong in the end

13:57 <whitequark> this is a serious hazard with self-learning in that way

13:57 <whitequark> you don't know the *intent*

13:57 <whitequark> but the maintainers of the project follow it whether you know it or not

13:58 <whitequark> regarding formal proofs: sometimes, sometimes not

13:58 <lkcl_> i wonder if we could do a comparison of the gate-level simulation against the RTL simulations

13:58 <whitequark> for example, if you used a shift by negative amount in your proof (because you used nmigen for that), then it'd be undefined at the same time

13:58 <lkcl_> actually use the exact same unit tests, pre- and post- RTL

13:58 <whitequark> depending on the proof approach, it might be that the proof engine will throw out the cases where the spec is doing something weird, assuming they're unreachable

13:59 <whitequark> i don't know the exact details

13:59 <lkcl_> yehyeh, we try to rewrite the formal proofs using a different algorithm

13:59 <whitequark> in my view, formal verification really shines when you apply the same spec to many different implementations

13:59 <lkcl_> appreciated. so it's not a perfect approach. noted

13:59 <whitequark> if you co-evolve a spec and an implementation in lockstep you risk introducing identical bugs

13:59 <whitequark> it's still quite useful, just needs to be handled with care

14:00 <lkcl_> yehhh, we're trying to encourage other teams to engage with the POWER9 formal proofs we're doing

14:00 <whitequark> sounds good

14:00 <lkcl_> but, like symbioticeda did, a common interface (RVF) is really needed

14:00 <whitequark> yeah

14:01 <lkcl_> it's really cool work, btw :)

14:01 <lkcl_> we're up to 75,000 lines of nmigen code (!)

14:01 <lkcl_> which is pretty mental.

14:01 <whitequark> cool

14:02 <lkcl_> anyway: kbeckmann: use Initial() :)

14:03 <lkcl_> with m.If(Initial():

14:03 <whitequark> this gets complicated if you use induction

14:03 <whitequark> well, more complicated than with BMC

14:03 <lkcl_> comb += Assume(ResetSignal() == 1)

14:04 <lkcl_> i did formal induction... errrrr.... 30 years ago. i barely remember it

14:04 <lkcl_> http://www.mathcentre.ac.uk/resources/uploaded/mathcentre-proof2.pdf

14:05 <whitequark> i'm talking about symbiyosys induction specifically

14:05 * lkcl_ just re-familiarising myself with the concept of induction (in general)

14:08 <lkcl_> this is the "prove" mode, right? https://zipcpu.com/blog/2018/03/10/induction-exercise.html

14:09 <whitequark> yeah

14:10 <lkcl_> ok. going to take a while to read that

14:14 <_whitenotifier-f> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/Jf6Re

14:14 <_whitenotifier-f> [nmigen/nmigen] whitequark 9c80c32 - setup: exclude tests.

14:14 <whitequark> lkcl_: btw, which other parts of nmigen were you duplicating?

14:15 <whitequark> I recall Record-related stuff (there's a plan to address that in a nice way), anything else?

14:15 <lkcl_> whitequark: yes, Record. we created RecordObject (and use it extensively)

14:15 <lkcl_> minverva also created something equivalent to RecordObject

14:16 <lkcl_> https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/iocontrol.py;h=853a1d040d9f6dcdfd48daee06674f94953b2f58;hb=af116fba83e8035ee7f9cee1dec0dee6306da8cf#l76

14:18 <lkcl_> ah. flatten. i know it's silly: a 2-line function.

14:18 <whitequark> https://github.com/nmigen/nmigen/issues/355

14:18 <whitequark> in master, Record is a subclass of UserValue, which means it's not really special

14:19 <lkcl_> ahh ok.

14:19 <whitequark> the problem is that the existing design for UserValue (and by extension Record, both in master and in 0.2) is seriously flawed in some aspects, so there would have to be a few changes

14:19 <whitequark> but after that, it should be possible to safely define e.g. Record-like objects in downstream code that will be robust against changes in upstream nmigen

14:20 <lkcl_> btw we're now also critically dependent on Record.connect

14:21 <whitequark> yeah, so

14:21 <whitequark> Record.connect in its current form will be gone and replaced with something fairly similar but of a much more sensible design, which I'm borrowing from FIRRTL

14:21 <lkcl_> will it have the "direction" capability? (in some form)

14:21 <whitequark> which I think will address many complaints, including some of yours

14:21 <whitequark> yes

14:22 <lkcl_> ah whew.

14:22 <whitequark> that's really the primary problem it will be solving

14:22 <whitequark> FIRRTL does it in a very elegant way, which I could not come up with myself

14:22 <lkcl_> Chisel3 has direction built-in to absolutely everything. it's fundamental.

14:22 <whitequark> sort of, you have passive values in it

14:22 <whitequark> in FIRRTL at least

14:23 <whitequark> but for the most part that's correct

14:23 <ZirconiumX> But Yosys infers direction from usage, right?

14:23 <lkcl_> nice. i am all in favour of not doing design work that someone else already came up with :)

14:23 <whitequark> it's what oMigen's Record was trying (and failing) to be, and what the people were complaining about in nMigen

14:23 <lkcl_> yosys infers from usage? i honestly don't know

14:23 <whitequark> I didn't recognize the complaints though until I read the FIRRTL spec and then it clearly was a superior design

14:24 <whitequark> ZirconiumX: direction for connecting

14:24 <ZirconiumX> Ah, okay

14:24 <lkcl_> Chisel3 has a special function (pair of functions) which allow you to "turn round" a signal

14:24 <whitequark> ZirconiumX: the major reason we need it is actually related to your proposal

14:25 <whitequark> for typed ports in nmigen

14:25 <ZirconiumX> Excellent

14:25 <whitequark> imagine a module with a wishbone input and output

14:25 <whitequark> unless there are interior directions in records, you can't express that without "exploding" the wishbone buses

14:26 <lkcl_> [for when current conversation thread is done] returning to Record / UserValue: the one thing we need to be able to do is to iterate through all the *values* in it (recursively) - not the bits

14:26 <whitequark> but you'd want to do something like `i : Input[wishbone.Interface]; o : Output[wishbone.Interface]` or whatever

14:26 <lkcl_> yeah we have some massive records that we need to assign, and some of the fields are different directions

14:27 <whitequark> lkcl_: once the design work for robust UserValue is done, you should just grab the current Record impl as-is and rely on it

14:27 <whitequark> since you already have a custom record

14:27 <ZirconiumX> (Context: https://github.com/nmigen/nmigen/issues/243)

14:27 <whitequark> then, once the design work for better connect() is done, you can migrate to that

14:27 <lkcl_> it would be an utter chaotic mess if we had to walk through them

14:27 <lkcl_> ok. appreciated.

14:27 <lkcl_> it's... well... quite a lot of work to do back-porting.

14:28 <whitequark> yes, which is actually one reason I focus on UserValue first

14:28 <lkcl_> we created a mirror-image of the python "operator" module, called "nmoperator".

14:28 <lkcl_> it contains nmoperator.eq, nmoperator.cat and a few others

14:28 <Sarayan> never mind operator? ;-)

14:28 <whitequark> this means you can upgrade to newer nmigen, even a version that changes the way Record works, without having to migrate all your code to the new design

14:28 <whitequark> so that you can e.g. benefit from cxxsim

14:29 <lkcl_> and it assumes that each argument is iterable by its *values* (recursively)

14:29 <lkcl_> Sarayan, :)

14:29 <whitequark> the existing design where Record inherits from Value cannot allow iterating by values

14:29 <lkcl_> whitequark, okaaay. appreciated about cxxsim

14:29 <whitequark> but the good news is that I'm dropping that

14:30 <whitequark> UserValue won't inherit from Value anymore and so it will in fact be possible to iterate a record's fields

14:30 <lkcl_> whitequark: that's why we over-rode __iter__ in RecordObject.

14:30 <whitequark> yeah, you violate LSP there

14:30 <whitequark> but the bigger insight than "don't do that" is that UserValue/Record should have never been subject to LSP in first place

14:30 <Sarayan> Least Surprise Principle?

14:30 <whitequark> Liskov substitution principle

14:30 <lkcl_> we can then do python-style OO inheritance, to any depth.

14:31 <lkcl_> that's why i wrote nmoperator.

14:31 <whitequark> lkcl_: the problem was that you've indirectly pointed a flaw in nmigen, which is that UserValue/Record unnecessarily derived from Value

14:31 <lkcl_> interesting

14:31 <whitequark> but the way you did it is by insisting on violating LSP, which is a pretty bad idea

14:31 <whitequark> the good idea was to change the design so that violating LSP is no longer necessary to achieve the same result

14:32 <Sarayan> Ok, didn't know the name, but yeah, that's rather fundamental in OO

14:33 <whitequark> this is why I insist on examining use cases for every proposed language change

14:33 <lkcl_> whitequark: amazingly, overriding __iter__ doesn't break anything. i checked.

14:33 <ZirconiumX> I suspect this is "doesn't *presently* break anything"

14:33 <whitequark> more often than not it turns out that a proposal for change correctly indicates the presence of a flaw, but not the specific place where the flaw is

14:33 <whitequark> such as here

14:34 <whitequark> lkcl_: remember our conversation from 20 min ago?

14:34 <whitequark> it doesn't break the *code*, but it breaks the *contract*

14:34 <lkcl_> what happens is that by returning the sequence of values, the values themselves are iteratable, and consequently at the *next* level they get converted to bits.

14:34 <lkcl_> whitequark, yehyeh :)

14:34 <whitequark> so suppose nmigen starts to assume somewhere that it can safely iterate anything that is isinstance(x, Value)

14:34 <whitequark> (there is no reason I don't do this other than trying to avoid unreadable code)

14:35 <whitequark> then your implementation will break

14:35 <whitequark> if a RecordObject is-a Value, then anything that Value does, it should also do, including the way a Value iterates

14:38 <lkcl_> indeed.

14:38 <whitequark> now it's not like I'm going to intentionally break your LSP-violating code, I'm just explaining why violating contracts is bad

14:38 <whitequark> just like in real life

14:38 <whitequark> you can do it, sure! but then you don't get to complain when the other party doesn't uphold its end either

14:39 <Sarayan> Oh wq, since you're in a design mood, I have an interesting syntax(?) problem

14:39 <whitequark> sure

14:39 <whitequark> (i'm not but let's hear you out)

14:39 <whitequark> (i'm fighting with yosys)

14:39 <Sarayan> I'm reimplementing a fully synchronous chip in nmigen, with a phase signal in input

14:39 <Sarayan> whihc means it's almost entirely in a with m.If(self.i_phase): m.d.sync += lots of stuff

14:40 <Sarayan> but I have a number of comb to keep things readable, or because some combs are shared between multiple sync

14:41 <whitequark> are there multiple phases?

14:41 <Sarayan> if I put the comb in the m.If(), the signals are going to be zeroed when the phase signal is 0

14:41 <Sarayan> no, only one

14:41 <Sarayan> it's going to work, but zero outside is probably more expensive, slower in sim, and annoying in debug

14:41 <whitequark> ditch m.If and use EnableInserter(i_phase) in toplevel

14:42 <whitequark> that will do exactly what you want, I think

14:42 <whitequark> i.e. comb always runs, sync only changes when i_phase is high

14:42 <whitequark> always changes*

14:43 <Sarayan> ok, I was wrong, there are kinda multiple phases

14:43 <Sarayan> specifically, config register write is async w.r.t the phase signal

14:43 <Sarayan> so yeah you could say there are two phases, i_phase and i_cs, sorry

14:43 <lkcl_> Sarayan: are you trying to write a pipeline?

14:44 <whitequark> okay, then use something like

14:44 <Sarayan> lkcl: https://github.com/furrtek/VGChips/tree/master/Konami/053251

14:44 <lkcl_> where combinatorial blocks are basically passed along, based on some "signal"?

14:44 <lkcl_> oo diagraaams :)

14:45 <lkcl_> ooo gaate level diagraaaams, i love those

14:45 <Sarayan> extracted from die "shot" by furrtek

14:45 <whitequark> DomainRenamer({"phase":"sync", "cs":"sync"})(EnableInserter({"phase":i_phase, "cs":i_cs})(your_stuff))

14:45 <whitequark> in toplevel

14:45 <whitequark> and then put your logic into two domains m.d.phase and m.d.cs

14:45 <whitequark> (adjust names as wanted)

14:46 <Sarayan> that's sounds interesting

14:46 <ZirconiumX> That's...really elegant, wow

14:46 <whitequark> (could do e.g. DomainRenamer({"cs":"sync"})(EnableInserter({"sync":i_phase, "cs":i_cs})(your_stuff)) )

14:46 <Sarayan> where is toplevel?

14:46 <ZirconiumX> As in, your top-level module

14:46 <whitequark> do you have a module with no sync logic that includes all other modules and connects them?

14:46 <whitequark> that's toplevel

14:46 <whitequark> if not, add one

14:47 <Sarayan> hmmmm

14:47 <whitequark> actually

14:47 <whitequark> you could even do something like

14:47 <whitequark> m = Module()

14:47 <whitequark> return DomainRenamer({"cs":"sync"})(EnableInserter({"sync":i_phase, "cs":i_cs})(m))

14:47 <whitequark> ...

14:47 <whitequark> at the root of your module hierarchy

14:47 <whitequark> though it's a bit awkward

14:47 <whitequark> ZirconiumX: yep! there's no EnableSignal() yet, which would complete the picture

14:48 <Sarayan> ok..... I'm trying to understand all that correctly

14:48 <lkcl_> whitequark: this sounds extremely powerful and compact. are there examples anywhere?

14:48 <whitequark> lkcl_: not yet but it will be definitely featured in the docs

14:48 <Sarayan> why can't the magic be inside the 53251 and keep the interface outside self-contained?

14:48 <lkcl_> whitequark: ok.

14:48 <lkcl_> btw, suggestion (code-style)

14:48 <whitequark> Sarayan: i need more context to answer that

14:49 <Sarayan> the 53251 is but one chip in the arcade boards

14:49 <lkcl_> if not isinstance(newfragment) or not new_fragment.type in ...:

14:49 <whitequark> ohh

14:49 <lkcl_> return new_fragment

14:49 <Sarayan> there's going to be others, plus cpus, plus other stuff

14:49 <lkcl_> then the same trick at the next if-statement

14:49 <Sarayan> they work on varioua phase signals

14:49 <whitequark> Sarayan: yeah i wasn't clear, sure

14:49 <whitequark> *sorry

14:49 <lkcl_> for multi-nested if statements it dramatically reduces indentation.

14:50 <Sarayan> and the top level pretty much links them together and implement the ttl that's between them

14:50 <whitequark> it doesn't actually have to be the toplevel module of the entire design. in fact there's no actual requirement for it to relate to the hierarchy in any particular way

14:50 <Sarayan> so subchip (module) has one or more clock signals that are phase signals

14:51 <ZirconiumX> It's possible for it to be "the top level of your 53251" instead of "the top level of your board"

14:51 <lkcl_> (wq: will email you a diff-patch)

14:51 <Sarayan> ok, that sounds much better

14:51 <whitequark> Sarayan: what ZirconiumX said

14:51 <whitequark> lkcl_: code style is subjective and i'm not going to change it to suit anyone's particular preference

14:51 <whitequark> i'm not telling you how to format your code either

14:52 <lkcl_> whitequark: hmmm that seems like an overly harsh response, based on a misunderstanding

14:52 <whitequark> sorry, wasn't intended to be that

14:53 <Sarayan> so either I put the magic on a submodule that's the real working part of the chip, or I massage the module with it just before returning it?

14:53 <lkcl_> no problem, no offense taken

14:53 <ZirconiumX> Pretty much, Sarayan

14:53 <whitequark> yup

14:53 <Sarayan> okay, very nice

14:54 <ZirconiumX> But I mean, you'd want a single chip module to instantiate anyway, so you can put the magic in there

14:54 <Sarayan> So DomainRenamer copies the sync domain into two other, and EnableInserter plugs a phase signal into a domain

14:54 <whitequark> in the inverse order, yes

14:54 <whitequark> so first you add one enable per domain

14:54 futarisIRCcloud has joined #nmigen

14:54 <whitequark> and *then* you turn the two domains into one

14:54 <Sarayan> ohhhhhh

14:55 <whitequark> this might make it more clear

14:55 <whitequark> m = EnableInserter({"sync":i_phase, "cs":i_cs})(m)

14:55 <whitequark> m = DomainRenamer({"cs":"sync"})(m)

14:55 <whitequark> return m

14:55 <Sarayan> so you have to declare the doamins in the constructor, right?

14:55 <whitequark> can you elaborate?

14:56 <Sarayan> If you go and use m.d.cs += xxx at the beginning of the module it's going to work or you have to tell that cs exists first?

14:57 <whitequark> former

14:57 <whitequark> the "cs" domain here is a sort of fake

14:57 <whitequark> it just exists to group logic

14:57 <Sarayan> automagically added?

14:57 <whitequark> no, late bound

14:57 <whitequark> like all others

14:57 <whitequark> think of it as an argument

14:57 <whitequark> so your k53251 is like a "function" which takes cs and sync as "arguments"

14:58 * lkcl_ celebrates. found and fixed a bug in a ridiculously-complex FSM

14:58 <whitequark> and until you "call" it (add as submodule) the actual domains that will be referred to by m.d.cs and m.d.sync aren't known

14:58 <whitequark> but once you do, they can just be the same thing

14:58 <whitequark> DomainRenamer takes a "function" that takes cs and sync as "arguments" and returns a "function" that takes just sync as an "argument" and uses it for cs too

14:58 <whitequark> does this make sense?

14:58 <Sarayan> Oh, it was more about the declaration-before-use aspect with the +=

14:59 <whitequark> yes

14:59 <whitequark> you can use any domain you want with m.d.<domain> +=

14:59 <Sarayan> You don't have to say first that you're going to then

14:59 <whitequark> yep

14:59 <Sarayan> that works, you can always check at the end if they're bound, so typos are found

15:00 <whitequark> yep

15:00 <Sarayan> That looks very, very elegant

15:00 <lkcl_> so... does DomainRenamer "remap" anything that you do "m.d.sync" to "m.d.{somethingelse}"?

15:00 <whitequark> that's what it does, yeah

15:01 <whitequark> the exact details at the moment are a little bit odd around edge cases, in particular because we don't have EnableSignal

15:01 <whitequark> that's going to improve soon-ish

15:01 <Sarayan> What will EnableSignal do?

15:01 <lkcl_> and... EnableInserter is... the industry-standard term is, i believe, a "clock gater"?

15:01 <whitequark> Sarayan: EnableSignal : EnableInserter :: ResetSignal : ResetInserter

15:02 <whitequark> lkcl_: not exactly

15:02 <Sarayan> except I don't grok ResetSignal yet either

15:02 <lkcl_> i.e. when the "EnableSignal" is true, everything... ok

15:02 <whitequark> EnableInserter turns DFFs into DFFEs

15:02 <lkcl_> ok, so when EnableSignal is... oooo :)

15:03 <whitequark> this is only a logical view

15:03 <lkcl_> DFFE Primitive

15:03 <lkcl_> The DFFE primitive allows you to specify a D-type flipflop with clock enable.

15:03 <whitequark> e.g. you can use it many times and there are no DFFs with more than one E input

15:04 <whitequark> but if you think about your design in terms of idealized flip-flops, then EnableInserter adds a fresh CE input and wires it to whatever you specified

15:04 <lkcl_> iiinteresting. and the output is frozen, even combinatorially?

15:04 <whitequark> EnableInserter and ResetInserter only affect sync logic

15:05 <whitequark> since they operate on clock domains

15:05 <lkcl_> whitequark: ahh ok. we have a situation where we need a "latch bypass"

15:05 <lkcl_> it is a hybrid of a DFF and a combinatorial bypass.

15:05 <whitequark> nmigen has no direct support for latches, very deliberately so

15:06 <whitequark> you can instantiate a vendor primitive, of course, but they're not a part of HDL

15:06 <Sarayan> wq: annoyingly, nmos is latches everywhere. At least cell-cmos is ff

15:06 <whitequark> Sarayan: doesn't apply to you

15:06 <lkcl_> yeh we wrote a function which creates one.

15:06 <whitequark> since you can transform latches into clocked logic, and in fact you will have to do it

15:06 <whitequark> if you want stuff to run fast on cxxsim or fpgas

15:07 <whitequark> but lkcl_ is working on an ASIC which does have latches as actual primitives

15:07 <Sarayan> cute

15:07 <whitequark> a DFF is two latches back to back, usually

15:08 <Sarayan> but when I convert something like the via6522, it's latches everywhere, with the effect that multiple latches in a chain on the same clock signal all change on the same clock

15:08 <Sarayan> which I can't convert on multiple sync, that would add a delay that doesn't exist

15:08 <Sarayan> s/on/to/

15:08 <lkcl_> Sarayan: don't ask :) we have a number of 2D matrices which in the full multi-issue version will result in over a quarter of a million gates if we use DFFs. blech :)

15:08 <whitequark> well, let's look at it this way

15:08 <whitequark> if you don't convert them, then you won't ever simulate that on an FPGA

15:09 <ZirconiumX> I don't think Yosys handles latches very well either, since they're logic loops, right?

15:09 <whitequark> if you're fine with that... you can do Instance("$dlatch") and cxxrtl will simulate it fine

15:09 <whitequark> ZirconiumX: yosys allows certain logic loops

15:10 <whitequark> (namely, ones that it can transform into not-logic-loops, breaking the loop with a $dlatch cell)

15:10 <whitequark> (which happens as a part of the proc_dlatch pass)

15:12 <lkcl_> ZirconiumX: no "modern" proprietary tools handle them. if you try to use them, you're into $100 million custom ASIC territory.

15:14 <Sarayan> lkcl: how many gates with latches instead of dff?

15:16 <whitequark> interestingly, latches are better for timing... if your timing analyzer can handle them

15:17 <lkcl_> Sarayan: an SR NOR (or SR NAND) is 2 gates.

15:17 <lkcl_> that's it.

15:17 <lkcl_> a DFF is around 10.

15:17 <whitequark> just to be clear, above i have used "latch" as a shorthand for "D-latch"

15:17 <whitequark> SR-latches are a totally different beast and are way worse for timing analysis

15:18 <lkcl_> we're doing a multi-issue Out-of-Order execution engine, which requires Dependency Matrices.

15:18 <lkcl_> whitequark: ah, appreciated

15:19 <lkcl_> the DMs are N(registers) x N(ReservationStations)

15:19 * ZirconiumX is not really up-to-date on GPU architecture

15:19 <ZirconiumX> The GPU I'm most familiar with is the PlayStation 2's Graphics Synthesiser. Which is fixed-function.

15:19 <lkcl_> so in the GPU version of our processor, that's a whopping 128 x 30 matrix.

15:20 <lkcl_> ZirconiumX: we're kinda cheating, by using a multi-issue out-of-order execution engine, and a hardware "for-loop" which pauses the Program Counter and shoves a shed-load of "element" operations into the multi-issue engine

15:20 <lkcl_> and lets the Dependency Matrices sort it out

15:21 <lkcl_> we are *not* doing a SIMD ISA... but it's SIMD at the back-end though.

15:22 <Sarayan> when you're at the hardware level, you better cheat as much as you can, else what's the point :-)

15:22 <lkcl_> things have moved on in the 3D industry (from fixed-function), although if you study GPLGPU closely (it implemented Plan9) which Jeff Bush did, he found that, architecturally, fixed-function and shader GPUs have a lot in common

15:23 <lkcl_> https://jbush001.github.io/2016/07/24/gplgpu-walkthrough.html

15:23 <lkcl_> Sarayan: not too much though :)

15:25 <Sarayan> ooo and predictions are cheating, in a way

15:28 <ZirconiumX> So, are you relying on aggressive IPC to make up for die area that might be used for a couple of simpler in-order cores?

15:28 <Sarayan> Don't ever have a quartus.ini file with "PDB_ASCII_DUMP=on" in the current directory it when running quartus commands btw, I'm sure it's Bad

15:28 <ZirconiumX> Oh, yeah, terrible

15:29 <Sarayan> just so you don't do that by mistake

15:48 <sorear> so you're basically doing what Itanium did (no SIMD, hardware designed for wide issue with all2all bypassing, SIMD instructions in (x86 compat mode) input are broken up into operations)

15:49 <sorear> this didn't work so well last time, but I'm interested to see a fair benchmark comparison when it's ready and when a real RVV impl is ready

15:50 <ZirconiumX> sorear: or more aptly what Terascale did, right?

18:50 <kbeckmann> lkcl_: thanks for the Initial() tip, completely missed that. Your method works great when only one cycle history is required, but I need 5 so I created a signal that i set using Fell(Initial(), clocks=5). Seems to work fine.

19:10 <whitequark> kbeckmann: ah, sorry i wasn't clear enough, i assumed you know about Initial() and were suggesting that nmigen do this Fell() trick

19:16 <kbeckmann> ah, yeah that would be neat indeed. still learning all of this :). to be honest, moving from a manual counter to Fell() feels a lot less like a hack, so I'm quite happy with this solution.

19:27 <whitequark> are you using induction yet?

19:38 FFY00 has quit [Remote host closed the connection]

19:43 FFY00 has joined #nmigen

19:54 <kbeckmann> i have tried it with symbiyosys some time ago, but haven't with nMigen. it seems quite powerful especially when you want to test something that normally would require a lot of iterations to get to a bad state.

19:58 <whitequark> right, so, i'm not sure if the Fell trick would work with induction

19:59 <kbeckmann> that's right. just tried it and it fails during step 0

20:00 <whitequark> i think you'll have to add assumptions that aren't entirely intuitive

20:00 <daveshah> step 0?

20:00 <daveshah> Failures in the induction part are invariably at step K, aiui

20:00 <kbeckmann> step 0: immediately, not even one clock period is shown in the strace

20:00 <kbeckmann> trace*

20:00 <daveshah> hmm

20:00 <kbeckmann> oh right!

20:00 <kbeckmann> i have to adjust for that

20:18 <kbeckmann> i take it back! it works. just tried it and added a bug after 32k iterations in my code, and running with "mode prove" found it.

20:18 <kbeckmann> forgot how powerful this was.

20:54 <whitequark> awygle: ohhh I remember now why you can't use arbitrary Values in Past() etc

20:54 <whitequark> because of the reset values

22:20 FFY00 has quit [Quit: dd if=/dev/urandom of=/dev/sda]

22:20 FFY00 has joined #nmigen

22:28 <lkcl_> kbeckmann, cooool

22:52 Asu has quit [Remote host closed the connection]