<d1b2>
<a> but i'm getting weird responses when i try to use it
<d1b2>
<a> i can tell that it's working at least at the basic level, since the command-sequence for "write" works consistently and it gives some sort of error when an invalid i2c slave address is given
<d1b2>
<a> but when i try to do a read operation, i get back a 1 response from the address-write portion of it, and then 255, 255 from the read portion
peepsalot has quit [Quit: Connection reset by peep]
<d1b2>
<TiltMeSenpai> uhh i2c has pullups
<d1b2>
<TiltMeSenpai> if you see nack and 255 on data, it means you're knocking on the door but nobody's home
<d1b2>
<TiltMeSenpai> if I'm interpreting your question right
<d1b2>
<a> huhh
<d1b2>
<a> but when i try to write it works fine
<d1b2>
<a> w/ the same periph addr
<d1b2>
<TiltMeSenpai> the device might not support read addresses? I don't really know
<d1b2>
<TiltMeSenpai> or it could be looking for some non-7-bit address?
<d1b2>
<a> works fine with a different i2c impl
<d1b2>
<TiltMeSenpai> do you have an oscilloscope?
<d1b2>
<TiltMeSenpai> or logic analyzer
<d1b2>
<TiltMeSenpai> something weird is going on that's stopping the target from driving the bus, but it's hard to say what without looking at the waveform
<d1b2>
<TiltMeSenpai> oh wait if you're running on a glasgow, you can use the trace option
<d1b2>
<a> i can hook up the i2c output to a logic analyzer and grab waveforms
<d1b2>
<TiltMeSenpai> yeah if you add --trace output.vcd the glasgow should end up writing a vcd with measured values to output.vcd
<d1b2>
<TiltMeSenpai> might be easier than grabbing a logic analyzer and hooking things up
<d1b2>
<a> oh this is running on a different fpga
<d1b2>
<TiltMeSenpai> oh are you just using the gateware?
<d1b2>
<a> yeah i just pulled the relevant gateware into a normal fpga project
<d1b2>
<TiltMeSenpai> ah I see
daknig2 has quit [Ping timeout: 256 seconds]
<whitequark>
a: that i2c implementation isn't the best in the world, but i think i tested it quite a bit
<whitequark>
awygle: we should discuss it on today's meeting
<awygle>
oh, sure
<d1b2>
<a> huhh there might be some FIFO issues here too actually, is there any reason a SyncFIFOBuffered would behave weirdly?
<d1b2>
<a> reading from the same FIFO at different times is yielding different results
<whitequark>
hmm
<whitequark>
which fpga? is it multiclock?
<d1b2>
<a> ECP5, one clock
<whitequark>
nothing that comes to my mind
<d1b2>
<a> there's a 1 second delay after the i2c operations are written to the command fifo before i try to read from the output fifo
<d1b2>
<a> but if i let other things happen before reading from the FIFO, then the FIFO output is different
<d1b2>
<a> depth is v large so that doesn't appear to be an issue
* awygle
confirms meeting time for the fourth time
daknig2 has joined #nmigen
jeanthom has joined #nmigen
hitomi2504 has joined #nmigen
<d1b2>
<a> fixed it, turned out that the data wasn't being put into the FIFO fast enough to keep up with the I2C transaction
<d1b2>
<a> that and a small bug in the FIFO read logic
<d1b2>
<a> how does glasgow handle this? does it store the data in a separate FIFO until it's all been received over USB, and then quickly dump it into the main FIFO?
jock-tanner has joined #nmigen
proteus-guy has joined #nmigen
jeanthom has quit [Ping timeout: 256 seconds]
Asu has joined #nmigen
jock-tanner has quit [Ping timeout: 256 seconds]
jeanthom has joined #nmigen
daknig2 has quit [Ping timeout: 256 seconds]
daknig2 has joined #nmigen
daknig2 has quit [Ping timeout: 256 seconds]
nengel has joined #nmigen
jeanthom has quit [*.net *.split]
nengel has quit [*.net *.split]
Degi has quit [*.net *.split]
alexhw_ has quit [*.net *.split]
mwk has quit [*.net *.split]
nengel has joined #nmigen
jeanthom has joined #nmigen
alexhw_ has joined #nmigen
mwk has joined #nmigen
Degi has joined #nmigen
proteus-guy has quit [Ping timeout: 256 seconds]
jock-tanner has joined #nmigen
jock-tanner has quit [Ping timeout: 260 seconds]
emeb has joined #nmigen
cstrauss[m] has joined #nmigen
<DaKnig>
where can I look for examples of using Array, FIFO, Memory...?
<DaKnig>
I m using vivado, for me it spits verilog
<lkcl_>
DaKnig: no, it doesn't. it generates yosys native ilang.
<lkcl_>
you can then pass that *to* yosys, by using "read_ilang {filename}" then "write_verilog {filename}" if you want verilog
<DaKnig>
oh cool
<DaKnig>
so thats how it does it
<lkcl_>
but that is yosys's job, not nmigen's
<DaKnig>
"the nmigen toolchain" :)
<lkcl_>
yup. there's also a ghdl plugin for yosys, it works, but people here recomment using verilator for conversion of vhdl, as being more complete. i think it's verilator
<FL4SHK>
the GHDL plugin doesn't support records with unconstrained elements in them
<FL4SHK>
i.e. one of the most desirable features from VHDL 2008
<lkcl_>
FL4SHK: ahh this is what the microwatt team ran into.
<DaKnig>
I would really like VHDL to get as much attention as Verilog in the open source circles
<DaKnig>
not enough tools, that dont work as well...
<FL4SHK>
old VHDL is pretty good btw
<lkcl_>
DaKnig: from working with microwatt for several months, i am now deeply impressed with VHDL
<FL4SHK>
it's just not as good as VHDL 2008
<FL4SHK>
...VHDL 2008's generic packages feature, one of the best things about the language, is very poorly supported
<FL4SHK>
now, nMigen is actually a lot better in the high level things department
<lkcl_>
i had a lot of trouble compiling microwatt.
<lkcl>
followed by "python3 {whatever.py} generate -t il"
<lkcl>
if you want to see the equivalent verilog, change that to "-t v"
<lkcl>
DaKnig: i thoroughly recommend opening the ilang in yosys ("read_ilang {filename.il}" and doing "show top" or "show {press tab key}" if there are submodules
<lkcl>
you'll need xdot and graphviz installed (apt-get install graphviz) for that to work
nengel has quit [Ping timeout: 240 seconds]
<lkcl>
it's quite fascinating to see the results of a linearly-written program as a gate-level tree, graphically.
<lkcl>
i've fixed several early rookie mistakes by always examining the graphviz after every edit
FFY00 has quit [Quit: dd if=/dev/urandom of=/dev/sda]
FFY00 has joined #nmigen
<DaKnig>
does it show the primitives too/
<DaKnig>
?
hitomi2504 has quit [Quit: Nettalk6 - www.ntalk.de]
<whitequark>
a: glasgow has two FIFOs, the one that holds data while it's being received through USB is in the FX2
<whitequark>
DaKnig: regarding arrays with heterogenous elements, an array is basically a more compact way to write a Switch()/Case() construct
<whitequark>
if you use an array on left-hand side, what happens is it expands on a Switch(index) and every Case() contains a single eq where the right-hand side is the same
<whitequark>
if you use an array on RHS, then in every Case the LHS is the same
<whitequark>
(though you could use an Array on LHS and Array on RHS and that has a more complex expansion, but the basic principle is the same)
<FL4SHK>
still working on my Python-based assembler here
<FL4SHK>
going to support variable width instructions
<FL4SHK>
...in reality it's just that a source code instruction may end up as two instructions in the final binary
<FL4SHK>
specifically, instructions that use 32-bit immediates
<whitequark>
kinda like boneless :)
<FL4SHK>
yes
<FL4SHK>
does boneless do that?
<FL4SHK>
instructions that use immediates basically have two possible sizes in my architecture
<FL4SHK>
...in reality there's just a `pre` instruction that expands the size of the immediate in the following instruction
<FL4SHK>
as such it's not truly variable width
<FL4SHK>
`pre` is a real, separate instruction
<tpw_rules>
that's precisely what boneless does
<FL4SHK>
ah
<lkcl>
DaKnig: yes, everything is shown. do try it out, you'll soon see
<FL4SHK>
seems great minds think alike
<FL4SHK>
though in my case, I didn't get the idea on a whim or anything
<whitequark>
i got it from Philip Freidlin I think
<lkcl>
whitequark: cool! i didn't realise Arrays could be used as a type of Switch statement
<FL4SHK>
I was the TA for a course my last semester of grad school
<whitequark>
lkcl: that's basically all they are
<whitequark>
*Philip Freidin
<FL4SHK>
And the architecture that the students had to work with had a `pre` instruction
<tpw_rules>
FL4SHK: one quirk in boneless is that since there's only 3 immediate bits in the arithmetic instructions, they actually index a lookup table if not preceded by `pre` (EXTI)
<tpw_rules>
with like 0, 1, 0xFF, and some others
<FL4SHK>
3 immediate bits???
<lkcl>
immediates are one of the biggest nuisances in ISAs. the Mill uses bit-level compression
<FL4SHK>
just 3?
<FL4SHK>
`pre`, or EXTI, pretty much solves the issue IMO
<tpw_rules>
some have 5 and some have 8 but the arithmetic are 3
<FL4SHK>
3 is very tiny
<FL4SHK>
the architecture the students had to work with was 16-bit with 4-bit immediates
<FL4SHK>
everyone had to provide their own encoding
<lkcl>
tpw_rules: the Mill does a really fascinating compact job: kinda like Huffman Encoding but targetted at FP as well as INT
<FL4SHK>
I personally love the task of building an encoding from an instruction set
<lkcl>
FL4SHK: ooo, 4-bit immediates :)
<FL4SHK>
...as well as building the instruction set
<FL4SHK>
it was pretty ARM-like
<FL4SHK>
the instructor called it PINKY
<FL4SHK>
only had a z flag
<FL4SHK>
still had the set less than instruction of MIPS
<tpw_rules>
lkcl: that sounds kind of bothersome? the two bit aligned ISAs i know are slow and excessively complex
<lkcl>
FL4SHK: me too. i designed an instruction set based around 2-bit "groupings", in 1990.
<lkcl>
tpw_rules: they have the advantage of static data allocation and it really shines for vectorisation. i don't know the full details, they have a working LLVM port
<lkcl>
tpw_rules: for vectors, being able to expand a single bit "0" out to a full 8/16/32/64-bit register is a hell of a saving.
<FL4SHK>
Here's what the architecture I'll be building my main computer with is like
<FL4SHK>
it's a vector machine
<FL4SHK>
and it treats cache lines as vectors
<FL4SHK>
cache lines are used as the primary thing over registers
<lkcl>
FL4SHK: ooo :)
<FL4SHK>
I built a machine like this once
<FL4SHK>
didn't have a data cache
<FL4SHK>
did have instruction cache, though
<FL4SHK>
I think it was a 32 kiB icache?
<FL4SHK>
I can't remember.
<lkcl>
there's... this is a known type of architecture. i forget the name.
<FL4SHK>
The machine didn't have a hardware-enforced mapping from cache line number to address btw
<lkcl>
FL4SHK: it's designed for 3D, right?
<FL4SHK>
well, it would certainly be *good* at that
<FL4SHK>
load instructions and store instructions would set the mapping of cache line number to address
<FL4SHK>
and as such multiple cache lines could share the same data by way of sharing address
<lkcl>
ok. the reason i ask is because in 3D (as we're finding out for LibreSOC), the workloads are basically large-amounts-of-LD, large-amounts-of-processing, large-amounts-of-ST
<FL4SHK>
this type of machine that I built is called "Line Associative Registers"
<FL4SHK>
such a machine hasn't been manufactured before TMK
<lkcl>
there's no overlap. the data that's processed is *not* shared significantly with other batches.
<lkcl>
FL4SHK: it's cool coming up with something new, isn't it? :)
<FL4SHK>
I didn't come up with Line Associative Rgisters
<lkcl>
awww
<FL4SHK>
but my master's project was to create a LARs processor
<FL4SHK>
I made the first one with floating point of any variety
<FL4SHK>
...but it was this bad fp format, bfloat16
<lkcl>
interesting
<FL4SHK>
it has type tagged registers
<lkcl>
urrrr i know it
<FL4SHK>
bfloat16 was only chosen for the simplicity of implementation
<FL4SHK>
and simplicity of, well, testing!
<lkcl>
i love type-tagged registers: it's what we'll be using in Libre-SOC.
<FL4SHK>
here the type tag is set by the load or store instruction
<lkcl>
oh: have you looked at the Mill?
<FL4SHK>
Not yet
<FL4SHK>
I know of it
<lkcl>
yeeees, i was going to say, that's exactly what the Mill does :)
<FL4SHK>
the Mill sounded interesting
<DaKnig>
lkcl: dont forget , mov is turing complete
<lkcl>
the LD operation basically specifies the width... that's it.
<lkcl>
there's no ADD8, no ADD16, no FPADD16 instruction: there's just... ADD.
<FL4SHK>
LARs as a concept was designed with mitigating the memory bottleneck
<DaKnig>
ld+st is turing complete so you dont need actual processing :)
<lkcl>
DaKnig: cool! :)
<FL4SHK>
transport triggered architectures come to mind
<lkcl>
FL4SHK: however you'll likely find, just like the Mill, that you'll need a "widen" and a "narrow" instruction
<FL4SHK>
lkcl: the thing I built didn't have those
<FL4SHK>
loads and stores set both the address and type tag of a cache line
<lkcl>
which (for int) does zero/sign-extending and (for float) does FP conversion
<FL4SHK>
the architecture does automatic casting
<FL4SHK>
if two registers have different types
<FL4SHK>
let's say you add rB and rC, storing the result in rA
<lkcl>
FL4SHK: yep, totally with you. i totally get it: this is the basis of Simple-V, the Vectorisation system i invented
<FL4SHK>
oh
<lkcl>
and the Mill does it as well
<FL4SHK>
okay then
<FL4SHK>
what are `widen` and `narrow`?
<lkcl>
Mill "widen" instruction: sign-extend and zero-extend. anything that was tagged as say "INT8" will be "widened" to whatever-the-widen-instruction says
<lkcl>
INT16, INT32 etc.
<FL4SHK>
LARs instruction sets don't have specific sign extend and zero extend instructions
<FL4SHK>
an add instruction will take care of it
<FL4SHK>
due to automatic casting
<lkcl>
basically it uses the "tag" that originally came from the LD, and the new "tag" of the...
<lkcl>
ah yes!
<lkcl>
ah... but does the "ADD" instruction *contain* the new tag?
<FL4SHK>
no, the tag is only set by loads and stores
<FL4SHK>
loads and stores also don't necessarily access memory due to associativity
<lkcl>
ok... so to do a convert from INT8 to INT32 you would need to do a "fake load of zero" into an INT32
<lkcl>
*then* ADD the INT8 number to that zero-loaded register
<FL4SHK>
loads being associative means you can just walk a register without needing to actually touch memory
<lkcl>
which in terms of the number of opcodes and cycles is sub-optimal
* lkcl
tck, tck.... thinking....
<FL4SHK>
you might have a delay of like, one clock cycle
<FL4SHK>
zero extension and sign extension are less necessary on this machine because you can just do 8-bit, 16-bit, 32-bit, etc. arithmetic natively
<lkcl>
i'm trying to think this through... what does it mean "loads are associative"?
<FL4SHK>
well
<FL4SHK>
if you load from an address already in the LAR file
<FL4SHK>
you don't need to read from memory
<DaKnig>
how long are those registers, again?
<DaKnig>
lines, whatever you called em
<lkcl>
ok, yes. got it.
<FL4SHK>
you just set the destination LAR to the contents of your other LAR that already has this data
<FL4SHK>
a load instruction, counterintuitively, might cause you to actually write back to memory
<lkcl>
so one of those would best be "reserved" as a "always containing zeros" line, by convention at the assembly / ABI level
<FL4SHK>
that's probably something you want
<FL4SHK>
a zero register
<FL4SHK>
DaKnig: like register widths, cache line widths, etc. in a normal architectures, that's set by whoever makes the instruction set
<lkcl>
yehyeh i was thinking that.
<FL4SHK>
I don't remember what my most recent LARs architecture did for that
<FL4SHK>
I think I picked 256 bytes per LAR?
<FL4SHK>
64 LARs, 256 bytes each
<FL4SHK>
oh, other thing, lkcl
<FL4SHK>
Fully LARs-based machines have only LARs, no regular cache or registers
<FL4SHK>
this includes the instruction side
<FL4SHK>
you have instruction LARs and data LARs
<FL4SHK>
instruction LARs get loaded via `fetch` instructions
* lkcl
raises eyebrows at instruction LARs :)
<FL4SHK>
...and normally you'd want your source code to not have `fetch` instructions
<FL4SHK>
software is supposed to provide a guarantee that the pipeline fetching always fetches from an ILAR
<FL4SHK>
...I'd personally say to throw an exception if there's a miss
<FL4SHK>
I was planning on doing that.
<FL4SHK>
here's how software provides the guarantee
<FL4SHK>
get the Binutils level software inserting `fetch`es
<FL4SHK>
for my next LARs processor, which is *not* the one I want to use for my main computer
<FL4SHK>
I want to not have virtual memory in that machine
<FL4SHK>
DaKnig: don't forget about my FIFOs that I made
<FL4SHK>
those also show how to use arrays
<FL4SHK>
I'm also using one of my FIFOs
<FL4SHK>
I was originally not using first word fall through
<FL4SHK>
but now I am
<lkcl>
FL4SHK: fwft is reeaaally tricky. whitequark went to a lot of trouble to write formal correctness proofs for the FIFO classes in nmigen
<FL4SHK>
lkcl: it is?
<FL4SHK>
Maybe it's not first word fall through that I did, then?
<FL4SHK>
I tested it
<FL4SHK>
I might have implemented something other than fwft
<lkcl>
FL4SHK: well... it is for most people. you appear to have a well-above-average capability in hardware design :)
<FL4SHK>
like... I find CPUs much more difficult than I did that thing
<FL4SHK>
or at least the types of CPUs I'm making
<FL4SHK>
simple ones are easy
<lkcl>
stuff that's known to be *really* hard computer science you're like, "pffh" :)
<lkcl>
yehyeh
<FL4SHK>
I can make a multi-cycle CPU in my sleep
<FL4SHK>
...er, by that I mean, a big freaking state machine CPU
<lkcl>
FL4SHK: try a multi-issue Out-of-Order engine some time
<FL4SHK>
*that's* something I haven't done before
<lkcl>
yehh i went with a FSM for the early version of Libre-SOC, just to get "instructions into pipelines" without having to worry about register dependencies
<FL4SHK>
I want to make a multi-core, out of order, multi-issue LARs machine
<FL4SHK>
nobody has done this before
<FL4SHK>
...oh, and virtual memory
<lkcl>
it took me *six months* of study with Mitch Alsup's help to understand the CDC 6600.
<FL4SHK>
what kind of things are you referring to as really hard computer science?
<FL4SHK>
all I did was shift reads to be asynchronous
<FL4SHK>
just like I've done with block RAM before
<awygle>
FWIW I found the docs page of Array more confusing than helpful. Once I realized it's a list you can index with a signal it clicked.
<Yehowshua>
Imagine if one day you could just do something like m.submodules.pcie = PCIe()
<lkcl>
FL4SHK: ah yes that *might* be different.
<lkcl>
Yehowshua: yeah, litex and fusesoc are intended to be that kind of level. and it's what we'll need
<FL4SHK>
I referred to it as a first word fall through FIFO but
<FL4SHK>
it still does some stuff synchronously
<FL4SHK>
it does what I needed it to
<FL4SHK>
Maybe first word fall through isn't what I needed at all
<lkcl>
FL4SHK: does it mean that: if the FIFO is empty, and it is written to, that the data coming in is available for reading *on the same cycle*
<lkcl>
that's "fwft" as best i can tell.
<lkcl>
without fwft, you will always have a 1 clock cycle delay, guaranteed, between incoming and outgoing data.
<lkcl>
even if the FIFO is currently empty
Yehowshua has quit [Remote host closed the connection]
<lkcl>
FL4SHK: if you're looking to do Out-of-Order, i recommend looking up "Design of a Computer" by James Thornton. it's available online (free) thanks to Thornton giving permission around 2010
<lkcl>
he was very old. his wife wrote a hand-written letter to the person who asked if he could put a copy of the book online
<lkcl>
and if you're interested in precise exceptions, branch speculation etc. i have some augmentation chapters written by Mitch Alsup that help explain how to do that, on top of the original 6600.
<FL4SHK>
don't need branch prediction for the type of thing I'm doing
<lkcl>
he also showed me how to do O-o-O memory management, which would be relevant for the LARs concept
<d1b2>
<TiltMeSenpai> is this "The Control Data 6600"?
<FL4SHK>
the fact that software guarantees no instruction ILARs misses means you can get some other assumptions
<lkcl>
that took me 3-4 weeks to understand, on its own.
<lkcl>
dlb2, TiltMeSenpai: yes :)
<FL4SHK>
I didn't have a delay of 1 clock cycle for reading
<FL4SHK>
but I did for writing
<FL4SHK>
so this is probably something else
<lkcl>
what about simultaneous read-and-write, on the same clock cycle? what happens then?
<FL4SHK>
the only thing htat's not synchronous is reading from the array inside the FIFO
<lkcl>
yes it sounds like it isn't fwft. fwft is definitely the following conditions (all on same clock cycle):
<lkcl>
* FIFO is empty
<lkcl>
* write occurs
<lkcl>
* read occurs
<lkcl>
* write "falls through" to the read port
<FL4SHK>
that really doesn't sound that bad
<FL4SHK>
it sounds a lot like something I've done for register files before
<lkcl>
it's the sort of thing that's enough of a nuisance that people don't want to have to reinvent it (and get it wrong)
<FL4SHK>
Where you'd need to read what was currently being written
<lkcl>
yes, funnily enough, it's exactly the same.
<Lofty>
Isn't that transparent read?
<FL4SHK>
that's really not that hard to me
<lkcl>
FL4SHK: yes - but it takes time, and people get it wrong, and then things break.
<FL4SHK>
doesn't sound hard at all to me
<lkcl>
Lofty: on regfiles? i believe so. it's kinda like having an operand forwarding bus built-in to the regfile
<FL4SHK>
it's just a matter of dealing with "next state" stuff
<Lofty>
It's not specific to register files
<Lofty>
It's a property of the memory
<FL4SHK>
I haven't needed a first word fall through FIFO before, though
<lkcl>
FL4SHK: it took our team 2-3 weeks to write the regfiles with transparent reads, and unit tests, and formal correctness proofs.
<FL4SHK>
what
<FL4SHK>
why?
<FL4SHK>
it took me that long for, say, my LAR file in my original LARs machine
<FL4SHK>
*that* was a hard project
<lkcl>
because this stuff is hard - for us - and we're not confident it would "work", so had to make sure by spending the time writing unit tests that gave us the confidence in the code
<lkcl>
i don't think you fully appreciate: you really do have a well-above-average level of competence in hardware design :)
<lkcl>
that was a compliment btw
<FL4SHK>
oh, well, thanks
<whitequark>
Lofty: that's a transparent read if you look at a memory alone
<whitequark>
but it's called first-world fallthrough on a FIFO
<whitequark>
same concept though
<Lofty>
Ah, I see, thank you
<awygle>
isn't it still FWFT even if it has latency >0, as long as you don't have to tick the output port to get the new data to show up?
peepsalot has joined #nmigen
<FL4SHK>
lkcl: to me, a register file is something you don't even really need to verify
<FL4SHK>
other than by looking at the source code
<lkcl>
FL4SHK: certain industries absolutely cannot take the author's word for it - or the source code.
<FL4SHK>
lkcl: I don't think it takes much to formally verify a register file
<FL4SHK>
like, 2 to 3 weeks is a *lot*
<FL4SHK>
a LAR File, on the other hand
<FL4SHK>
that's difficult to formally verify
<FL4SHK>
or at least it was for me
<FL4SHK>
...I'd probably have an easier time with it today
<FL4SHK>
LAR files are not simple like caches
<lkcl>
indeed. as will be the OoO Dependency Matrices.
<lkcl>
when you have an out-of-order design, formal correctness proofs - that data has been correctly been read/written in the right order to the register file - becomes far more challenging
<FL4SHK>
when you say formal verification, do you mean with yosys?
<FL4SHK>
because that's the definition I was using
<lkcl>
symbiyosys - yes.
<lkcl>
which uses SAT solvers like yices2, etc., yes
<lkcl>
i have been thinking about how to verify the OoO Dependency Matrices for some time.
<lkcl>
how to guarantee that the instruction issue order is the same as the completion order *where it actually matters*
<FL4SHK>
I'd write everything in nMigen at this point
<lkcl>
because - haha - in some cases it doesn't matter. yes, that's what we're doing. everything's in nmigen.
<FL4SHK>
I'll need to study up on out of order machines
<lkcl>
"add r1, r2, r3" and "add r4, r5, r6" do *not* matter what the completion order is because there's no dependency hazards
<FL4SHK>
I understand that much
<lkcl>
FL4SHK: the "normal" algorithm - the one that everyone quotes - is the Tomasulo Algorithm.
<FL4SHK>
I've heard of it
<lkcl>
there's a really good video on youtube by an indian guy, who explains it really well
<FL4SHK>
but I don't know its details
<FL4SHK>
since I'm unaware of its details, I might come up with my own thing
<FL4SHK>
it'd be fun to say "hey, look, this is my own algorithm"
jeanthom has quit [Ping timeout: 240 seconds]
<lkcl>
once you understand that, i can point you at a page which allows understanding of the precise capability of the (augmented) 6600.
<lkcl>
:)
<FL4SHK>
I don't think I want to see the Tomasulo Algorithm
<lkcl>
there's some things you definitely need to think through.
<lkcl>
do you want interrupts to be serviceable immediately?
<FL4SHK>
if it's a machine with out of order execution, probably not
<lkcl>
do you want multiple LOAD/STOREs to be done in parallel without data corruption?
<lkcl>
FL4SHK: actually, the precise-augmented 6600 *can* handle interrupt-servicing immediately, because there's a way to cancel outstanding in-flight instructions
<FL4SHK>
What does "outstanding" mean in this case?
<lkcl>
"work that's in pipelines or waiting to be put *into* pipelines that hasn't hit the register file yet"
<lkcl>
aka "in-flight"
<FL4SHK>
I'll need to think about what I should do for micro ops
<lkcl>
some OoO designs use a "rollback history" system. others "hold off" from writing anything that could cause "damage"
<FL4SHK>
I don't want to study existing ideas for micro ops
<FL4SHK>
nooo don't tell me
<lkcl>
micro-ops according to Mitch Alsup is... haha :)
<lkcl>
you really do want to discover this stuff for yourself, don't you? :)
<FL4SHK>
yes
* lkcl
zzip. with extra gaffa tape.
<lkcl>
mmMmmh, mmhmhhh!
<lkcl>
if you get stuck just ask.
<FL4SHK>
things that I don't outright need to know like what the virtual memory system needs to do for OSes to work
<FL4SHK>
I don't know if I want to know much in advance.
Yehowshua has joined #nmigen
<Yehowshua>
I guess its meeting time?
<lkcl>
FL4SHK: what will be fascinating is if you document all of this and put it online as libre software
<lkcl>
oh?
<lkcl>
oh!
<whitequark>
yep, meeting time
<FL4SHK>
software, eh
<FL4SHK>
I thought this was hardware!!!
<FL4SHK>
the reality is that hardware and software might as well be the same thing...
<whitequark>
i'll begin with the status update. not much to report; i've been looking into cleaning up cxxsim and getting it into master proper
<whitequark>
that will probably take a little more time
<whitequark>
the main issue i'm having is organizing the simulator guts; there are a bunch of things that are shared (the public interface essentially) and a bunch of things that are similar but not really shared exactly
<whitequark>
right now there are two Simulator classes that inherit from the same "core" class
<whitequark>
i think that's not a particularly great design; it's confusing to which one you refer, it's easy to have their interfaces accidentally diverge, it requires you to substitute or rename imports
<Yehowshua>
So I remember sometime ago that you can express new logic **in** simulation
<Yehowshua>
How does CXX handle that?
<whitequark>
oh?
<whitequark>
can you elaborate?
<Yehowshua>
Yes - you could do a.eq(b) in a process
<Yehowshua>
Lemme find an example
<whitequark>
oh right
<whitequark>
when you do that in a process, it doesn't add logic; it executes once and instantaneously
<whitequark>
more like a regular assignment
<lkcl>
whitequark: yes. we've already started doing this:
<lkcl>
cxxsim = True
<lkcl>
from nmigen.sim.cxxsim import Simulator, Settle
<lkcl>
if cxxsim:
<lkcl>
else:
<lkcl>
from nmigen.back.pysim import Simulator, Settle
<whitequark>
lkcl: yeah so that sucks imo
* cr1901_modern
is here in read only mode mostly
<whitequark>
i mostly did it because i wanted to get something out for you folks to test
<lkcl>
whitequark: really appreciated
<lkcl>
the usual "solution" is a Factory class system
<whitequark>
what i think would be a better design is having a single Simulator, all the commands, etc, and the Simulator would take a SimulationEngine that would actually implement it
<whitequark>
the Engine would be mostly or completely opaque to user code
<whitequark>
so it'd be something like `from nmigen.sim import PythonEngine, CxxEngine`
<lkcl>
that'd work. it's one step away from a full "class Factory" (where engine is a string and the Factory class looks that up in a dictionary)
<whitequark>
there is a good reason I don't want to make the engine argument a string
<whitequark>
the reason is that importing CxxEngine, right now, pulls in Python's ctypes
<whitequark>
but... that's unlikely to work all that well on PyPy, and I know you folks use PyPy among other things
<whitequark>
PyPy needs cffi, but that's an external dependency
<lkcl>
yuk
<lkcl>
oh
<whitequark>
well, it doesn't strictly speaking *need* cffi
<whitequark>
it just works much faster with cffi
<whitequark>
and this is hella important, because the ctypes overhead is massive
<lkcl>
i meant to say: i had a suggestion instead of using ctypes?
<awygle>
i'm here. sigh.
* lkcl
waves to awygle
<whitequark>
it is in fact so large that cxxsim is only ~2x faster than pysim!
<whitequark>
where in reality it should be something like ~100x faster
<lkcl>
the idea is: at the same time as auto-generating the cxxsim.cc, actually auto-generate a c-based python module that matches it.
<Yehowshua>
So its not very hard to write driver code around CXXRtl directly in C++.
<whitequark>
lkcl: yeah and that would work even worse on pypy
<Yehowshua>
I'm also not opposed to writing such drivers
<lkcl>
rather than try to use swig (or other c-to-python wrapper), just auto-generate the c module.
<lkcl>
whitequark: sigh :)
<Yehowshua>
Like for a large design, the user could double down and write the driver code themselves
<whitequark>
Yehowshua: i'm quite certain we can speed things up
<whitequark>
i'm not presenting this to you as some sort of insurmountable problem
<whitequark>
i'm merely describing how much overhead ctypes has
<Yehowshua>
Ah
<whitequark>
i haven't even tried solving this so far in the code that i wrote
<lkcl>
Yehowshua: true. we just discussed that yesterday, how simulated verilator peripherals are written in c
<whitequark>
simulated cxxrtl peripherals (aka cxxrtl black boxes) are, naturally, written in c++
<lkcl>
i wasn't suggesting *replacing* the use of ctypes. but... a speed up of 50x, if achievable by not using ctypes is possible for /usr/bin/python3, that's quite compelling
<whitequark>
a speed up of 50x would, i think, only be possible by ditching python completely
<whitequark>
i.e. have the simulation call back into python only when it actually needs something from python
<Yehowshua>
hmm... what about pypy... Does it have CTypes support?
<Yehowshua>
Jitted pypy get quite fast
<whitequark>
basically, nmigen would tell cxxrtl "here is your clocks, and here is the trigger you need to call me back on, and now do this all on your own"
<whitequark>
Yehowshua: yes but ctypes has some design problems that prevent pypy from being efficient with it
<whitequark>
which is one reason for cffi's existence
<cr1901_modern>
I'm confused... Is the "only 2x faster" thing a new regression? I could've sworn cxxsim was an order of magnitude faster previously...
<whitequark>
cr1901_modern: it's only 2x (on minerva, the speedup will be higher on larger designs) faster when used through nmigen
<whitequark>
let me rephrase
<whitequark>
cxxrtl is 100x faster than pysim, but cxxsim (nmigen's cxxrtl integration) is 2x faster
<cr1901_modern>
ahhh okay
<lkcl>
with LibreSOC, we're seeing about... 10-20 instructions per second executed in pysim
<lkcl>
that's without the IEEE754 FPU added
<whitequark>
what about cxxsim?
<lkcl>
we're running into that ready/valid bug, on every aspect of the design
<whitequark>
right but i don't think it should affect how fast the design runs
<lkcl>
ready/valid signalling is a core aspect of the data "management" because it's an OoO design rather than a simple, straightforward pipeline
<whitequark>
(the problem is virtually certainly on python side)
<lkcl>
none of the unit tests pass, so i can't get it running to the point where i can tell how fast it is
<whitequark>
mm, okay
<lkcl>
because they _all_ use ready/valid style communication, unfortunately.
<whitequark>
btw Yehowshua could you minimize the testcase further? that would really speed up the process of fixing the bug
<whitequark>
it's somewhat subtle
<Yehowshua>
Yes. I was having Michael work on that
<whitequark>
great
<lkcl>
Yehowshua: cutting out the actual "shift" should do the trick and just have a countdown.
<whitequark>
anyway, let's go on to the next item
<Yehowshua>
I'm curious about nmigen-soc
<whitequark>
yeah?
<Yehowshua>
What's the vision? How does it compar/complement LiteX?
<Yehowshua>
**compare
<lkcl>
and with OpenPITON?
<whitequark>
nmigen-soc is a SoC *gateware toolkit*
<whitequark>
so it gives you all the buses, and it gives you a BSP generator
<whitequark>
but it doesn't, for example, know how to build firmware
<Yehowshua>
OK
<whitequark>
and it doesn't have a BIOS, it doesn't have any preference on what your language is
<whitequark>
could be C, C++, Rust, whatever
<whitequark>
LiteX could (in principle) be built on top of nmigen-soc
<Yehowshua>
I see now
<whitequark>
if/when LiteX migrates to nmigen
<lkcl>
whitequark: it sounds very much like what we need to complete Libre-SOC.
<lkcl>
if you're familiar with OpenPITON, they can specify the full spec of (an) SoC as a JSON file.
<lkcl>
templating *shudder* then creates the ennntiirre SoC including AXI4 bus infrastructure
<Yehowshua>
Looking at issue 10, I see that the peripherals would be asynchronous - as in able to cross multiple clock domains
<lkcl>
it sounds to me like nmigen-soc would be the "bedrock" of a nmigen equivalent of that
<whitequark>
yeah
<lkcl>
how do CSRs work? they're just registers (in effect) but named so that, if needed, they can be "addressed" by wishbone/AXI4?
<whitequark>
yeah more or less
<lkcl>
ok
<awygle>
correct me if i'm wrong, but it seems like nmigen-soc is a placeholder and a sketch right now, and there's a fair bit of design work to be done before it's "finalized". is that accurate?
<jfng>
yes
<awygle>
hi jfng, was wondering if you were here :)
<whitequark>
awygle: i think the parts that are already there are reasonably functional, and shouldn't radically change
<whitequark>
but there are a lot of things missing
<lkcl>
oh.
<jfng>
you can already use parts of it to build SoCs (e.g. the busses part)
<whitequark>
so i wouldn't say it's a placeholder (nmigen-stdio is, though), but it's definitely unfinalized
<Yehowshua>
So jnfg, I've talked to you a bit before. awygle, haven't really said hi to you before - so hello.
<jfng>
what is currently lacking is the integration tools
<awygle>
howdy :D
<FL4SHK>
calculus seems a little out of the ordinary for hardware
<lkcl>
right. one thing that we need for Libre-SOC is a way to check if a particular address is valid
<FL4SHK>
except not really
<lkcl>
*before* actually issuing the request.
<FL4SHK>
sorry, just a joke
* FL4SHK
leaves again
<lkcl>
FL4SHK :)
<whitequark>
the main reason i haven't spent a lot of time on stdio/soc yet is because we don't have streams or interfaces yet
<cr1901_modern>
I remember a long time ago (~1 year ago) we discussed "how do we build firmware for nmigen SoCs"? Is the idea now that nmigen-soc will _not_ handle building at all?
<whitequark>
which is why those are on the agenda today, among other things
<jfng>
a subset of which (peripherals) is the current focus of development
<lkcl>
this because we're doing an out-of-order design and there will be multiple (parallel) memory requests outstanding
<FL4SHK>
what do you mean by interfaces whitequark?
<whitequark>
FL4SHK: we'll get there in a bit :)
<FL4SHK>
might be SV interfaces
<jfng>
(hi awygle :) )
<FL4SHK>
but I find that classes do that job mostly
<whitequark>
what i have in mind is not intentionally related to SV interfaces
<whitequark>
yeah sorry :/ we should get one of those IRC bots that convert links to titles
<Yehowshua42>
lkcl will be replaced
<whitequark>
anyway, I wanted to make nMigen records something that you could use like you use any ordinary value
<lkcl>
Yehowshua42: lol
Yehowshua has quit [Ping timeout: 245 seconds]
<whitequark>
initially, Record was a direct subclass of Value, and treated specially everywhere
<whitequark>
this was somewhat controversial because people tried to inherit from it, and I didn't really want to support that
<lkcl>
whitequark: yeah. if it was vhdl it would not be possible. however: python, go figure. it just feels... "natural" to inherit (and extend).
<whitequark>
eventually what I arrived at is making an "UserValue", which is a special Value you *can* inherit from, under the condition that it always lowers to some other nMigen value
<whitequark>
which is what Record does (it lowers to a concatenation of its fields)
<lkcl>
i remember we had quite a discussion about the implications of modifying inherited instances after they'd been used (once)
<whitequark>
this was insightful because there are clearly quite a few more use cases for this kind of thing
<Yehowshua42>
So if I understand correctly, UserValue is its own thing
<awygle>
yes, the ability to "plug in" custom data types to the nmigen infrastructure is very useful
<whitequark>
unfortunately UserValue has a fatal flaw
<whitequark>
which is to say, it inherits from Value, and Value has a ton of methods with various names, and is getting regularly expanded
<whitequark>
what this means in practice is that, unless we fix the flaw, we can never add methods to Value
<whitequark>
because it'll break user code that uses the same names e.g. in record fields
<whitequark>
I can definitely foresee someone using a field called "shift_left"
<lkcl>
... which pollutes the namespace into which you'd consider adding "things"
<whitequark>
yes
<Yehowshua42>
Is it possible to have something that doesn't inherit from value but always lowers to value? Maybe the user can define how it lowers?
<whitequark>
that is precisely what #355 is about
<lkcl>
i encountered this problem when creating RecordObject. i "fixed" it by overriding __getattr_
<Yehowshua42>
one thing I'd add to the list of things to discuss if we have time is bringing industry attention to nMigen
<lkcl>
by over-riding __setattr__ it becomes possible to check if the thing being added is already in use ("shift_left")
<awygle>
FL4SHK: the problem with that is we might break all your code because of a totally unrelated change in a new version of nmigen
<whitequark>
yep
<lkcl>
and to raise an exception
<whitequark>
lkcl: but it would be better if this problem didn't exist in first place
<lkcl>
whitequark: true :)
<whitequark>
anyway, Yehowshua42 has the right solution here: a separate class that doesn't inherit from Value, but which is *castable* to Value
<whitequark>
we didn't already have that implemented because... UserValue predates Value.cast IIRC
<lkcl>
oo intriguing
<lkcl>
i like it
<whitequark>
there are a few things that we should decide
Yehowshua42 has quit [Quit: Ping timeout (120 seconds)]
<whitequark>
(1) what should it be called? UserValue is a bad name because it'll also be used internally in nMigen
<whitequark>
already used in fact
<awygle>
trait To<Value>
<awygle>
i'm obviously joking about the specifics but i think that's the general tone we should shoot for. i don't _love_ CustomValue although i'd be OK with it
<whitequark>
yeah, same
<whitequark>
I'm open to better names
<whitequark>
ValueCastable?
<awygle>
ToValue, Castable, AsValue, Lowerable
<whitequark>
to go with Elaboratable?
* lkcl
chooses engineering-style names
Yehowshua has joined #nmigen
<jfng>
UserValue.as_value() seems redundant
<Lofty>
I think ValueCastable works quite well, actually
<Yehowshua>
I agree with Lofty
<lkcl>
oh wait... so the idea is similar to Elaboratable in concept?
<awygle>
it's a bit wordy but it 's probably the clearest
<Lofty>
I'm willing to take wordiness for clarity
<Yehowshua>
Same
<whitequark>
lkcl: yeah, kinda. Elaboratable is an interface that lowers to a Fragment (usually a Module when you write it); ValueCastable lowers to Value
<awygle>
i'd say yes. it's a marker class in the same way that Elaboratable is
<lkcl>
ValueCastable... yyeah.
<whitequark>
excellent, let's go with that name then
<whitequark>
the cast method should be .as_value I'm pretty sure, we already have the convention going
<Yehowshua>
Defining how something lowers - is this a new idea? Does some other HDL have that?
<whitequark>
.as_signed()
<awygle>
yeah strong agree with as_value
<Lofty>
Yep, as_value works nicely.
<cr1901_modern>
+1 to as_value
<lkcl>
is that to be implemented *by* each inheritor of ValueCastable?
<whitequark>
Yehowshua: not sure, but it seems straightforward
<whitequark>
lkcl: correct
<d1b2>
<TiltMeSenpai> .as_value() feels very rust-y, I like it
<Lofty>
I think there are a fair few people with Rust experience here
<whitequark>
TiltMeSenpai: nmigen is generally fairly rusty; not overwhelmingly so, but I do borrow ideas
<d1b2>
<TiltMeSenpai> yeah
<Yehowshua>
Yeah - it is straight forward. So much so that I'm wondering why didn't someone else think of this?
<awygle>
then it should be to_value :p
<Yehowshua>
Rust is a bootiful language
<awygle>
or into_value
<awygle>
rather
<lkcl>
mmmm then i can see that getting "old" quite quickly - enough so that people start putting it into classes that they then inherit from
<whitequark>
into_* is about ownership
<awygle>
i know, i'm joking
<whitequark>
right ok
<awygle>
(mostly joking, partially lamenting python's weak type system)
<awygle>
either way nothing productive
<lkcl>
one of the really nice things about Record (and RecordObject): there's one function (the constructor) and that's it
<whitequark>
lkcl: i don't expect there will be many direct subclasses of ValueCastable
<Yehowshua>
python is a snek. sneak's have no bones...
<whitequark>
but i don't see anything wrong with subclassing it further
<Yehowshua>
**snakes - commenting on weak types
<Lofty>
I mean, nMigen tries to compensate for the type system as much as possible
<whitequark>
it's completely your code, do whatever you want, nmigen will take it
<lkcl>
whitequark: oh this is "internal to nmigen" classes we're talking, rather than user-created ones?
<whitequark>
nope, ValueCastable is something downstream code would use
<whitequark>
your RecordObject would probably inherit from it
<whitequark>
eventually
<whitequark>
nmigen Record would inherit from it
<awygle>
somebody somewhere needs to implement the lowering, and we should expose that
<Lofty>
This really does sound like a trait at this point :P
<lkcl>
ok. right. and then RecordObject would be used (without having every instance that inherits from RecordObject have its own to_value)
<whitequark>
Lofty: it *is* a trait
<whitequark>
lkcl: yes
<awygle>
if you want Record lowering semantics you should be able to inherit from Record
<lkcl>
okaaay
<whitequark>
(Record is going away though)
<whitequark>
(but yeah)
<awygle>
well yes, but it's the only example we all have experience with currently
<whitequark>
yeah
<Yehowshua>
as in will be deprecated?
<whitequark>
yes
<awygle>
nice segue :p
<Lofty>
And eventually removed
<whitequark>
everything public goes through a deprecation cycle of at least one release
<Yehowshua>
Well - you have to do what you have to
<whitequark>
more if it turns out to be a major issue for downstream code
<Yehowshua>
Just own it like Apple killing CD drivers
<Yehowshua>
**drives
<whitequark>
we don't *just* break downstream code if we can do it at all
<lkcl>
Yehowshua: we may have to take a copy of Record and maintain it externally (in nmutil) at the crossover/deprecation point
<whitequark>
we have one more aspect on ValueCastable
<awygle>
oh sorry
<Yehowshua>
lkcl - and or eventually re-write to use valuecastable
<whitequark>
it is related to an edge case of (incorrect) user code returning different things when .as_value() is called multiple times
<Yehowshua>
In fact - I think re-writing codebases every once in a while is a good exercise
<Yehowshua>
Albeit painful
<lkcl>
Yehowshua: it depends on timing of the Oct 2020 tape-out
<lkcl>
whitequark: oh?
<Yehowshua>
Yeah - not for a while
<whitequark>
if you return different results from .as_value() when it is called (by nmigen, during casting) multiple times
<whitequark>
you can end up with wildly internally inconsistent ASTs
<whitequark>
and things will break in a confusing way well after the fact
<Lofty>
I want to just say "this is undefined behaviour", but obviously that's not a solution; how could one catch a situation like that?
<awygle>
is this something we can check for? we have the lazy lowering stuff in the current UserValue
<lkcl>
this is similar to the original discussion we had for RecordObject: what happens if someone adds things to the RecordObject *after* constructor time?
<awygle>
we could check for equality there
<whitequark>
awygle: we can't check for equality
<lkcl>
except the problem's now moved to to_value()
<whitequark>
Values override ==
<awygle>
it being python you can still override __eq__ but then it should be obvious you're making things worse
<Lofty>
This seems like a "murphy versus machiavelli" problem, almost.
<whitequark>
no no, .as_value() returns a Value, and Value does override __eq__
<cr1901_modern>
whitequark: Not saying returning different results is a good idea, but... why can't that be caught internally in nmigen via an isinstance() dance?
<whitequark>
uh, how would isinstance() help?
<Lofty>
They're all instances of Value, right?
<awygle>
mm i guess Value.cast is static so you can't really store previous results there, you'd have to do it internal to the ValueCastable, which means the user can still do Bad Things
* lkcl
wonders two things. (A) is it Officially Nmigen's Problem at all (B) if it is, can hashing of the AST be done, keeping a dictionary of first-usage and comparing it against subsquent uses?
<awygle>
or that
<whitequark>
(a) yes, it's a footgun, (b) that's not trivial, hence the discussion
<whitequark>
let me explain the options we have
<whitequark>
and why they're all bad
<cr1901_modern>
Oh, the "child" type is erased when is_value() is called
<whitequark>
awygle: i'm not being defensive against the user doing Bad Things on purpose; that is not possible in Python
<whitequark>
(or even in Rust really, though it is harder there)
<Lofty>
<Lofty> This seems like a "murphy versus machiavelli" problem, almost.
<lkcl>
Lofty: lol
<whitequark>
(you can already shell out to gdb and change private fields in your program without UB)
<whitequark>
can always*
<awygle>
i copy, i meant "without realizing, while thinking they're doing the right thing"
<whitequark>
what i'm defending against is *accidental misuse*
<whitequark>
yes
<whitequark>
so there are two general options we have for this case
<whitequark>
detect or prevent
<Lofty>
I'm wondering if there's some feasible way of evaluating as_value exactly once.
<whitequark>
and there are two places we can do it in
<whitequark>
as_value() itself, and Value.cast()
<whitequark>
detection would mean memorizing the value when it's first returned, and comparing them the next time the function is called
<whitequark>
this seems pointless: it's strictly more work than prevention
<whitequark>
okay, it's not actually pointless, it eagerly shows that there is a bug in the user code
<Yehowshua>
I'm scratching my head on how to implement prevention
<whitequark>
prevention in this case would mean memorizing the value when it's first returned, and then never calling the user function again
<lkcl>
agreed: if you're going to go to the trouble of detection, and you *know* it's going to cause problems in 100% of cases, logically that suggests prevention
<Lofty>
So prevention would be "evaluate exactly once"
<lkcl>
mmm except...
* lkcl
thinks
<whitequark>
yes. we currently do that. but it's not a very good implementation
<lkcl>
someone calling it twice would go "why doesn't it do what i want the second time??"
<awygle>
i actually think detection is better than prevention
<awygle>
yes, for that reason
<awygle>
detection will loudly tell you you've made a mistake. prevention will silently do _something_ which may or may not be right
<lkcl>
unless.. haha, you actually monkey-patch the module and **REMOVE** the to_value function after it's first called :)
<Lofty>
lkcl: machiavelli
<Yehowshua>
Why not combine detection and prevention
<Yehowshua>
That is prevent
<Yehowshua>
But then inform the user you prevented
<lkcl>
or, you monkey-patch it to replace it with "don't call this again, here's why"
<Lofty>
If a detection trips, it should be fatal
<lkcl>
*or*...
<whitequark>
it should be a hard error if we detect at all
<Lofty>
How do you know which one is the intended result?
<whitequark>
no monkey patches
<lkcl>
you use an over-ride on __getattr__ which checks if the thing being accessed is named "to_value"
<whitequark>
no overrides
<whitequark>
no metaclasses
<awygle>
what we need is linear types basically, but we can't have those. currently i like the memoization option the best.
<whitequark>
no weird junk people will get confused by
<cr1901_modern>
I don't see what's wrong with memoization
<whitequark>
yes, let me explain
<whitequark>
so the way memoization would be implemented is by adding a private field on the user class (it will end up being named _Value__casted or something) in Value.cast
<whitequark>
that's fine
<whitequark>
we can then either detect or prevent or whatever
<whitequark>
the problem is that .as_value() is a public function
* lkcl
thinks...
jeanthom has joined #nmigen
<lkcl>
oh. i wonder if, just like in Elaboratable detects "def elaborate", if it's possible to require that a base-class function be called
<whitequark>
and, suppose one has a PackedStruct and one wants to rotate it or shift it for whatever reason
<lkcl>
that base-class function will set the flag "to_value_has_been_called"
<whitequark>
so you'd write packed_struct.rotate_left(10) but that doens't work cuz it's not a Value
<FL4SHK>
I have a concern
<FL4SHK>
will I have to change my existing code that uses `Record`?
<whitequark>
FL4SHK: eventually, yes, because Record will be deprecated and removed
<whitequark>
for now, no, there will be a compat shim
<FL4SHK>
How often do you plan on doing breaking changes?
<Yehowshua>
And FL4SHK, you can pull in Record directly
<whitequark>
every 0.x release I remove features deprecated in 0.(x-1)
<FL4SHK>
oh my
<FL4SHK>
will it ever become very stable?
<whitequark>
the release cadence is... I think jfng suggested 3 months ideally? right now it's more like 6 months
<Lofty>
It *is* 0.x software
<FL4SHK>
I see
<Yehowshua>
Well, its only been around a little over a year
<FL4SHK>
nMigen?
<whitequark>
FL4SHK: year, year and a half from now, we might have 1.0
<Lofty>
Yep
<FL4SHK>
I thought it was longer than that
<whitequark>
something like that
<whitequark>
nMigen is very young
<cr1901_modern>
December 2018
<FL4SHK>
I see
<awygle>
starting to think we should put the deprecation policy in the readme
<Lofty>
Honestly, I don't think we need to rush to 1.0 anyway, but that's kinda irrelevant
<whitequark>
awygle: it will be in the docs
<awygle>
we get that question a fair bit
<whitequark>
quite prominently
<Yehowshua>
Which leads me to my next question
<FL4SHK>
Breaking changes are scary because I have old code sometimes
<Yehowshua>
About the docs
<FL4SHK>
and I can't always update it
<cr1901_modern>
>so you'd write packed_struct.rotate_left(10) but that doens't work cuz it's not a Value
<cr1901_modern>
Was this a finished thought?
<whitequark>
FL4SHK: such is the life with 0.x dependencies
<FL4SHK>
All right.
<Lofty>
FL4SHK: the old versions will still be on PyPI, so you can pin against them, I think
<whitequark>
i go to quite a bit of effort to make upgrades painless
<whitequark>
e.g. the deprecation errors tell you how to fix your code, typically
<whitequark>
*warnings
<Yehowshua>
Yup - noticed with nmigen.back.pysim -> nmigen.sim.pysim
<FL4SHK>
Any idea if very basic stuff might change?
<whitequark>
with Record specifically you could also extract it from the nmigen codebase and stuff it into your own codebase and use it indefinitely
<whitequark>
FL4SHK: not really
<FL4SHK>
That's about what I figured
<whitequark>
Record is one of the few major warts
<FL4SHK>
probably not going to have to deal with very many breaking changes on my end, then
<FL4SHK>
I bet PackedStruct will stick around
<whitequark>
the other is the build system DSL, but that's far off, and will probably have an automatic migration system
<whitequark>
PackedStruct should be the final design for that component
<FL4SHK>
I largely only need plain old Python classes and `PackedStruct`
<awygle>
whitequark: to try to finish the thought out, i suspect we could do _some_ kind of python shenanigans to ensure that as_value did memoization _itself_, thereby avoiding the problem. the question is, is that too much magic.
<FL4SHK>
What about Layout?
<FL4SHK>
will it be sticking around?
<Yehowshua>
@awg
<FL4SHK>
american wire gauge
<Yehowshua>
awgle is right
<whitequark>
we'll get to that discussion once we finish #355, ok?
<Yehowshua>
FL4SHK - ur funny
<whitequark>
let's stay on topic
<whitequark>
so
<whitequark>
19:19 < cr1901_modern> >so you'd write packed_struct.rotate_left(10) but that doens't work cuz it's not a Value
<whitequark>
this was not a finished thought, we veered way off topic
* cr1901_modern
nods
<whitequark>
what you *should* write when you realize `packed_struct.rotate_left` doesn't work, is `Value.cast(packed_struct).rotate_left(10)`
<whitequark>
but what you might *want* to write is `packed_struct.as_value().rotate_left(10)`
<whitequark>
and that doesn't detect or protect incorrect implementations of .as_value()
<lkcl>
urrr yuk
<cr1901_modern>
hrm... :(
<whitequark>
awygle is right: we'll need *some* python shenanigans there
<awygle>
whitequark: radical proposal for discussion - what if Value.cast didn't work for this?
<cr1901_modern>
import inspect
<cr1901_modern>
and inspect the call frame?
<awygle>
so that `as_value` is the canonical and only way to do this
<whitequark>
awygle: that makes the problem worse
<whitequark>
because we don't control as_value
<lkcl>
to explain that: if those are the implementations of.. say... the upgraded-Record-replacement's rotate_left function, fine
<awygle>
mm.. yes, fair.
<whitequark>
the other reason it makes the problem worse is that all of the nMigen guts use Value.cast
<whitequark>
anyway, let's see which our options to fix this are
<awygle>
i shoulda stopped talking once everybody was saying how right i was :p
<Yehowshua>
Well its either lots of shenanigans or lots of educating
<whitequark>
- we could do memoization in Value.cast and somehow detect if .as_value() is called directly
<whitequark>
- we could do memoization in .as_value() (using a decorator, probably) and then detect if .as_value() is not defined using this decorator
<cr1901_modern>
So now, if you enforce the decoration rule in Value.cast, that means that _everything_ that Value.cast takes must now be decorated? Is the current behavior for Value.cast to accept a superset of types beyond "stuff that implements as_value()"? >>
<cr1901_modern>
Basically "will the new behavior break compatibility with Value.cast as-is right now"?
<whitequark>
er, not at all
<whitequark>
it doesn't change the existing behavior in any way
<whitequark>
it adds new behavior
<whitequark>
well
<whitequark>
the way we have it worked out is actually simpler than that
<whitequark>
now, *instantiating* a ValueCastable is different
<whitequark>
say you have this code:
<whitequark>
class MyRecord(ValueCastable):
<whitequark>
def as_value(): ...
<whitequark>
if you do MyRecord() and as_value() is not decorated with @ValueCastable.memoize [preliminary name], an exception is thrown
* lkcl
apologies: need to rest. will be back (and checking irc logs)
<cr1901_modern>
ahhh
<d1b2>
<emeb> golly! hacked up a custom platform definition for my up5k board and tried the acm_serial LUNA example and it actually enumerated!
<whitequark>
okay, two hours is enough of a meeting
<whitequark>
I guess we're done for today
<jfng>
can we spend a 5mins on nmigen-soc issues ?
<cr1901_modern>
I see... ValueCastable has the memoization logic, so Value.cast() and as_value() do the same thing
<whitequark>
jfng: oh, yeah
* cr1901_modern
will go back to read-only
<_whitenotifier-b>
[nmigen] awygle commented on issue #355: [RFC] Redesign UserValue to avoid breaking code that inherits from it - https://git.io/JJC8h
* lkcl
would like to hear about nmigen-soc
<awygle>
wq lemme know if i mis-summarized or missed anything
<_whitenotifier-b>
[nmigen] whitequark commented on issue #355: [RFC] Redesign UserValue to avoid breaking code that inherits from it - https://git.io/JJC4I
<whitequark>
nope, all seems correct
<_whitenotifier-b>
[nmigen] awygle commented on issue #355: [RFC] Redesign UserValue to avoid breaking code that inherits from it - https://git.io/JJC4L
<jfng>
the question would be, can we consider it done ?
<_whitenotifier-b>
[nmigen] whitequark commented on issue #355: [RFC] Redesign UserValue to avoid breaking code that inherits from it - https://git.io/JJC4m
<jfng>
i believe most of the scaffolding needed for csr peripherals is done
<_whitenotifier-b>
[nmigen] whitequark commented on issue #355: [RFC] Redesign UserValue to avoid breaking code that inherits from it - https://git.io/JJC4c
<jfng>
one issue that has not been addressed is awygle's concern about compatibility between peripherals
<whitequark>
jfng: we decided to go for Approach A, plus a wrapper that puts the two if you use a peripheral as a "black box"
<whitequark>
right?
<awygle>
my question on #10 would be, does this design preclude eventually having a "bus-agnostic" way to describe memories (as well as registers) which could be used to write bus-agnostic peripherals? i believe this is a desirable use case
<lkcl>
awygle: and access the CSRs directly?
<awygle>
lkcl: i want a way to say "i have these control registers and this memory-map" and be able to instantiate that with a Wishbone bus, or an AXI bus, or an Avalon bus, or a custom bus, without having to change anything about the peripheral
<awygle>
which makes me nervous about the proposed `wishbone.Peripheral` mixin
<lkcl>
awygle: cool. i can see how that would be useful / desirable
<awygle>
but it's not necessarily a problem, i just want to raise it as a use case i am interested in
<whitequark>
awygle: is that actually possible, mechanically?
<whitequark>
unless your memory is something dumb, you're probably actually handling bursts yourself
<lkcl>
there's also a real-world use-case i can think of: Raptor Engineering is doing an LPC implementation.
<whitequark>
and i'm not sure if there is a way to abstract over WB and AXI bursts
<jfng>
sorry, i wasn't clear enough
<lkcl>
it's possible for LPC to "flip" - dynamically - into UART mode.
<jfng>
my question is for a peripheral with only CSRs, no memories, no WB
<awygle>
whitequark: that's a fair question, but 1) lots of memories are dumb 2) you can probably map to a lowest common denominator if performance isn't critical 3) if performance is critical you should still be able to write AXI-only peripherals
<whitequark>
awygle: do we need many kinds of dumb memory peripherals?
<jfng>
just two attributes: `csr_bus` and `periph_info` for metadata
<awygle>
i am open to learning that it's not possible or not useful, i just don't want to preclude it at this early stage
<lkcl>
as in: some CSRs *reprogram* the behaviour of the peripheral to be a completely different type of interface.
<lkcl>
sorry completely different type of peripheral
<awygle>
i dont' want an AXISramPeripheral and a WBSramPeripheral
<whitequark>
why not?
<whitequark>
does it cause problems?
<awygle>
twice the verification effort, i guess?
<whitequark>
hmm
Kekskruemel has joined #nmigen
<whitequark>
but most of the verification of a memory mapped peripheral is verification of the bus, right?
<whitequark>
like, even for a DDR controller, presumably most of it would live in stdio
<whitequark>
and be verified there
<awygle>
i think if we do have AXISramPeriph and WBSramPeriph, then most of the code in any given peripheral will be mapping from AXI and/or WB to a bus-neutral control interface (in nmigen-stdio) anyway. but again, i could be wrong about that.
<lkcl>
it makes sense to me for AXIsramPeripheral and WBSramPeripheral to be created by way of mix-ins
<awygle>
but we're drifting away from jfng's request pretty harshly
<lkcl>
and likewise {AnyOtherBus}SramPeripheral
<whitequark>
yeah
<jfng>
memories are a whole other topic yes
<awygle>
in the absence of memories i don't have any real issue with the current proposal but i don't really see the value of the wishbone.Peripheral mixin, i guess
<jfng>
my question is: do we need a mixin csr.Peripheral class ?
<jfng>
which would validate two attributes: the csr bus interface, and the peripheral metadata
<whitequark>
hmm
<jfng>
the alternative i see is pure naming conventions
<whitequark>
how would a peripheral with both CSR and Wishbone look like?
<whitequark>
inherit from wishbone.Peripheral alone? inherit from both wishbone.Peripheral and csr.Peripheral?
<awygle>
what is the use case for having both?
<lkcl>
and then because they inherit from both, they know to "talk" to each other?
<jfng>
a wrapper peripheral class, maybe with a decorator, that would implement the bridge
<jfng>
so it would inherit from wishbone.Peripheral alone
<whitequark>
jfng: can you remind me what was the outcome of the discussion of split CSR/Wishbone and unified CSR/Wishbone?
<whitequark>
ie do the peripherals with both CSR and Wishbone export a single Wishbone bus, or both CSR and Wishbone buses
<whitequark>
I recall we reached a decision but I can't remember which one it is
<whitequark>
and there was some really good reason for that decision too
<jfng>
oh no, i forgot
<awygle>
yknow what i'ma just shut up because i'm not very informed here. i've laid out my use case, i trust y'all to either support it or decide it's a bad idea.
<lkcl>
i have a vague recollection that AXI4 has CSRs separate somehow. it could just be a convention though
<lkcl>
we may actually have to use a modified version of Wishbone.
<lkcl>
(adding support for speculative read/writes)
<whitequark>
awygle: i'm going to defer that decision, i think nothing forces us to preclude it for now, so i'll keep the option open to have the kind of middleware you request
<whitequark>
but no promise that it would absolutely be the way we go
<lkcl>
anything that's "merged" would make that... difficult.
<awygle>
copy
<lkcl>
Wishbone is based on a "take-it-or-leave-it" type of contract.
<whitequark>
yeah, nmigen-soc will strictly stick to upstream Wishbone
<lkcl>
Out-of-Order designs need the "House Contract of Sale" contract. "offer, exchange, complete"
<whitequark>
jfng: okay, we need to figure that out (again)
<whitequark>
because i think it would be the key for making this decision
<whitequark>
maybe ask key2? iirc he was involved
<lkcl>
which would mean that, if it's not "separatable" (so that we can mix in alternative buses), we'd have to hard-fork nmigen-soc. or write a replacement.
<lkcl>
which would be a lot of duplicated effort.
<whitequark>
jfng: iirc, i argued for split CSR/Wishbone buses in peripherals that have their own Wishbone bus because you can always turn it into a merged one, but not the other way around
<lkcl>
to explain: the Out-of-Order design that we're doing can have up to *eight* in-flight memory read/writes simultaneously outstanding
<whitequark>
and we could have a wrapper that turns the split one into a merged one if desired
<whitequark>
on the other hand, the split design can have somewhat lower resource consumption
<lkcl>
where normal Wishbone it expects one and only one bus read/write at one time, and for stalling to propagate back to the main core.
<lkcl>
whitequark: indeed (wrapper makes split -> merged but not possible the other way)
<whitequark>
jfng: on the other hand, i think key2's counterargument was that it is necessary to ensure synchronization between CSR writes and memory writes
<whitequark>
so the split design isn't actually entirely viabel
<whitequark>
*viable
<jfng>
i found the logs (22/03), and a split csr/wb interface + an easy to use wrapper was indeed the conclusion
<whitequark>
hmm
<lkcl>
is there a log somewhere of key2's counterargument?
<whitequark>
it was in private communication
<lkcl>
ahh ok
<lkcl>
whoops
<lkcl>
what was his concern?
<whitequark>
if you have different latencies on CSR and WB/AXI interfaces you may have a bad time
<whitequark>
eg if you flush a FIFO
<lkcl>
that if you split things, you have to make a synchronisation protocol (in effect, something pretty similar to wishbone stb/ack)?
<whitequark>
once you command this through CSR, you want to know that the FIFO is indeed flushed
<lkcl>
yes. this i call the "take-it-or-leave-it" protocol :)
<lkcl>
and a FIFO, interestingly, interferes with that... and requires the "Contract of Sale" style API.
<lkcl>
funny.
<jfng>
if we take a splitted bus approach, and use mixins for peripherals, then it would be very tempting to inherit from two (wb,csr) mixins
<jfng>
but that would not work, i thinlk
<whitequark>
jfng: why not?
<jfng>
assuming each mixin must provide a `periph_info` attribute
<whitequark>
right, that was exactly my concern
<lkcl>
h
<jfng>
there must be a single point of truth `periph_info` attribute for the whole peripheral, but this assumes that its memory layout is hierarchical
<lkcl>
hhmmm i "get" key2's concern about synchronisation. it really does mean that some sort of ready/valid/busy/ack signalling is needed on CSRs.
<lkcl>
the use of e.g. Wishbone (or AXI4) *masks* that need
<jfng>
this signaling you need can be done by bridging your csrs behind a WB4 bus
<lkcl>
because normally (i.e. in the merged design), the use *of* the Bus - which has that ready/valid/busy/ack protocol built-in - *provides* the very protocol needed so that delays can..
<lkcl>
jfng: i am kinda advocating that the protocol used to communicate between split buses *is* wishbone :)
<lkcl>
even when say AXI4 is used
<lkcl>
because it contains the exact ready/valid/busy/ack communications protocol needed for managing (say) FIFO-based CSRs.
<whitequark>
lkcl: i'm pretty sure you actually want AXI4
<whitequark>
because that has out-of-order transactions
<lkcl>
whitequark: well... *thinks*...
<whitequark>
the reason nmigen-soc bothers with Wishbone at all is that a lot of existing cores and designs use it, and people are familiar with it
<whitequark>
WB4 isn't all that good, and WB itself is essentially a legacy bus at this point
<lkcl>
yeah... it's not sophisticated, that's for sure.
<lkcl>
to clarify context: i'm referring to a protocol used to communicate between split peripheral option
<lkcl>
as an *internal* protocol
<whitequark>
jfng: so i think the periph_info issue is fixable, but we should decide something about synchronization first
<whitequark>
what do you think about this?
<lkcl>
in the "merged" design you don't see the problem because the Bus provides the very protocol needed to ensure that FIFO-based CSRs get correctly updated
<lkcl>
acknowledgement comes back a few cycles later (when the FIFO is flushed)
<lkcl>
if all CSRs were single-cycle update, there would not be a problem
<lkcl>
am i making sense? :)
<jfng>
could we provide some metadata about bus latency ?
<whitequark>
jfng: how would a CPU core use it?
<whitequark>
wait states?
* lkcl
yup. tired. leave you to it to discuss, will check the logs. thank you to you both (and everyone)
<whitequark>
lkcl: CSRs can't be all single cycle update because their width is unlimited
<whitequark>
jfng: tbh, i am tired too, what do you think about discussing this later this week, or having another meeting next monday?
<whitequark>
so on 27th rather than 3rd
<jfng>
np, i need to go home too
<whitequark>
we could spend 1st and 3rd monday questions in general and 2nd/4th monday just between us implementers :)
<awygle>
oh real quick, happy to implement 355 but not sure what my schedule is like for the next <undetermined>, so don't let me hold up 0.3 if i don't make it to it is all
<whitequark>
mm okay
<whitequark>
it's not a huge change, so worst case I can just make it myself
<whitequark>
we'll have 0.3.rc1 first
<awygle>
mhm
Asu has quit [Remote host closed the connection]
Kekskruemel has quit [Quit: Leaving]
jeanthom has quit [Ping timeout: 264 seconds]
<Degi>
Is Record([("abc", 1),("def", 1)]) the wrong syntax for making a record with 2 subsignals? Since .def gives an invalid syntax error
<whitequark>
`def` is a Python keyword
<whitequark>
you can use getattr() to access that field
<whitequark>
it's just a bad placeholder name :)
<Degi>
Oh indeed...
<Degi>
Heh yeah I've noticed xD
<_whitenotifier-b>
[nmigen/nmigen-soc] whitequark pushed 1 commit to master [+0/-0/±6] https://git.io/JJCuo
<_whitenotifier-b>
[nmigen/nmigen-soc] rroohhh c754caf - test: make nmigen 0.3+ compatible