<lkcl>
whitequark: " either full or with no overlapping cases" yes agreed
<lkcl>
we have an auto-generated switch statement read from CSV files, for the POWER9 decoder.
<lkcl>
the case statements look like "0111000---110" in some cases
<lkcl>
these are placed early and we expect them to match first, in order
<lkcl>
with over 350 entries in total it would be absolute hell to "organise" them to not overlap
<whitequark>
instruction decoders are some of the most compelling use cases of the Switch() statement as it currently works, yep
<awygle>
I'm sure we talked about this quite a while ago but I once again find myself wishing for a gateware test coverage tool/metric
<whitequark>
i think those exist, in general
<whitequark>
i think yosys can't do coverage?
<whitequark>
oh, hm
<whitequark>
awygle: try the `supercover` pass
<whitequark>
that's super sketchy but it's a start i think
<lkcl>
oh, while i remember: "1---" "-1--" "--1-" "---1" does not get actually turned into a pmux, in the nmigen pmux example. i checked the yosys diagram. it was however... 8 months ago.
<lkcl>
i will check again in the morning with latest version.
<awygle>
whitequark: interesting, will try eventually
<whitequark>
lkcl: the example shows how to implement the high-level concept of a priority mux. it is not demonstrating translation to any specific technology primitives
<whitequark>
it cannot, since nmigen doesn't mandate a specific synthesizer, anyway
Degi has quit [Ping timeout: 240 seconds]
Degi has joined #nmigen
jaseg has quit [Ping timeout: 260 seconds]
jaseg has joined #nmigen
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #nmigen
lkcl_ has joined #nmigen
PyroPeter_ has joined #nmigen
lkcl has quit [Ping timeout: 258 seconds]
PyroPeter has quit [Ping timeout: 256 seconds]
PyroPeter_ is now known as PyroPeter
lkcl__ has joined #nmigen
lkcl_ has quit [Ping timeout: 240 seconds]
jeanthom has joined #nmigen
_whitelogger has joined #nmigen
_whitelogger has joined #nmigen
<lkcl__>
whitequark: ahh ok.
jeanthom has quit [Ping timeout: 256 seconds]
<daveshah>
awygle: there is the mcy mutation coverage tool
<jeanthom>
whitequark, should I add more tests for RoundRobin?
<DaKnig>
shizoor: does it jump around in reality too or is that a camera artifact
<d1b2>
<Shiz> huh what
<DaKnig>
if it really does jump around that much you probably nned to fix the timing of the counters
<d1b2>
<Shiz> DaKnig: you might want to reconsider your bot highlight code :p
<DaKnig>
the hsync and vsync
<DaKnig>
@Shiz: bot highlight code?
<d1b2>
<Shiz> whatever bot you're using to connect to Discord
<DaKnig>
I tried to mention the person two messages above me
<d1b2>
<Shiz> it highlights based on shortest match instead of exact match
<DaKnig>
I am connected to IRC
<DaKnig>
you are the one using the bot :)
<DaKnig>
its a bridge
<d1b2>
<Shiz> so it mentioned me instead of shizoor
<d1b2>
<Shiz> cc'ing @esden for the bridge then :p
<DaKnig>
I am not the one who connected a Discord channel to the IRC channel
<d1b2>
<Shiz> neither am i
<DaKnig>
should imo only ping the discord side when prefixed with @
<DaKnig>
thats how it works in some other place
<DaKnig>
works great, btw.
kbeckmann has quit [Quit: WeeChat 2.7]
jeanthom has quit [Quit: Leaving]
jeanthom has joined #nmigen
<d1b2>
<Torsten> the issue is that one name matches the first letters of the other name and the bot decided to ping the shorter match
shizuo has joined #nmigen
emeb has joined #nmigen
<shizuo>
<DaKnig> shizoor: does it jump around in reality too or is that a camera artifact. Sorry for answering a question a few hours later. Yes it jumps around like that in reality, that was the idea. I had it produce a static image, decided that was a bit boring then started to add some weird effects.
shizuo is now known as shizoor
alexhw has quit [Remote host closed the connection]
alexhw has joined #nmigen
<shizoor>
<DaKnig> My friend wanted me to do "broken VCR" effect, I added horizontal synch slip as well as one dependent on what vertical line it's on and some noise to make it look like a freeze frame on a bad VHS recorder. The other one I just wanted it to zoom out, but when it zoomed past a certain point it started doing some interesting things so you'd see the same image overlap on itself so I just let it do its thing.
proteus-guy has joined #nmigen
<shizoor>
Thinking about what to do next, possibly a simple game of some sort, like "breakout".
kbeckmann has joined #nmigen
<awygle>
daveshah: that's awesome, thanks for the link!
<tpw_rules>
i did a pure logic pong game on my fpga for a class
<tpw_rules>
ironically it was for microprocessors class and the key point was it had no microprocessors
chipmuenk has quit [Ping timeout: 244 seconds]
chipmuenk has joined #nmigen
hitomi2505 has quit [Quit: Nettalk6 - www.ntalk.de]
<shizoor>
Yeah, that's why I was considering breakout, possible with no CPU. :D
<shizoor>
Or I could stop being a slacker and get cracking with the difficult stuff.
<tpw_rules>
i learned a lot doing mine
<shizoor>
I'm enjoying it, learning loads about FPGA and Python as well.
<shizoor>
I've got the DAC and ADC boards too, for a different board but I can wire it up manually, I'm thinking could one of these perform well doing DSP functions? Could it work well as a guitar pedal? It's around the right size. :)
<hell__>
tpw_rules: how did you "see" the pong game?
<tpw_rules>
VGA
<shizoor>
@tpw_rules Hats off man. Well done.
FFY00 has quit [Remote host closed the connection]
FFY00 has joined #nmigen
<DaKnig>
anythin is possible without a CPU :)
<DaKnig>
a CPU running an (offline) app is a state machine using too many states . just write the state machine directly, bam, you get the game :)
chipmuenk has quit [Quit: chipmuenk]
shizuo has joined #nmigen
shizoor has quit [Ping timeout: 256 seconds]
shizuo is now known as shizoor
<shizoor>
<Daknig>CPU free it is. :D
<DaKnig>
to be clear, programming games at some point becomes much easier in software even with shitty CPU
<shizoor>
<DaKnig> Don't worry, I get you, I did the bitmap scale thing so I could have graphics getting bigger and smaller, I'm planning on bringing in a CPU for later stuff. Doing something big without a CPU would be insanity.
<FL4SHK>
man, this kind of sucks
<FL4SHK>
I got into the Binutils mailing list
<FL4SHK>
because of these autoconf errors I was getting
<FL4SHK>
*autotools
<FL4SHK>
but it seems the person who answered me didn't really understand the problem I was having?
<FL4SHK>
guess I'd better just keep waiting
<FL4SHK>
can you do `array.eq`?
<DaKnig>
you cant
<FL4SHK>
er
<FL4SHK>
why can't you do it?
<FL4SHK>
that's the better question
<DaKnig>
eq is not implemented for Array
<FL4SHK>
that's not what I'm getting at
<FL4SHK>
I'm asking why it wasn't implemented for Array
<DaKnig>
I guess it would be hard to figure out for non homogenous arrays
<FL4SHK>
VHDL lets you do it
<DaKnig>
VHDL only ever has homogenous arrays
<DaKnig>
I mean an array where all members are the same type
<FL4SHK>
I get that
<FL4SHK>
but here's why I think it's not going to be a problem
<FL4SHK>
Array is useful if you want to use a Signal as an index into it
<FL4SHK>
i.e. it does stuff a Python list won't do
<FL4SHK>
if you can index into a heterogenerous Array, you *should* be able to use `.eq`
<FL4SHK>
it shouldn't be hard as long as both Array's are formatted the same
<FL4SHK>
`Array` isn't a block RAM
<DaKnig>
you can technically word_select
<DaKnig>
tho I agree that's not as nice
<FL4SHK>
honestly I think indexing into a heterogenerous array would be *harder* than an assignment
<DaKnig>
I ... dont think it matters
<DaKnig>
indexing is a mux anyways
<FL4SHK>
because, for indexing, you have to implement some kind of decoder
<FL4SHK>
this is more difficult than `.eq`
<FL4SHK>
slightly
<awygle>
you can do something like `map(lambda pair: pair[0].eq(pair[1]), zip(arr1, arr2))`
<awygle>
`Array`s are not really intended to be packed structs though
<FL4SHK>
packed structs don't really support indexing
<awygle>
(i asked a very similar question last night or the night before)
<FL4SHK>
a whole array assignment exists in VHDL
<vup>
as already mentioned you can do `Cat(*array).eq`
<FL4SHK>
oh
<awygle>
oh that's a good idea
<awygle>
much better than my complicated mess :p
shizoor has quit [Ping timeout: 256 seconds]
<lkcl__>
awygle: do make sure though that the sub-things you're assigning to on each side are the same size!
<lkcl__>
if you have a.... (non-homogenous?) Array then you better have them matching in size
<lkcl__>
Cat effectively flattens everything to "as if it was a sequence of bits"
<lkcl__>
i like Cat :)
<lkcl__>
i particularly like grabbing bits from a list of signals or a signal, in a loop, adding them to a list, and then at the *end* of the loop, use Cat(*accumulator_list).eq(something)
<vup>
yes the `Cat(onething).eq(otherthing)` pattern is something where a "assignment of different sizes" lint would be especially helpful I think
<awygle>
i like the `Cat(*arr).any()` solution to my problem from the other day
<awygle>
width irrelevant
<DaKnig>
yeah Cat is problematic here
<DaKnig>
for example if you wanna have implicit overflow
<DaKnig>
it wont overflow, it would just move whatever's next by a bit
<_whitenotifier-b>
[nmigen-boards] TiltMeSenpai opened pull request #91: Convert OrangeCrab r0.2 ADC sense to diffpair - https://git.io/JJ0pX
<DaKnig>
can I test verilog modules via nmigen pysim (or w/e it is called) by importing it as a sumbodule?
<Lofty>
DaKnig: no, because pysim works on nMigen fragments, and Verilog is not a fragment
<Lofty>
But you should be able to do it with cxxsim
<DaKnig>
I still dont get what a fragment is
<DaKnig>
ok
<lkcl__>
DaKnig: yes that's my point. there, a simple for-loop would solve that.
<lkcl__>
DaKnig: are you familiar with compilers, at all?
<lkcl__>
and what an "Abstract Syntax Tree" is?
<DaKnig>
so would it just refuse the simulation?
<DaKnig>
lkcl__: I actually started (and paused) a standard C compiler project
<DaKnig>
for the GB
<DaKnig>
I did get the AST stage :)
<DaKnig>
so yeah ik what this is
<lkcl__>
DaKnig: cool! ok, so a "fragment" is a nmigen AST instance-tree
<DaKnig>
and it cant get this from reading a netlist or whatever is produced by yosys from verilog?
<DaKnig>
ok. if thats the case...
<ktemkin>
nMigen places a number of constraints on what it can construct / represent internally in order to remove a lot of common gateware footguns
<ktemkin>
in particular, the way clocking is handled (e.g. statements belonging to single clock domains)
<ktemkin>
pysim can only operate on designs that explicitly follow those constraints
<lkcl__>
DaKnig: correct. it's a python data structure, created in python, by nmigen, that has nothing to do with verilog or yosys
<lkcl__>
whenever you do "comb += x.eq(5)"
<lkcl__>
that creates - in-memory - an Abstract Syntax Tree "Assign(x, 5)"
<_whitenotifier-b>
[nmigen-boards] TiltMeSenpai synchronize pull request #91: Convert OrangeCrab r0.2 ADC sense to diffpair - https://git.io/JJ0pX
<lkcl__>
as ktemkin points out, nmigen AST "fragments" are data structures specific to nmigen.
<DaKnig>
makes total sense.
<lkcl__>
you _can_ do Instance("verilog_filename.v")
<lkcl__>
but it's entirely opaque to pysim.
<DaKnig>
so pysim would refuse to run it?
<lkcl__>
because pysim does *not* know how to simulate verilog
<whitequark>
cxxsim is a simulator conceptually similar to pysim, but instead of translating nmigen fragments to python, it translates yosys internal representation to c++
<lkcl__>
annotate! that's it. cocotb compiles *annotated* verilog/vhdl where the output from the compiled program can be "understood" - and also interacted with - via a unix pipe, back in "python-land"
<lkcl__>
cxxsim is very cool.
<lkcl__>
daveshah i think mentioned he'd already tried compiling both verilog and vhdl into cxxrtl and done mixed language simulations that way
<DaKnig>
so cxxsim allows cosimulation with C++ instead of python?
<awygle>
ohaiyo
<awygle>
meeting o'clock
* lkcl__
waves to awygle
<lkcl__>
awygle: it is? oh, changed to every monday (irc title)
<awygle>
whitequark: does it make sense to add IL->python support to pysim? (probably not huh)
<_whitenotifier-b>
[nmigen-boards] TiltMeSenpai synchronize pull request #91: Convert OrangeCrab r0.2 ADC sense to diffpair - https://git.io/JJ0pX
<d1b2>
<TiltMeSenpai> crap, that's my bad
<d1b2>
<TiltMeSenpai> I was trying to see if the nmigen dep was the issue, ended up leaking into the pr by accident
<whitequark>
DaKnig: yep
<awygle>
you'd need a pyrtl which seems like a lot of work for marginal gains
<ktemkin>
ohey, meeting
<whitequark>
awygle: yeah, it'd add a lot of moving parts
<whitequark>
anyway, meeting!
<whitequark>
let me look at the notes from the previous one
<awygle>
for my part i'd like to get a judgement on the order of #355 (basically just answering my question on the PR) and discuss #381
<whitequark>
let's start with that
<_whitenotifier-b>
[nmigen] whitequark edited a comment on issue #355: [RFC] Redesign UserValue to avoid breaking code that inherits from it - https://git.io/JJC4c
<awygle>
ok so i wrote a PR (#449) which implements ValueCastable, but it _only_ does that
<awygle>
i thought it would make sense to review the three parts (impl ValueCastable, deprecate UserValue, migrate internal usage) separately
<whitequark>
first, it uses __memoized__, which is technically UB in python, since all of __dunder__ names are reserved
<whitequark>
you can't just claim them
<awygle>
oh, bummer
<whitequark>
I think you can actually look at the properties of the method object
<whitequark>
hmm
<whitequark>
let me check this, one sec
<awygle>
stackoverflow suggested i would need to mark it with a field like that, but ... who knows if stackoverflow is right :p
<whitequark>
that's one way to do it, i guess
<lkcl__>
awygle: if you ask it as an actual question (and explain why it's not a duplicate) it attracts surprisingly knowledgeable and altruistic people
<whitequark>
okay, so that's one problem, we can decide what to do here later, as it's just an implementation concern
<awygle>
yup
<whitequark>
a more important part is a design issue
<whitequark>
I'm not sure if I want to expose the name `memoize`
<whitequark>
I don't want people to use it generically, and I don't want it to be in __all__
<whitequark>
I was thinking that perhaps it could be a class method on ValueCastable, but that's also bad!
<whitequark>
since class methods pollute the namespace of whatever derives from ValueCastable
<awygle>
hm, ok. i did debate with myself about the name, figured i'd hear feedback if you didn't like it, so that's no surprise.
<awygle>
yeah exactly. it has to be available _somewhere_
<lkcl__>
we did discuss that this is "pretty similar" to Elaboratable
<awygle>
65322222222222222
<awygle>
.... thank you for your input, cat.
<whitequark>
hi awygle's cat
<lkcl__>
what's different that... haha
<lkcl__>
what's different that means the pattern used for Elaboratable cannot be followed?
<awygle>
that first link is exactly what i did, eys
<lkcl__>
we're looking to see if a function is decorated, is that right?
<whitequark>
no, that's a minor issue i'm not concerned about right now
<whitequark>
we're discussing where the decorator should live
<lkcl__>
whitequark: ah ok
<lkcl__>
ok. sorry, keep up at the back :)
<awygle>
whitequark: how about calling it `@lower_method` or something, and making it a class or static method on ValueCastable
<awygle>
i don't think taking two symbols from the ValueCastable namespace is the end of the world, and since it's mandatory it's not like it'll surprise the user
<whitequark>
awygle: it's not catastrophic, but it makes me wonder whether the previous decision was correct, after all
<awygle>
alternately, we keep it where it is and just change the name
<whitequark>
what i initially thought about is that it would be the lower friction option
<whitequark>
but maybe it isn't?
<awygle>
which decision?
<whitequark>
about making it a mandatory decorator
<whitequark>
let me think about it for a bit
<awygle>
oh. i still think that's a better idea than relying on a docstring, personally. but sure, consider and let me know. i'll change the double-underscore thing in the meantime.
<whitequark>
oh, i don't mean a docstring
<whitequark>
i am looking into other ways to do this automatically
<awygle>
ah
* lkcl__
has a feature-request to discuss (at an appropriate non-interrupting moment)
<whitequark>
awygle: okay, i see two options
<whitequark>
okay, three
<whitequark>
1. a custom metaclass that lets us wrap the creation of derived classes. we already rejected it, nothing to rehash
<whitequark>
i don't like (2) for many of the same reasons i don't like (1). it's complicated and magical, hard to understand, hard to use correctly
<awygle>
and conflicts with reasonable derived-class use cases
<whitequark>
plus objects deriving from ValueCastable might be something simple, int-like for example, so the performance hit of __getattribute__ is not that welcome
<whitequark>
conflicts how?
<whitequark>
it becomes a part of our public interface (you have to call super().__getattribute__ in your own subclasses), but i think it doesn't conflict per se
<whitequark>
ok. i agree with you then. we should just rename it to something better and deal
<awygle>
you don't want to move it into being a class or static method?
<lkcl__>
um... we're referring to a decorator, right? is there any reason why that decorator should not be a static member function of UserValue itself?
<lkcl__>
so it would be used as @UserValue.lower_method()
<whitequark>
awygle: sorry, unclear
<awygle>
because it pollutes the namespace of derived classes of s/UserValue/ValueCastable/
<whitequark>
we should rename it and move it into ValueCastable
<awygle>
whitequark: copy. name bikesheds?
<whitequark>
let's bikeshed!
<lkcl__>
ok got it.
<lkcl__>
:)
<awygle>
lowering
<awygle>
`@lowering` rather
<whitequark>
hmm@ValueCastable.
<whitequark>
er
<awygle>
... `@ValueCastable.lowering` rather.
<whitequark>
hmm, you have to prefix it with @ValueCastable
<whitequark>
yeah
<whitequark>
superclass members aren't in the class namespace when you're defining them
<whitequark>
(aside: class namespaces are goddamn weird)
<awygle>
hard agree
* lkcl__
has done mro walking... mwaaaaa
<whitequark>
(i think you can use a defaultdict as a class namespace? i wonder how that would work)
<whitequark>
aaanyway
<whitequark>
@ValueCastable.value_lowerer? or lkcl's .lower_method?
<whitequark>
.lower_method seems a bit more idiomatic
<whitequark>
(or should it be .lowermethod? damn you python)
<_whitenotifier-b>
[nmigen-boards] TiltMeSenpai synchronize pull request #91: Convert OrangeCrab r0.2 ADC sense to diffpair - https://git.io/JJ0pX
<awygle>
I think lowermethod
<whitequark>
it's inconsistent whichever way we go
<awygle>
It's wordier but that ship has sailed with ValueCastable already lol
<whitequark>
but i think we should go with .lowermethod to go with .abstractmethod and friends
<lkcl__>
oh: i simply used that because i couldn't remember what it was called :)
<awygle>
Yep, reasonable. Works for me
<whitequark>
cool. so for that PR: memoize->lower_method, put it into ValueCastable, get rid of __memoized__
<whitequark>
I'd probably just use a private attribute in that case
<whitequark>
it'll expand into _ValueCastable__memoized which is just fine
<d1b2>
<TiltMeSenpai> witequark: hold on the nmigen-boards PR for a second, I'm being braindead. Let me test and get it working then I'll ping you on the issue or something. Sorry
<_whitenotifier-b>
[nmigen-boards] whitequark closed pull request #91: Convert OrangeCrab r0.2 ADC sense to diffpair - https://git.io/JJ0pX
<_whitenotifier-b>
[nmigen/nmigen-boards] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/JJEei
<_whitenotifier-b>
[nmigen/nmigen-boards] TiltMeSenpai 1a23a4d - orangecrab_r0_2: convert sense to diffpair
<whitequark>
TiltMeSenpai: oops
<whitequark>
so, github ui has a [✔] Draft chekcbox for such cases
<whitequark>
i think i wouldn't even have the button then
<whitequark>
lkcl__: what was your feature request?
<d1b2>
<TiltMeSenpai> ah, didn't see that. I was fighting with the github UI in either case, as the CI commit leaking showed 😦
<whitequark>
yeah :/
<lkcl__>
oh. more of a "how do i do this and if not _then_ a feature-request"
<lkcl__>
i've run the ghdl simulator, and the fact that it outputted "reports" at the appropriate time was reaaally useful
<awygle>
whitequark: "i'd probably just use a private attribute" is that just "a name that starts with _"?
<lkcl__>
the only equivalent i've found with nmigen is a rather painful tree-walk, from the top level, in a simulation unit test.
<whitequark>
awygle: no, sorry, confusing terminology
<lkcl__>
so the question is: "is it possible to do the equivalent of VHDL report" in nmigen (pysim / cxxsim)
<whitequark>
so python has _attrs, which are just a hint to whoever is reading the code
<whitequark>
python also has __attrs (note underscores at the beginning), which add an additional level of namespacing
<awygle>
ah, ok. so double underscore prefix but not suffix (because reserved)
<lkcl__>
and if not, is there a compelling reason to add - or not add - the same capability?
<whitequark>
specifically, __attr expands into _Namespace__attr lexically (I think it's actually done in the lexer), where "Namespace" is the innermost class
<whitequark>
*innermost class name
<whitequark>
or something like that
<whitequark>
this still doesn't provide any actual *visibility* or *access* control
<whitequark>
it's called "private attribute" because python is bad at naming, it should be called something like "namespaced attribute"
<whitequark>
it's useful when you want to do something in e.g. a mixin that is just your implementation details that you want to not conflict with anything else in the class
<whitequark>
anyone can still read or write the expanded name if they want
jeanthom-mobile has quit [Read error: Connection reset by peer]
jeanthom has joined #nmigen
jeanthom-mobile has joined #nmigen
<whitequark>
lkcl__: what is a "VHDL report"?
<lkcl__>
b_i := unsigned(b);
<lkcl__>
report "a_mf_b a=" & to_hstring(a) &
<whitequark>
ohh, does that just print to the terminal?
<lkcl__>
there's an equivalent in verilog, i forget what it is. basically, at an appropriately-triggered point in a simulation, a... yes
<whitequark>
namely, the fact that there *is* a decent mental model for conversion operators that do not explicitly specify zero/sign extension
<whitequark>
i think that is an insight important enough that we should reconsider the entire feature in its light
<lkcl__>
jacob's pretty damn valuable.
<lkcl__>
ok so what he's saying is: <Shape> defines both the size and the sign/unsigned
<lkcl__>
Value you know the shape
<lkcl__>
therefore that is sufficient information to determine - unambiguously - with one function - what needs to be done?
<awygle>
i get what programmerjake is saying but personally i want to always request sign extension unambiguously in an HDL
<awygle>
my mental model is that all Values are just wires, and so unsigned extension is "normal" and sign extension is "abnormal"
<lkcl__>
awygle: well. that _is_ what Value.convert_to(signed(8)) would in effect be doing
<whitequark>
awygle: i'm also not entirely sure i want .as() even if we can unambiguously define it
<whitequark>
but that's not the assumption the previous discussion was built upon, which is to say, previously, i (and most others) thought that there *isn't* an unambiguous way to define .as()
<awygle>
i recognize that this model isn't universal, which is why i'm not saying "default to unsigned and make signed a method" or anything like it, but i would like to separate out the methods (As described in the link comment) regardless
<whitequark>
i think we should reopen the discussion with that in mind and do another round of comments
<whitequark>
it's entirely possible that we'll end up with .sign_extend() anyway, but i'd like us to make an informed choice
<awygle>
OK, fair enough. i was having the discussion from a totally different angle (i always assumed one could define .as() unambiguously, i just didn't like it), so it doesn't change my position at all, but by all means let's let everybody weigh in again
<lkcl__>
awygle, whitequark: would it be reasonable to create (one-liner) wrapper functions that call convert_to()?
<whitequark>
awygle: i started out thinking that we can define it unambiguously, but wasn't sure exactly how
<lkcl__>
nmigen.util (somewhere)
<lkcl__>
def as_signed(x, width):
<whitequark>
and then we tried to define it, and turned out basically no one agrees on what they expect it to do, because there was no clear, easily communicable model
<awygle>
whitequark: same, but because i didn't like it, i never did the shovel work which lead you to being uncertain.
<whitequark>
ah
<whitequark>
do you think you can do the RFC v2 issue?
<awygle>
sure. logistically how would you like to handle it? new issue or new comment?
<whitequark>
new issue, the old one is already hard to follow enough
<whitequark>
hence, the v2
<awygle>
yup, gotcha
<whitequark>
the options are "my comment", ".as(<Shape>) preserving value modulo output size" and ".extend(<Shape>)/.truncate(<Shape>) preserving value modulo output size"
<whitequark>
fwiw, .extend(signed(8)) seems like something you'd be happy with
<whitequark>
hmm
<whitequark>
ah but that sometimes does sign extend and sometimes zero extend
<whitequark>
still, let's discuss it. i have a feeling we'll end up with the former choice anyhow
<whitequark>
but it'd be good to hash out semantics for the latter two so we're really certain they're bad
<whitequark>
interesting edge case: s = Signal(unsigned(8)), s.extend(signed(8))
<whitequark>
what should this do?
<lkcl__>
indeed, it's one of the disadvantages of <Value>.convert_to(<Shape>) that it does in fact auto-detect based on the input parameter
<whitequark>
awygle: i think in the RFC we should make an emphasis on working out the precise semantics for the last two options. not necessarily something you have to do, just something to keep in mind when writing
<whitequark>
i will do it if others won't
<awygle>
sounds good
<lkcl__>
in that regard i tend to agree with awygle that explicit conversion (and the opportunity for raising exceptions) would help avoid silly mistakes
<whitequark>
there's a bunch of edge cases that every conversion operator design should handle
<awygle>
i will write it up before the next meeting
<whitequark>
awygle: thanks!
<whitequark>
mkay, anything else?
<whitequark>
jfng?
<lkcl__>
who was it last time we were discussing... nmigen-soc?
<lkcl__>
and it got right to the end of 2 hours
<jfng>
maybe we could continue where we left off wrt nmigen-soc #10 ?
<lkcl__>
ah :)
<lkcl__>
jeanthom: sorry, faulty-memory, it was jfng :)
<jfng>
apologies for not having having written what we started to conclude, so maybe i can sum it off quickly
<whitequark>
yes, let's. there was also PackedStruct discussed previous Monday, but that can be done at the next meeting (since it is very much not an implementer thing)
<jfng>
the problem was how to handle peripherals with separate bus interfaces
<jfng>
they come with two problems:
<jfng>
1. synchronization between transactions is not guaranteed, e.g. there could be a buffer on the upstream wb4 bus, and nothing on the upstream csr bus
<jfng>
and the peripheral needs a way to cope with that
<jfng>
2. peripheral metadata currently assumes a single hierarchical memory layout for the peripheral
<jfng>
for 1. i think i have a solution
<whitequark>
listening
<jfng>
we could add an optional `rdy` signal to the csr bus interface
<jfng>
the csr->wb bridge could use it to hold off incoming transactions
<awygle>
i'm gonna mostly shut up here, because i don't really understand nmigen-soc as it currently exists, but i did just want to mention that i don't see why #1 is the peripheral's problem as opposed to the interconnect's
<lkcl__>
awygle: the (long, when tired) insight i came up with last time was that if the CSR reading/writing is merged with the Wishbone management, you don't really notice that the CSRs do, sometimes, need multiple cycles to read/write
<d1b2>
<emeb> Is it safe to say that LUNA uses its own fork of nmigen that's not compatible with mainline?
<whitequark>
emeb: not that I know?
<ktemkin>
it just uses master
<whitequark>
I'd be pretty surprised if anyone actually used an incompatible fork of nmigen
<lkcl__>
and that if you split the Wishbone management (the interface) into a separate class from the CSR management, you *still* need that ready/valid signalling... on the *CSRs*
<whitequark>
jfng: I think the `rdy` signal won't help
<jfng>
awygle: the peripheral could have been designed to e.g. store some data in a memory, and then do something with it when a given CSR is written to
<whitequark>
specifically in the case that CSR latency is higher than WB latency
<d1b2>
<emeb> when I installed LUNA it put an .egg-link file in my site-packages to its own version. when I uninstalled that and installed master my luna stuff doesn't work anymore.
<lkcl__>
when the two roles are merged into one class, that's "masked"
<d1b2>
<emeb> error is ModuleNotFoundError: No module named 'nmigen._unused'
<awygle>
jfng: i do understand that, i just think that it's a requirement of the core that the interconnect should have to satisfy
<lkcl__>
whitequark: i would expect that in the joining of the two (we're assuming a split separate Wishbone-class and CSR-class?) the Wishbone class would need to "understand" the (proposed) CSR rdy capability
<lkcl__>
and wait for it, basically.
<ktemkin>
emeb: make sure you grabbed the right nmigen
<lkcl__>
however i could be misunderstanding
<jeanthom>
emeb: pip show nmigen?
<jfng>
whitequark: i assume that we can do the same thing in the other way, if the peripheral uses the `rty` WB signal
<lkcl__>
my feeling is - ironically - that the protocol actually needed *might need to be close to wishbone itself*
<lkcl__>
i.e. the full stb/ack/we/cyc shebang
<lkcl__>
rather than just "rdy"
<lkcl__>
we is definitely needed: it's necessary for the WB-Bus-Interface instance to tell the CSR instance "this is a read rather than a write request"
<emeb>
yes those. just did pip3 install --user --upgrade nmigen
<whitequark>
hm
<whitequark>
what does `pip3 show nmigen` show exactly?
<jfng>
whitequark: i don't like it too, because even if we can pull it off with WB4, i don't know if every bus we may be interested in has support for "retry later" signals
<lkcl__>
change of topic: we have had people run into difficulties when installing e.g. nmigen-boards and nmigen. if you get those in the wrong order, pip3 install nmigen-boards actually pulls in a random version of nmigen off the internet, installs *that*, and then you *can't* upgrade nmigen due to version conflicts
<whitequark>
lkcl__: yes. this will continue to happen until i release nmigen 0.3
<whitequark>
jfng: i feel like we're going down the wrong road
<awygle>
jfng: what does litex do here
<lkcl__>
jfng: my feeling is, it's not so much whether other buses *want* "retry" signals, it's that if they want beyond a single-cycle (or same-cycle) read/write of CSRs, they have to have... *something*
<jfng>
but i haven't found an alternative, so perhaps awygle has a point, and the peripheral should assume the interconnect is well behaved
<lkcl__>
whitequark: ah ok.
<lkcl__>
it really doesn't "feel right" that just to split into two (Wishbone Interface, CSR instance) you need pretty much the exact same protocol *as* Wishbone.
<awygle>
tbh i'm not sold on CSR as being useful in its current form, but LiteX seems to use it with success, so they seem like the right prior art to draw on here
<whitequark>
awygle: why do you think it's not useful?
<lkcl__>
just to manage the CSRs, that is. that's the bit that's weird, and conflicts with the requirement that people ask for alternative buses
<lkcl__>
it feels like it _should_ be a lot simpler... but unfortunately isn't.
<awygle>
i think i don't really understand the use case. it seems like the CSR bus only exists to save a very minimal amount of logic per peripheral while complicating the interconnect with a bunch of bridges
<awygle>
that's probably not true, which is why i say "i think i don't understand the use case"
<lkcl__>
awygle: hmmm you've got a point
<whitequark>
awygle: oh, the CSR *bus*
<whitequark>
not the rest of CSR infrastructure
<whitequark>
well
<awygle>
yeah. i mean "a CSR" is just... a register. right?
<lkcl__>
i find it very difficult to understand how to use the CSRs - how to relate them to the addresses, due to the abstraction.
<whitequark>
let me rephrase again
<lkcl__>
but that could just be down to lack of documentation
<lkcl__>
and suitable examples
<whitequark>
awygle: do I understand it right that the part of the CSR infra you don't see as very useful is the ability to use a single CSR bus that goes to the CPU, bypassing the normal wishbone stuff?
<whitequark>
because there are many other parts to it, such as tracking register addresses, making sure registers of arbitrary size can be accessed race free regardless of bus width, and so on
<lkcl__>
for me, i think it's beyond even that, whitequark. here's some really obvious code:
<whitequark>
lkcl__: a lot of CSR infra exists to automate BSP generation
<lkcl__>
the "registers" are simply merged into the object. the address decoding is...
<lkcl__>
ahh yeah
<whitequark>
that's why it looks so complicated
<whitequark>
to a degree, it is fairly complicated
<whitequark>
I also didn't document it nicely yet, that will be fixed
<lkcl__>
which is something that we need (one of the libre-soc goals)
<awygle>
whitequark: i suppose so? i originally thought that a CSR would just be a bus-agnostic register abstraction (which would seem useful to me), but then i found out there was a separate "CSR bus", which i don't understand the utility of very well
<whitequark>
awygle: oh
<lkcl__>
to be able to, like OpenPITON, auto-generate an entire interconnect *and* peripherals *and* linux kernel header GPIO files *and* device-tree files *and* Technical Reference Manual from a machine-readable file (OpenPITON uses json)
<whitequark>
awygle: the main reason the "CSR bus" exists is that the CSR infra does not limit the size of CSRs in any way, and yet guarantees atomic reads/writes
<whitequark>
the secondary reason the CSR bus exists is that cores that have a lot of sparse CSRs are quite inefficient to map to e.g. 32-bit wishbone
<whitequark>
the tertiary reason the CSR bus exists is so that an UART doesn't have to be written for WB, for AXI, for Avalon...
<whitequark>
it doesn't necessarily have to be user-visible. we are discussing it right now like it would be in some cases, but maybe that's a bad idea
<awygle>
the tertiary reason is _very_ compelling to me, but then it falls down because the same can't be true of e.g. SRAM
<lkcl__>
oh wait - so the idea is to *completely separate* the CSRs from the peripheral that they're associated with?
<whitequark>
awygle: I think that's fine though, because CSR accesses and mmapped accesses are very different
<whitequark>
CSR can be multi-cycle non-pipelined in basically every case and that's just fine
<awygle>
the primary reason assumes that arbitrary-width atomic-access registers are valuable, which.... i guess i'm not experienced enough with SoCs to say for sure is or isn't true. it doesn't _feel_ true to me
<whitequark>
well, in practice it's not completely arbitrary width, though it makes some things a bit simpler
<awygle>
the secondary reason feels like it could be solved with a smarter interconnect
<whitequark>
in practice what you discover is you need atomic reads of a 64 bit register on a 32 bit wishbone
<awygle>
... somehow i find myself arguing about this, which was not my intention at all. oops.
<whitequark>
ARTIQ had that
<whitequark>
it had a hack that did the same thing as the current CSR bus, badly
<lkcl__>
awygle: imagine that you have data which can only come in (or out) once. you absolutely cannot read it twice
<whitequark>
if we want nmigen-soc to be accessible to use cases like control plane applications, it becomes even more important
<lkcl__>
because that's the next character from the UART Rx
<awygle>
atomic transactions are good, no argument
<whitequark>
lkcl__: uh, that's unrelated
<lkcl__>
whitequark: oh ok
<whitequark>
awygle: the CSR bus *is* this smarter interconnect
* lkcl__
is trying to keep up (through a low-grade constant headache)
<daveshah>
TileLink has a mode where you end up with an ALU in the peripheral for extreme atomic memory capabilities from what I've seen
<whitequark>
it uses narrow internal buses to conserve resources, yet preserves atomicity even if you go down to 8 bits, while letting you use 64-bit registers transparently
<daveshah>
Massive overkill for a CSR of course but kind of interesting
<whitequark>
we'll need it even if we go with merged peripherals that only export a single Wishbone, etc, interface
<awygle>
you can do all of that with wishbone though
<whitequark>
that's sort of true
<whitequark>
and in fact that was one of the suggestions when I was adding the bus, "just use wishbone"
<emeb>
I think I see what happened - I installed nmigen as directed and it installed v0.2, but then I installed nmigen-soc and it requires v0.1 and downgraded me.
<whitequark>
emeb: ah yes, the nmigen-soc dependency was wrong until recently. I fixed it in master though
<lkcl__>
daveshah: ariane (and others) put a mini ALU for the amo* operations, just below (or part of) the L2 cache. however they used AXI4, there, not TileLink
<whitequark>
awygle: arguably, you could replace all the csr.* stuff with modules that have WB interfaces instad
<lkcl__>
rocket-chip definitely did a mini ALU with TileLink interconnect
<whitequark>
it's not clear what the point would be though
<emeb>
whitequark: OK - that change must not have propagated to PyPI - it's still downgrading.
<whitequark>
emeb: indeed
<whitequark>
install from master for now
<whitequark>
I'll sort it out
<whitequark>
awygle: basically, think of csr.* as a toolkit for implementing a clever WB target that efficiently does atomic transactions
<awygle>
whitequark: yeah that's not really what i'm suggesting either. it's not important though, at least not at the moment.
<whitequark>
(or not WB)
<whitequark>
actually, I think I'd like to understand what you mean here
<whitequark>
because maybe I'm missing something important
<lkcl__>
whitequark: i think that my point is, to get the kind of synchronisation needed, as well as the abstraction requested, you need a wishbone-*like* interface with similar capabilities.
jeanthom has quit [Ping timeout: 256 seconds]
<awygle>
whitequark: is it late-evening in your personal world? i was about to go eat lunch
<whitequark>
awygle: we can continue later, i'll be up for a few hours at least
<awygle>
cool, i'll ping tou in 30 or so
<lkcl__>
1hr45m eek apparently i have another call in 15m, alarm just went off :)
<jfng>
whitequark: iirc you said last time that you had an idea for the second problem (peripheral metadata)
<whitequark>
jfng: i feel like we really should decide something about the first one first
<whitequark>
e.g. if we end up with all merged peripherals, that would obviate the metadata issue entirely
<whitequark>
i'm convinced we can work out something nice here
<jfng>
i'm struggling to find something that doesn't involve adding backpressure to the csr bus interface
<whitequark>
jfng: would that even work?
<whitequark>
like, let's forget for a moment that we have to implement this. let's imagine a perfect solution that a genie would pull out of thin air
<whitequark>
what kind of design should we have then?
<lkcl__>
interesting question
<jfng>
uh, numbering of each transactions on each bus, and the user would have to match them ?
<jfng>
though that may be very unrealistic
<whitequark>
jfng: doesn't matter that it's unrealistic, remember? a genie implements it
<whitequark>
*but*
<whitequark>
what matters here is that it's perfect for peripheral imlementers
<whitequark>
while being the opposite of perfect for initiator implementers
<lkcl__>
oo. that reminds me. a wonderfully-named new bus called "Banana Memory Bus". part of SaxonSOC
<whitequark>
what i'm trying to think of is a design (no matter how complicated or expensive) that would be nice to use from both initiator and target side
<whitequark>
because i believe that coming up with this will expose the flawed assumptions we hold that prevent us from getting to a nice real design
<lkcl__>
one important feature-request, though: i believe it's unrealistic to expect literally every peripheral to be written in nmigen
<lkcl__>
consequently, being able to "plug in" verilog, vhdl pre-existing peripherals (opencores, richard herveille's excellent rgbttl module for which i have linux kernel driver source)
<lkcl__>
should i feel be a high priority.
<lkcl__>
otherwise, nmigen-soc becomes an "either / or" choice.
<lkcl__>
you either use nmigen-soc *or* you duplicate it / use something else
<whitequark>
sure, and the other way around too
<lkcl__>
and that does mean... yes
<whitequark>
this has already been requested and i decided that it should be possible
<jfng>
why would transaction IDs be the opposite of perfect for initiator implementers ?
<lkcl__>
it does mean that whatever is done needs to be able to take into account that the [opencores] external RTL has "integrated" CSRs
<lkcl__>
like that XICS interrupt controller in VHDL i posted an hour back.
jeanthom has joined #nmigen
<lkcl__>
some sort of "specification" is needed that recognises the peripheral's capabilities, or at the very least a wrapper of some type that conforms to the (ideal) design
* awygle
resists the urge to bring up xTEDS :p
<jfng>
lkcl__: after reading the spec, i like this "banana memory bus"
<whitequark>
jfng: right, i didn't put it very clearly
<whitequark>
let's try another approach
<whitequark>
what do we have? an initiator, maybe a CPU, that has a WB initiator interface, and a target, maybe a peripheral, that has a WB target interface and a CSR target interface
jeanthom has quit [Ping timeout: 240 seconds]
<whitequark>
where can reordering come from? it could happen inside the CPU or in the interconnect
<whitequark>
in the interconnect, it could be because we chose to implement CSRs in a way that makes CSR latency different from WB latency
<whitequark>
in the CPU, it could be because the CSR areas are always uncacheable (I think any sane implementation will be like that), but the memory-mapped areas could be write-combine or even cacheable
<whitequark>
in fact I think that it can be desirable to do write combining even with relatively simple CPUs
<whitequark>
if we consider something like a DRAM controller, a Flash controller, I think this becomes even clearer
<whitequark>
you *definitely* want your DRAM to be cached, and you almost certainly want your Flash memory to be cached, too
<whitequark>
video RAM has similar constraints, write combining is required for decent performance
<whitequark>
jfng: let's look at the flash controller, I think that's the simplest case that's clearly problematic
<whitequark>
how would you want to do writes? well, you write to the memory area served by the controller, then you poke the CSR to start a page write
<whitequark>
right?
<whitequark>
(barring some horrible "the CSRs expose SPI pins directly as GPIOs" solution)
<jfng>
yes
<whitequark>
this, i think, means two things
* awygle
is back now, tag me in whenever (reading backlog)
<whitequark>
first, you can't do what awygle suggests, or at least it's not remotely enough
<whitequark>
sure, annotating interconnect latency, or compensating for it, takes care of a part of the problem, at a significant cost
* awygle
has visions of poking his head through a door and taking a shoe to the head :p
<whitequark>
but... this doesn't handle reordering caused by write combining at all, and this doesn't necessarily handle reordering caused by cacheable/noncacheable accesses having different latency (as this is inside the CPU, not necessarily something we can fix in the interconnect)
jeanthom has joined #nmigen
<whitequark>
second, i'm fairly certain that even if our proposed solution is "the peripheral should expose an unified Wishbone interface rather than Wishbone/CSR", it should actually be two Wishbone interfaces
<whitequark>
precisely so that you could stuff one of them behind a cache, or in a part of the memory map that is cacheable, or in a distinct enough chunk of memory that you can configure it as cacheable in TLB, or stuff like that
<whitequark>
you can't just take a CSR memory map and glue it to the Flash memory map
<whitequark>
and call it a day
<whitequark>
awygle: coincidence, i swear :p
<jfng>
ah yes, i see your point
<whitequark>
something i'm thinking of, here, is for example that the Flash and DRAM controllers could each sit behind their own caches
<whitequark>
and then you could arbitrate the CPU and eg the DMA controller between those
<whitequark>
this way, you won't have cache coherency issues
<jfng>
you can't configure your cpu cache to say "cache this region, except some bytes here and there"
<whitequark>
well, sometimes you can
<awygle>
but not always
<whitequark>
intel has MTRRs for this
<whitequark>
yes
<whitequark>
it's complex
<awygle>
(also it's ugly)
<awygle>
yes
<whitequark>
and it prevents many useful cache hierarchies
<whitequark>
such as sharing a cache between CPU and DMA
<whitequark>
(sure, you don't want that if you want to maximize perf, but what if your CPU is just for control?)
<awygle>
well for one thing, i'm always right, so you should always do what i suggest :p
<whitequark>
lol
<awygle>
so your concern is, you have a flash controller, which is doing write combining, and exposes a memory-mapped interface, and also has a CSR interface you poke to initiate a write
<whitequark>
awygle: sort of but that's a bit too specific
<whitequark>
the general problem i'm highlighting is that reordering/latency comes from multiple sources
<whitequark>
we can theoretically handle the last three, perhaps at a high cost
<whitequark>
but there is nothing we can do about the first two
<whitequark>
so real user code will always have to do some sort of explicit synchronization
<whitequark>
it's just how memory models in native languages and CPUs work
<whitequark>
even x86 reorders writes with other writes
<awygle>
i _think_ the general solution i have rattling around in my head is that you shouldn't design your peripheral such that a CSR write is required to be ordered w.r.t. a memory write
<awygle>
i know "shouldn't" is not a great place to design from
<awygle>
so i'm trying to crisp it up
<whitequark>
this is part of what i'm talking about, actually
<whitequark>
i mean
<whitequark>
if you could tolerate any CSR and MM ordering at all, it's just two peripherals
<awygle>
kind of, but also not really. imagine that flash interface, but your CSR has two bits, "lock" and "program". you poke "lock" first, and after that any memory transactions are rejected until you un-poke "lock" or poke "program" and go through a program cycle.
<whitequark>
uh
<whitequark>
define "rejected"
<awygle>
NACK'd or not allowed to begin. wishbone "RTY" or "STALL"
<whitequark>
have you tried using those
<whitequark>
well, wishbone bus errors
<whitequark>
with real wishbone CPUs
<whitequark>
oh, wait, it *stalls* the transaction, right?
<awygle>
STALL does, yeah
<whitequark>
RTY is an error or?
<jfng>
rty ends the transaction
<awygle>
RTY has no defined semantics
<jfng>
but is not an error per se
<awygle>
just like ERR has no defined semantics
<awygle>
because wishbone is Bad
<whitequark>
ah right, that's why i can't remember what it does
<whitequark>
well uh
<whitequark>
for example, mor1kx just flat out ignores any wishbone errors on loads
<awygle>
rty means "try again later" basically. but it's unambiguous that it does not accept the transaction.
<whitequark>
errors on stores aren't super usable either for a few reasons
<whitequark>
they're not precise, so you can get an exception for a store instruction in completely different code that finished long ago
<awygle>
you're the one who invoked a genie earlier. it's not my fault wishbone is terrible lol
<whitequark>
and they cause exceptions, which is an excellent way to fuzz your driver
<whitequark>
yeah, sure! that's still useful input
<whitequark>
i'm mostly just frustrated about wishbone
<awygle>
SAME.
<jfng>
a lot of parts are underspecifed imo
<whitequark>
in general, i think bus error cycles are a non-starter
<whitequark>
jfng: that is just objectively true
<whitequark>
no need for "imo" :p the spec openly says it even
<jfng>
especially arbitration
<whitequark>
awygle: bus error cycles are a good way to do 2 things
<whitequark>
1. find bugs in your interconnect and CPU
<awygle>
whitequark: you can't have it both ways though. you have to be able to apply backpressure even if you build the mess into your notion of a CSR.
<whitequark>
2. fuzz your driver with exceptions
<whitequark>
awygle: yes, i'm trying to figure out a solution that will actually survive contact with reality
<awygle>
you don't need to guarantee ordering of transactions, you just need some kind of... preemption, that guarantees you get _a_ consistent ordering. just like in softwareland
<whitequark>
wait
<whitequark>
elaborate?
<awygle>
hence my "lock" bit above, it's saying "preempt the memory map's operations, i'm doing config stuff now"
<whitequark>
yes but how do you make sure your last writes reached the memory?
<awygle>
uh attempting to elaborate - you know how in multithreaded programs as long as _some_ sequence of preemptions could yield a given result it's a legal result? which doesn't mean "it can do whatever", there are still rules, but it does mean you have to be able to handle one of a set of results (or add additional constraints)
<hell__>
PCIe says: posted writes
<whitequark>
right you're basically talking about a memory model
<awygle>
god i would kill for us all to be sitting at the same whiteboard right now..... this is so hard to explain without a pen in my hand
<whitequark>
happens-before
<awygle>
"how do you make sure your last writes reached the memory?" if the transaction hasn't committed (an ACK to your STB in wishbone), it hasn't reached the memory (yet?)
<whitequark>
but your software doesn't talk to the WB interconnect
<jfng>
are we interpreting the WB4 spec to say that RTY doesn't commit the write ?
<awygle>
more importantly, it's still "your" (the initiator's) responsibility
<awygle>
it's like an ownership model
<awygle>
if i ACK your transaction i'm guaranteeing that i'll deal with it from here, and if i can't guarantee that i don't ACK it
<awygle>
does this make any sense at all?
<whitequark>
right, i see, you have only the interconnect/peripheral in scope, meanwhile i am trying to do this end-to-end
<whitequark>
what you're saying makes sense
jeanthom has quit [Ping timeout: 256 seconds]
<awygle>
right, but the end-to-end reduces to a series of interconnect/peripheral transactions
<whitequark>
no, because one "end" is my rust code
<awygle>
... okay i wasn't going _that_ end to end
<whitequark>
... i mean you could consider a CPU a kind of interconnect
<whitequark>
i think it's important to do that because otherwise we can end with interconnect/peripheral design that isn't amenable to using from firmware
<awygle>
wait why doesn't your Rust code talk to the Wishbone interconnect?
<whitequark>
as you no doubt have seen happen before
<awygle>
isn't that exactly what it's doing anytime it writes to the memory map?
<whitequark>
nope, for two reasons: because Rust has its own memory model we con't control, and the CPU has its own memory model we largely don't control
<jfng>
not necessarily, the write may end up in a write buffer, and not immediately reaching the wb bus
<whitequark>
what jfng says
<awygle>
ok it doesn't talk _directly_ to the wishbone interface, the interaction is mediated by the memory controller. fair.
<jfng>
or even a write-back L1
<whitequark>
so no matter what we do, the end user would have to (through the BSP, ideally) make sure rustc is constrained, and the memory controller is constrained
<whitequark>
because this is necessary anyway, what if we integrate *that* synchronization code with synchronization necessary for the peripheral?
<whitequark>
strawman example: after writing everything to the flash, you issue one read from the last flash byte, ensuring that (assuming a sane memory controller) by the time the read returns, all the writes reached the peripheral
<awygle>
i'm feeling a bit wrapped around the axle at this point. let me try and encompass this discussion with my mind and come back to you. i'm in that place of "i still feel that i'm right, but i don't feel that you're wrong, but i'm not sure we're talking about the same thing". i need a bit to internalize it. carry on without me.
<whitequark>
i'm largely done. my suggestion to jf (as the -soc maintainer, so by extension anyone involved in -soc) is that we go with the "split" proposal
<awygle>
oh, i unambiguously agree with that
<jfng>
yes, i think we can all agree to a split design by now
<whitequark>
then the peripheral info question is relatively easy to resolve
<whitequark>
peripheral info should have multiple memory maps per peripheral.
<whitequark>
that's it.
<whitequark>
eg for a flash peripheral one of those probably ends up at 0x0000000 in code space, and another in MMIO config space wherever else
<jfng>
whitequark: are you implying that the solution to the synchronization problem could be entirely done from the software end ?
<whitequark>
jfng: not quite
<whitequark>
i believe that the synchronization problem, in designs with a CPU, would unavoidably, in any correct design, include a significant software aspect
<whitequark>
which is an incredibly obnoxious thing to debug and is hard to get right
<whitequark>
because it is so arduous, i am proposing that whatever we do for peripherals would be done while taking into account the software code used for synchronization
<whitequark>
i.e. i am saying that we should abandon the idea that this should be completely transparent
<whitequark>
e.g. with the interconnect balancing latencies, or with a single upstream WB port that prevents reordering
<whitequark>
it can't be transparent, let's pick whichever non-transparent solution is the most convenient to use
<whitequark>
and lean into the non-transparent nature of the synchronization to make it simpler for implementers and easier to get right
<whitequark>
latency annotation would be a nightmare to get right
<whitequark>
some sort of ping-pong protocol like the one awygle is suggesting? much much nicer
<whitequark>
(well, i don't think that specific one works, but i think the general concept is workable)
<agg>
could it interact with specific fence instructions from the cpu instead of using reads as a finish-writing sidechannel? like issuing a dmb or dsb fence in arm to ensure store buffers are flushed
<agg>
maybe making it require cpu-specific details is untenable compared to using the always-present read/write interface though
<jfng>
do fence instructions also flush the L2 cache ? i believe they only do L1 but i may be wrong
<whitequark>
jfng is right, this doesn't work in general
<agg>
on cortex-m7 it's l1 only
<agg>
l2 cache management is explicit
<agg>
but a read doesn't do it either, and peripheral CSRs are all in noncacheable regions
<whitequark>
but nmigen-soc is not even CPU-specific
<whitequark>
(i'm quite enthusiastic about CPU-less SoCs!)
<agg>
sure, same
<agg>
but that doesn't mean it couldn't have a signal that coordinates with a fence instruction or other logic?
<whitequark>
that's true, but... I don't think WB4 has those for example?
<jfng>
you could technically, with tags extensions
<whitequark>
i guess
<whitequark>
i think we can't rely on it being present
<whitequark>
so we would need a more general solution anyhow
<whitequark>
we can optionally also handle barriers, of course
<agg>
doesn't get more generally supported than reading, I guess
<jfng>
i think tilelink was designed with that use case in minde
<whitequark>
but that doesn't help the current discussion
<agg>
just a shame to generate a spurious read transaction only to ensure the write is complete
<whitequark>
i specifically said "strawman"
<whitequark>
it's just an example of the general strategy i meant
<agg>
ack
<whitequark>
that said, it doesn't seem that bad to me?
<jfng>
the "read last byte after the write burst" approach ?
<whitequark>
virtually all CPUs guarantee that reading a location you just wrote introduces happens-before edges
<agg>
i guess it depends on how that's meant to interact with caches or external memory controllers or whatever
<whitequark>
yes, it's not actually enough
<awygle>
cr1901 will be sad when we eliminate the possibility of supporting the DEC Alpha
<whitequark>
how so?
<agg>
maybe you could write a magic location that blocks until all preceding csr writes go through or something
<awygle>
it's the only exception to your "virtually all" that i'm aware of
<whitequark>
... really?
<awygle>
yeah
<whitequark>
what
<whitequark>
how... does that even work
<whitequark>
i mean i knew DEC Alpha has *the* weakest memory model
<awygle>
on a more serious note - do we assume that all nmigen-soc peripherals will only be used with nmigen-soc interconnect?
<whitequark>
but i don't understand how that'd work at all
jeanthom has joined #nmigen
<jfng>
awygle: nope, they should be usable in a standalone fashion
<awygle>
i agree with that
<jfng>
or, at the very least, convertible to a standalone instance
<whitequark>
yep
<awygle>
which brings me back to "this is not the peripheral's problem"
<awygle>
although i do see how it could push you the other direction
<whitequark>
from my end-to-end point of view, this doesn't really resolve the discussion at hand
<whitequark>
i'm fine with "this isn't peripheral's problem"
<whitequark>
but we should do it *somewhere*
<whitequark>
and it should be nice
<awygle>
do we have enough "somewhere", currently, to really make it relevant?
<awygle>
we were discussing #10, which is the peripheral API. it doesn't seem like there's a ton of interconnect infrastructure in -soc, as of yet
<whitequark>
a flash peripheral would be one of the first
<whitequark>
you're also working on dram
<whitequark>
it's not yet a directly pressing concern
<whitequark>
but since we already got the context right now... why not figure it out, since it's obviously a problem we'll hit in future?
<jfng>
maybe the question should be "can we move forward without painting ourselves in too much of a corner ?"
<whitequark>
that's always a good baseline, and i'd be happy with that
<whitequark>
but i also feel like we might end up with an actual solution already
<awygle>
i thought that was "split" and "this is not the peripherals problem"
<whitequark>
the latter worries me
<awygle>
(meeting, back soonish)
<whitequark>
it's not obvious to me that this can be always a non-concern for the peripheral
<jfng>
so, since we agreed on a split bus design, and that the peripheral info may hold multiple memory maps, how would that play out with mixins ?
<jfng>
can they just all hold a reference to the same peripheral info instance ?
<jfng>
hmm, wait that doesn't make sence
<jfng>
or maybe if the csr mixin had a `csr_periph_info` attribute, etc. but its very ugly
<whitequark>
jfng: what if we start without mixins, just implementing peripherals with whatever logic the mixins *would* desugar into, and then extract them once we're confident in the API we want?
<jfng>
yep, that works
<lkcl__>
jfng: BMB - i thought you might like it, not just the name :)
<jfng>
the possibility to have stateless up/downconverters etc is nice
<_whitenotifier-b>
[nmigen-boards] whitequark commented on issue #89: [RFC] handling boards that are very similar - https://git.io/JJEIQ
<_whitenotifier-b>
[nmigen-boards] rroohhh commented on issue #89: [RFC] handling boards that are very similar - https://git.io/JJEIF
Asu has quit [Remote host closed the connection]
<sorear>
if you have a directory-ish cache coherence system, CPUs can cache memory exposed by devices and the devices can tell the CPUs when to write back + known when they have done so