ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at · logs at
Degi has quit [Ping timeout: 246 seconds]
Degi has joined #nmigen
Guest30583 has joined #nmigen
proteus-guy has quit [Ping timeout: 256 seconds]
thinknok has joined #nmigen
<whitequark> folks, i have a question regarding the design of cxxrtl
<whitequark> to recap, the idea is that cxxrtl would have debug info that allows introspecting the design at runtime given a signal name
<whitequark> and enumerating signals as well
<whitequark> one thing that is necessary is to represent hierarchical signals, eg so you get nice hierarchical VCDs
<whitequark> the problem is that it's kind of a pain in the ass
<whitequark> specifically, how do you model a signal name?
<Sarayan> badly?
<whitequark> std::vector<std::string>, but that's bad because it's not castable to a C array
<whitequark> std::vector<const char *>, but that's a shitton of allocations for deeply nested designs, with no real reason
<whitequark> and you lose the ability to generate names at runtime though the utility is debatable
<whitequark> and it's an awful lot of code to construct it, with lots of copying and pushing
<whitequark> it feels to me that the best low-level representation of a hierarchical name is a single string
<whitequark> one allocation; easy to FFI; easy to concatenate while constructing debug info
<whitequark> the problem is that yosys lacks a canonical way to serialize hierarchical names into strings
<Sarayan> why vector? There's nothing dynamic in there
<whitequark> there is
<whitequark> well
<whitequark> you can assemble a hierarchy of modules at runtime if you want
<Sarayan> then each will have its own signal tree
<whitequark> if you use verilog then those are deduplicated
<whitequark> if you use nmigen they are not, but cxxrtl is not just for nmigen
<Sarayan> for me "runtime" is well after hdl happened
<whitequark> yes
<whitequark> you use module A in module B and C
<whitequark> if there is a signal s in A, this should produce debug info for B.A.s and C.A.s
<whitequark> and there is only one class that corresponds to A if you don't flatten
<whitequark> so it gets the "parent path" as an argument
<Sarayan> Could you have A::introspection that gives you offsets/accessors based on A's object pointer?
<Sarayan> same for B and C
<whitequark> not really no
<Sarayan> should make the introspection stuff fully const
<whitequark> this isn't doable in standard C++ afaict
<whitequark> like, you can technically do it and it'll probably work, but i couldn't find a rule that explicitly says that the compiler can't break my code if i do that, so i work assuming that it can
<Sarayan> are your modules POD?
<whitequark> nope
<whitequark> virtual destructor
<Sarayan> ah
<whitequark> has to be
<whitequark> not standard layout either
<Sarayan> could you generate accessors?
<whitequark> that's what i'm doing
<whitequark> more or less
<Sarayan> because accessors are unbreakable
<whitequark> sigh
<Sarayan> standard-wise
<whitequark> what i'm generating is a map from names to values, wires, or memories
<Sarayan> especially given that a no-capture lambda decays to a normal function
<whitequark> the values are just the bare data chunks so that e.g. python code doesn't have to go through libffi each time it wants to twiddle a bit
<whitequark> this is completely irrelevant to the problem at hand, which is the fact that i have to give the signals some name
<whitequark> even if i generated extern "C" accessor functions, which is a bad idea, i would still have to give them names
<Sarayan> so { "signal_name", [](A*p) -> u32 { return p->xxx; }, [](A*p, u32 v) { p->xxx=v; }}, takes care of a signal, an array of that alpha-sorted gives you access in log2 time. Then you can have a signal/directory meta-structure
<Sarayan> and the lambdas are C-callable, since they decayt to functions
<Sarayan> the more annoying part being keeping the prototype constant
<Sarayan> and of course the names keep local to 'A'
<whitequark> no?
<Sarayan> I probably missed 1e6 niggly details of course
<whitequark> sigh. can you please focus on the actual question i asked
<Sarayan> I must have misunderstood the question
<whitequark> you've thought about this for about thirty seconds and i've thought about this for a month or two
<whitequark> accessors are a ridiculously bad idea because you take a cheap operation (assign one uint32_t, usually) and turn it into a very expensive operation (foreign function call)
<Sarayan> In fact, I'm pretty sure I misunderstood the question
<whitequark> you'll need accessors when you want to rematerialize a localized value but that's the exceptional case, not the rule
<Sarayan> you're probably overestimating the cost of a bare function call I suspect, but that doesn't matter, it wasn't the question
<whitequark> it's not a bare function call
<whitequark> take a look at what a ctypes function call does in python some day
<Sarayan> I thought we were in cxxrtl?
<whitequark> the primary consumer of debug info in cxxrtl is vcd (where small overheads add up very quickly). the secondary consumer is python (where FFI calls are quite expensive)
<Sarayan> ah
<Sarayan> ok, I was in the third consumer case, a C++ program linking in the generated code
<whitequark> i have no idea why are you so confident in your responses, so much so as to doubt my judgement, when you don't even have any clue what design we're discussing
<whitequark> and frankly i don't know why i bothered asking. this is a waste of time.
<Sarayan> I don't doubt your judgement at all, and I'm sorry if it came out that way
<Sarayan> it's me who's struggling to understand the real problem
<whitequark> okay, let's try again
<whitequark> we have a tree of modules in cxxrtl. this tree is assembled dynamically at runtime for two reasons
<whitequark> first, because a single module can be used many times, in many places in hierarchy
<whitequark> second, because black boxes are already a thing, and they permit (awkward) separate compilation, and maybe some day i'd like to get actual separate compilation
<whitequark> i.e. HDL module per C++ file, to reduce warm build times
<Sarayan> does that mean you have c++ classes which communicate, with possibly multiple objects of each class?
<whitequark> essentially yes
<whitequark> (this is already how cxxrtl works)
<Sarayan> ok good, go on
<Sarayan> it's a very natural c++ structure
<whitequark> yep, that's the point of cxxrtl. to do things in natural c++ ways, and hopefully not too horrible ones
<Sarayan> yeah, well, there's hdl at the start, so naturalness... :-)
<whitequark> to interact with this structure without using a C++ compiler we need a metaobject protocol. some way to describe its layout in an abstract way
<whitequark> there are basically two approaches to that that i see
<Sarayan> ok
<Sarayan> interact means read and write the state of signals
<whitequark> some signals are read only
<whitequark> (combinatorial wires)
<Sarayan> are some signal rebuilt on request?
<Sarayan> doesn't matter, that's not the question either, at least not at this point
<whitequark> in the final design for debug information yes, in the minimal one i am working towards at first no
<whitequark> the first approach is to describe each module with a static structure that would contain flat names of each wire and offsets to their values, plus of course sizes and such
<Sarayan> that would be what I would tend for too
<whitequark> the problem is that you cannot express "offset to data member" in standard C++ at all
<whitequark> hm
<Sarayan> is that certain?
<whitequark> you can use offsetof() since C++17, conditionally
<whitequark> but... cxxrtl supports C++11
<whitequark> so i think it's certain within cxxrtl's constraints
<whitequark> i also checked that you definitely can't do anything useful with a pointer to member. if you can get at the offset, it'll be through offsetof()
<Sarayan> I suspect you can sanely consider that what is usable w.r.t offsetof in c++17 is what is technically working in the previous versions
<whitequark> I would rather not invoke UB intentionally
<whitequark> so I see that as off-limits
<Sarayan> then you could use accessors iff the cost is not too high. These can't fail
<whitequark> on each cycle you dump the entire set of signals into vcd. vcd will use the same interface
<whitequark> do you really want a ~2x or so overhead for vcd dumps?
<whitequark> because the call is cheap, the read is cheap, doing both has nontrivial cost
<whitequark> over doing one
<Sarayan> could the accessors give you a pointer to the value? So you only call it once?
<Sarayan> pointer to a u32 in a class in not UB as long at you don't move the object around
<whitequark> sure
<Sarayan> I don't know, I may be missing things again
<whitequark> so now instead of directly using the debug info, you're actually building some other structure that is more useful for your purposes
<Sarayan> yeah
<whitequark> which is why I propose that debug information not use a fairly restrictive static descriptor approach, but rather provide a method that gnerates this structure for you
<whitequark> std::map<debug_name, debug_value>
proteus-guy has joined #nmigen
<Sarayan> Is that efficient?
<Sarayan> I mean, can you make creating a map efficient?
<whitequark> does it matter?
<whitequark> you do it once at the very beginning
<whitequark> whatever the cost is, it just has to be lower than a few simulation steps (or a few thousands even)
<Sarayan> ok, then I fail to see the difference with the static descriptor approach
<Sarayan> std::map is essentially a binary tree, e.g. you can have the same result with a sorted name list. Alternatively, you can precompute a hash and sort in that order
<whitequark> well, for one, it works with c++11 without the overhead of accessors
<Sarayan> what is debug_value's type?
<Sarayan> so essentially the only non-static part is chunk_t **
<whitequark> sure
<Sarayan> so if you had there a static function that fills a chunk_t *** instead, it would be fully static and possibly C-accessible
<whitequark> it can't be fully static because of blackboxes
<Sarayan> would you ask the blackboxes to generate that information too?
<whitequark> yeah, sure, that's fundamentally what separate compilation is
<whitequark> you treat a bunch of modules as mutual black boxes
<Sarayan> then you can ask the blackbox to provide the filler static functions
<Sarayan> and the structure
<Sarayan> but yeah, you way works too, it's just that the startup cost is significantly higher
<Sarayan> but probably not noticeable anyway
<whitequark> it's just a method on the C++ class, like eval and commit
<Sarayan> python/vcd will have to "parse" the map to fill internal structures in any case
<Sarayan> having everything done with methods has its charm that's for sure
<Sarayan> huhu, people burned by boost are starting to pipe up
<whitequark> the kind of data structure debug information needs can be determined by looking at which operations will be common
<whitequark> and it's two operations
<whitequark> 1. iterating through all signals (or perhaps a subset, but that's just iterate with a condition
<whitequark> 2. looking at specific signal
<whitequark> so while i could make nested descriptors it doesn't seem like a good fit, since an application that uses the debug info would make an equivalent of such a map anyhow
<Sarayan> wouldn't writing to vcd want to have linear access on all the signals?
<whitequark> linear?
<Sarayan> as in packed vector, so that the cache use is minimal
<Sarayan> go over them, find which ones have changed
<Sarayan> it's iterating, but as fast as possible because you do a full iteration after each clock
<Sarayan> dunno if vcd is delta-packed
<whitequark> it is
<whitequark> it's been an hour and we still didn't get to the actual issue i had
<Sarayan> sorry
* Sarayan shut ups and listens
chipmuenk has joined #nmigen
<whitequark> so whatever approach we have, we need to have some way to go from "a tuple of strings representing a hierarchy path" to a debug_item
<whitequark> this could be nested lookup in descriptor tables, but that only really works for fully static descriptor tables
<whitequark> well
<whitequark> not quite that, let me approach it another way
<Sarayan> ok
<whitequark> given just a C FFI, there should be a way to enumerate every signal in a given toplevel module recursively
<Sarayan> agreed
<whitequark> and a way to go from a hierarchical, multipart name to a signal
<whitequark> (signal being struct debug_item in this case)
<Sarayan> yeah
<whitequark> the former is easy and uninteresting
<whitequark> the latter is interesting because it's not clear how to represent names
<Sarayan> feels like a filesystem
<whitequark> yes
<whitequark> the two options i see is "an array of name parts" and "a single mangled name"
<Sarayan> / vs. \ vs. : oh joy
<whitequark> originally i was leaning to an array, but it seems like a PITA to actually use
<whitequark> especially if you are using the FFI from pure C
<Sarayan> yeah, it's not natural
<Sarayan> we're all very used to paths
<whitequark> it's not very good in python either
<Sarayan> as onw string
<whitequark> you'd have to do a lot of allocations
<Sarayan> one
<whitequark> if we're using dynamic descriptor tables, then that is even more allocations, a quite ridiculous amount for deeply nested designs
<whitequark> for a flat table that is
<Sarayan> yep
<whitequark> ok, so you agree that paths are the way to go. I also think they are
<whitequark> now we have to figure out a name mangling scheme
<Sarayan> are there characters that are not allowed in signal names?
<whitequark> whitespace.
<whitequark> that's it
<Sarayan> well, isn't whitespace good enough?
<Sarayan> It sure is unusual, but heh
<Sarayan> which would mean no mangligng, which is nice
<Sarayan> -g
<whitequark> that's a mangling technically
<whitequark> since the spaces weren't there originally
<whitequark> but yeah
<whitequark> this has one obnoxious property
Asu has joined #nmigen
<whitequark> if you decide to flatten your design, which you probably often will as it produces faster binaries, you'll need to replace spaces with dots
<whitequark> but... dots are actually valid as a part of a name in verilog or nmigen
<Sarayan> that's unfixable though
<Sarayan> ambiguous is ambiguous
<Sarayan> unless you say "if you put a dot in your name, it will be considered a hierarchy", turn all dots to spaces and call it a feature
<whitequark> the ambiguity is not inherent here, it's a part of yosys flatten pass
<Sarayan> sure, but post-flatten it's ambiguous unless you change yosys
<whitequark> i think yosys needs to change here, yeah
<Sarayan> I'm kinda sorta serious about the call it a feature part though
<Sarayan> it can be nice to add structure where it isn't explicitely in the hdl
<whitequark> i don't want to intentionally introduce ambiguity where i am not forced to
<whitequark> well, verilog explicitly allows you to use dots.
<whitequark> in nmigen, in theory, i could have forbidden them
<whitequark> (though it would be nice to use dots as a part of records, maybe?)
<whitequark> but verilog stays as it is
<Sarayan> sure, but what you say is "if you call a signal hello.a and hello.b they're appear as a and b in a hello virtual module"
<whitequark> yeah and that's broken
<Sarayan> it is?
<whitequark> because there's no such module in the input
<Sarayan> hence why I called it virtual, you give some control to the programmer on the vcd structure
<whitequark> the goal of a simulator is to give you 1:1 correspondence with input HDL, not to invent coding conventions on its own and force it on you
<whitequark> if i wanted that i'd have used verilator
<Sarayan> huhu
<Sarayan> well then you have to change the yosys flatten pass to keep the path somehow
<Sarayan> attribute values can have spaces can't they?
<whitequark> yeah
<whitequark> an attribute would work
<Sarayan> so yosys-original-path
<whitequark> ok, i guess spaces it is
<whitequark> one regular space ' ' specifically
<whitequark> plus a small helper to make it easier to construct those, maybe
<Sarayan> ' '.join(path) weee
<Sarayan> will be easy in python
<whitequark> you won't use it directly in python, most likely
Bernhard2 has joined #nmigen
chipmuenk1 has joined #nmigen
chipmuenk has quit [Ping timeout: 260 seconds]
chipmuenk1 is now known as chipmuenk
proteus-guy has quit [Ping timeout: 272 seconds]
Bernhard2 has quit [Remote host closed the connection]
<anuejn> awygle: re the bikeshedding of the ila:
<anuejn> from my perspective, the ila shouldnt be part of the nmigen platform source (or better the TemplatedPlatform class)
<anuejn> i would rather vote for a design where we have something like an IlaPlatformWrapper, that takes an existing platform and makes it have some specific ila functions (like platform.probe)
<anuejn> i want this because i have the impression that it is kind of ugly to special case the ila in any way
chipmuenk has quit [Quit: chipmuenk]
<anuejn> i imagine something like this:
<anuejn> that would allow us to have the ila more out of tree and behave more like a library
proteus-guy has joined #nmigen
Asu has quit [Remote host closed the connection]
Asu has joined #nmigen
Asu has quit [Remote host closed the connection]
Asu has joined #nmigen
thinknok has quit [Remote host closed the connection]
thinknok has joined #nmigen
proteus-guy has quit [Ping timeout: 246 seconds]
proteus-guy has joined #nmigen
<_whitenotifier-c> [nmigen] rroohhh opened issue #391: Weird simulation behaviour with clock's and Instances -
<whitequark> anuejn: i don't agree with that, sorry. an ILA is as much a part of the platform code as IO buffer insertion is
<whitequark> the reason for that is that it relies on invariants that are not currently a part of the nmigen public API, such as the ability to run some code that inserts more code after all of the user code was elaborated
<whitequark> the only contract i am ready to add to the public API is the probe/sink interface of the ILA; the internals absolutely have to stay within nmigen so that they can change in lockstep with it
<_whitenotifier-c> [nmigen] whitequark commented on issue #391: Weird simulation behaviour with clock's and Instances -
chipmuenk has joined #nmigen
david-sawatzke[m has left #nmigen ["User left"]
<awygle> Good morning
<_whitenotifier-c> [nmigen] rroohhh commented on issue #391: Weird simulation behaviour with clock's and Instances -
<vup> morning
<awygle> hm, the "ILAInserter" idea is somewhat interesting actually, because of one of my other concerns, which is "how do you configure the ILA for things like delta compression, queue depth, etc". by which i mean, there's more than one reasonable design for an ILA and it would be nice to not lock one in, but also the complexity of being configurable has a cost as well
<awygle> not an endorsement, just a promise to think about it a bit more, and an appreciation of expanding my field of view
<awygle> anuejn, whitequark ^
<_whitenotifier-c> [nmigen] whitequark commented on issue #391: Weird simulation behaviour with clock's and Instances -
<_whitenotifier-c> [nmigen] whitequark edited a comment on issue #391: Weird simulation behaviour with clock's and Instances -
<_whitenotifier-c> [nmigen] rroohhh commented on issue #391: Weird simulation behaviour with clock's and Instances -
<vup> whitequark: while it might not be possible to change it now, what is the reason for having ResourceManager.add_clock_constraint take a frequency and Simulator.add_clock take a period?
<whitequark> vup: argh. oversight
samlittlewood has quit [Quit: samlittlewood]
guan has quit [Read error: Connection reset by peer]
levi has quit [Ping timeout: 260 seconds]
guan has joined #nmigen
mithro has quit [Read error: Connection reset by peer]
bubble_buster has quit [Ping timeout: 240 seconds]
<_whitenotifier-c> [nmigen] rroohhh commented on issue #391: Weird simulation behaviour with clock's and Instances -
<vup> whitequark: I see, is it too late to change this now, or should I open a issue for it?
bubble_buster has joined #nmigen
mithro has joined #nmigen
bubble_buster has quit [Ping timeout: 260 seconds]
guan has quit [Ping timeout: 264 seconds]
<whitequark> I don't know how to change this in a nice way :/
mithro has quit [Ping timeout: 260 seconds]
levi has joined #nmigen
bubble_buster has joined #nmigen
guan has joined #nmigen
mithro has joined #nmigen
<whitequark> we're going to rejig the simulator interface one last time
<whitequark> maybe comment in that issue so it doesn't get lost
<whitequark> since that could be an opportunity to fix things by e.g. renaming a method and deprecating the old name
<vup> sure, sounds good
<_whitenotifier-c> [nmigen] rroohhh commented on issue #228: Reconsider simulator interface -
<_whitenotifier-c> [nmigen] whitequark commented on issue #391: Weird simulation behaviour with clock's and Instances -
levi has quit [Quit: Connection closed for inactivity]
<_whitenotifier-c> [nmigen] whitequark commented on issue #391: Weird simulation behaviour with clock's and Instances -
Guest30583 has quit [Quit: Nettalk6 -]
<whitequark> awygle: regarding ILA configurability, I think this is very much achievable
<awygle> i agree
<whitequark> so the same way that inserted probes would be stored in the platform when you do `platform.add_analyzer_probe`, I think you could provide an elaboratable to `platform.add_analyzer_sink`
<whitequark> that should conform to some existing interface we provide as a part of nmigen.lib
<whitequark> this pretty much completely decouples the platform code from the actual ILA impl
chipmuenk has quit [Quit: chipmuenk]
<whitequark> Sarayan: I discussed it and the flatten pass can very much be fixed
<whitequark> so that the hdlname attribute will contain a properly space-separated path
<whitequark> thank you for the discussion
<whitequark> *discussed it with yosys developers
<awygle> I think that's workable. We should include a default implementation in nmigen.lib tho, probably.
<whitequark> yep
<awygle> I half expected you to say it should be in stdio, glad to be on the same page
<awygle> Ugh I keep going back and forth on whether I should be working on the nmigen ILA or the Glasgow ELA applet
<whitequark> my (weak) preference is for the former
<whitequark> once I'm done with cxxrtl I might just implement the ELA
<whitequark> it'd be easy for me because I wrote all the glasgow ILA code
<Sarayan> wq: You're welcome, and sorry if you felt like I wasn't listening at the start
<Sarayan> irc is hard, let's go shopping
<whitequark> that's ok, communication is hard
<awygle> I am guessing cxxrtl isn't a "next few days" thing
chipmuenk has joined #nmigen
<whitequark> it is
<whitequark> cxxsim isn't quite, but cxxrtl parts are muche asier
<awygle> hmm
<awygle> well i'm busy as hell today anyway (hence my somewhat fractured responses), we'll see how i'm feeling this evening
chipmuenk1 has joined #nmigen
chipmuenk has quit [Ping timeout: 260 seconds]
chipmuenk1 is now known as chipmuenk
thinknok has quit [Ping timeout: 246 seconds]
chipmuenk has quit [Quit: chipmuenk]
Asu has quit [Ping timeout: 256 seconds]
<whitequark> okay i have a draft of cxxrtl debug info working
<whitequark> need to fix flatten now, i think
<awygle> Nice