pepijndevos changed the topic of #apicula to: Project Apicula: bitstream documentation and tooling for Gowin FPGAs https://github.com/YosysHQ/apicula -- logs https://freenode.irclog.whitequark.org/apicula
_whitelogger has joined #apicula
FabM has joined #apicula
<pepijndevos> I want STA to be merged...
<pepijndevos> I think we *can* model either. Yosys has a write_spice command, so we can just take analog 74HC models and simulate the thing.
<pepijndevos> I think of traces more as inductance than capacitance, but of course they have both.
<pepijndevos> you'd have to manually model trace paracitics, or do paracitic extraction after layout. A good rule of thumb is something like 2nH per mm IIRC... might be off by a few orders of magnitude.
<pepijndevos> And then just hope the 74HC models you downloaded from a random yahoo groups zip file actually model parasitic and are not just a digital model with a slope inside.
<pepijndevos> daveshah, if I want to feed nextpnr pre-routed json, do all of the routing attributes of a net have to be on a single wire, or can I just have a bazilion wires corresponding to each pip with their own routing attribute?
<daveshah> I have never tested prerouted json and I expect this to fail in all kinds of nasty ways
<pepijndevos> ouch
<daveshah> Loading and restoring stuff written by nextpnr should work
<pepijndevos> so I can feed it NEXTPNR_BEL to pre-place a bel, but I can't give it pre-routed stuff?
<daveshah> But the attributes were never really intended to be used by anything else
<daveshah> You can but if the routing path for a net isn't complete or is otherwise unpopular it will probably silently throw away the routing
<pepijndevos> Hmmm, so what I'm trying to do is the following...
<pepijndevos> I have a working vendor bitstream and a broken Apicula bitstream. I want to unpack the vendor bitstream, and then feed it all the way through the flow to see where it breaks
<pepijndevos> I wrote a "repacker" that just unpacks the bels and pips and packs them again, but that did not uncover the bug
<pepijndevos> So I need to take a larger detour
<pepijndevos> I'm looking for a way to start from a state where the vendor bitstream and apicula bitstream have exactly the same pnr, and then go from there.
<daveshah> So if all the routing is associated with a net and forms a complete path, then it should work
<daveshah> But you would have to do things like reinsert alias pips where needed
<pepijndevos> gahh, okay maybe that's not such a great idea anyway...
<daveshah> Yeah thats why I wasn't recommending it, unfortunately
<pepijndevos> Claire suggested doing a brainstorm call later today to help me get unstuck from this bug... I've been staring at it since friday
<daveshah> Happy to have a glance over the working and non-working ones if that helps?
<pepijndevos> Maybe? Just thinking what would be the best way to do that.
<daveshah> the unpacked designs for both would be useful
<daveshah> as JSON or FASM or whatever
<pepijndevos> My unpacker generates a metricton of verilog instead
<pepijndevos> Right, hold on, I'll make a gist with all the things.
<daveshah> Something I note is that your design doesn't seem to have any SPINE signals used?
<pepijndevos> wait... this one seems very wrong indeed
<pepijndevos> CLK_0 is just tied to VCC, that can't be good...
<pepijndevos> With previous ones I was able to trace the CLK signals all the way to the clock pin...
<pepijndevos> Well, at least it's *something* to fix...
<daveshah> The vendor design or the nextpnr one?
<pepijndevos> The nextpnr one
<daveshah> Ah, I was looking at the JSON rather than the unpack.v - the only obvious issue in the JSON was the lack of a SPINE
<daveshah> which CLK0 is the problem in unpack.v?
<pepijndevos> well, go to any DFF, and ctrl-f whatever it has as the clock input
<pepijndevos> R4C45_CLK0 for example
<pepijndevos> assign R4C45_CLK0 = VCC;
<pepijndevos> If you do the same on vendorunpack.v it traces all the way back to the IOB, and that also used to work on nextpnr.
<pepijndevos> So... seems like a regression but I'm honestly glad I have something tangible to track down.
<daveshah> oh, interesting
<daveshah> R4C45_CLK0 doesn't actually appear in the JSON at all
<daveshah> I think that DFF might actually be unused?
<daveshah> R4C45_SLICE1 in the JSON is $PACKER_GND; so a LUT only
<pepijndevos> oh...
<pepijndevos> that would explain something wouldn't it...
<pepijndevos> yea R3C10_CLK0 does work
<pepijndevos> assign R3C10_CLK0 = R3C10_GB60;
<pepijndevos> assign R3C10_GB60 = R3C10_GBO1;
<pepijndevos> assign R3C10_GBO1 = R3C10_GT10;
<pepijndevos> assign R3C10_GT10 = R2C10_GT10;
<pepijndevos> assign R2C10_GT10 = R2C10_UNK56;
<pepijndevos> assign R2C10_UNK56 = R10C27_UNK56;
<daveshah> The most notable divergence is that 'UNK56' in the nextpnr design is a SPINE signal in the vendor design
<pepijndevos> yea there should not be any UNK into GT as far as I know...
<pepijndevos> that could be something...
<daveshah> both the vendor and nextpnr designs use a 'UNK124' signal immediately after the clock pin
<daveshah> but the vendor design uses R10C28_UNK124 and the nextpnr design R10C27_UNK124
<pepijndevos> right, so what I expect is something like F6 (IOB) -> UNK (center) -> SPINE (center) -> SPINE (alias) -> GT (on spine) -> GT (alias) -> GBO (branch) -> GB -> CLK
<daveshah> yeah it seems like those two SPINEs are UNKs in the nextpnr design
<pepijndevos> I do not know the meaning of the R2C10_UNK56 wires, but clearly they are not spines.
<daveshah> might be worth removing UNK56s from the routing graph and seeing if nextpnr uses a proper SPINE signal?
<pepijndevos> right
<pepijndevos> thaaaaaank you
<pepijndevos> I'm pretty sure that's the problem
<pepijndevos> So... I have a wirenames table, but these clock wires have... different names. The SPINE wires would be the LUT inputs under normal wire numbering, while the UNK wires would be inter-tile wires.
<pepijndevos> Maybe this is actually how you can route generic interconnect INTO the clock routing?
<daveshah> Yes, sounds quite plausible, although I'm slightly surprised that it still doesn't go through a SPINE
<pepijndevos> I'll leave that mystery for another day...
<pepijndevos> yea, found these bunch of weird looking aliases that my code generated https://bpa.st/XOIQ
<pepijndevos> veeeery peculiar
<daveshah> where do the aliases come from?
<pepijndevos> WOOOORRKKIIIIPNGGGGG!!!!!eleven
<pepijndevos> They come from a combination of fuzzing and vendor data. In this case, I used fuzzing to figure out which center tile corresponds to which quadrant, which mux corresponds to which spine, and which mux input corresponds to which clock pin, but I used the vendor data for the mux bits themselves.
<pepijndevos> So the vendor data had a mux info for these UNK wires, and my code just mapped them along happily with the rest of the mux wires
<pepijndevos> It's a bit puzzling...
<pepijndevos> daveshah, next order of business: nextpnr target!! I saw you talked in some issue that you're working on a new target yourself that's reconsidering a few ecp5 decisions. So I'm wondering what's a good starting point for my Gowin target?
<daveshah> There isn't really much to work on at the moment, I'm afraid
<daveshah> nexus has a deduplication scheme that is quite complicated and probably overkill for gowin
<daveshah> I would probably largely ignore the ECP5 architecture and start from iCE40 as a rough base
<pepijndevos> hmmm, maybe? The highest Gowin devices go to 55k at the moment, that's kinda ECP5 territory in size, right?
<daveshah> Oh, I didn't realise they got that big
<daveshah> Then they do need deduplication probably
<pepijndevos> Yea the ones I support only do 9k at the moment, but later I definitely want to support GW2A which come in 18k and 55k iirc
<pepijndevos> And there is some vague info about a 100k device, but only short announcements on sketchy sites, no info on the Gowin website.
<daveshah> I should probably say there might be some API changes to nextpnr next year - I don't even know exactly what yet, but it might involve a standard framework for deduplication
<daveshah> I do want to deal with some of the current annoyances, so if you hit anything that you don't like, please tell me or make a note of it so I can take account of that in the design process
<pepijndevos> I briefly glanced at the nexus branch... and there isn't as much clutter as in the ice40/ecp5 folder. Is that just because all the stuff is missing?
<daveshah> Yes, partly, but it is also designed to be simpler
<daveshah> the lack of BRAM, DSPs or complex IO functionality is a big reason for being less cluttered
<daveshah> the code to generate the deduplicated database for nexus is https://github.com/daveshah1/prjoxide/tree/master/libprjoxide/prjoxide/src/bba
<pepijndevos> A loooong time ago I copied the ice40 as a base, but it had all this extra stuff that made it hard to see where to begin
<daveshah> Yeah
<daveshah> the entire nextpnr codebase is a bit of a mess, quite frankly, and mostly my own fault
<pepijndevos> Since I need deduplication and don't have BRAM and stuff, maybe starting from the nexus folder as a base would be not such a bad plan?
<daveshah> yeah, go for it
<pepijndevos> Then I can steal BRAM from ECP5 later or something haha
<pepijndevos> I just hope that if the nexus stuff is your current best idea about deduplication, the eventual API would be not too dissimilar, but we'll see...
<daveshah> It will probably be almost identical
<daveshah> just a bit more generic, because it might end up used for xilinx; too
<pepijndevos> If you think a lot of major changes are coming, maybe I could also focus on fuzzing some more things and do nextpnr later.... ah okay, seems fine then
<daveshah> no, I'd start on nextpnr
<daveshah> the major changes might be the best part of a year away, it depends on a lot of variables still
<trabucayre> daveshah: maybe currently useless but I'm working on a PLL generator for nexus
<pepijndevos> oh, the bba stuff is in rust? I thought that was all python scripts
<daveshah> there is no python in the build path for nextpnr-nexus any more
<daveshah> the only python is for the fuzzers, and the nextpnr python scripting if you enable it
<pepijndevos> for speed reasons?
<daveshah> yes, and complexity
<daveshah> making python link to stuff correctly is a bit of a dark art
<pepijndevos> oh yea
<daveshah> the database build can be done in about 10 seconds for nexus, rather than almost 5 minutes for ECP5
<pepijndevos> does the bba need to link to nextpnr?
<daveshah> that's more because of the more efficient algorithm to do the dedup though
<daveshah> no, it can be mmap'd instead if you want to
<daveshah> (not the bba text file but the resulting binary from bbasm)
<pepijndevos> no i mean does the rust bba generator link to nextpnr? It just outputs a bba text file, right?
<pepijndevos> I'm fine with Rust, but my chipdb is currently just a pickle of the internal class. So if writing it in Python is not a giant pita, that would be simpler. Else I first have to find a different database format that Rust can parse.
<pepijndevos> my internal db is already dedupped, depending what that means in nextpnr terminology. There is only a single tile object per tile type.
<pepijndevos> That's basically the reason I'm having all these aliases for clock routing, because it kinda ignores tile types and just does whatever on top.
<daveshah> Yeah it just creates a text file, totally separate
<daveshah> Sounds like your database isn't far off what the dedup code needs anyway
<pepijndevos> So if I'm using Python and an already deduped db, what I need to understand is more what the final bba textfile looks like than how your rust program works I guess.
<omnitechnomancer> Does the ice40 target still do a custom packing step? or has it switched to just placing BELs directly with vailidity checks?
<omnitechnomancer> I think you need to deal with things like the edge reflection for routing connections
<daveshah> It does packing
<daveshah> There's no point separating LUT and FF bels because they can't be used independently
<daveshah> The FF input can only be the LUT output
<omnitechnomancer> Ah, I see
<omnitechnomancer> But most other FPGAs do not share this property so this is not the case?
<omnitechnomancer> Ah but gowin shares this property doesnt it
<pepijndevos> yea, DFF input is LUT output. But LUT output can of course bypass DFF.
<omnitechnomancer> Does it have any ripple carry logic?
<Lofty> Yes, Gowin has ripple carry adders
<omnitechnomancer> and distributed RAM modes as well?
<omnitechnomancer> Oh I realise why I missed it was you who did 74-series logic synth, I was watching the video in the background at work and didnt look at the slide with the tweet on it...
<Lofty> Welp
<Lofty> omnitechnomancer: also, yes, distributed RAM
<Lofty> Which I tend to refer to as LUT RAM
<omnitechnomancer> So sorry for missing your involvement :(
<omnitechnomancer> An apt name
<Lofty> You can make it up to me by following me on Twitter :P
<omnitechnomancer> Perhaps in time I may atone
<pepijndevos> lol
<pepijndevos> Gowin adders are a bit weird though. mwk said usually they are kinda seperate from LUTs, but in Gowin it appears they are just a LUT with an "express" connection to the next lut.
<pepijndevos> Would actually be funny to see if clever techmapping can use them as... other things.
<Claude> Is this the carry chain ? Asking stupid :)
<pepijndevos> Yea, it's the carry chain
<pepijndevos> But it seems the full adder logic is just encoded in the LUT, where the normal LUT output is the sum and the carry is hardwired to the next LUT.
<Claude> Time digital converter comes to mind
<pepijndevos> So you could say... Gowing luts are frangible
<pepijndevos> ^Lofty
<pepijndevos> a what?
<pepijndevos> https://en.wikipedia.org/wiki/Time-to-digital_converter I assume, but how does that relate to a carry chain?
<Claude> As (predictable)delay element
<pepijndevos> ah I see
<pepijndevos> hmmm yea could work maybe
<pepijndevos> Well, if anyone is looking for a fun documentation project... I kind of know how ALU works, but have not written any code for it or figured out the details. So it's something that's not too complicated and that I can kind of give guidance about. Effort so far: https://www.youtube.com/watch?v=Vt7FyOXfkZA
<Lofty> pepijndevos: that didn't ping me; try a space next time :P
<pepijndevos> Would be fun if it turns out you can use a LUT4 as two LUT3 with one going to the carry out.
<Lofty> In other news I've been working on a LUT mapper.
<Lofty> It's...pretty neat, honestly.
<pepijndevos> ^ another reason for omni to follow Lofty haha
<pepijndevos> pfff need to read the nextpnr documentation again with a non-fried brain. Now just browsing the code and getting distracted hehe
<Lofty> The problem with writing a LUT mapper is that I don't have many people to talk to about it :(
FabM has quit [Quit: Leaving]
<pepijndevos> Yea once you get into all these really obscure things you're very quickly in a very small group of people.
<pepijndevos> Like... how many people in the world document FPGAs?
<Lofty> More people than write LUT mappers, pepijndevos
<pepijndevos> haha maybe
<pepijndevos> Maybe there are a few in academia
<pepijndevos> todo: write lut mapper to talk about it with lofty. (actually want to do this but for ASIC)
<omnitechnomancer> pepijndevos: did you ever get around to building your bit serial CPU in 74-series, did the boards come in?