<hackerfoo>
ZirconiumX: Currently non-existent source code is also undocumented and without examples :)
<hackerfoo>
It might be easier to add documentation and examples to LiteDRAM.
<ZirconiumX>
First I need to understand LiteDRAM. Writing the controller gives me an excellent understanding of it that I wouldn't have as an outsider.
<hackerfoo>
We (Symbiflow) are using it to build an SoC.
lutsabound has quit [Quit: Connection closed for inactivity]
lopsided98 has quit [Quit: Disconnected]
lopsided98 has joined ##openfpga
dh73 has quit [Quit: Leaving.]
dh73 has joined ##openfpga
dh73 has quit [Client Quit]
OmniMancer has joined ##openfpga
nrossi has joined ##openfpga
Jybz has joined ##openfpga
Jybz has quit [Quit: Konversation terminated!]
rohitksingh has joined ##openfpga
<_florent_>
ZirconiumX: that's true that LiteDRAM is not heavily documented and probably not that easy to start with, that's something i'm aware of and that i want to improve
<_florent_>
Most of the designs are using it directly integrated with LiteX (so using the Migen code, as the example provided by hackerfoo)
<_florent_>
but that's also possible to use the generator to configure the core and create a standalone verilog core that you can just reuse as any other verilog core in your design
<_florent_>
i'm happy to help you getting started if you want to use it and explain how it works.
<_florent_>
But i can also understand you want to try to write your own :) (but that's not something that can be done in a few days: just getting the 7-Series PHY working, doing the DDR3 initialization, read/write leveling was already a few weeks of efforts and that's not even the "controller" itself)
<hackerfoo>
It's also easier to write your own when you have a working example to compare against.
<kc8apf>
_florent_: I'm trying to comprehend LitePCIe and the lack of docs and comments is painful
<_florent_>
kc8apf: yes i know, sorry for that, that's really something i want to improve on the different cores (any help is also welcome :) or even just feedback on the difficulties understanding/using them)
<kc8apf>
I _think_ I understand LitePCIe's DMA but there is a lot going on
<kc8apf>
any notes on that would be very helpful when I get back to this project next week
<_florent_>
kc8apf: ok, i'll add documentation to it this week
<kc8apf>
thank you
<kc8apf>
I'll try to write things up as I figure them out and send PRs as well
<kc8apf>
i'm off for the night
<_florent_>
ok thanks, good night
<ZirconiumX>
_florent_: my goal here is to use nMigen and pysim, so serialising a LiteDRAM core to Verilog is unhelpful
<_florent_>
ZirconiumX: so you want to be able to simulate your DDR3 controller in nMigen?
<ZirconiumX>
Or alternatively to use yours
<ZirconiumX>
But yeah
<_florent_>
the difficulties with simulations are that: 1) you need a model of your DDR3 (i generally use Micron's ones) 2) you need model or simulation libraries of the primitives you are using in the FPGA
<keesj>
kc8apf my i suggest you start documenting the thing yourself and ask questions when things are not clear? I am also not a pro but in some cases I might be able to help
<_florent_>
so pure nMigen simulation is difficult for that (you could create models in nMigen, but then you are checking things against your own understanding of how it should work)
<ZirconiumX>
Perhaps, but my target here is the ECP5, which as I understand it already has a good simulation library
<mwk>
ZirconiumX: no, you need a model of the *RAM*
<ZirconiumX>
I can read, mek
<ZirconiumX>
*mwk
<mwk>
ugh
<mwk>
it seems I can't :p
<mwk>
sorry
<ZirconiumX>
Go get some coffee :P
<mwk>
that's a good idea
<_florent_>
ZirconiumX: you could probably use Verilator for your simulation, but i don't know if nMigen already supports it natively
<ZirconiumX>
_florent_: it does not.
<mwk>
on the bright side, I finally managed to get up at a reasonable-ish combination of (time, time spent sleeping) today, for the first time in several weeks
<ZirconiumX>
I could use Verilator, but then that involves dumping the nMigen/oMigen code to Verilog and then turning it to C++, and it gets increasingly indirect.
<mwk>
let's hope this lasts
<ZirconiumX>
Congrats, mwk
<_florent_>
ZirconiumX: in your case, if you want to use LiteDRAM, i would just do the simulation of your system with a nMigen memory model that has an Wishbone/AXI interface, and only use LiteDRAM core when you are targeting hardware, but the behavior will not be exactly what you'll have on hardware (bandwidth, latency,...)
<ZirconiumX>
That's probably going to be the mother of all simulation/synthesis mismatches.
<mwk>
*thinking* speaking of simulation models, I want a verilog sim model of human sleep, so that I can plug in the sleep deprivation numbers for the last week and yesterday's sleep binge and see if I'm actually close to getting it right
<mwk>
ah well, probably going to screw up soon anyway
<mwk>
[end of mwk's pre-coffee morning thoughts]
<_florent_>
ZirconiumX: it seems you are on your own then :) (since i don't have anything better to suggest)
<daveshah>
FYI, the FOSS simulation library for ECP5 is not very good
<daveshah>
I need to work on it, right now none of the DDR3 primitives have models
<daveshah>
The vendor models aren't really Verilator compatible afaik, but this might be unavoidable for some of the DDR3 stuff due to delays etc
<OmniMancer>
Are the yosys equivalence checking commands for showing that one design is equivalent to another?
juri__ has quit [Ping timeout: 240 seconds]
<daveshah>
OmniMancer: yes, although don't expect them to work miracles
<daveshah>
You can do it either with miter and sat or the equiv_ passes
juri_ has joined ##openfpga
<daveshah>
If you want to test a Yosys pass then there is equiv_induct
<daveshah>
*equiv_opt
<OmniMancer>
In that context what does structural equivalence mena?
<OmniMancer>
mean*
scream has quit [Write error: Broken pipe]
jfng has quit [Read error: Connection reset by peer]
swedishhat[m] has quit [Remote host closed the connection]
henriknj has quit [Remote host closed the connection]
xobs has quit [Read error: Connection reset by peer]
<OmniMancer>
daveshah: how does nextpnr deal with global clock meshes?
pepijndevos[m] has quit [Write error: Connection reset by peer]
synaption[m] has quit [Remote host closed the connection]
<daveshah>
OmniMancer: just as wires and pips like anything else
<daveshah>
You might need a custom pass to promote them, and possibly route them correctly in some cases
<daveshah>
But it's usually not much code
<OmniMancer>
so does custom code try to put clock like nets on them?
<daveshah>
That was nommu. Not sure how using the mmu affects memory footprint
<OmniMancer>
Oh interesting
<OmniMancer>
do the FPGA SoCs that run linux typically have an MMU?
<daveshah>
VexRiscv does, yes
<daveshah>
In fact you may have to use the mmu for VexRiscv because I don't know if 32 bit nommu is working yet
<daveshah>
I have always used the mmu
<OmniMancer>
I think the only reason the k210 doesn't use an mmu is that the core is using an earlier version of the RISC-V spec or something so it doesn't match current requirements
<pepijndevos>
And then for small packages you're not going to use all the IOB, fine.
<pepijndevos>
But wait... what if you want to cram a really tiny FPGA in a package with waaay to many pins?
<pepijndevos>
Oh, I know, I'll make an IOB that has TEN pins....
<daveshah>
lol
<whitequark>
um, what
<mwk>
whitequark: there's a special kind of IO tile in small gowin devices that has 10 pins crammed in it
<daveshah>
I guess without any IOLOGIC?
<mwk>
it has no usual serdes/flops
<mwk>
only bare input/output/oe
<whitequark>
gowin has serdes in io tiles?
<mwk>
yes
<mwk>
well, it has the thing that xilinx calls serdes
<mwk>
ie. parellel/serial converters
<pepijndevos>
If you look at the picture I posted, the red IOB are not used at al. So like... we put all these IOB with sweet differential pairs and what not, but we'll disable half of them and use MEGA IOB
<mwk>
not the gigabit stuff
<mwk>
*sigh* can't we all agree at least on terminology
<pepijndevos>
hah no
<whitequark>
right, i figured
<whitequark>
i call those blocks "XDR"
<whitequark>
and actual high speed stuff with CDR "SERDES"
<whitequark>
but that's just me
<mwk>
XDR?
<whitequark>
you know, like DDR, but generalized for gearbox ratio more than 2
<mwk>
huh
<mwk>
and CDR?
<pepijndevos>
aka eXtra data rate
<whitequark>
mwk: clock and data recovery
<mwk>
oh, that's a nice term
<whitequark>
the part of a SERDES block that extracts the clock embedded in data
<OmniMancer>
so things that do stuff like 8b10b?
<mwk>
pepijndevos: any clue on how many of these red io tiles are actually attachment points for one-off blocks?
<whitequark>
yes
<OmniMancer>
or tmds
<daveshah>
Just to confuse things, I think the (non-existent) ECP4 was going to have hard CDR blocks to combine with regular IO pins for SGMII
<pepijndevos>
whoa, there are FPGAs with embedded hardware clock recovery?
<mwk>
I don't think I've seen any FPGA with tmds support
<whitequark>
pepijndevos: sure, ECP5
<OmniMancer>
pepijndevos: for serial channels yes
<mwk>
pepijndevos: uhhh, like all non-low-end ones
<whitequark>
yeah
* pepijndevos
goes off building radio receivers
<mwk>
xilinx calls these "gigabit transceivers"
<mwk>
they're meant for serial protocols like PCIe and SATA, and you cannot really use them for anything else
<daveshah>
Usually with a significant minimum frequency, btw
<daveshah>
ECP5 is something like 200Mbit/s min
<mwk>
notably, TMDS is a shitshow on FPGAs
<OmniMancer>
alas
<pepijndevos>
mwk, I think on the bigger Gowins, the special attachement points are colored blue, but I might be wrong.
<mwk>
pepijndevos: but the -1 part has some singleton hw as well, doesn't it?
<pepijndevos>
eh, does it? Maybe the S or Z variants have some funky stuff onboard.
* mwk
consults notes
<mwk>
pepijndevos: well there's this Ufb thing, whatever it is
<mwk>
it's stuffed in the corners (upper left and upper right) and in the PLL tile though
<mwk>
so doesn't take up any IO tiles
<pepijndevos>
... user feedback?
<mwk>
so uh... I got nothing
<mwk>
NFI
<OmniMancer>
daveshah: how do I work out why nextpnr-generic doesn't like my pip?
<daveshah>
What do you mean, doesn't like?
ZipCPU|Laptop has quit [Ping timeout: 276 seconds]
<pepijndevos>
OmniMancer, what I did is add code.interact(local=locals()) and poke at the ctx
<OmniMancer>
daveshah: this kind of doesn't like: "IndexError: _Map_base::at"
<daveshah>
Ah, you need to create the wires first
<daveshah>
This could probably be a better error
<OmniMancer>
AFAIK the wires exist
<pepijndevos>
From my experience: no, they don't, quite, really, exist in the way I thought haha
<OmniMancer>
Oh no I know why
<OmniMancer>
I haven't prefixed the names with the tile location
<pepijndevos>
hehehe oops
pie_ has quit [Ping timeout: 276 seconds]
<OmniMancer>
so its trying to connect ce0 to e1beg2 but that isn't what the wires are named
<daveshah>
Aha
<OmniMancer>
Is it going to yell at me if it cannot place any IOBs btw?
<OmniMancer>
So far it seems to have mostly just consumed all CPU time
rohitksingh has quit [Ping timeout: 246 seconds]
<OmniMancer>
Ah I see it does
<OmniMancer>
daveshah: are the IOBs inserted by nextpnr?
<daveshah>
Well, the rapidwright flow would support it with minimal changes
<daveshah>
It's just I have no hardware and no interest in making those changes
<daveshah>
The whole flow is a proof of concept anyway
<sorear>
oh, there’s no self-contained us+ or 7 yet?
<daveshah>
There is self contained for xc7 using xray
<daveshah>
But there is no public bitstream documentation for xcup yet
<daveshah>
Some people at Manchester also did something with nextpnr and Ultrascale
<daveshah>
+
<daveshah>
But they didn't publish the bitstream docs in a useful format
<OmniMancer>
daveshah: hmmm I am now getting another pip addition failure but I need to sleep
<pepijndevos>
Oh hey, imagine you're a EDA software dev that needs to design a file format for FPGA bitgen. Cool, so you need a way to map some abstract features to bit locations, right?
<pepijndevos>
Hmmm, I know, so for every tile I'll make tables that map from features to bits :))
<OmniMancer>
daveshah: is there any reason for the map error if both wires do exist?
<OmniMancer>
oh no I see it
<pepijndevos>
But instead of storing the bit location in the table, I'll make another table, and pack the XY position in a single integer in decimal notation.
<OmniMancer>
there is a ,
<pepijndevos>
And haha, you thought those features would be consistent between FPGAs? Nope, I'll just make them different just because so there are 3 layers of meanlingless mapping before reaching a bit location https://bpaste.net/show/E2YPA
<OmniMancer>
pepijndevos: how wonderful :/
<pepijndevos>
Note how the tile locations are exactly the same in GW1N-1 and GW1NR-9, but for some obscure reason most of them are off by one between them.
genii has joined ##openfpga
Asu has joined ##openfpga
<OmniMancer>
nextpnr sure likes eating memory
<daveshah>
That is unfortunately due to how the generic arch works atm
<daveshah>
Each pip needing a name doesn't scale very well
<OmniMancer>
nearly 2 GB
<OmniMancer>
and it isn't done with the interconnect
<daveshah>
Yeah, all I can suggest is shorter pip names
<daveshah>
(for comparison, a proper arch like iCE40 or ECP5 needs about 128MB minimum RAM)
<daveshah>
Probably the long term solution is to make generic pip/wire identifiers a (location, name idstring) tuple
<daveshah>
this way where a name is used in multiple tiles the name string is only stored once; then the identifier is a total of 12 bytes
<OmniMancer>
yes having the location be part of the id would help with that immensely
<OmniMancer>
I got a place and route successful, but hadn't changed one of the later scripts so it errored then
m4ssi has quit [Remote host closed the connection]
emeb has joined ##openfpga
pie_ has joined ##openfpga
pie_ has quit [Remote host closed the connection]
pie_ has joined ##openfpga
kernlbob has quit [Ping timeout: 250 seconds]
craigjb has quit [Ping timeout: 245 seconds]
craigjb has joined ##openfpga
brizzo has quit [Ping timeout: 250 seconds]
grantsmith has quit [Ping timeout: 250 seconds]
brizzo has joined ##openfpga
grantsmith has joined ##openfpga
pie_ has quit [Ping timeout: 246 seconds]
pie_ has joined ##openfpga
<azonenberg>
daveshah: that isnt the case already?
<azonenberg>
it would not surprise me to know that one or more of the reasons the xilinx chip databases are so massive is that they have some equally inefficient duplication of resources :p
<mwk>
you'd be correct
<mwk>
I don't know how Vivado stores its database, but ISE is... not great at deduplicating
<daveshah>
It's a shame given how repetitive Xilinx arches are really
pie_ has quit [Ping timeout: 250 seconds]
Dolu has quit [Ping timeout: 240 seconds]
<azonenberg>
daveshah: i know
<azonenberg>
like, thinking about how i'd build the architecture files for a typical part
<azonenberg>
i honestly think i could get the arch spec for at least the core fabric of a smaller xilinx part to fit 100% in L3 cache
<daveshah>
Yeah, it's definitely possible
<azonenberg>
And adding new devices would be a few kB each
<azonenberg>
bassically
<azonenberg>
basically just describing the size of a clock region, number of clock regions and their arrangement
<daveshah>
I haven't personally tried something like this yet
<azonenberg>
and the set of columns
<daveshah>
What makes it slightly tricky is the connections between different kinds of tiles
<azonenberg>
still, couldnt you just create one copy of the config for "clb adjacent to dsp" or similar
<azonenberg>
and just have pointers to that?
<azonenberg>
i'd assume based on my experience with coolrunner that you have a few basic tile types and then just flip/mirror variants of them
<daveshah>
Oh yeah, definitely
<daveshah>
It's mostly a case of making sure that a doubly linked routing graph isn't too slow to traverse
<azonenberg>
if it fits in cache, i'd suspect that the benefits from lower latency more than outweigh the extra fetches
<daveshah>
Yes, almost certainly
<daveshah>
It's mostly avoiding any actual n^2 type situations where data isn't available without a full search
pie_ has joined ##openfpga
rohitksingh has joined ##openfpga
Bob_Dole has quit [Ping timeout: 250 seconds]
Dolu has joined ##openfpga
<tpw_rules>
so i've been thinking about a boneless register allocator too but i don't really understand graph coloring allocation
<tpw_rules>
like assigning registers without control flow is trivial. but idk how to really do it with it. i was reading briggs' thesis and it dismisses some ideas which seem to work, maybe for performance reasons?
nrossi has quit [Quit: Connection closed for inactivity]
AndrevS has quit [Quit: umount /dev/irc]
Bob_Dole has joined ##openfpga
mumptai has joined ##openfpga
ZipCPU|Laptop has joined ##openfpga
ZipCPU|Laptop has quit [Remote host closed the connection]