<zignig>
tpw_rules: also if you use most of the spram you could have a banked frame buffer and draw the led matrix in gateware.
<tpw_rules>
what do you mean?
<zignig>
that way the cpu would be free to do fancy stuff ;)
<tpw_rules>
it already is
<tpw_rules>
those LED panels require very fast drivers on account of they don't have PWM built in
<zignig>
have a two blocks of spram that are banked , write the image into it and the flip to the other.
<tpw_rules>
i need to be able to read 6 words at once
<zignig>
one can be written by the cpu and the other writes it to the screen , write flip repeat.
<tpw_rules>
i wish i could do that but i need six read ports at 40MHz
<tpw_rules>
or a ghastly tangle of logic to interleave everything correctly
<zignig>
ah , still cool , just floating ideas ....
<tpw_rules>
yeah i wish i could have done that
<tpw_rules>
fortunately it's not too bad. i have six brams per 32x16 panel so i could add one more without a problem
<tpw_rules>
i would just have to sacrifice one bit of color resolution
Asu` has quit [Remote host closed the connection]
<zignig>
only 10 bit color, oh noes !
<tpw_rules>
it would be 9 at that point
<tpw_rules>
and the problem is that it needs serious gamma correction
<tpw_rules>
so if you don't have lots of bits of color it gets real splotchy at the low end
<zignig>
ah, I like that fact that you can create the gamma curve when you create the instance rather than an external load with verilog.
* zignig
has to deliver the progeny to school
<tpw_rules>
yeah i'm very happy memory init works
<tpw_rules>
also i just want you to know i thought i was extremely clever for putting the gamma correction on the write side instead of the read side
<zignig>
having the two phase delay that does gamma just before you write it. yes ... very clever :)
<tpw_rules>
wq: just tried synthesizing boneless with 14 bit W and it only lost me 600kHz of fmax. it must not be the critical path
<tpw_rules>
or maybe i have very fast ram attched
pointfree has quit [*.net *.split]
kmehall has quit [*.net *.split]
_whitenotifier has quit [*.net *.split]
pinoaffe has quit [*.net *.split]
jhol has quit [*.net *.split]
sensille has quit [*.net *.split]
awygle has quit [*.net *.split]
<whitequark>
tpw_rules: uh
<whitequark>
are you sure that actually works
<whitequark>
because it's implemented as W|Rx
<whitequark>
or rather W||Rx
<tpw_rules>
yes i replaced it with w<<2 + Rx
<whitequark>
alright
<whitequark>
so it's great we kept our options open, right
<tpw_rules>
i thought the critical path was in the ALU anyway
<tpw_rules>
that's at least what i can glean from the timing report
<whitequark>
hm. that's true
<whitequark>
tbh i find the boneless fmax very disappointing
<whitequark>
it might be worth adding a decode stage
<whitequark>
hm
<tpw_rules>
anyway should i submit all my things as different issues? off the top of my head i can think of increasing W, renaming all the jumps, assembler R0 == 0, assembler label stuff
kmehall has joined ##openfpga
pinoaffe has joined ##openfpga
_whitenotifier has joined ##openfpga
sensille has joined ##openfpga
jhol has joined ##openfpga
awygle has joined ##openfpga
pointfree has joined ##openfpga
<whitequark>
feel free to rename the jumps in the manual PR
<whitequark>
adding an adder to W isn't something i'm comfortable doing
<tpw_rules>
ok. i will make a table as a comment so we can decide
<whitequark>
remember how the goal of boneless was to be a 300 LUT CPU?
<whitequark>
an adder there is 14 more LUTs
<tpw_rules>
no i never heard that
<whitequark>
oh ok
<whitequark>
well yeah it kinda was
<tpw_rules>
also it's still uh got a long ways
<whitequark>
well sure
<whitequark>
but we can avoid making it worse
<tpw_rules>
fair. how good is the toolchain at optimizing e.g. common subexpression elimination? i did it the naïve way with four more adders and got 58 more LUTs
<whitequark>
it sucks
<whitequark>
abc9 sucks somewhat less
<tpw_rules>
good to know
<whitequark>
you're using abc9 right
<tpw_rules>
i don't know? i'm using master yosys and nextpnr and nmigen and boneless, as of like five days ago
<whitequark>
so no
<whitequark>
use synth_ice40 -abc9
<whitequark>
uh, sec
<whitequark>
NMIGEN_synth_opts=-abc9
<tpw_rules>
as an environment variable?
<tpw_rules>
fpga toolchains will always be the same :P
<whitequark>
hm?
<whitequark>
you can specify it in your code as well, .build(synth_opts="-abc9")
<tpw_rules>
random environment variables controlling essential settings
<tpw_rules>
oh
<whitequark>
the environment var is just for easy testing
<whitequark>
esp someone else's designs
<cr1901_modern>
And users with wacky environments can set them to suit their wacky environments.
<tpw_rules>
well it added 5 LUTs but got me 3MHz more fmax in the LED driver
<whitequark>
hm, interesting
<whitequark>
that seems about right, but usually the results are more impressive
<whitequark>
and yes, abc9 optimizes for delay, not LUT count at all
<tpw_rules>
and 1 in boneless
<tpw_rules>
yeah it went from 34 to 37. technically it needs to be at 40 but it works anyway. i hear fmax is pessimistic
<whitequark>
yes, it's worst case over all PVT
<whitequark>
*PVT variation
* tpw_rules
misread V as volume
<tpw_rules>
wait what's P then
<whitequark>
process voltage temperature
<tpw_rules>
yeah i knew what you meant but i kept reading it as pressure volume temperature
<tpw_rules>
constraining the clock still makes fmax worse though
<sorear>
I’m sure delay parameters will change at a few 100 MPa
<tpw_rules>
anyway i need to stop thinking and have some dinner...
<tpw_rules>
and they get better as the process shrinks and die volume lowers
<sorear>
I think generally bandgaps shrink at high
<sorear>
pressure, and eventually Si becomes a metal and nothing works at all, bu this is a mofh question
<tpw_rules>
you can un-dope silicon at high pressure?
kmehall has quit [*.net *.split]
_whitenotifier has quit [*.net *.split]
pinoaffe has quit [*.net *.split]
jhol has quit [*.net *.split]
pointfree has quit [*.net *.split]
awygle has quit [*.net *.split]
sensille has quit [*.net *.split]
<whitequark>
ZirconiumX: thanks for your work on improving flowmap btw
<whitequark>
even if I haven't had time to fix those issues yet, I'm definitely going to do it sometime later
<ZirconiumX>
whitequark: Honestly I find it interesting to have multiple approaches to something (in this case LUT mapping), each with advantages/disadvantages
<ZirconiumX>
On some of the benchmarks I have it seemed like `flowmap -relax` tended to pessimise area over even pure `flowmap`, but I would have to build some hard stats first
azonenberg_work has quit [Ping timeout: 268 seconds]
sensille has joined ##openfpga
kmehall has joined ##openfpga
pinoaffe has joined ##openfpga
_whitenotifier has joined ##openfpga
awygle has joined ##openfpga
pointfree has joined ##openfpga
jhol has joined ##openfpga
<whitequark>
ZirconiumX: relaxation is fundamentally heuristic in flowmap
<whitequark>
and those heuristics haven't really been tuned
<ZirconiumX>
So, given abc9 uses delay statistics for optimisation, are there plans for anything similar for flowmap?
<whitequark>
yes, and in fact flowmap is much less limited than abc, in theory
<whitequark>
currently flowmap uses unit delay model
<ZirconiumX>
That's useful to note
<whitequark>
since it predates the work in yosys that exposes delay to abc
<whitequark>
daveshah: wild idea: we have stochastic and analytical PAR, but only analytical synthesis. would there be any benefit from stochastic synthesis?
<ZirconiumX>
Somebody's probably already written a paper about it
<ZirconiumX>
I have a delay model for the Cyclone V, but it's pretty difficult to tell if the numbers I have are sane, especially given that Quartus seems to give different delay statistics depending on the phase of the moon.
<ZirconiumX>
I wrote a script to take the top thousand critical paths, break them down by cells, and then iterate through the different pins of the LUTs. I get two or three different values for even the same exact LUT.
<pie_>
my guesses are heating but im not equipped to do any back of the envelope calculations
<pie_>
cant be the photon pressure right? :P
<pie_>
i guess since its mems, photoelectric effect also comes into question? should be pretty easy to test that one
<pie_>
or whatever it is that makes semiconductors exposed to light gooo
<whitequark>
tpw_rules: oh btw do you actually want the code that populates SPRAM? or did you arrive at some other solution?
<tpw_rules>
whitequark: i arrived in the sense that i was scheming. i didn't start implementing anything yet. so yes, please
<tpw_rules>
where does it load it from? the jtag chain?
kmehall has quit [*.net *.split]
_whitenotifier has quit [*.net *.split]
pinoaffe has quit [*.net *.split]
jhol has quit [*.net *.split]
pointfree has quit [*.net *.split]
awygle has quit [*.net *.split]
sensille has quit [*.net *.split]
<whitequark>
tpw_rules: anywhere you want. there would be a signal that you assert, which causes the CPU to read a word from external bus and write it to PC, then increment PC
<whitequark>
I think we could even go one step further and make it read from a fixed address on the external bus, presuming it is an MMIO register
<whitequark>
or maybe it's easier to just ignore addresses at that step, not sure
<tpw_rules>
oh
<whitequark>
but something along those lines
<tpw_rules>
ok so this is just an enhancement such that the cpu wouldn't need a boot rom
<whitequark>
yeah
pinoaffe has joined ##openfpga
kmehall has joined ##openfpga
jhol has joined ##openfpga
_whitenotifier has joined ##openfpga
awygle has joined ##openfpga
pointfree has joined ##openfpga
sensille has joined ##openfpga
<tpw_rules>
it's still my job to get code words to the external bus
<whitequark>
yep
<tpw_rules>
well that bit there is the hard part :P
<whitequark>
ah.
<tpw_rules>
like it's either do an spi or uart loader fully in gateware, or just make a really dumb peripheral and burn a bram on a bootloader
<whitequark>
well um, you want an UART already, right?
<whitequark>
so it'd be on the external bus somewhere
<tpw_rules>
but to use your interface i would have to make gateware that implements a uart receiver
<tpw_rules>
and checksum and etc etc. but that's alright, i have a plan. i don't think that interface would be immediately useful for it
<tpw_rules>
what i would want is someway to stall at least the external bs
<tpw_rules>
bus
<whitequark>
checksum?
<whitequark>
ok right, you'd have to add gateware that packs bytes from UART into words, that's tru
<whitequark>
it would be much easier to load from SPI flash
<whitequark>
as for stalling, yeah
<tpw_rules>
by the way i haven't investigated or reproduced yet but yesterday during one of my debugging sessions i think i caught the assembler putting in extraneous EXTIs. have you seen that?
<tpw_rules>
like EXTI 0
<_whitenotifier>
[Boneless-CPU] tpwrules opened issue #6: Assembler improvement: Distinguish between Rn and n. - https://git.io/Je2Mr
egg has quit [Read error: Connection reset by peer]
<tpw_rules>
a) the pull request and 2 issues are all that i planned to say, so please review them at your convenience
<tpw_rules>
b) the rest of that, idk either
<whitequark>
oh I meant the adding logic with no output now
<tpw_rules>
ok i have no idea. that link should be all you need to reproduce it. it's my bedtime now but i can explain more tomorrow or something. when the conditions are met, 716 LUTs and 49MHz fmax. when they're not, 760 LUTs and ~41MHz fmax
<tpw_rules>
(by the way the appropriate main file is boneless_led.py if you are going to poke at it)
<whitequark>
tpw_rules: could be abc weirdness
<whitequark>
tpw_rules: i really don't like privileging signed comparison over unsigned or vice versa
<whitequark>
those should be symmetric
<tpw_rules>
i wonder if it's a bug? two things having influence like that shouldn't really be possible. or there's a hashmap somewhere
<tpw_rules>
noted re comparisons. but i really gotta go to bed. thanks for all your help today
<daveshah>
Stochastic behaviour in Yosys often comes from things like IdStrings
<daveshah>
Just using a string somewhere can change the ordering of that string if it is used later
<daveshah>
Several Yosys passes, as well as abc, can have significantly varying results depending on the order in which they see cells
Jybz has joined ##openfpga
X-Scale` has joined ##openfpga
X-Scale has quit [Ping timeout: 252 seconds]
X-Scale` is now known as X-Scale
freeemint has joined ##openfpga
freeemint has quit [Ping timeout: 245 seconds]
OmniMancer has joined ##openfpga
freeemint has joined ##openfpga
freeemint has quit [Remote host closed the connection]
freeemint has joined ##openfpga
freeemint has quit [Ping timeout: 246 seconds]
sgstair has quit [Ping timeout: 268 seconds]
sgstair has joined ##openfpga
Jybz has quit [Quit: Konversation terminated!]
Jybz has joined ##openfpga
Jybz has quit [Client Quit]
Jybz has joined ##openfpga
freeemint has joined ##openfpga
<pepijndevos>
daveshah, "Centre muxes also contain CLKTEST, a component with unknown function used in Lattice’s testing." How is the existence of these things known? If they are not used in user designs, they won't show up in fuzzing results, right?
<daveshah>
They show up in the Diamond physical view and tcl api
egg has joined ##openfpga
oeuf has joined ##openfpga
egg has quit [Ping timeout: 240 seconds]
egg has joined ##openfpga
oeuf has quit [Ping timeout: 268 seconds]
<pepijndevos>
ah ok
<pepijndevos>
What is a "group" in Nextpnr?
<daveshah>
It's a collection of objects for UI purposes
<daveshah>
normally used for tiles, switchboxes, etc that have some graphics not associated with a smaller object (i.e. tile/switchbox boundaries)
soylentyellow has joined ##openfpga
<pepijndevos>
hmmmm
<pepijndevos>
Do I need to implement all that stuff to make a bare minimun working flow?
<pepijndevos>
m
<pepijndevos>
IIRC this presentation recommended just copying an existing architecture, so then the question becomes which parts to copy and which to change hmmmmm
<pepijndevos>
and also which architecture
<daveshah>
No, you don't need to implement any of that if you don't care about the GUI
<daveshah>
I'd start with iCE40 as a base
<daveshah>
The deduplication that ECP5 does adds a lot of unnecessary complexity for smaller FPGAs and could always be added later once everything is working
<pepijndevos>
Okay, that's useful hehe
<pepijndevos>
So if I wanted to make the most basic of basic flows, do I need anything else besides a copy of archdefts.h, arch.h and a stripped down arch.cc?
<daveshah>
Basically
<daveshah>
I think some of the placement validity functions in arch_place will also be needed
<daveshah>
although these could probably just be "return true" if you ignore clock enables, set/resets and multiple clocks for now...
<pepijndevos>
Oh, what does arch_place do? Did not see it in the archapi.md, which is the full extend of my Nextpnr experience so far.
<daveshah>
It's not in Arch API because the Arch API only defines functions and name of header files, the name of the implementation source files is up to arches so they don't have to have multi-thousand-line arch.cc
<daveshah>
it implements isValidBelForCell and isBelLocationValid
<daveshah>
but these could go in arch.cc or any other file equall
<daveshah>
*equally
<pepijndevos>
ahhh, I see. Put differently: I need to define implementations for all of arch.h, for which I can copy arch*.cc files.
<daveshah>
Yes
<daveshah>
Some small implementations are idiomatically kept in arch.h to allow inlining
<pepijndevos>
Pfew... it's a bit... overwhelming to get started by copying thousands of lines of code
unkraut has quit [Remote host closed the connection]
mearon has left ##openfpga ["WeeChat 1.7.1"]
unkraut has joined ##openfpga
<ZirconiumX>
Looking at the IWLS2005 synthesis benchmark suite. One of them has a logic loop. I should probably discard that, shouldn't I?
<tnt>
Huh ... I have "Info: [ -6527, -4257) |+" in the histogram.
<daveshah>
pepijndevos: no, they can also just have sram programmed like an ice40
<daveshah>
yeah, they are roughly GAL type devices with some mixed-signal bits
<pepijndevos>
phew
emeb has joined ##openfpga
keesj has quit [Ping timeout: 276 seconds]
keesj has joined ##openfpga
pie_ has joined ##openfpga
freeemint has quit [Remote host closed the connection]
freeemint has joined ##openfpga
mkru has joined ##openfpga
zng_ has joined ##openfpga
bsilvereagle has quit [Ping timeout: 264 seconds]
bsilvereagle has joined ##openfpga
zng has quit [Ping timeout: 264 seconds]
<OmniMancer>
I have a pnl to json converter now \o/
<daveshah>
Very nice!
<OmniMancer>
scannerless parsers are nice
<OmniMancer>
it is somewhat not fast though
<OmniMancer>
I think from this I can form a reasonable idea of what the routing pips look like
davidthings has joined ##openfpga
Hamilton has quit [Quit: Leaving]
<OmniMancer>
Is it common for the routing fabric to contain wires in the n s w e directions that go some number of tiles?
<daveshah>
That sounds like pretty much every FPGA fabric
<OmniMancer>
the naming suggests that the eagle parts have such wires that go 1, 2 or 6 tiles, with 4 each of the 1 and 8 each of the others
<OmniMancer>
is it common to have intermediate connections on these wires?
<daveshah>
That sounds suspiciously the same as ECP5
<daveshah>
Sometimes they have all intermediate connections (iCE40), one in the middle (ECP5) or mostly none (xcup)
<OmniMancer>
yea, the 2 and 6 versions have a mid as well
<daveshah>
That sounds more or lesd identical to ECP5 then...
<ZirconiumX>
Though it's majorly out of date, WP01003 puts the Stratix II in the same category as the iCE40.
<ZirconiumX>
"Routing is organized as wires in a number of rows and columns. The Stratix and Stratix II families use a three-sided routing architecture as shown in Figure 13. This means that a LAB can drive or listen to all of the wires on one horizontal (H) channel above it and two vertical (V) channels to the left and right side of it. The channels contain wires of length 4, 8, 16, and 24, and signals can get off at any LAB along the length of the
<ZirconiumX>
wire."
<ZirconiumX>
That's how I interpret this at least.
<OmniMancer>
daveshah: is there a diagram of the ecp5 routing somewhere it would be interesting to look at later
freemint has joined ##openfpga
freeemint has quit [Read error: Connection reset by peer]
OmniMancer has quit [Quit: Leaving.]
Asu has joined ##openfpga
fsasm has joined ##openfpga
mkru has quit [Quit: Leaving]
keesj has quit [Ping timeout: 268 seconds]
keesj has joined ##openfpga
emeb_mac has joined ##openfpga
<tnt>
Does anyone if when using the ECP5 distributed RAM elements, if the read register can be used to register the read port output in the same slice ?
<daveshah>
It can, although I think there may be some caveats with clock/reset packing
<tnt>
Ok. yeah it's a single clock design and no resets on that at all nor CE so I guess it should be fine.
<tnt>
(first part is from yosys, second is from nextpnr).
<daveshah>
A TRELLIS_DPR16X4 is 3 SLICEs, two 2-bit memories and the write port
<daveshah>
plus one SLICE for gnd/vcc
<tnt>
Oh, right, tx for the details.
<tnt>
I'm not quite familiar yet with all the ecp5 details.
Maya-sama has joined ##openfpga
hackkitten has quit [Ping timeout: 246 seconds]
Maya-sama is now known as hackkitten
Asu` has joined ##openfpga
Asu has quit [Ping timeout: 264 seconds]
azonenberg_work has joined ##openfpga
Jybz has quit [Quit: Konversation terminated!]
Maya-sama has joined ##openfpga
hackkitten has quit [Ping timeout: 246 seconds]
hackkitten has joined ##openfpga
Maya-sama has quit [Ping timeout: 252 seconds]
Maya-sama has joined ##openfpga
hackkitten has quit [Ping timeout: 252 seconds]
Maya-sama has quit [Ping timeout: 240 seconds]
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
<tpw_rules>
zignig: saw your PR, you're definitely right. i thought i put the link but it must have got lost. i think i'm going to just change it myself and not accept the PR though, it feels better to me to have just one contributor for now. is this a problem for you?
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
Asu` has quit [Quit: Konversation terminated!]
<tnt>
How's the DSP support for ECP5 ? Pre-Adder / ALU / ... ?