kuldeep has quit [Read error: Connection reset by peer]
kuldeep has joined ##openfpga
azonenberg_work has joined ##openfpga
rohitksingh_work has joined ##openfpga
<wbraun>
I have been playing with the icestorm setup and the picorv32 core. There is a build for the iCE40 ultraplus 5k (one the icebreaker board)
<wbraun>
Its actually failing timing for me at the default clock frequency (something like 12MHz)
<wbraun>
and then passes if I disable a few things on the core. Its at about 80% utilization for the FPGA.
<wbraun>
Given the low max clock frequency, are the iCE40 FPGAs really derpy, or is there some limitation with the icestorm toolchain?
<Bob_Dole>
ice40UP5K are really slow
<wbraun>
The picorv32 repo claims at least a few hundred MHz on a 7 series FPGA
<Bob_Dole>
ice40HX8K are faster speed grades, but they lack a few features.
Bike has quit [Quit: Lost terminal]
<wbraun>
The ultraplus are the newer iCE40 FPGAs, right?
<Bob_Dole>
I think the HX ones should get upwards of 50mhz
<wbraun>
I passed timing at whatever the default timing was for the HX dev board build.
<Bob_Dole>
yeah, ultra plus are newer, but they're slower apparently. they add some features though
<wbraun>
How can they be an order of magnitude slower than a 7 series FPGA....
<wbraun>
I guess the high utilization (~80%) may have also played a role.
<wbraun>
Apparently the HX4k and the HX8k are the same die? Can I generate 8K sized bitstreams to target the 4k devices with the icestorm toolchain?
<wbraun>
I am trying to avoid the BGA devices…
<Bob_Dole>
from my understanding it's the "fabric" that interconnects the LUTS, 7-series xilinx chips are just faster there.
<Bob_Dole>
and up5k models are exceptionally slow at it
<Bob_Dole>
(I am not a specialist here, at all.)
<wbraun>
Yah, its the fabric that is slower. Just suprising that its that much slower.
<wbraun>
according to wikipedia the iCE40 is on a 40nm process node. The artix-7 is on something like a 28nm process node. Those two nodes should not have an order of magnitude speed difference.
<Bob_Dole>
there's more than node-size for speed
<wbraun>
Its a pretty good figure of merit though.
<wbraun>
So is it possible to generate 8k sized bitsteams for the 4k devices?
<Bob_Dole>
I don't know, will need to wait for a response from someone more familiar.
<wbraun>
Or is that not built into the tools to avoid the wrath of lattice?
<wbraun>
I have heard multiple FPGAs from multiple vendors employ the same product differentiation (only JTAG ID) but there is a conspicuous lack of tools that take advantage of that.
ayjay_t has quit [Remote host closed the connection]
ayjay_t has joined ##openfpga
<kc8apf>
wbraun: yes, it's common that a single die design is fused for different SKUs
<kc8apf>
it's unclear if the fused parts are binned (tested and bad sections mapped out) or just sold as lower end parts for market segmentation
<Bob_Dole>
are they actually fused off, is his question
<kc8apf>
generally, no
<kc8apf>
the fusing is to set the JTAG ID
<Bob_Dole>
so possible to put an 8k bitstream on the 4k parts? maybe useful
<kc8apf>
No one wants to have a tool end-user find their part has occasional failures
<kc8apf>
but yes, it's entirely possible to do
<kc8apf>
ice40 4k and 8k
<kc8apf>
many of the artix7 line
<kc8apf>
max-v too
Xark has quit [Ping timeout: 252 seconds]
<rqou>
max-v (and other altera parts) seem to be literally controlled by an "if" statement in the software
<rqou>
max-v at least doesn't even differ in jtag idcode
* kc8apf
suspects rqou has an alert on max-v
<rqou>
nor is it "geometrically" restricted; literally "if (LEs in design > limit) raise an error"
<rqou>
not e.g. "can only use the top half"
<rqou>
and no, just happened to notice the conversation
<TD-Linux>
has anyone done anything with mach 4
<kc8apf>
i haven't heard anything
<wbraun>
They are not binned because you can typically route to any LUT on the device. You are just limited to some number of LUTs by the software.
<kc8apf>
I have a MachXO2 on my desk to poke at at some point
<wbraun>
So I guess the answer is that no one has bothered adding support for doing that with the iCE40 devices then?
<rqou>
uh, it works for the ice40
<rqou>
you can generate a bitstream with 8k worth of LUTs for a 4k device
<kc8apf>
just need to patch the bitstream with the 4k device ID
Xark has joined ##openfpga
<rqou>
i think it does that automatically
<kc8apf>
oh, I thought it didn't
<wbraun>
oh cool.
<wbraun>
So I can use the ICE40HX4K-TQ144 (in an easy to deal with non BGA LQFP package) as an 8k device then? And not have to deal with BGA packages while getting to play with the biggest device?
ym has joined ##openfpga
<kc8apf>
wbraun: should be able to
<rqou>
yeah, there's already some board that does this
<rqou>
iirc it's a shield thing for some "maker" board form factor and has sdram
<Bob_Dole>
me and solra are going with the BGAs because honestly, since I have a spare toaster oven, I think BGAs sound easier. they'll pull themselves intoplace if you get close enough.
<rqou>
can confirm, BGAs (at least at 1mm pitch on ENIG) work much more reliably than fine-pitch QFPs
<wbraun>
iCE40 FPGA has a largest pitch of 0.8mm
<rqou>
probably works, but ymmv
<wbraun>
1mm pitch is within cheap PCB design limits, 0.8mm typically requires you to push / break some of the limits
<wbraun>
I have done 1mm in the past, its not that bad.
<Bob_Dole>
why did I only just now realize what ymmv is?
<wbraun>
What project are you working on Bob_Dole?
<SolraBizna>
tricking me into making cryptocurrency mining devices
<SolraBizna>
all I wanted was a 65816 and some memory, but noooo :P
<wbraun>
With what FPGAs?
<wbraun>
Is there anything that can be profitably mined with even FPGAs nowadays?
<wbraun>
What does not have an asic?
<wbraun>
I was working on a sia coin miner last year but then I got bored with it. A few months later they announced an asic...
<Bob_Dole>
wbraun, there's a few that need FPGAs, monero specifically. They're mostly going for high-end xilinx parts for that, and mid-tier xilinx parts for accelerating gpu mining for various algos.
<Bob_Dole>
I'm trying to get solra to make a fully FOSS gpu architecture, and if possible, include the ability to mine cryptocurrency stuff. first part is most important.
<SolraBizna>
he's also working on tricking me into fully open hardware he can actually have
<SolraBizna>
which is much easier because I also want that
ayjay_t has quit [Quit: leaving]
<Bob_Dole>
be able to pair said gpu with a fully open cpu with fully open everything-else is desired
<wbraun>
My attempted edge was to use AWS FPGA instances. They have hefty FPGAs
<wbraun>
I wonder if I would have actually been profitable if I finished it. Probably not.
<wbraun>
I wonder if I would have been more cost effective than the current best AWS credit >> crypto >> USD laundering method. Possibly. But still not likely.
<wbraun>
Also, turns out that they restrict the FPGA instances quite heavily.
<Bob_Dole>
I don't know much about the AWS fpgas.
<wbraun>
Its a top end xilinx part.
<wbraun>
The toolchain for loading stuff on it is a bit convoluted though.
<galv[m]>
How do they restrict the FPGA instances? I was able to reserve one just fine with some credits. Do they put restrictions on the total number?
ayjay_t has joined ##openfpga
<wbraun>
They won’t let you just run any bitstream and you have to use their wrapper for interface
<galv[m]>
The toolchain is a gory mess :(
<Bob_Dole>
they're working on a custom VCU15... something, based card with some tweaks to cooling and power delivery, changing the V to a B, and then some artix-7 parts to put into M.2 slots called Acorn
<kc8apf>
wbraun: that's so they can regain control, etc
<Bob_Dole>
not aws. but SQRL/Mineority
<wbraun>
They restrict the number. I was trying to determine the price elasticity of the spot instances and I had to beg them for the ability to have 10 instances
<wbraun>
Probably mostly so you can’t dick with interfaces you are not supposed to or fry the device with a dud bitstream
<Bob_Dole>
the acorns look interesting, M.2 Artix 7 cards, they're being focussed at accelerating parts of mining algos that can get GPUs more competitive again
<wbraun>
And single FPGA instances mind you, not the 10x of the full box with multiple cards
<kc8apf>
a few of their security people told me about the lengths they go to ensure the fpgas are clean before transferring between users
<wbraun>
the price elasticity was actually measurable with only 10 spot instances, so I guess they did not have that many at the time.
<wbraun>
I had to pretend to be working on some research project at my university to even get 10
<wbraun>
Clean? What stored state is there?
<kc8apf>
I recall them having a bunch of RAM attached
<wbraun>
Yah. Should be pretty easy to flush volatile ram though
<wbraun>
So they load some “cleaning” bitstream to initialize everything to a known state?
<wbraun>
What are the latches you are talking about though? Latches would be created with LUTs, no?
<kc8apf>
in a 7-series device, bitstreams don't always include the storage bits in a LUT
<kc8apf>
or BRAMs
<kc8apf>
think of partial reconfig situations
<kc8apf>
I didn't get concrete details on what they do. I'm guessing a bit based on what I know of 7-series bitstream format
<wbraun>
Oh yah, each slice contains some distributed ram / shift registers
<wbraun>
Now I think about it, don’t they always use partial reconfiguration and always keep their interface wrapper configured or something?
<kc8apf>
I think so
<wbraun>
I don’t remember, its been about a year in a half since I read the docs
<wbraun>
It was fun to learn about. Not very accessible though.
<wbraun>
But it is cool to theoretically have access to such a big FPGA at relatively low cost.
<kc8apf>
it would avoid having to reset the PCIe block
<Bob_Dole>
my thought is: cache is good. sram is "cheap enough" to get 8MB of fast-ish SRAM. cryptonight needs ~2MB per thread. ECP5s don't Go Faaast,but have room to get a decent number of threads. cache is good for CPU so even if it can't mine well that'll probably make a cpu more Tolerable for general use onit. >.>
<Bob_Dole>
DDR3 isn't working in the open toolchain right now, right? how's ddr/ddr2 doing?
<SolraBizna>
iCE40s can do DDR
jevinski_ has joined ##openfpga
jevinskie has quit [Ping timeout: 252 seconds]
Zorix has quit [Ping timeout: 264 seconds]
jevinskie has joined ##openfpga
Zorix has joined ##openfpga
jevinski_ has quit [Ping timeout: 245 seconds]
<sorear>
DDR means two things
<emily>
dance dance revolution 3
jevinski_ has joined ##openfpga
jevinskie has quit [Ping timeout: 250 seconds]
<wbraun>
So Arachne-pnr and nextpnr are both place and route tools?
<wbraun>
I am looking through the example builds on picorv32 and it looks like one of the builds (for the iCE40 UltraPlus 5K) uses nextpnr and the other (for the iCE40 hx8k) uses arachne-pnr
<wbraun>
It seems that both were building yesterday but something is not working today. I was poking around in the make file to debug and noticed that there was a difference between the two builds.
<wbraun>
Not that its causing my problem. Just curious. Both repos seem to be similarly updated / active.
<Bob_Dole>
arachne-pnr is the old pnr tool,replaced by nextpnr
<wbraun>
ok. Cool.
<wbraun>
is the command interface the same?
<wbraun>
Ooh, looks like its not. I will figure it out though.
<wbraun>
So nextpnr is the tool to use for the near future? Is there other alternatives?
<kc8apf>
VPR but it doesn't have working support for any FPGAs yet
<wbraun>
cool! Thanks for answering the questions! Hopefully I will have something building soon.
lovepon has quit [Ping timeout: 250 seconds]
<gnufan>
Bob_Dole: with regard to "FOSS GPU", let me point you to http://libre-riscv.org/3d_gpu/ ; LKCL the guy backing it maybe has not such a great success story behind, but maybe some ideas could worth a look..
<Bob_Dole>
gnufan, that the eoma68 guy?
<gnufan>
indeed..
<gnufan>
i subscribed for the microdesktop... still waiting for it! :-)
<gnufan>
but things are "slowly" moving there.. it looks like..
<Bob_Dole>
One of his ideas is basically what I was proposing to solra. take a risc-v core, modify it some,modify llvmpipe to use those modifications, call it day.
<Bob_Dole>
and same, I ordered a compute card and enough to use it... has it been a year yet? I think it's been a year+ now.
<Bob_Dole>
he was doing great at keeping updates about progress and hindrances and then went silent for a while, it looks like he's gotten back to giving updates.
<whitequark>
Bob_Dole: llvmpipe is really slow...
<whitequark>
well, it's impressively fast for what it is
<whitequark>
but you pretty much have to back it with like an i7 to get anything meaningful
<whitequark>
and even then
<Bob_Dole>
whitequark, yes. the point is desktop environments to run smoothly, nothing more.
<whitequark>
well
<whitequark>
macOS up to 10.8.5 works well on llvmpipe
<whitequark>
their variant of
<whitequark>
10.10+? nope, very slow
<Bob_Dole>
since everything requires 2.5D acceleration now
<whitequark>
just barely enough to not be completely unusable
<Bob_Dole>
I haven't touched Mac OS since 10.6
<whitequark>
i run it in a vm, which.... HANG ON
<whitequark>
i can run it in a VM with GPU passthrough now
<whitequark>
let's try it
m4ssi has joined ##openfpga
GuzTech has joined ##openfpga
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
<Bob_Dole>
I wonder how well the Nyuzi core ports over to ECP5.. since it looks like Cyclone IVs are also LUT4?
<Bob_Dole>
but there's more to it than hat
<Bob_Dole>
s/hat/that
Miyu has joined ##openfpga
Miyu has quit [Ping timeout: 268 seconds]
thehurley3[m] has quit [Remote host closed the connection]
galv[m] has quit [Remote host closed the connection]
AlexDaniel-old[m has quit [Remote host closed the connection]
pointfree[m] has quit [Remote host closed the connection]
indefini[m] has quit [Remote host closed the connection]
jfng has quit [Remote host closed the connection]
edmund20[m] has quit [Remote host closed the connection]
Wallbraker[m] has quit [Remote host closed the connection]
nrossi has quit [Remote host closed the connection]
Xark has quit [Ping timeout: 252 seconds]
<daveshah>
Bob_Dole: I think ECP5 and Cyclone IV are quite comparable resource-wise
<daveshah>
Although their low level architecures are different; similar LUT, RAM and multiplier widths
<daveshah>
Timing should be similar too
Xark has joined ##openfpga
<Bob_Dole>
nyuzi was using 74k of its LE/LUT4s it looked like, so unless it's using DSP blocks..
<daveshah>
Should just about fit in an 85k then
<daveshah>
85k has plenty of DSPs too
<Bob_Dole>
wasn't there only one operation being supported on them presently?
AlexDaniel-old[m has joined ##openfpga
<daveshah>
Well the FOSS tools won't build Nyuzi until late next year at the earliest due to its heavy use of SystemVerilog
<daveshah>
So DSP feature set in ECP5 would be the least of my worries
<Bob_Dole>
Ah, well, there's plenty enough to do in the mean time.
rohitksingh_work has quit [Read error: Connection reset by peer]
<openfpga-github>
[Glasgow] whitequark commented on issue #70: Should be A1. First, it's equivalent to shorting pins 1 and 2 (already easy, but you have to remember the pinout). Second, it gives the EEPROM an unique address, unlike shorting A0 to Vcc. https://github.com/whitequark/Glasgow/issues/70#issuecomment-434674257
<whitequark>
i assume cameras do something like that in reverse
<whitequark>
so you can feed the lcd from it or something
<whitequark>
or because 7:1 SERDES were already a thing
<sorear>
why would you spend 1/7 of your bits on sync data
<whitequark>
because of vga
<whitequark>
it's for converting vga transmitters and vga displays directly to lvds
<whitequark>
like
<whitequark>
it is literally made forfeeding lcd controllers that have the parallel lcd bus
* awygle
writes "learn about VGA" on TODO list, thinks about it for a second, and then crosses it off again
<whitequark>
awygle: vga is
<whitequark>
digital hsync, digital vsync, and analog r+g+b
<whitequark>
that's the entire thing
<sorear>
I assume it's also wasting a stupid fraction of the clock cycles on blanking intervals
<Bob_Dole>
awygle, but you can make a DAC out of just a bunch of resistors you can drive off the pins of an MCU
<whitequark>
vsync is a frame start strobe, hsync is a line start strobe
<whitequark>
yes
<whitequark>
you now understand vga
<Bob_Dole>
and have vga, just like that
<sorear>
despite the fact that LCDs fundamentally don't have a retrace period
<whitequark>
sorear: but when are you going to generate music if not during vsync intervals?
<whitequark>
and play it if not during hsync intervals? :P
<awygle>
how will my mouse tear if there's no vsync
<awygle>
(i might have that backwards)
<whitequark>
lol
<qu1j0t3>
awygle: is that a trick or rhetorical question
<whitequark>
awygle: did you know that
<qu1j0t3>
awygle: or are you looking for an answer
<whitequark>
your gpu includes support for drawing the cursor
<whitequark>
specifically to avoid cursor tearing
<whitequark>
it's called "silken mouse" by nvidia iirc
<sorear>
tearing is a kind of aliasing artifact, and will be relevant as long as time is sampled discretely
<awygle>
qu1j0t3: it's a question demonstrating my total lack of comprehension of how any of this works or, in fact, what "mouse tearing" actually is
<qu1j0t3>
and just "mouse" by everyone else, who uses vertical retrace interrupts since forever ;-)
<whitequark>
i assume so that hardware that can't into double buffering still has usable cursor
<sorear>
I knew that basically every GPU has a cursor sprite but not that it was about tearing
<sorear>
thought it was just another "we added this when computers were slow and never revisited the decision"
<awygle>
... did seeed make their website spooooky for halloween? everything is orange
<qu1j0t3>
awygle: If you change the (single buffered) vram during the raster, you can see a partially updated mouse "sprite". This is CRT terminology because the problem is very old, and was solved originally by using a vertical retrace interrupt and doing the update during CRT blanking
<qu1j0t3>
(doesn't need double buffering, but double buffering also needs that interrupt anyway)
<whitequark>
double buffering means double ram
<awygle>
i guess "seeed" is also kind of spoooky by itself
<qu1j0t3>
yes.
<whitequark>
and i assume that wasnt available back them
<whitequark>
then
<awygle>
qu1j0t3: okay that makes sense, kind of
<awygle>
concurrency problem
<qu1j0t3>
whitequark: yeah, but it just wasn't needed for such a small problem
<whitequark>
right
<awygle>
seems like a big-hammer fix for it
<awygle>
but i guess with CRTs maybe not
<qu1j0t3>
whitequark: double buffering gets used when you have a full screen of sprites, like Dark Castle on Mac, then of course you flip buffers during blanking
<qu1j0t3>
but you know that
* qu1j0t3
shuts up
<whitequark>
well i've never heard of that specific application of double buffering
<qu1j0t3>
awygle: not really, the interrupt had many uses
<whitequark>
im not much of a crt person
<whitequark>
i vaguely aware they exist
<whitequark>
i used an actual crt for a few years i think
<whitequark>
but that was sooooo long ago
<qu1j0t3>
heh, i'm still a crt person, but electrostatic
<qu1j0t3>
doing vector stuff for fun
<whitequark>
nice
<awygle>
qu1j0t3: "don't concurrently access these bits" seems like a solveable problem at higher granularity to me, but i guess it's not worth it
<qu1j0t3>
i think with LCDs you still have to fake the retrace interrupt in principle, no doubt GPUs do that
<openfpga-github>
Glasgow/revC 78bfffc Hector Martin: Update SOT563 footprints to Glasgow version and more DRC fixes
<openfpga-github>
Glasgow/revC 5a50dd6 Hector Martin: revC: Update SOT563 footprints to Glasgow version and more DRC fixes
<whitequark>
awygle: i automatically assume the implementation of bus wait cycle insertion would be fucked up somehow
ZipCPU_ has joined ##openfpga
<whitequark>
because it's intel
<awygle>
lol, fair
<awygle>
i was thinking of software level locking
<whitequark>
oh
<awygle>
rather than doing it in hardware
<whitequark>
... wouldn't that involve a vsync interrupt *anyway*
<whitequark>
assuming you don't want to poll a bit
<whitequark>
so you'd need a vsync interrupt and then either block there (ew) or longjmp out of the display update code (double ew)
ZipCPU has quit [Ping timeout: 250 seconds]
<sorear>
awygle: the video signal does block writes, but moving the cursor needs to be atomic relative to that
<sorear>
the fine-granularity but traditional approach here would be "read a current-scanline register, poll if it's too close to where you want to write"
<sorear>
some old systems didn't have enough memory for *single* buffering - ancient video systems are a trip
<sorear>
would the NES PPU be considered a CRTC
<awygle>
okay i'm convinced
<azonenberg_work>
sorear: well if you want to do really low memory stuff
<azonenberg_work>
you have a bunch of sprites and coordinates
<azonenberg_work>
and synthesize pixels in real time off that :p
<sorear>
but if you don't have a hardware sprite for the cursor, the window system needs to save the content under the cursor somewhere
<sorear>
which would then significantly complicate all drawing routines if you aren't already double-buffering everything
<sorear>
i guess you could not bother and just fire expose events on every mouse move but ughh
<travis-ci>
whitequark/Glasgow#120 (revC - 5a50dd6 : Hector Martin): The build has errored.
<azonenberg_work>
sorear: well if you're doing that kind of low time rendering
<azonenberg_work>
you never store the framebuffer
<azonenberg_work>
you spit out pixel data in real time and always re-render at the frame rate
<sorear>
sorry, I'm talking about two things at once
<azonenberg_work>
i.e. your cpu clock is the pixel clock
<sorear>
the NES PPU doesn't store a framebuffer anywhere
<sorear>
but we've also been discussing cursor rendering in Toolbox, and, uh, it's been too long since I read Inside Macintosh
<sorear>
iirc classic mac os, X11, and windows pre-2000 or so all use the "there is one framebuffer, 'windows' exist only to drive the event loop, moving a window results in expose events" approach
<sorear>
but if you have a framebuffer, and the framebuffer contains the cursor, drawing is tricky
<azonenberg_work>
Yeah
<azonenberg_work>
if i ever get around to making any embedded gizmos with a UI
<azonenberg_work>
i'm thinking of having a hardware compositor in the fpga :p
<azonenberg_work>
each app renders using a combination of hw and sw to its own private framebuffer
<azonenberg_work>
then as you finish updating you push the current framebuffer pointer to the compositing block
<azonenberg_work>
which knows the position of each window and generates the final framebuffer from that
wbraun has joined ##openfpga
<sorear>
why would you generate a final framebuffer instead of just compositing during scanout
<sorear>
unless you want to do weird transforms
<awygle>
wait so in a 7-series
<awygle>
does every signal that a bufg drives end up on one of the 12 clock nets in each region?
<awygle>
that was not a super comprehensible phrasing of that question i guess
<azonenberg_work>
sorear: because the latency of memory reads is hard to predict if they're coming from all different sources
<azonenberg_work>
compositing during scanout is easier if you have linear reads so you can prefetch etc
<azonenberg_work>
awygle: Yes, that is my understanding
<awygle>
azonenberg_work: then why are there 32 BUFGs if you can never use more than 12
<azonenberg_work>
I dont think
<azonenberg_work>
seec
<awygle>
(this is specifically a zynq 7020, exact numbers may vary)
<azonenberg_work>
awygle: so, if i'm reading UG472 right
<azonenberg_work>
There are 32 vertical clock lines down the center spine of the device
<azonenberg_work>
one BUFG drives each one
<azonenberg_work>
Each clock region has 12 horizontal clock lines, which can each be fed by some or all (docs are unclear on exact routability here) of the 32 global
<azonenberg_work>
then the HCLKs drive the per-region clock tree
<azonenberg_work>
So basically, you have 32 global clocks but any one clock region can only use 12 of the 32
<azonenberg_work>
but they can be different subsets
<azonenberg_work>
you can also use a BUFH to drive a HCLK directly without using one of the 32 global lines
<awygle>
hm okay
<awygle>
this design _might_ fit then
<azonenberg_work>
This is one of many situations where floorplanning is handy
<awygle>
what counts as a clock region?
<azonenberg_work>
have you ever looked at a 7 series chip in the floorplanner? :)
<azonenberg_work>
The boxes with color-coded outlines
<awygle>
no, because we're using some kind of horrible mess of mostly-EDK
<azonenberg_work>
Roughly speaking a clock region is half the chip wide and 50 CLBs high
<azonenberg_work>
So all 7-series parts are 2xN clock regions
<azonenberg_work>
Height of a clock region is constant, width varies by size of the part
<azonenberg_work>
(ultrascale changes this, iirc all ultrascale clock regions are the same size and you can have >2 columns of them)
<awygle>
where is the planahead button
<azonenberg_work>
um... i know how to get to it from projnav but not edk
<azonenberg_work>
But you can launch it standalone if you just want to look at the design
<awygle>
what is projnav
<azonenberg_work>
_pn
<azonenberg_work>
the normal ISE ide
<azonenberg_work>
source $XILINX/settings64.sh
<azonenberg_work>
planAhead
<azonenberg_work>
is how you'd launch it in the CLI
<azonenberg_work>
Then once you get it up, create a new dummy project in /tmp or something, select the device, and import the ngc (synthesized netlist) and ncd (par'd netlists) from your build
<azonenberg_work>
it should default to opening up the device floorplan
<awygle>
can i make this show me the actual clock nets? i see the tiles but not the nets...
<SolraBizna>
blast, there was a whole discussion of video generation and I was asleep
<SolraBizna>
the original Mac had a single 16x16 hardware "cursor sprite" and a single hardware framebuffer
<SolraBizna>
it updated the cursor sprite's position in the vertical-blank handler, and applications that wanted to do smooth animations would render to the heap and start to copy that rendered data onto the screen after the vertical blank had occurred
<SolraBizna>
I would recommend an architecture like that for new embedded systems with mostly-static graphics requirements
<azonenberg_work>
awygle: yes those are clock regions
<azonenberg_work>
Planahead has a bug where each time the window refreshes, the box gets a little smaller
<azonenberg_work>
this is fixed in vivado and i reported it in ISE but they wontfix'd it since ISE was basically EOL by that point
<awygle>
that is an amusing bug
<balrog>
azonenberg_work: they did FINALLY put out a Win10 compatible ISE build
<balrog>
but it only supports Spartan 6, wtf
<azonenberg_work>
balrog: because basically all the older parts are "even more EOL" than s6
<azonenberg_work>
and for 7 series they want everyone using vivado
<balrog>
Spartan 3A is still too common
<azonenberg_work>
o_O
<balrog>
due to 5V tolerant I/Os
<azonenberg_work>
wait what? s3a is 5v tolerant?
<azonenberg_work>
i thought you had to use cplds for that
<Bob_Dole>
:o
<balrog>
hm maybe I'm confusing myself
<azonenberg_work>
xc9500, the original series, is 5V
<azonenberg_work>
9500xl might be 5v tolerant but is 3.3v core, i dont remember
<Bob_Dole>
Ithought even the 5V CPLDs were EOL
<azonenberg_work>
coolrunner and beyond are 3.3 max on io
<balrog>
azonenberg_work: yeah, nope, I was wrong
<azonenberg_work>
balrog: my guess? the support engineers are tired of dealing with ise and want it to just die already
<awygle>
verdict - this design will fit but i'll probably have to hand-optimize what buffers are used (instead of just slapping them all on BUFGs)
<awygle>
i'm so sick of this design lol
<balrog>
will they add CPLD support to Vivado?
<balrog>
coolrunner is still supported I thought
<azonenberg_work>
yeah, you have to use old ise on pre-win10 or linux
<azonenberg_work>
vivado will never support cplds
<azonenberg_work>
they consider cplds a dead end
<azonenberg_work>
they still sell the chips but arent really wanting people to do new designs with them
<azonenberg_work>
awygle: i generally use bufh's when i can for regional clocks that are only used for a small module or something
<azonenberg_work>
It helps placement considerably in ISE with BUFH's if you floorplan the module to one clock region though
<azonenberg_work>
so it doesnt have to spend a lot of time fighting unroutability to make it work
<azonenberg_work>
awygle: btw if you upgrade this design to ultrascale at some point it will get a lot easier since you have 24 clocks per clock region (and the clock regions are smaller)
<awygle>
azonenberg_work: pins from the same logical bus are in different banks, clocked by clocks which are not on CC pins
<azonenberg_work>
plus 24 more "routing" clocks that are used for feed-through to adjacent regions
<awygle>
floor planning is unlikely to help
<awygle>
at this late date
<awygle>
so ise doesn't use timing for placement /routing, does it?
<azonenberg_work>
it should be timing driven
<awygle>
oh OK, I thought it wasn't for some reason
<azonenberg_work>
the really old parts did not, like spartan3 with some par options
<azonenberg_work>
awygle: side note, i almost never use BUFGs in 7 series designs
<azonenberg_work>
Since PLLs can do funny things during boot and i normally have my clocks come from a PLL
<azonenberg_work>
So what i do instead is i feed each PLL output to a BUFGCE gated by the PLL lock signal
<awygle>
azonenberg_work: i would love to do all kinds of cool shit but this design came pre-fucked
<awygle>
these are non-free-running clocks coming in on non-clock-capable inputs
<awygle>
no PLLs, no BUFIOs, only pain
<azonenberg_work>
lol
<azonenberg_work>
enjoooooooy
<azonenberg_work>
oh, and get the pcb engineer fired :p
<whitequark>
wtf
<balrog>
whitequark: your logging server is down?
<azonenberg_work>
awygle: at least if they were CC inputs you'd be ok-ish
<azonenberg_work>
and that could totally have been fixed at layout time
<azonenberg_work>
Also if at all possible try and change policy so the FPGA guy(s) get inserted into the design flow before the PCB tapes out to fab
<azonenberg_work>
you can avoid so much pain by having a less siloed design flow
<azonenberg_work>
(not sure if this design predates you or what though)
<whitequark>
balrog: is it? lemme see
<whitequark>
balrog: seems up to me?
<balrog>
it was a hiccup
<balrog>
got a "could not load data from this location"
<azonenberg_work>
same thing with asic work, having the RTL and layout guys/teams talking to each other from day one avoids lots of issues before they become big problems
<azonenberg_work>
(looking at you, big-cpu-company-that-isnt-amd)
<balrog>
azonenberg_work: looooooooool
<whitequark>
pffffffff
<azonenberg_work>
seriously, siloed workflows like that are asking for trouble
<whitequark>
yeah intel's culture is royally fucked
<whitequark>
every time someone talks about it i'm just "welp"
<whitequark>
"how do they produce anything at all functional"
<qu1j0t3>
:)
<azonenberg_work>
whitequark: to give you an idea of how bad it is
<azonenberg_work>
one time i had an intel engineer ask me for recommendations on a 10G NIC
<azonenberg_work>
I suggested an intel chipset
<azonenberg_work>
he was like "wait, we make those?"
<rqou>
wtf
<azonenberg_work>
he literally had no idea an entire line of their business existed
<azonenberg_work>
its one thing to not have commit access to the repo or something, but not knowing the product exists??/
<rqou>
azonenberg_work: although to be fair i have very little idea what the rest of my employer is doing
<rqou>
because we also do a ton of random shit
<azonenberg_work>
rqou: yeah but like, these arent classified government contracts or something
<azonenberg_work>
these things are sold openly on amazon
<rqou>
yeah, but considering my $WORK, the red team really has very little visibility into e.g. how the huffpost writers get paid (which was a thing that came up a while back)
<sorear>
the "three xeons in a trenchcoat" thing caught me by surprise, but it was at least chips I knew about
<rqou>
it's not even in the same "legacy half of the company"
<azonenberg_work>
yeah but if you have a huge holding company that has many different sub-brands its understandable
<azonenberg_work>
as they're effectively their own companies who just share profit at the highest levels
<azonenberg_work>
but intel nic and intel cpu?
<zkms>
also theres intel buttbands
<rqou>
wat
<sorear>
*mumble* run lspci on a typical intel laptop and count the things *not* made by intel
<sorear>
i guess at this point they could expand into batteries or screens
<SolraBizna>
2
<SolraBizna>
(out of 21)
<SolraBizna>
(one is an ExpressCard I've inserted, the other is a FireGL[?!!?!] GPU)
<SolraBizna>
for three months, it was stable, so I assumed I had fixed it
<SolraBizna>
Three days ago, I unplugged the serial connection
<SolraBizna>
It has hung twice
<rqou>
solution: update the deployment requirements to require the serial connection be plugged in
<SolraBizna>
I can't think of any reason for it to fail like this except that having the ground hooked up was stabilizing it somehow
<whitequark>
that's actually possib
<whitequark>
le
<rqou>
[14:16] (rqou) solution: update the deployment requirements to require the serial connection be plugged in
<whitequark>
try hooking up just the ground?
<rqou>
am I doing the cursed vendor thing correctly?
<SolraBizna>
that's a good idea, I'll do that if it's stable with the serial cable connected for a week or so
<reportingsjr>
whitequark: what is your IRC channel?
<whitequark>
reportingsjr: ##whitequark
<whitequark>
rqou: yes.
<whitequark>
unfortunately
<SolraBizna>
so glad I don't have to deal with real cursed vendors
<SolraBizna>
on the other hand, I have to deal with being poor...
<awygle>
would it be faster to route pin->BUFR->BUFG than pin->BUFG?
<azonenberg_work>
no i dont think so
<azonenberg_work>
probably a lot slower in fact?
<awygle>
hm why? my reasoning is "BUFR is close to pin, clock fabric is faster than general routing"
<awygle>
i guess that second thing may not be true
<azonenberg_work>
its not faster than general routing per se, it's controlled for skew
<azonenberg_work>
its a balanced tree with the same number of loads on each leaf etc
<awygle>
right
<awygle>
MMCMs aren't guaranteed to free-run if their input clock goes away...
<azonenberg_work>
Why would you want a pll/mmcm to free run anyway? if it loses lock the output frequency is completely unpredictable
<azonenberg_work>
and might go too fast for your constraints or something
<awygle>
mhm
<azonenberg_work>
if anything i'd want it to insta-stop and gate all outputs
<azonenberg_work>
after only a couple of vco cycles, before it could drift out of the safe range
<awygle>
i can't find anything about "bufr O to bufg I" timing in the datasheet....
<sorear>
the problem with a stopped clock is that a glitchless clock mux can't change away from a stopped clock
<sorear>
a PLL that free-runs at the lowest possible frequency could be useful, idk
<azonenberg_work>
awygle: that's routing fabric
<azonenberg_work>
routing fabric numbers arent in the datasheets
<azonenberg_work>
heck, past 7 series they don't even give you CLB timing
<awygle>
yeah but it's dedicated routing, i was hoping it was
<azonenberg_work>
sorear: yes but the VCO isn't stopped
<azonenberg_work>
the idea is that you detect loss of input clock, vco is still running, then glitchlessly gate the output between cycles before the vco period drifts significantly
<sorear>
azonenberg_work: replying to "I'd want it to insta-stop and gate all outputs"
<azonenberg_work>
That's my point, you can glitchlessly stop the outputs because the pll outputs are generated from the VCO which is free-running
<sorear>
yes but if you have a clock mux downstream of the PLL you're hosed
<awygle>
you spend like a nanosecond in the bufr itself
<azonenberg_work>
depends on what you're muxing and how its configured
<awygle>
but it takes >3ns to get to bufg
<azonenberg_work>
all i can say is, my approach has always been that lack of a stable clock = lack of a clock
<awygle>
so who knows
<azonenberg_work>
shut down until you get a clock back
<awygle>
find out when this build finishes i guess
<azonenberg_work>
awygle: why do you care so much about this delay?
m4ssi has joined ##openfpga
<azonenberg_work>
what are you trying to do with the input clock?
<awygle>
sample the incoming data
<azonenberg_work>
a bufh is the best thing to do with a fabric-sourced clock (not that there are GOOD things to do with fabric clocks)
<awygle>
hm, wonder if BUFH->BUFG is faster actually
<azonenberg_work>
Why bufg at all??
<awygle>
i can't BUFH the whole way because the clock regions don't work out that way
<azonenberg_work>
thats my question
<azonenberg_work>
Sample the data in the BUFH'd domain
<azonenberg_work>
then feed into a dual clock fifo
<awygle>
i _can't_. the clock comes in in (say) bank 34, along with 2 out of 8 bits
<azonenberg_work>
...
<awygle>
the other 6 are in (say) bank 13, one clock region over and one down
<azonenberg_work>
waaait
<azonenberg_work>
the bits arent even coming in on the same side of the chip, much less the same bank?
<awygle>
nope
<azonenberg_work>
my previous recommendation to fire the pcb guy just got a lot stronger
<awygle>
not necessarily at least
<awygle>
like i said this board wasn't, strictly speaking, _designed_ to do this
<awygle>
oh the BUFHs are in the center spine, so they're not going to be any faster than routing through fabric to the BUFG, probably
<azonenberg_work>
My understanding is that a BUFH is purely a software construct, they have no existence in the actual chip
<azonenberg_work>
so is a BUFG
<azonenberg_work>
ish
<azonenberg_work>
Basically, you have routes from various sources into the 32 global clock lines
<azonenberg_work>
and the 12 regional clocks
<azonenberg_work>
Then you can route from the global clock lines into the regional clocks
<azonenberg_work>
a BUFH is just a way of saying, drive this regional clock but don't backfeed into the global clock
<azonenberg_work>
i.e. the 12 BUFH's per clock region are literally the horizontal branches of the global clock tree within that clock region
<azonenberg_work>
there are not two separate sets of routes
<azonenberg_work>
So a BUFH is just a pip going from fabric routing to the horizontal clock wire
<azonenberg_work>
And it likely uses the same high-fanout buffer as the redriver between the global clock spine and the horizontal clock row
<sorear>
What would happen if a clock tree had loops?
<awygle>
right
<awygle>
but the redriver is still on the center spine
<azonenberg_work>
near, not in
<daveshah>
sorear: you mean going back through a global buffer?
<awygle>
whatever, close enough
<daveshah>
You'd make a delay line memory
<daveshah>
Most arches don't have bidirectional switches off the clock tree so you couldn't form a loop within it
<azonenberg_work>
a very power hungry and slow one :p
<azonenberg_work>
and yes
<azonenberg_work>
that too
<azonenberg_work>
you could loop one bufg into another and another
<azonenberg_work>
but not full feedback
<sorear>
I mean if there were actual metal loops in clock distribution
<azonenberg_work>
you'd have to close the loop in fabric
<awygle>
bufg i-to-o is much faster than bufh i-to-o
<sorear>
naively, a clock plane would not have any skew problems (because nearby-in-2D flops will always get the pulse at nearly the same time), and a clock plane-with-holes would seem to have the same advantage while using not much more metal than a tree
<sorear>
i'm wondering if (a) they don't do this to save metal (reduce clock capacitance, reduce dynamic power) (b) this doesn't actually work
<daveshah>
I suspect capacitance comes into it
<daveshah>
Most clock tree structures have buffers in the tree structure
<daveshah>
At least in ice40 and ecp5 they can be turned off to save power too
m4ssi has quit [Remote host closed the connection]
<travis-ci>
whitequark/Glasgow#123 (revC - 8da369c : Hector Martin): The build has errored.
<azonenberg_work>
sorear: metal density issues w fab too
<azonenberg_work>
there are min and max percent cover allowed
<azonenberg_work>
and yes cap is an issue
<azonenberg_work>
An ideal clock setup is typically a fractal of H shapes
<azonenberg_work>
So each buffer only needs a fanout of ~4
<sorear>
I guess my question is, "do they use fractal trees because that's the most efficient use of metal, or because there are voodoo RF reasons to studiously avoid loops?"
<azonenberg_work>
Both
<azonenberg_work>
I did see a square grid on a 350nm part once