<kc8apf>
whitequark: azonenberg: i've been trying to find midspan interposers similar to what you're been discussing. Work has a LeCroy T2-16 PCIe analyzer but no cables or interposers. https://cdn.teledynelecroy.com/files/pdf/gen2_passive_interposer_datasheet.pdf is the LeCroy interposer but it seems to be unavailable.
<whitequark>
yep, I've found that one actually, which to a degree inspired my whole idea
<whitequark>
I think a midspan interposer with SFF-8643 and COTS adapters I listed before could be very very cheap
<kc8apf>
i'm very interested in finding a compatible part
<whitequark>
think $200 range
<kc8apf>
analyzers are running $500US on ebay
<whitequark>
oh, that's cheaper than i expected
<kc8apf>
T2-16 does up to gen2
<whitequark>
what connectors are those?
<kc8apf>
Molex iPass
<whitequark>
think i should do the SFF-8643 tee with a molex ipass connector to TE instead of another SFF-8643?
<whitequark>
wait
<whitequark>
are those iPass things *yet another SAS connector*?
<GenTooMan>
hmmm it's a 68 conductor PCIx*8 cable with 2 ends on it that's 3meters long. That's a bit exotic.
emeb_mac has quit [Quit: Leaving.]
X-Scale` has joined ##openfpga
<whitequark>
oh
<whitequark>
oh I missed that lmao
X-Scale has quit [Ping timeout: 258 seconds]
X-Scale` is now known as X-Scale
<swetland>
Are any of the various ECP5 boards in development shipping or approach shipping? I'm interested in poking at bigger/faster stuff than ICE40UP5K, but not excited about Lattice's idea of a dev board or fighting with vivado to target 7 series...
<kc8apf>
Until their crowd supply campaign gets started, ULX3S are occasionally available if you email them.
<azonenberg>
whitequark: yeah sata seems to be best bang for the buck for low pin count high speed serial, with sas/sff-8643 as a decent multi lane option
<kc8apf>
agree
<kc8apf>
quite a few BMC reference boards use two SATA for PCIe
lutsabound has quit [Quit: Connection closed for inactivity]
emeb_mac has joined ##openfpga
mumptai_ has joined ##openfpga
mumptai has quit [Ping timeout: 268 seconds]
<TD-Linux>
kc8apf, is that SATA Express or some other configuration
<whitequark>
kc8apf: thinking I'll proceed with my original plan to use SFF-8643, but if you get those lecroy analyzers to work (maybe deriving from the board I'll make?), please let me know
<kc8apf>
TD-Linux: something of their own concoction
<TD-Linux>
oh wow they made a 4 lane sata express. SFF-8639
Bike has quit [Quit: Lost terminal]
<kc8apf>
whitequark: afaik if I can get a passive interposer attached it will just work
<kc8apf>
Problem is that I don't have any
<kc8apf>
May have to buy one just to RE it
<whitequark>
ack
<whitequark>
oh right, I can't actually make an adapter because we don't know the pinout
emeb_mac has quit [Quit: Leaving.]
Asu has joined ##openfpga
OmniMancer has joined ##openfpga
Asu has quit [Read error: Connection reset by peer]
Asu has joined ##openfpga
Jybz has joined ##openfpga
Asuu has joined ##openfpga
Asu has quit [Ping timeout: 265 seconds]
<gruetzkopf>
the ipass stuff is old, early external SAS x4 used it a lot, but of course they don't carry any sidebands there, and the much more uncommon pcie external cabling spec cables are expensive
<gruetzkopf>
also the pcie express external cabling spec defines a completely different connector for x1 links (from the same family as the DVI connector)
mumptai_ has quit [Remote host closed the connection]
mumptai has quit [Remote host closed the connection]
mumptai has joined ##openfpga
cr1901_modern has quit [Read error: Connection reset by peer]
Asuu has joined ##openfpga
Asu has quit [Ping timeout: 240 seconds]
cr1901_modern has joined ##openfpga
Jybz has quit [Ping timeout: 265 seconds]
Jybz has joined ##openfpga
Asuu has quit [Remote host closed the connection]
Asu has joined ##openfpga
edmund has joined ##openfpga
<GenTooMan>
no interest in SCSI express? I would think that might be inexpensive route. SAS 4.0 is a bit complex I noticed as well with the FEC added in.
<hackerfoo>
These things seem to go for $7.5k, and you can use the WebPACK license.
<TD-Linux>
GenTooMan, I think the only interest here was using the cables for pcie
<GenTooMan>
TD-Linux oh, I see, I had seen people using PCIx with SCSI so that seemed to be the direction people were musing toward.
<swetland>
I dunno, is 1.341M LUTs enough?
<gruetzkopf>
GenTooMan: PCIe?
<gruetzkopf>
PCI-X is like, old
<gruetzkopf>
and 64bit wide
Asu has left ##openfpga ["Konversation terminated!"]
<GenTooMan>
don't mind me I got e mixed with x it happens when you don't sleep well.
<hackerfoo>
swetland: That's why I didn't buy it, because I figured I should wait until my little xc7a200t isn't enough. I wish it was an UltraScale, though.
<gruetzkopf>
i mean: i am currently hacking around with PCI-64 soo
<GenTooMan>
I think my mb has a PCI slot on it.... yeah it does ... huh. one of course sort of like PCI systems use to have 1 ISA slot I guess. :D
<daveshah>
hackerfoo: I thought alveo was UltraScale+ (unless you specifically want non-+ for some reason)
<hackerfoo>
And there's 64GB of RAM on the U250, and 16x PCI gen3.
<hackerfoo>
daveshah: I know. That's even better. I'd be happy with a non+, though. It's just that 7-series (what I have) is old and slow.
<daveshah>
Ah, I misread
<hackerfoo>
But I don't need 1.3M luts. I'm more interested in having the latest architecture.
<daveshah>
What about the Ultra96?
<daveshah>
I have the zcu104 for UltraScale stuff which is nice but quite a bit more pricy
<hackerfoo>
I want PCIe so I can just plug it into my dev PC and pump data through it.
<hackerfoo>
It just seems easier than having to work through another CPU.
<hackerfoo>
And it's fun to have massive bandwidth.
<daveshah>
On the plus side, don't need to worry about crashing your PC
<hackerfoo>
I haven't had that problem yet. I'm trying to figure out partial configuration so I don't even need to reboot.
<hackerfoo>
Which is another reason I'd like an UltraScale.
<hackerfoo>
Is a misbehaving PCIe device still a problem when each lane is a dedicated link?
<hackerfoo>
I guess it could corrupt the CPU's RAM or something.
<gruetzkopf>
it is, but you can make it less of a problem using iommu
<gruetzkopf>
it'll still spam your kernel log with address translation errors, but that's better than overwriting the kernel
<hackerfoo>
gruetzkopf (IRC): Thanks. I'll have to check into that. I have VT-x.
<gruetzkopf>
that's VT-d in marketing speech iirc?
<gruetzkopf>
are you on linux?
<gruetzkopf>
intel_iommu=on
<hackerfoo>
I have "vmx" listed in /proc/cpuinfo. Do I need to add the kernel flag?
<mwk>
hackerfoo: vmx is irrelevant, what you want is VT-d
<mwk>
and that needs the kernel option
<mwk>
(vmx / VT-x is about virtualizing CPU, which you're not doing)
<gruetzkopf>
can always recomment enabling it, also good against crashing GPUs trashing all over memory
<hackerfoo>
Thanks
<gruetzkopf>
(of course x86 is decades behind other platforms in implementing it - 32bit sparcs "sbus" DMA has per-slot virtual addresses, the silicon graphics origin 2000 (designed around 1996!) and all later platforms effectively have a IOMMU per XIO<->PCI bridge..
<hackerfoo>
I guess the moral of the story is, prioritize what your customers care about, then steal all the ideas of your dead competitors later :)
<tnt>
I have a GPU here that reliably crashes the whole chipset pcie root ... takes out all devices on it and it needs a cold boot (power off / on, reboot is not enough) to bring them back).
<hackerfoo>
I wonder if features are implemented decades later due to patents.
<hackerfoo>
tnt: What CPU?
<tnt>
hackerfoo: It's a Asrock RX560 GPU connected to a x8 port off an X570 chipset (i.e. not the pcie lanes from the CPU directly, it's on the chipset lanes) and system has a Ryzen 3700X.
<hackerfoo>
Huh. I thought the root was always in the CPU now. I have a lot to learn about PCIe.
<gruetzkopf>
hackerfoo: guess who sank SGI
<gruetzkopf>
mostly intel with itanic
<hackerfoo>
There can be multiple roots?
<gruetzkopf>
yup
<gruetzkopf>
(and it's implementation dependent if you can DMA between devices hanging off different ones
<hackerfoo>
Why are there only 4x lanes connecting the X570 to the CPU? That seems like a bottleneck. I guess it's gen 4, though.
<tnt>
The CPU has 24 PCIe lanes. 16 of them go to a Nvidia GPU, 4 of them to an NVME and the 4 other go to the X570 chipset which includes a PCIe switch that splits them into more lanes.
<hackerfoo>
So each "root" is more of a switch, then.
<hackerfoo>
Hopefully these things will be more robust in the future due to external interfaces.
<mwk>
taking a bus designed for internal use by trusted always-connected devices and making an exposed external interface out of that? sign me up, no way it could go wrong
<hackerfoo>
I look forward to my PCIe powered desk fan. I guess that's just USB C.
<hackerfoo>
Or USB 3? Something like that.
<mwk>
nonono, you need one with a stepper motor where every step is triggered by a MMIO write
<pie_[bnc]>
data power
<hackerfoo>
You'll need a faster CPU to run the fan faster, to keep that new CPU cool.
<hackerfoo>
Rotational reasoning.
<gruetzkopf>
hackerfoo: even my old powermac g5 has two pcie roots
<gruetzkopf>
both hanging off of hypertransport
<swetland>
daveshah: how rough is the nextpnr/xray toolpath at the moment? if somebody had some S7 and A7 dev boards and a bit of spare time would it be worth taking a look at?
<daveshah>
It's full of bugs but reports of those bugs are always welcome :)
<daveshah>
It has mainly been tested with an Arty A35, but other boards with a 35/50T chip should be easy enough to use with too
Jybz has quit [Quit: Konversation terminated!]
<swetland>
I've got an Arty A7 35 and S7 25 and a Nexus4 and 2nd gen zybo sitting around
<daveshah>
Arty is probably the best place to start with it then
<swetland>
is there an example project/workflow somewhere?
<TD-Linux>
it's kind of funny to me that all the arguments about pci-e security are based around the iommu, when linux disables it by default
<gruetzkopf>
iommu=off is a very strange choice, yes
<TD-Linux>
bonus points for iommu not being per device but per group, grouping determined by trustworthy and competent motherboard firmware vendors
<gruetzkopf>
the dmabuffer <-> iommupage alignment also _really_ needs cleanup
emeb_mac has joined ##openfpga
<gruetzkopf>
no use if driver .data or .text are still overwriteable..
<swetland>
daveshah: cool! will poke around. any particular areas/features worth looking for issues with or exercising?
<daveshah>
Not really, if you hit bad router performance then please try the router2 branch and see if that makes things better or worse
<swetland>
do I need all this crazy python stuff to just download and use the existing xray database?
<daveshah>
What Python stuff?
<daveshah>
xray bitgen does require various deps
<daveshah>
the nextpnr import doesn't have any external deps, iirc, but pypy3 is very much recommended over regular CPython for that for speed reasons (although less important if you only plan on doing it once and not hacking on it)
oeuf has joined ##openfpga
<swetland>
the xray stuff has a list of various python things and instructions for local / global installation, etc. similarly, is vivado 2017.2 needed for the "just use it" path? (just installed 2019.2 to have the vendor tool to check designs against and am already reminded how much I loooove vivado x.x)
<daveshah>
No Vivado at all is needed, although you might need to patch out the check in Xray's environment.sh
<swetland>
yay. also, wow, vivado seems to have gotten even slower and clunkier in the past year or so (*and* I've upgraded my workstation since I last had the misfortune of using it)
<swetland>
but, hey, trivial design builds and runs on the metal, so there's a point of reference
<tnt>
Oh, so turns out the crosslink nx ALU implements some custom operations, rotate left/right and some fixed-point-math-optimized MULH / MULHSU / MULHU variants.
<daveshah>
Yup
<daveshah>
I wonder if this is the first time a feature from bitmanip has been taped out (rotates), even if the encoding isn't correct
<daveshah>
prjxray-db in xilinx/external should be the exact git submodule commit, not master
<daveshah>
There have been some upstream changes I haven't caught up with yet
<swetland>
ah it does seem to like this one more
<swetland>
getting closer. fasm2frames.py can't import fasm ... some path/env issue I expect
<daveshah>
Perhaps because the "source" in the script is failing
<daveshah>
You might need to set XRAY_DIR and or patch out the Vivado check in Xray utils/environment.sh
<daveshah>
Also might need an xray git submodule init/update if you haven't already
<swetland>
there isn't a source in blinky.sh or attosoc.sh -- maybe I need to do something so the virtuanenv stuff knows where to find its junk when I run the tools/utils from a different working directory?
<daveshah>
No you don't need any virtualenv stufd
<daveshah>
*stuff
<swetland>
I set XRAY_{TOOLS,UTILS}_DIR so the scripts can find fasm2frames.py and xc7frames2bit
<daveshah>
The source is in attosoc.sh at least, it was added recently
<swetland>
oh there it is
<daveshah>
If it doesn't work you can try running it before the attosoc.sh separately
<daveshah>
That script should set up pythonpath so it finds everything
<swetland>
ye gods these tricolor LEDs are blinding
<q3k>
heh
<daveshah>
Haha
<daveshah>
Glad it works
<daveshah>
Perhaps some PWM is needed...
<swetland>
very cool.
<swetland>
does attosoc do anything beyond blinkenlights? looks like the switches affect the patterns somewhat
<daveshah>
The blinkenlights should be prime numbers in base 4
<daveshah>
something like red=1, green=2, white=3
<daveshah>
one of the switches is reset, iirc
<swetland>
I don't think I can decode them without protective eyeware. looking at 'em directly makes my eyes water ^^
<swetland>
thanks for the help getting stuff setup. this is exciting
<swetland>
is it expected that the resource counts for the 35 look more like (but don't seem to exactly match the datasheet) for the 50?
<daveshah>
;)
<swetland>
the attosoc project uses a hilariously tiny fraction of the available resources on this part
<daveshah>
Note that the nextpnr LUT count is LUT5s not LUT6s because of how it deals with fractured LUTs
<swetland>
I'm guessing the register file ended up in distributed ram as only one BRAM seems to be used
<daveshah>
But the 35 is just a rebranded 50, people have been hacking them for a while
<daveshah>
The reg file might be in RAM and the tiny program in LUTs
<daveshah>
It doesn't have any system RAM
<daveshah>
just the reg file
<daveshah>
hence attosoc instead of picosoc
<swetland>
ah so that's why it doesn't mesh up with the slice counts
* swetland
nods
<daveshah>
The FF count should be the real FF count though
<swetland>
okay SLICE_FFX total of 65200 makes sense against 8150 slices of 8 FFs
<daveshah>
Yup
* swetland
declares 2020 The Year of The Linux Desktop^W^WOpensource FPGA Toolchain
<gruetzkopf>
the land of "opensource fpga toolchain on linux open source cpu built with open source fpga toolchain" was already reached :D
<swetland>
I'm less excited about running linux on fpgas. 10ish years of android got all the embedded linux out of my system. it is so huge and clunky. (but cool that it can self host like that)
<daveshah>
It's less huge and clunky without Android :p (but still not very practical outside of techdemos)
<swetland>
oh to be sure. but 8MB or so as the price of entry for a relatively minimal arm kernel is just kinda nuts. and the surface area of the kernel is insane. but the modern distros make android look positively slim by comparison.
<swetland>
where in the world is the *code* in this fasm thing?
<daveshah>
The attosoc demo?
<tnt>
Has there been a single instance of where running linux on a fpga soft core was more than a tech demo ? ATM given the speed of fpga softcores it seems like a huge overhead with little benefit ?
<swetland>
daveshah: no the prjxray "fasm" library thing the fasm2frame.py uses (I'm curious if I can depython the hot path ^^)
<swetland>
third_party/fasm seems like barely the shell of a library
<daveshah>
I think it must come from there
Bike has joined ##openfpga
<daveshah>
I don't know the structure of Xray that well and it has changed a bit over time
<daveshah>
tnt: some asic people might use it to test Linux on their design without silicon faster than a simulator?
<swetland>
looks like prjxray/fasm_assembler.py, bitstream.py, etc
<daveshah>
not sure if that still counts as a techdemo
<tnt>
daveshah: I guess it sits somewhere in between techdemo and "real application" since it's really just a temporary stepping stone :)
<swetland>
I know the qualcomm folks who did early linux boot verification on their bignormous fpga emulators for their socs talked about hours-to-days boottimes ^^
<tnt>
But I was more thinking of things like ECP5 rather than gigantic ultrascale simulating a full asic :p
<daveshah>
Yeah, I think for at least 99% of practical applications for Linux+FPGA a Zynq is a better bet
<swetland>
I think there's a ton of potential for softcore stuff on medium sized fpgas, but a full linux build (even stripped down as "embedded") is pretty huge and certainly overkill
<dh73>
I booted Linux/Windows/MacOS on HAPS-70 and ZeBU. Windows take around week and a half to boot in ZeBu. In HAPS is much faster. Booting OS in FPGA/Emulators is slow, because the main purpose is to have traces of specific IPs. When OS is booting, you have fixed most of the bugs in the bring-up stage.
<tnt>
swetland: oh yeah for sure, I very much like softcores :)
<swetland>
also while riscv is cute, I expect one could do things more tailored for fpgas architecturally.
<swetland>
but I suppose riscv is today's mips -- it's the pipelined cpu everyone builds in that processors course ^^
<daveshah>
I fear most university processor courses are still using MIPS
<daveshah>
At least based on the odd glance at Verilog stackoverflow etc
<swetland>
textbooks have been shifting
<GenTooMan>
MIPS is well known and has had a lot of "in place" running since the late 1980s
<swetland>
the selfie folks (have you seen that project? s/mips/riscv/ subset emulator/hypervisor/c-subset-compiler/etc) moved to riscv
<gruetzkopf>
the mips courses aren't even on the fun mips variants :(
<sorear>
Interesting
<sorear>
Gives me an excuse to finish my BSV compiler now
<tnt>
daveshah: anyclue why BB_CDR has a INADC pin ?
<hackerfoo>
sorear: link?
<daveshah>
tnt: I think internally it is a similar concept
<daveshah>
Both are "analog" uses that avoid the regular IO buffer
<sorear>
hackerfoo: I never published it, and I didn’t get interestingly far, I have 1/20 of a parser
<tnt>
but you can't connect it to the CDR block ... it throws an error ...
<daveshah>
Hmm, don't know then
<daveshah>
Looking at the routing indeed it doesn't seem like they use the ADC path
<tnt>
yeah weird ... maybe you're supposed to use that when you want the ADC path on CDR capable pins instead of BB_ADC but ... why ...
<daveshah>
Or if you want to use both the ADC and the CDR?
<daveshah>
idek why
<tnt>
Still haven't got the CDR working btw. Opened a TSR with lattice see if they can provide any insight ... (like maybe "oh yeah, it's broken on ES" or something to that effect)
<daveshah>
Will be interesting to see what they say
<tnt>
Yup. In the mean time I'm looking at DELAYA atm and I'm wondering what this "Edge Monitor" feature is ? Do you have any clue ?
<tnt>
I can't seem to find much info about it at all.
<daveshah>
I think it means it updates the delay value internally when it sees an edge on the signal
<daveshah>
I don't know why you want that. One of the docs somewhere mentioned it being for SPI4.2
<tnt>
yeah, WAIT_FOR_EDGE is "Used for SPI4.2 implementation."
<tnt>
There is also EDGE_MONITOR "To enable edge monitor when in a IDDR X2, X7to1, X4, or X5 mode."
<daveshah>
Oh, I didn't realise they are two different things
<daveshah>
The ECP5 only has WAIT_FOR_EDGE iirc
<daveshah>
I also noticed some bitstream weirdness around the DELAYA blocks when I was fuzzing them (I haven't finished as I need to go back and work out what is really going on)
<daveshah>
it looks like the 800ps coarse delay mode might be always on for some reason