<tpw_rules>
whitequark: it's not to boneless, but i guess that's the magic of ABC??
<tpw_rules>
also relatedly i just got my first MAC block working :D
<whitequark>
nice!
emeb has quit [Quit: Leaving.]
<tpw_rules>
anyway uh, no barrel shifter: 1508 LUTs 15.33MHz, with barrel shifter: 1496 LUTs 14.06MHz.
<whitequark>
oh
<whitequark>
interesting
<tpw_rules>
i also did check that it works and it didn't like just simplify it out
<tpw_rules>
but yeah now my boneless can do 16x16->32 multiply with the result shifted right by any amount in 4 cycles. great for fixed point math, which what i'm about to do a lot of
<OK_b00m3r>
:D
<tpw_rules>
(it would be 2 cycles if i could figure out how to do a custom instruction)
<implr>
am i insane or is it normal for xilinx's bootgen to segfault when run under strace
<implr>
also why, is this some cursed drm/license scheme
<mwk>
it's xilinx
<mwk>
why wouldn't it segfault?
<mwk>
also, try running it under valgrind
<whitequark>
implr: they don't even bother stripping half the symbols on linux
<mwk>
maybe you'll find some dumb memory corruption (again)
<whitequark>
there's no drm
<whitequark>
i mean
<whitequark>
there is but they didn't fuck up the binary
<mwk>
umm
<mwk>
a friend told me you can fix the drm by patching like 3 bytes in the binary
<mwk>
or something
<kc8apf>
yep
<tpw_rules>
three whole bytes????
<tpw_rules>
i once fixed drm by patching 1 bit
<tpw_rules>
:P
<kc8apf>
"we put in FlexLM so we're all good"
<whitequark>
it's more like 11 bytes
<whitequark>
but yes
<whitequark>
maybe there's a more elegant way in 3.
<kc8apf>
*leaves in tons of symbols and 50% of the app is TCL scripts*
<whitequark>
i think they know and it's on purpose.
<mwk>
*shrug* I don't recall
<whitequark>
they definitely knew for ISE.
<mwk>
it's been a while since I... errr, my friend looked at it
<whitequark>
like. they replied as much to the researchers who broke the crypto in *that* flexlm.
<whitequark>
"if someone wants to have it that badly they aren't paying anyway", paraphrased
<kc8apf>
maybe they don't realize you can use a symbol-rich version to look at crash reports
<whitequark>
you can literally just break on the error message
<whitequark>
it's not like it's any harder to do with any other toolchain.
<whitequark>
other than the massive amount of inlining some of them do.
* mwk
recalls it was simpler to crack the gowin toolchain than actually set up the license server, despite actually having a license for it
<whitequark>
that's definitely the case for diamond.
<whitequark>
and *possibly* ISE these days, i recall the licensing crashes on recent linux or something
<mwk>
huh, interesting
<mwk>
worksforme, and I run arch
<kc8apf>
the whole looking for eth0 thing
<mwk>
oh that
* mwk
just keeps a dummy ethernet device as eth0
<whitequark>
it also did something stupid, uh
<whitequark>
it looked for eth0 via dbus or something
<whitequark>
unless you ran it as root
<whitequark>
and the dbus version didn't match
<whitequark>
i don't recall *exactly* but it was idiotic
<implr>
they did remove the license checks for webpack completely though
<implr>
so you don't have to do any of that shit if you don't have a license
<implr>
anyway I tried to gdb that bootgen and got defeated by their massive spaghetti of launcher bash scripts
<vup>
vivado also segfaults when run under strace, so seems like its a common pattern
<implr>
mwk: btw do you perhaps know something about byte-swapped(?) series 7 bitstreams?
<mwk>
?
<mwk>
what do you mean?
<mwk>
I mean, you do need to bitswap the bytes when stuffing them via eg. JTAG
<mwk>
because they're supposed to be loaded MSB-first, and JTAG shifts LSB-first by its nature
<mwk>
but it's pretty well-known
<implr>
so the way you load a bitstream on a zynq from linux is that you DMA into this special device
<mwk>
right, PCAP
<implr>
and that supposedly just accepts the whole file raw and figures out the headers and stuff on its own, except not really
<mwk>
but I think that takes parallel data, right?
<implr>
because a .bit file is 'byte-swapped', whatever that means but that's what the internets say
<mwk>
*sigh*
<implr>
so, on old linux-xilinx, you could cat foo.bit > /dev/xdevcfg
<mwk>
what idiot even uses words such as "byte-swapped"
<mwk>
it's like calling little endian "swapped bytes"
<mwk>
they're not swapped ffs, they're just in a different order than you assumed to be
<implr>
but then they decided to integrate with the fpga-manager mainline framework instead of a special snowflake out of tree driver
<mwk>
at least say "byte-swapped related to <something>"
<implr>
but the thing is, their old driver parsed it and if it detected a raw .bit it did the swapping in the kernel
<mwk>
</rant>
<mwk>
raw .bit as opposed to what?
<mwk>
because .bit is *not* raw
<mwk>
it has a header
X-Scale` has joined ##openfpga
<implr>
well, raw in the sense 'straight out of vivado, no processing'
<mwk>
and while (I think) it has the correct bit-within-bytes order to be stuffed into PCAP directly, the header needs to be stripped
<mwk>
because (for PCAP at least), you need the correct 32-bit alignment, and header is not necessarily a multiple of 4 bytes
X-Scale has quit [Ping timeout: 240 seconds]
<kc8apf>
PCAP doesn't wait for the sync word?
<mwk>
kc8apf: it does
<mwk>
but it needs the words to be aligned
<implr>
...so when they mainlined it, the kernel people told them to fuck off and fix their userland to generate the correct file, instead of doing bit mangling on a big buffer in the kernel
<kc8apf>
ah, right
<mwk>
and if you instead a non-multiple-by-4 amount of bytes before it, that doesn't happen
<mwk>
implr: alright, first bit of advice
<implr>
so on sufficiently recent kernels if you want to load a .bit, you first have to run it through bootgen, which does something to it
<mwk>
try generating an actual raw bitstream instead
<mwk>
there's a global option you can set in vivado before write_bitstream, I think
<mwk>
you'll get a .bin file instead of .bit
X-Scale` is now known as X-Scale
<implr>
well the bootgen method works and is the official way of doing that
<implr>
ah, I think I saw that somewhere
<mwk>
that or just take the .bit and strip everything before the "ff ff ff ff 00 00 00 bb" sequence
<mwk>
well, maybe a few more ffs
<mwk>
hm, just stripping everything before the first 0xff byte should work
<mwk>
I don't think the framing can have any ff bytes in it
<mwk>
I can definitely tell you that a .bit file won't work
<implr>
heh I suspected it being due to the soc being little endian and them overlooking that fact
<mwk>
and TRM is vague on the concept of what is a word here
<mwk>
and that script you linked does sound reasonable
<implr>
for now I got this to work already with a cursed python wrapper around bootgen, but pulling in vivado to effectively call htonl(3) in a loop seems a bit excessive
<implr>
so yeah I'll probably use that script, thanks for the investigation
* mwk
concludes, from the fact that the script exists at all, that the bytes indeed need to be swapped between .bit and whatever DMA needs
<mwk>
I suppose that's the "fun" of havine a big-endian configuration interface hooked to a little-endian cpu
<implr>
how likely is it that they forgot that this would be a problem and had to fix it in software
<mwk>
forgot or just didn't care
<mwk>
I mean, eh
<mwk>
if you program via JTAG, you get to *bit*-swap the bytes instead
* mwk
wrote a python script for that, and it spends like 90% of its time swapping the damn bytes
<mwk>
I mean, a programmer script
rohitksingh has joined ##openfpga
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
juri_ has quit [Ping timeout: 268 seconds]
juri_ has joined ##openfpga
freemint has quit [Ping timeout: 245 seconds]
genii has quit [Quit: Welcome home, Mitch]
nrossi has joined ##openfpga
rektide has quit [Remote host closed the connection]
Bike has quit [Quit: Lost terminal]
mumptai has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]
OmniMancer has joined ##openfpga
<OmniMancer>
In the yosys nextpnr flow, where does the mapping step occur?
<mwk>
"mapping"?
<mwk>
what do you mean by that?
<OmniMancer>
technology mapping might be the other name?
<OmniMancer>
though looking at it more I think yosys does this
<whitequark>
OmniMancer: in abc
<whitequark>
well, mostly
<whitequark>
there are also a bunch of `techmap` passes that do things abc can't
<OmniMancer>
And then you get out a thing made out of cells which roughly match what is in the FPGA?
<mwk>
kind of yes
<mwk>
nextpnr also does a bit of packing
<mwk>
but most of the heavy lifting is done by techmap (which converts generic-ish primitives inside yosys to the target library) and abc (which converts logic into LUTs)
freemint has joined ##openfpga
<OmniMancer>
nextpnr needs to pack the LUTs and other cells into actual sites in slices?
<daveshah>
tbh it doesn't
<daveshah>
This was what I thought was best at the time of doing ECP5
<daveshah>
before nextpnr had any kind of reasonably fast placer
<daveshah>
These days I'm more interested by flows that put more work into legality checking during placement rather than packing
<OmniMancer>
so it could just place them on sites?
<daveshah>
Yes, so you would have separate LUT and FF bels for example
<OmniMancer>
Ah
<mwk>
well you also need to do fairly complex constraint checking to make sure it all fits together
<mwk>
but yes
<daveshah>
I found for UltraScale that wasn't anywhere near as bad as I thought
<mwk>
(eg. check all FFs in slice have compatible controls sets, etc)
<daveshah>
It's quite a bit of code but it's very simple logic so it runs very quickly
<OmniMancer>
I think I realise why the Anlogic as the a0-7 b0-7 c0-7 d0-7 for each location and then also specific slice names for them, it allows the routing part to be uniform and then just have a mapping per location type to map the generic names to the specific functions
<daveshah>
You can also be clever and only check parts of a slice that changed
<OmniMancer>
I suspect those routing bels are the same everywhere but what the signals mean depends where they are, but will have to see when I do some poking at it
freemint has quit [Ping timeout: 250 seconds]
Jybz has joined ##openfpga
freemint has joined ##openfpga
<mwk>
"bels" are *not* routing
<mwk>
what you're thinking of is pips
<mwk>
and wires
freemint has quit [Ping timeout: 276 seconds]
mumptai has quit [Remote host closed the connection]
freemint has joined ##openfpga
<OmniMancer>
oh sorry, pips then
<OmniMancer>
bels are basic logic elements?
<mwk>
yes
<daveshah>
For some reason Xilinx groups pips inside slices into "routing bels"
<daveshah>
But they aren't really bels in any sensible meaning
<OmniMancer>
ah, how helpful
<mwk>
huh, I thought they were called "site pips"
<daveshah>
They are
<daveshah>
But the groupings like AOUTMUX are called routing bels
<daveshah>
At least in Vivado
<daveshah>
Maybe not in ISE
<mwk>
on ISE it's all bels
<OmniMancer>
everything is a bel according to xilinx?
<mwk>
well, everything within a site
<mwk>
neocad basically requires one config option == one bel
<mwk>
(and you can also have non-configurable bels)
<mwk>
so you have a bel that sets an FF's initial value, for example
<mwk>
or a bel that sets sync vs async reset for all FFs in a slice
<mwk>
some bels are explicitely muxes, where the config option values directly correspond to the mux inputs
<OmniMancer>
ah yep
<mwk>
the inversion select bels (eg. CLKINV) also count as muxes btw, they mux between CLK and CLK_B
<OmniMancer>
indeed, wouldn't want uneven delay
<daveshah>
Xilinx doesn't have an F output per se
<daveshah>
Closest is probably the MUX one which is selected by a routing bel
<mwk>
it used to have
<daveshah>
Hah
<mwk>
around spartan 3 era
<daveshah>
I almost thought F was original lattice naming
<mwk>
and yes, that was a mux bel (FMUX)
<daveshah>
_sigh_
<daveshah>
Was F selected between LUT or LUT MUX by any chance?
<mwk>
daveshah: spartan 3 had two LUTFFs per slice, one LUT had F output, the other had G output
<daveshah>
Oh right
<mwk>
LUT, LUT MUX, or LUT XOR
<daveshah>
So that's a different meaning to F in ECP5
<mwk>
hmm, wait
<mwk>
I messed up
<mwk>
F is always the LUT output
<daveshah>
Oh never mind it is quite similar
<mwk>
X is the slice output
<daveshah>
Aaaaa
<daveshah>
What is the now called X input called then?
<mwk>
and X can be muxed from F, F5, or FXOR
<mwk>
you mean the bypass input?
<daveshah>
Yeah
<mwk>
BX and BY
<daveshah>
I see
<mwk>
not to be confused with XB and YB, which are bypass *outputs*
<daveshah>
So the F output in ECP5 is muxed between the LUT (or XOR in carry mode) or MUX2
<mwk>
that would correspond to X
<daveshah>
But ECP5 doesn't really have site pips so that's selected outside the slice
<daveshah>
Meanwhile the bypass/mux select input is called M (miscellaneous)
mkru has joined ##openfpga
freemint has quit [Ping timeout: 240 seconds]
<OmniMancer>
is there also an FX?
<daveshah>
Yes, that's the mux output
<daveshah>
mux2 output
<OmniMancer>
ah okay
<OmniMancer>
seems like Anlogic have similar naming for all of that
<daveshah>
The actual F signal is selected between F from the slice primitive (LUT output) or FX from the slice primitive (MUX2 output)
<daveshah>
For MachXO2 it's a bit different because FX can also be used separately
<OmniMancer>
Oh, I think there is two separate outputs, but unsure
<daveshah>
On ECP5 there are two separate outputs but they are immediately followed by pips to select one of them to actually use as F
<daveshah>
You can't route both of them to fabric at the same time
<OmniMancer>
ah
<OmniMancer>
I am not sure yet about Eagle
<OmniMancer>
but the slice has a Q and F and an FX output
<daveshah>
Right that sounds the same as Lattice
freemint has joined ##openfpga
<OmniMancer>
Alas the logic that dicates how F and FX are derivied is just a big box of "output combine" logic so not very enlightening in the datasheet
<daveshah>
I see the LUT5 thing makes it a bit more complicated
freemint has quit [Ping timeout: 265 seconds]
freemint has joined ##openfpga
<OmniMancer>
The LUT5 is a bit strange
<OmniMancer>
I am reasonably certain its actually 2 LUT4s with an extra MUX
<OmniMancer>
you can use both of the LUT4 outputs in some case AFAIK
<daveshah>
Well that is how LUTs are always built
<daveshah>
All the FPGA LUT designs I know of are just 1 bit memory cells and a tree of muxes
<OmniMancer>
well yea, but I mean the signals from the two LUT4s are available independently
<daveshah>
That is why different LUT inputs tend to have significantly different delays
<daveshah>
Right
<OmniMancer>
I am not certain how much that is true though, the fuzzing for some of those settings is a bit resistant to providing interesting outcomes
<OmniMancer>
It also doesn't seem to help that the tools reaction to incorrect inputs is to produce no bits
<OmniMancer>
so anything I have said might be wrong as its mostly inferred from the translated data sheet and the behaviour of the tools
<OmniMancer>
The muxes closer to the end of the tree had lower delay?
rohitksingh has quit [Remote host closed the connection]
<mwk>
daveshah: I messed up with the reset inversion, sorry for the confusion :(
<mwk>
so if I understand it correctly, on Ultrascale you can invert every single general fabric input for free?
rohitksingh has joined ##openfpga
<OmniMancer>
does it just have the inverted input on the mux as well?
<daveshah>
mwk: no, just it adds an SR inversion mux
<daveshah>
No worries btw
<daveshah>
What's confusing is that the primitive has supported inversion for a while longer
<daveshah>
I think this is because IO flipflops had data and SR inversion, but not slice ones
<mwk>
according to my data, virtex 6 has data inversion, but no SR inversion (?!?)
<mwk>
hm
<mwk>
wait, this actually makes sense
<daveshah>
That might be for IO FFs too
<mwk>
I'm talking specifically about IO FFs
<mwk>
normal slice FFs cannot do data inversion
rohitksingh_ has joined ##openfpga
<mwk>
and I think it's supported for IO FFs because you need inversion to do faux-differential output
<daveshah>
But there's also the special inverter in the IOB for that
rohitksingh has quit [Ping timeout: 250 seconds]
<mwk>
what inverter?
<daveshah>
From xc7 onwards, anyway one of the IOBs has an inverter for pseudo differential outputs l
<mwk>
it's just how they represent true differential output vs faux differential
<daveshah>
Well, it behaves like one
<daveshah>
Oh, I see
<mwk>
so the "inverter" is really half of the differential output driver
<daveshah>
Anyway, I guess the inversion of FF input is still useful if you don't want to use the official diff pairings for a pseudo output
<mwk>
if you're doing faux differential output, you cannot use it
<mwk>
and have to use FF input inversion
azonenberg_work has quit [Ping timeout: 245 seconds]
freemint has quit [Ping timeout: 250 seconds]
freemint has joined ##openfpga
freemint has quit [Ping timeout: 245 seconds]
freemint has joined ##openfpga
freemint has quit [Ping timeout: 245 seconds]
freemint has joined ##openfpga
freemint has quit [Quit: Leaving]
freemint has joined ##openfpga
q3k has quit [Ping timeout: 265 seconds]
q3k has joined ##openfpga
mifune has quit [Quit: leaving]
mifune has joined ##openfpga
rohitksingh_ has quit [Ping timeout: 250 seconds]
freemint has quit [Ping timeout: 250 seconds]
OmniMancer has quit [Quit: Leaving.]
davidthings has joined ##openfpga
Asu has joined ##openfpga
freemint has joined ##openfpga
freemint has quit [Remote host closed the connection]
freemint has joined ##openfpga
<azonenberg>
daveshah: speaking of fpga input delays changing
<azonenberg>
Does yosys/nextpnr support timing driven swapping of lut inputs to optimize critical path delays?
<daveshah>
Yes, in several possible ways
<azonenberg>
Awesome
<daveshah>
Either as making LUT permutation appear as pips with delay set to the difference in delays in front of the LUT (I've done this for Xilinx but currently for routeability rather than timing)
<daveshah>
or doing a fixed permutation pass between placement and routing based on criticality (this is what I did for ECP5 where LUT permutation didn't seem to benefit criticality so much)
<azonenberg>
Also is the nextpnr placer/router still annealing based?
<ZirconiumX>
It's either SA or HeAP
<azonenberg>
my recollection was that there was talk of doing an analytic placer
<daveshah>
The placer is analytical with an annealing based refinement step
<azonenberg>
ah ok
<azonenberg>
so it's a hybrid
<daveshah>
The current router is mostly A* with ripup and some heuristics
<daveshah>
I am working on a negotiated congestion router with parallelism and possibly SAT integration at the moment
<azonenberg>
That was gonna be my next question, what the status of parallel P&R was
<daveshah>
The only parallel part at the moment is the analytical placement
<azonenberg>
also parallel synthesis, can you read_verilog multiple files in parallel into the AST yet or anything like that?
<daveshah>
(both solving for X and Y in parallel and using an OpenMP sparse matrix solver)
<daveshah>
Yosys has no parallelism
<azonenberg>
I wanna say i might have tried using OMP in a few of my passes at one point or another? could be wrong
<azonenberg>
But yeah certainly no support yet for running multiple parallel commands or anything like that
<azonenberg>
i feel like at least basic parsing and elaboration for multiple files should be parallel capable
<daveshah>
I think something more interesting is ccache for Yosys
<azonenberg>
whether that buys any performance would be more interesting
<azonenberg>
My #1 wishlist item for yosys remains enough systemverilog support that it can compile my vivado designs
<daveshah>
talk to mithro!
dh73 has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]
genii has joined ##openfpga
davidthings has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
bwidawsk has quit [Quit: Always remember, and never forget; I'll be back.]
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
bwidawsk has joined ##openfpga
freemint has quit [Ping timeout: 245 seconds]
rohitksingh has joined ##openfpga
rohitksingh has quit [Ping timeout: 245 seconds]
emeb has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]
davidthings has joined ##openfpga
davidthings has quit [Read error: Connection reset by peer]