<edbordin> I was wondering why there was obnoxiously loud background music on the podcast....turns out I forgot to stop the music I had playing -_-
Yehowshua has quit [Remote host closed the connection]
Yehowshua has joined #nmigen
Yehowshua has quit [Ping timeout: 245 seconds]
Degi has quit [Ping timeout: 256 seconds]
Degi has joined #nmigen
lkcl_ has joined #nmigen
lkcl__ has quit [Ping timeout: 240 seconds]
jaseg has quit [Ping timeout: 260 seconds]
jaseg has joined #nmigen
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #nmigen
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 264 seconds]
PyroPeter_ is now known as PyroPeter
mwk has quit [Ping timeout: 244 seconds]
Yehowshua has joined #nmigen
So I've got the TinyFPGA using USB->Serial as a UART. USB of course is fixed at 48MHz which means I need an AsyncFIFO to get data into faster clock domains. Been browsing for an example to use for Migen's AFIFO in `nmigen.lib`. Any pointers?
<286Tech> If you have a stimulus process, then just a yield will advance the clock by one cycle right? Assigning to a signal yield dut.a.eq(1) will not AFAIK.
<286Tech> Sorry, I meant to say that pysim seems to work correctly, but the VCD file looks off.
<286Tech> I set a signal to 1, do a yield, then set it to 0, and yield again, but the VCD file only shows the 1, but not the 0.
<286Tech> After looking at it more closely, the outputs of the DUT seems correct in the VCD file. It's just that the inputs that I assign in the stimulus process is wrong.
286Tech: could you upload your stimulus process somewhere?
<286Tech> I just started building an instruction fetch module, and now I've run into this.
<286Tech> For example, i_stall remains 1 in the VCD file (even though the outputs behave correctly), and so does i_load, i_data, and i_load_addr.
<286Tech> They keep the value that they were assigned first in the VCD file.
FL4SHK: wait - are you writing an assembler in Python?
[nmigen-soc] rroohhh opened pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZod
[nmigen-soc] codecov[bot] commented on pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZoN
[nmigen-soc] rroohhh commented on pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZop
proteus-guy has quit [Ping timeout: 240 seconds]
Yehowshua: I'm writing it in this style: I pass a Python list to the assembler
FL4SHK : Sounds like you're doing some DSL? Using`with` is kinda neat. I noticed that nMigen's RTLIL emitter has with statements so that scopes within the AST are tab aligned.
yes, doing a DSL of sorts
that's how the assembler will be working
What ISA does it target?
custom one
I find the process fun
Haha - I bet
It's MIPS like in some ways
My friend did a haskell assembler once
Was good for catching errors cuz haskell is functional
I've built stereotypical assemblers with lexers and parsers before
this is better because it allows me to include higher level stuff in it
I always included scopes in my assemblers
that way you get local labels
That's a smart idea now that I think about it
Do you mean you want a GPU with FPGA fabric?
I think the place and route problem, an NP hard problem, would benefit from being implemented as an island model genetic algorithm
I mean I want to look at nextpnr
and make it run on a host's GPU
So nextpnr does have CUDA annealing support
oh it does?
this is good
what about OpenCL?
I think... My friend wrote one - I don't know if he did a pull request
I mean, I've got an nvidia GPU in my main machine
Lemme ask him
Never seen that
And nextpnr doesn't use annealing for the bulk of placement now anyway
island model GA is another solution
I think that's what I'd use for running it on the GPU
Hypothetically you could run the sparse matrix solver on the GPU but the benefit would be non-existent
I'll need to take a look at this at some point
because the place and route stage takes forever
NP hard search problems are good to place in an island model GA
The problem with all these things is not the algorithm but actually dealing with all the real world constraints
What do you mean?
I suspect that there are other faster heuristics, particularly for routing which is the pain point
something like GA would be my choice when I cared about QoR over runtime
island model GAs are faster than regular GAs
What's "QoR" stand for?
all the complexities like fracturable LUTs, placement rules, etc
quality of results
OK - just asked him - he said he added CuBlas support for analytical
But he didn't get much speedup
Yes, that's what I'd expect tbh
He also did not pull request
well, all right.
The main problem is route time, but this is more down to the placer needing more routeability heuristics
daveshah, did you know there are some companies using nextpnr in production?
I've never used nextpnr
which is what I've been working on, although taking even a fairly detailed paper and example code (RippleFPGA) and actually turning it into a useful, generic, real-world implementation is quite tricky
That's interesting - for an existing arch or for their own FPGA arch?
My main computer is going to be doing this thing where you work with cache lines instead of registers
He also wrote a shader for a custom architecture I think
I've built a machine like this before, in SV
daveshah, has Lattice ever reached about about a collaboration on NextPNR?
Yes, but things move slowly so I can't really say much yet
OK. That's fantastic
Yehowshua: so do you think having only full products for multiplies is a problem?
also, is it possible to use an unsigned full product to implement a signed full product?
Can you expand a bit? I haven't heard of the term full product before
multiply two 32-bit numbers, producing a 64-bit result
I don't think its a problem
In fact, I think its quite useful!
Especially with fixed point
for this video game console, there's no floating point
Oh - I see. You don't have a mvfhi instruction?
this isn't quite MIPS
multiplies encode both hi and low into the instruction
`mulu rA, rB, rC, rD`
I think I might just include signed full products
That's clever
Have you factored into this the cost of two write ports?
two write ports of what?
Or are your multiplies multicycle?
Wherever your two destination registers are
oh, for the register file
I didn't think about it, no
Probably going to just make multiplies multi-cycle
That's fascinating thought. I'm actually testing multi port writes on the ice40 this afternoon
See how it affects BRAM
since the register file is rather small, I can probably just implement registers out of logic
**BRAM count
but BRAM is probably faster
Unfortunately, Yosys doesn't do any tricks to map multiple write ports
So it will end up as FFs
I could just use two memories
Yeah - that's what I suspected
There are various ways round it, either using a XOR trick
My way seems to be "just use more block RAM!!!1one"
or a small FF based memory to track which was last written
honestly, though
I might just expand the pipeline
You don't just need more block RAM, you also need one of those tricks
probably make it so there are two write stages
Also, daveshah, if you want to show validity of nextpnr in industry, I know a couple place where its used, in case that comes up in future convos with Lattice
Yes, that would definitely be helpful.
daveshah, does having two write stages cause much of a problem?
You mean you only do one write per cycle?
well, yeah, and it solves the race condition when you encode the destination registers as the same
I guess that will create a bubble when you multiply
there's already going to be a bubble
because it's multi cycle
I don't know enough about CPU arch to know if that's a major issue
If your multiply is already multicycle then it's probably going to be cheaper than a two write ported file
If that's the only thing you need two write ports
I believe it's the only thing I need two write ports for
If you dual issued other instructions, then the cost of two write ports would be much more worthwhile
this machine isn't intended for maximum performance
CPU dev was my main draw into hardware
For a game console, other accleration is probably more interesting anyway
Lots of fun to be had with video and audio
this is going to be kind of like a GBA
got the VGA signal generation done
but it'll be running at a high enough clock rate for software 3D
if the GBA can do software 3D, so can this thing
What company do you do AERO for - if you can say...
I was at GTRI
I had nothing to do with the incidents last year though
Yeah - no shame man
so for this assembler I'm making
I've even got stack frame stuff set up...
decided upon this structure
fp + 0: register save area
fp + register_save_area_size: local variables
I'm kinda impressed actually
How long did that take you?
to do what now?
the assembler?
It's not done!
stack fram structure
oh, that was pretty easy
I've done this before, but it's been a while
it's very simple
look at the description
that's all there is to it
I'm writing a compiler for my thesis - and still haven't finished the stack - by my ISA is intended for CNNs - go figure
I don't think I'll ever go back to SV outside of work
nMigen covers my needs far, far too well
<emeb> then on the output pick R & C so the corner is at or below nyquist of the sample rate.
<emeb> for my system the sample rate is << clock rate
yes, nMigen is quite good. I tried to switch some courses at my school that I help teach to nMigen
The students totally freaked
they freaked out?
But I think if nMigen were to become heavily used, it would have to be taught early on
Yeah - they had never seen git or python before
oh, this must be low level coursework, then
They also didn't know how to install thing with package managers
<emeb> that's a lot to take in all at once
Yeah, like sophomore year
I studied a lot of computer science stuff on my own
don't have a degree in it
I have a mechanical engineering undergrad degree and an electrical engineering master's degree
I'm computer engineering - although I didn't really learn any of my compiler RTL skills in school
I have an iMac G3, and only FOSS FPGA tools work on it haha
So that's how I got into this stuff
I got into CPU dev because I really wanted to make CPUs, haha
<emeb> iMac G3 - now that's historic hardware!
Yup! I tried running yosys on my MacSE 1987 too!
But doesn't have enough memory.
<emeb> heh
I don't believe that you should upgrade hardware just because something better exists
Good software should run on old hardware
<emeb> I was proud of myself for getting icestorm/yosys/nextpnr running on an RPi Zero. You win.
Yehowshua has quit [Remote host closed the connection]
[nmigen-soc] whitequark commented on pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZ18
hitomi2504 has quit [Quit: Nettalk6 - www.ntalk.de]
emeb, regarding picking R and C "so the corner is at or below nyquist of the sample rate", what's the "corner"?
<emeb> the corner frequency of an RC low-pass filter is F = 1/(2piR*C)
I don't know my analog foo very well
<emeb> so pick your R and C values so that F is at about 1/2 the sample rate.
<emeb> ie - I wanted about a 20kHz corner frequency so I used a 100 ohm R and a 0.1uF C
Is F the highest frequency that'll get passed through?
bear in mind if you are just using a single RC filter, it rolls off very slowly, so you might actually want to put the -3dB frequency a lot lower than half your sample rate
<emeb> Well, a single-pole RC lowpass doesn't roll off fast. That's where it starts to ramp down at 6dB/oct
I don't know what "rolling off" means
Yehowshua has joined #nmigen
Well is signal processing, you can delete frequencies - they are attenuated smoothly
**in not is
<emeb> If you plot a curve of attenuation vs frequency - roll off is how it slopes downward with increasing freq
I suppose I don't really need to know much about advanced filtering for what I'm trying to do
<emeb> not really. this is pretty fool-proof
going to be really cool to hear this working
I want to make, like, Gameboy type audio hardware
high -= (a >> (BITS - 1)) * b + (b >> (BITS - 1)) * a
that's all you need to translate an unsigned full product to a signed one
FL4SHK: cool!
I got this from a friend, though
a*b + b*a ...
*curious*... why are there two multiplies?
a[0] * b + b[0] * a
wait no
a[len(a) - 1] * b + b[len(b) - 1] * a
that's what you use ^
it's not really a multiply
it's a mux
ok yes top bit, yep.
high.eq(high - (a[len(a) - 1] ? b : 0) + (b[len(b) - 1] ? a : 0))
FL4SHK: if you're going to do 2-port regfile write, you could consider doing "LD/ST-with-update" just like in PowerISA.
currently thinking of going with a simpler method
(the address-generation part gets written back into the 1st register)
will just implement the register file out of logic
lkcl_ whats that
FL4SHK: we have a mix of register files. one of them is only 8 entries but it needs 6R and 5W. it's maasssiiive.
DaKnig: LD/ST-with-update instructions? when you do a LD/ST you usually compute the address from 2 operands: one register and either an immediate and another register
LD/ST-with-update will write that calculated address into an outgoing register.
basically using the LD/ST as an ADD unit, saving one instruction in computationally-intensive loops.
5 watts?
FL4SHK: 6R5W - 6 read 5 write
I see
sorry, convention when discussion regfiles: N (read) M (write) e.g 4R1W porting, sorry i assumed you'd be familiar with the terminology.
jeanthom has quit [Ping timeout: 260 seconds]
lkcl__ has joined #nmigen
lkcl_ has quit [Ping timeout: 240 seconds]
Yehowshua has quit [Ping timeout: 245 seconds]
phire has quit [Remote host closed the connection]
phire has joined #nmigen
jeanthom has joined #nmigen
lkcl__ is now known as lkcl
jeanthom has quit [Ping timeout: 260 seconds]
is there a tutorial for Xilinx stuff that goes from writing code to actually putting the bitstream on the board?
<Benny> Look at the digilent learn site
digilent has articles about nmigen?
my search engine doesnt bring up anythin
<Benny> No but a couple on how to upload the bitstrean
nmigen-boards does most of the heavy lifting for you
then is there a tutorial for that?
chipmuenk has quit [Quit: chipmuenk]
Not really, but it's simple enough that I can give you one here
What board do you have?>
I have a zynq arty z7
from nmigen_boards.arty_z7 import ArtyZ720Platform # at the top of your code
Asu has quit [Read error: Connection reset by peer]
Asuu has joined #nmigen
I guess I'd have to pass on using nmigen-boards
too much stuff to recompile
I get many compilation errors and I am not sure if this is the right place to ask for help about them
how can I get the verilog files from nmigen code?
I might still be able to use the Vivado GUI and work like that
DaKnig: `from nmigen.back import verilog` then `foo = YourTopLevelModule(); ports = [ foo.bar, ... ]; with open("out.v", "w") as f: f.write(verilog.convert(foo, ports=ports))`
But I want you to think about this: compiling Yosys is pretty easy. So too is xc3sprog.
Asuu has quit [Quit: Konversation terminated!]
I tried compiling xc3sprog. it says `package 'libftdi' not found
the thing is, I installed that package a moment earlier
as it turns out, my package manager has an old version of this
then the same for libftd2xx. but there the package manager didnt have this, and this error persisted after I installed this according to the official tutorial (just downloading, extracting the tar and putting stuff in the right place, linking some files etc)
Yehowshua has joined #nmigen
General rule of thumb - avoid Vivado haha
jeanthom has joined #nmigen
I've never gotten Vivado to work
Even with GUI standalone
the vivado gui works
it just works
proteus-guy has joined #nmigen
I didnt have to mess with much stuff
I'd really rather not use that GUI though, thats one reason I wanted to move to nmigen or something that has CLI tools for this
Well, I used Vivado to program my zedboard two years ago with a simple blinky program. Merely programming the board bricked it. It never worked again. That's my only experience with Vivado
Maybe the board was bad
lol that kinda happens sometimes
what are you doing now then
Yes, nMigen has support for Vivado CLI
I use Lattice
with xilinx boards
thats one solution
Never had a single issue with openOCD or TInyProg
<Yes, nMigen has support for Vivado CLI> as long as you can get that to work.. which is what I am struggling with rn
Do you know if your Xilinx board supports another programmer besides x3s?
I just was able to install xc32 in Ubuntu
Its in the ubuntu apt as well as arch AUR
I am using centos 7
with a few more repos
OK. Grab docker and do a wrapper
installed that because vivado officialy supports taht and nothing newer
is there a docker for xc3sprog?
I never used docker...
Nah - it won't take two seconds to make one
In fact I can make one real quick
I probably have an old version of docker... lemme see... 1.13.1
DaKnig: you only need xc3sprog if you want to program your board using nmigen-boards. If you have some other way to program it, you can just program it yourself, the bitstream is located in the `build` folder.
so make that `sudo docker build -t xc3sprog .`
Or make a user group for docker
notice my username
when running that
Oh smh
what's that RUN thing?
does it run this inside the docker
if so, does it really matter if its with sudo or not
yes - it runs it in its own special CNAME linux space
It shouldn't. I've never had that problem...
jock-tanner has joined #nmigen
Well, one thing I can tell you is that if you can install a recent version of Ubuntu or Arch box, all these problems will magically disappear...
you see- moving the vivado install would require a new licence as mine only worked for 5 machines, and I already spent all 5
You have the commercial license?
or using the webpack version that tracks what Im doing
I have a commercial license yes
got a voucher for 10$
Does your particular FPGA need that commercial?
no but I prefer that because that allows me to disable the data it sends over to xilinx
Makes sense. Well I don't see any easy solutions.
not using nmigen-boards looks like the only one
Well, Xilinx still uses SOME command to program your FPGA
You can change that command in nMigen boards
I just looked at the link from your last message; yeah I know how subprocesses work in python, I think I could change this.
So you could just place that file in the same directory as blinky.py, and then modify the ` def toolchain_program(self, products, name)` to what you want, or like miek said, just change `do_program` in blinky.py to false