<d1b2>
<edbordin> I was wondering why there was obnoxiously loud background music on the podcast....turns out I forgot to stop the music I had playing -_-
Yehowshua has quit [Remote host closed the connection]
Yehowshua has joined #nmigen
Yehowshua has quit [Ping timeout: 245 seconds]
Degi has quit [Ping timeout: 256 seconds]
Degi has joined #nmigen
lkcl_ has joined #nmigen
lkcl__ has quit [Ping timeout: 240 seconds]
jaseg has quit [Ping timeout: 260 seconds]
jaseg has joined #nmigen
electronic_eel has quit [Ping timeout: 240 seconds]
electronic_eel has joined #nmigen
PyroPeter_ has joined #nmigen
PyroPeter has quit [Ping timeout: 264 seconds]
PyroPeter_ is now known as PyroPeter
mwk has quit [Ping timeout: 244 seconds]
Yehowshua has joined #nmigen
<Yehowshua>
So I've got the TinyFPGA using USB->Serial as a UART. USB of course is fixed at 48MHz which means I need an AsyncFIFO to get data into faster clock domains. Been browsing for an example to use for Migen's AFIFO in `nmigen.lib`. Any pointers?
<d1b2>
<286Tech> If you have a stimulus process, then just a yield will advance the clock by one cycle right? Assigning to a signal yield dut.a.eq(1) will not AFAIK.
<d1b2>
<286Tech> Sorry, I meant to say that pysim seems to work correctly, but the VCD file looks off.
<d1b2>
<286Tech> I set a signal to 1, do a yield, then set it to 0, and yield again, but the VCD file only shows the 1, but not the 0.
<d1b2>
<286Tech> After looking at it more closely, the outputs of the DUT seems correct in the VCD file. It's just that the inputs that I assign in the stimulus process is wrong.
<jeanthom>
286Tech: could you upload your stimulus process somewhere?
<d1b2>
<286Tech> I just started building an instruction fetch module, and now I've run into this.
<d1b2>
<286Tech> For example, i_stall remains 1 in the VCD file (even though the outputs behave correctly), and so does i_load, i_data, and i_load_addr.
<d1b2>
<286Tech> They keep the value that they were assigned first in the VCD file.
<Yehowshua>
FL4SHK: wait - are you writing an assembler in Python?
<_whitenotifier-b>
[nmigen-soc] rroohhh opened pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZod
<_whitenotifier-b>
[nmigen-soc] codecov[bot] commented on pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZoN
<_whitenotifier-b>
[nmigen-soc] rroohhh commented on pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZop
proteus-guy has quit [Ping timeout: 240 seconds]
<FL4SHK>
Yehowshua: I'm writing it in this style: I pass a Python list to the assembler
<Yehowshua>
FL4SHK : Sounds like you're doing some DSL? Using`with` is kinda neat. I noticed that nMigen's RTLIL emitter has with statements so that scopes within the AST are tab aligned.
<FL4SHK>
yes, doing a DSL of sorts
<FL4SHK>
that's how the assembler will be working
<Yehowshua>
What ISA does it target?
<FL4SHK>
custom one
<FL4SHK>
I find the process fun
<Yehowshua>
Haha - I bet
<FL4SHK>
It's MIPS like in some ways
<Yehowshua>
My friend did a haskell assembler once
<Yehowshua>
Was good for catching errors cuz haskell is functional
<FL4SHK>
I've built stereotypical assemblers with lexers and parsers before
<FL4SHK>
this is better because it allows me to include higher level stuff in it
<FL4SHK>
I always included scopes in my assemblers
<FL4SHK>
that way you get local labels
<Yehowshua>
That's a smart idea now that I think about it
<Yehowshua>
Do you mean you want a GPU with FPGA fabric?
<FL4SHK>
I think the place and route problem, an NP hard problem, would benefit from being implemented as an island model genetic algorithm
<FL4SHK>
no
<FL4SHK>
I mean I want to look at nextpnr
<FL4SHK>
and make it run on a host's GPU
<Yehowshua>
So nextpnr does have CUDA annealing support
<FL4SHK>
oh it does?
<FL4SHK>
this is good
<FL4SHK>
what about OpenCL?
<Yehowshua>
I think... My friend wrote one - I don't know if he did a pull request
<FL4SHK>
I mean, I've got an nvidia GPU in my main machine
<Yehowshua>
Lemme ask him
<daveshah>
Never seen that
<daveshah>
And nextpnr doesn't use annealing for the bulk of placement now anyway
<FL4SHK>
island model GA is another solution
<FL4SHK>
I think that's what I'd use for running it on the GPU
<daveshah>
Hypothetically you could run the sparse matrix solver on the GPU but the benefit would be non-existent
<FL4SHK>
I'll need to take a look at this at some point
<FL4SHK>
because the place and route stage takes forever
<FL4SHK>
NP hard search problems are good to place in an island model GA
<daveshah>
The problem with all these things is not the algorithm but actually dealing with all the real world constraints
<FL4SHK>
What do you mean?
<daveshah>
I suspect that there are other faster heuristics, particularly for routing which is the pain point
<daveshah>
something like GA would be my choice when I cared about QoR over runtime
<FL4SHK>
island model GAs are faster than regular GAs
<FL4SHK>
What's "QoR" stand for?
<daveshah>
all the complexities like fracturable LUTs, placement rules, etc
<daveshah>
quality of results
<FL4SHK>
ah
<Yehowshua>
OK - just asked him - he said he added CuBlas support for analytical
<Yehowshua>
But he didn't get much speedup
<daveshah>
Yes, that's what I'd expect tbh
<Yehowshua>
He also did not pull request
<FL4SHK>
well, all right.
<daveshah>
The main problem is route time, but this is more down to the placer needing more routeability heuristics
<Yehowshua>
daveshah, did you know there are some companies using nextpnr in production?
<FL4SHK>
I've never used nextpnr
<daveshah>
which is what I've been working on, although taking even a fairly detailed paper and example code (RippleFPGA) and actually turning it into a useful, generic, real-world implementation is quite tricky
<daveshah>
That's interesting - for an existing arch or for their own FPGA arch?
<FL4SHK>
My main computer is going to be doing this thing where you work with cache lines instead of registers
<Yehowshua>
He also wrote a shader for a custom architecture I think
<FL4SHK>
I've built a machine like this before, in SV
<Yehowshua>
daveshah, has Lattice ever reached about about a collaboration on NextPNR?
<daveshah>
Yes, but things move slowly so I can't really say much yet
<Yehowshua>
OK. That's fantastic
<FL4SHK>
Yehowshua: so do you think having only full products for multiplies is a problem?
<FL4SHK>
also, is it possible to use an unsigned full product to implement a signed full product?
<Yehowshua>
Can you expand a bit? I haven't heard of the term full product before
<FL4SHK>
multiply two 32-bit numbers, producing a 64-bit result
<Yehowshua>
I don't think its a problem
<Yehowshua>
In fact, I think its quite useful!
<Yehowshua>
Especially with fixed point
<FL4SHK>
for this video game console, there's no floating point
<Yehowshua>
Oh - I see. You don't have a mvfhi instruction?
<FL4SHK>
this isn't quite MIPS
<FL4SHK>
multiplies encode both hi and low into the instruction
<FL4SHK>
`mulu rA, rB, rC, rD`
<FL4SHK>
I think I might just include signed full products
<Yehowshua>
That's clever
<daveshah>
Have you factored into this the cost of two write ports?
<FL4SHK>
two write ports of what?
<daveshah>
Or are your multiplies multicycle?
<daveshah>
Wherever your two destination registers are
<FL4SHK>
oh, for the register file
<FL4SHK>
I didn't think about it, no
<FL4SHK>
Probably going to just make multiplies multi-cycle
<Yehowshua>
That's fascinating thought. I'm actually testing multi port writes on the ice40 this afternoon
<Yehowshua>
See how it affects BRAM
<FL4SHK>
since the register file is rather small, I can probably just implement registers out of logic
<Yehowshua>
**BRAM count
<FL4SHK>
but BRAM is probably faster
<daveshah>
Unfortunately, Yosys doesn't do any tricks to map multiple write ports
<daveshah>
So it will end up as FFs
<FL4SHK>
I could just use two memories
<Yehowshua>
Yeah - that's what I suspected
<daveshah>
There are various ways round it, either using a XOR trick
<FL4SHK>
My way seems to be "just use more block RAM!!!1one"
<daveshah>
or a small FF based memory to track which was last written
<FL4SHK>
honestly, though
<FL4SHK>
I might just expand the pipeline
<daveshah>
You don't just need more block RAM, you also need one of those tricks
<FL4SHK>
probably make it so there are two write stages
<Yehowshua>
Also, daveshah, if you want to show validity of nextpnr in industry, I know a couple place where its used, in case that comes up in future convos with Lattice
<daveshah>
Yes, that would definitely be helpful.
<FL4SHK>
daveshah, does having two write stages cause much of a problem?
<daveshah>
You mean you only do one write per cycle?
<FL4SHK>
well, yeah, and it solves the race condition when you encode the destination registers as the same
<daveshah>
I guess that will create a bubble when you multiply
<FL4SHK>
there's already going to be a bubble
<FL4SHK>
because it's multi cycle
<daveshah>
I don't know enough about CPU arch to know if that's a major issue
<daveshah>
Right
<daveshah>
If your multiply is already multicycle then it's probably going to be cheaper than a two write ported file
<daveshah>
If that's the only thing you need two write ports
<FL4SHK>
I believe it's the only thing I need two write ports for
<daveshah>
If you dual issued other instructions, then the cost of two write ports would be much more worthwhile
<FL4SHK>
this machine isn't intended for maximum performance
<FL4SHK>
CPU dev was my main draw into hardware
<daveshah>
For a game console, other accleration is probably more interesting anyway
<daveshah>
Lots of fun to be had with video and audio
<FL4SHK>
this is going to be kind of like a GBA
<FL4SHK>
got the VGA signal generation done
<FL4SHK>
but it'll be running at a high enough clock rate for software 3D
<FL4SHK>
if the GBA can do software 3D, so can this thing
<Yehowshua>
What company do you do AERO for - if you can say...
<daveshah>
Yeah
<Yehowshua>
I was at GTRI
<FL4SHK>
Boeing
<FL4SHK>
I had nothing to do with the incidents last year though
<Yehowshua>
Yeah - no shame man
<FL4SHK>
so for this assembler I'm making
<FL4SHK>
I've even got stack frame stuff set up...
<FL4SHK>
decided upon this structure
<FL4SHK>
fp + 0: register save area
<FL4SHK>
fp + register_save_area_size: local variables
<Yehowshua>
I'm kinda impressed actually
<Yehowshua>
How long did that take you?
<FL4SHK>
to do what now?
<FL4SHK>
the assembler?
<FL4SHK>
It's not done!
<Yehowshua>
stack fram structure
<FL4SHK>
oh, that was pretty easy
<FL4SHK>
I've done this before, but it's been a while
<FL4SHK>
it's very simple
<FL4SHK>
look at the description
<FL4SHK>
that's all there is to it
<Yehowshua>
I'm writing a compiler for my thesis - and still haven't finished the stack - by my ISA is intended for CNNs - go figure
<FL4SHK>
I don't think I'll ever go back to SV outside of work
<FL4SHK>
nMigen covers my needs far, far too well
<d1b2>
<emeb> then on the output pick R & C so the corner is at or below nyquist of the sample rate.
<d1b2>
<emeb> for my system the sample rate is << clock rate
<Yehowshua>
yes, nMigen is quite good. I tried to switch some courses at my school that I help teach to nMigen
<Yehowshua>
The students totally freaked
<FL4SHK>
they freaked out?
<Yehowshua>
But I think if nMigen were to become heavily used, it would have to be taught early on
<Yehowshua>
Yeah - they had never seen git or python before
<FL4SHK>
oh, this must be low level coursework, then
<Yehowshua>
They also didn't know how to install thing with package managers
<d1b2>
<emeb> that's a lot to take in all at once
<Yehowshua>
Yeah, like sophomore year
<FL4SHK>
I studied a lot of computer science stuff on my own
<FL4SHK>
don't have a degree in it
<FL4SHK>
I have a mechanical engineering undergrad degree and an electrical engineering master's degree
<Yehowshua>
I'm computer engineering - although I didn't really learn any of my compiler RTL skills in school
<Yehowshua>
I have an iMac G3, and only FOSS FPGA tools work on it haha
<Yehowshua>
So that's how I got into this stuff
<FL4SHK>
I got into CPU dev because I really wanted to make CPUs, haha
<d1b2>
<emeb> iMac G3 - now that's historic hardware!
<Yehowshua>
Yup! I tried running yosys on my MacSE 1987 too!
<Yehowshua>
But doesn't have enough memory.
<d1b2>
<emeb> heh
<Yehowshua>
I don't believe that you should upgrade hardware just because something better exists
<Yehowshua>
Good software should run on old hardware
<d1b2>
<emeb> I was proud of myself for getting icestorm/yosys/nextpnr running on an RPi Zero. You win.
<Yehowshua>
lolz
Yehowshua has quit [Remote host closed the connection]
<_whitenotifier-b>
[nmigen-soc] whitequark commented on pull request #23: test: make nmigen 0.3+ compatible - https://git.io/JJZ18
hitomi2504 has quit [Quit: Nettalk6 - www.ntalk.de]
<FL4SHK>
emeb, regarding picking R and C "so the corner is at or below nyquist of the sample rate", what's the "corner"?
<d1b2>
<emeb> the corner frequency of an RC low-pass filter is F = 1/(2piR*C)
<FL4SHK>
I don't know my analog foo very well
<d1b2>
<emeb> so pick your R and C values so that F is at about 1/2 the sample rate.
<d1b2>
<emeb> ie - I wanted about a 20kHz corner frequency so I used a 100 ohm R and a 0.1uF C
<FL4SHK>
Is F the highest frequency that'll get passed through?
<agg>
bear in mind if you are just using a single RC filter, it rolls off very slowly, so you might actually want to put the -3dB frequency a lot lower than half your sample rate
<d1b2>
<emeb> Well, a single-pole RC lowpass doesn't roll off fast. That's where it starts to ramp down at 6dB/oct
<FL4SHK>
I don't know what "rolling off" means
Yehowshua has joined #nmigen
<Yehowshua>
Well is signal processing, you can delete frequencies - they are attenuated smoothly
<Yehowshua>
**can't
<Yehowshua>
**in not is
<d1b2>
<emeb> If you plot a curve of attenuation vs frequency - roll off is how it slopes downward with increasing freq
<FL4SHK>
I suppose I don't really need to know much about advanced filtering for what I'm trying to do
<d1b2>
<emeb> not really. this is pretty fool-proof
<FL4SHK>
going to be really cool to hear this working
<FL4SHK>
I want to make, like, Gameboy type audio hardware
<FL4SHK>
high -= (a >> (BITS - 1)) * b + (b >> (BITS - 1)) * a
<FL4SHK>
that's all you need to translate an unsigned full product to a signed one
<lkcl_>
FL4SHK: cool!
<FL4SHK>
I got this from a friend, though
<lkcl_>
a*b + b*a ...
<lkcl_>
*curious*... why are there two multiplies?
<FL4SHK>
a[0] * b + b[0] * a
<FL4SHK>
wait no
<FL4SHK>
a[len(a) - 1] * b + b[len(b) - 1] * a
<FL4SHK>
that's what you use ^
<FL4SHK>
it's not really a multiply
<FL4SHK>
it's a mux
<lkcl_>
ok yes top bit, yep.
<FL4SHK>
high.eq(high - (a[len(a) - 1] ? b : 0) + (b[len(b) - 1] ? a : 0))
<lkcl_>
FL4SHK: if you're going to do 2-port regfile write, you could consider doing "LD/ST-with-update" just like in PowerISA.
<FL4SHK>
currently thinking of going with a simpler method
<lkcl_>
(the address-generation part gets written back into the 1st register)
<FL4SHK>
will just implement the register file out of logic
<DaKnig>
lkcl_ whats that
<lkcl_>
FL4SHK: we have a mix of register files. one of them is only 8 entries but it needs 6R and 5W. it's maasssiiive.
<lkcl_>
DaKnig: LD/ST-with-update instructions? when you do a LD/ST you usually compute the address from 2 operands: one register and either an immediate and another register
<lkcl_>
LD/ST-with-update will write that calculated address into an outgoing register.
<lkcl_>
basically using the LD/ST as an ADD unit, saving one instruction in computationally-intensive loops.
<FL4SHK>
5 watts?
<lkcl_>
FL4SHK: 6R5W - 6 read 5 write
<FL4SHK>
oh
<FL4SHK>
I see
<lkcl_>
sorry, convention when discussion regfiles: N (read) M (write) e.g 4R1W porting, sorry i assumed you'd be familiar with the terminology.
<FL4SHK>
nah
jeanthom has quit [Ping timeout: 260 seconds]
lkcl__ has joined #nmigen
lkcl_ has quit [Ping timeout: 240 seconds]
Yehowshua has quit [Ping timeout: 245 seconds]
phire has quit [Remote host closed the connection]
phire has joined #nmigen
jeanthom has joined #nmigen
lkcl__ is now known as lkcl
jeanthom has quit [Ping timeout: 260 seconds]
<DaKnig>
is there a tutorial for Xilinx stuff that goes from writing code to actually putting the bitstream on the board?
<d1b2>
<Benny> Look at the digilent learn site
<DaKnig>
digilent has articles about nmigen?
<DaKnig>
my search engine doesnt bring up anythin
<DaKnig>
g
<d1b2>
<Benny> No but a couple on how to upload the bitstrean
<Lofty>
nmigen-boards does most of the heavy lifting for you
<DaKnig>
then is there a tutorial for that?
chipmuenk has quit [Quit: chipmuenk]
<Lofty>
Not really, but it's simple enough that I can give you one here
<Lofty>
What board do you have?>
<Lofty>
->
<DaKnig>
I have a zynq arty z7
<Lofty>
from nmigen_boards.arty_z7 import ArtyZ720Platform # at the top of your code
Asu has quit [Read error: Connection reset by peer]
Asuu has joined #nmigen
<DaKnig>
I guess I'd have to pass on using nmigen-boards
<DaKnig>
too much stuff to recompile
<DaKnig>
I get many compilation errors and I am not sure if this is the right place to ask for help about them
<DaKnig>
how can I get the verilog files from nmigen code?
<DaKnig>
I might still be able to use the Vivado GUI and work like that
<Lofty>
DaKnig: `from nmigen.back import verilog` then `foo = YourTopLevelModule(); ports = [ foo.bar, ... ]; with open("out.v", "w") as f: f.write(verilog.convert(foo, ports=ports))`
<Lofty>
But I want you to think about this: compiling Yosys is pretty easy. So too is xc3sprog.
Asuu has quit [Quit: Konversation terminated!]
<DaKnig>
I tried compiling xc3sprog. it says `package 'libftdi' not found
<DaKnig>
the thing is, I installed that package a moment earlier
<DaKnig>
as it turns out, my package manager has an old version of this
<DaKnig>
then the same for libftd2xx. but there the package manager didnt have this, and this error persisted after I installed this according to the official tutorial (just downloading, extracting the tar and putting stuff in the right place, linking some files etc)
Yehowshua has joined #nmigen
<Yehowshua>
General rule of thumb - avoid Vivado haha
jeanthom has joined #nmigen
<Yehowshua>
I've never gotten Vivado to work
<Yehowshua>
Even with GUI standalone
<DaKnig>
the vivado gui works
<DaKnig>
it just works
proteus-guy has joined #nmigen
<DaKnig>
I didnt have to mess with much stuff
<DaKnig>
I'd really rather not use that GUI though, thats one reason I wanted to move to nmigen or something that has CLI tools for this
<Yehowshua>
Well, I used Vivado to program my zedboard two years ago with a simple blinky program. Merely programming the board bricked it. It never worked again. That's my only experience with Vivado
<Yehowshua>
Maybe the board was bad
<DaKnig>
lol that kinda happens sometimes
<DaKnig>
what are you doing now then
<Yehowshua>
Yes, nMigen has support for Vivado CLI
<Yehowshua>
I use Lattice
<DaKnig>
with xilinx boards
<DaKnig>
ah
<Yehowshua>
ECP5
<DaKnig>
thats one solution
<Yehowshua>
Never had a single issue with openOCD or TInyProg
<DaKnig>
<Yes, nMigen has support for Vivado CLI> as long as you can get that to work.. which is what I am struggling with rn
<Yehowshua>
Do you know if your Xilinx board supports another programmer besides x3s?
<DaKnig>
idk
<Yehowshua>
I just was able to install xc32 in Ubuntu
<Yehowshua>
x3cs
<Yehowshua>
Its in the ubuntu apt as well as arch AUR
<DaKnig>
I am using centos 7
<DaKnig>
with a few more repos
<Yehowshua>
OK. Grab docker and do a wrapper
<DaKnig>
installed that because vivado officialy supports taht and nothing newer
<DaKnig>
is there a docker for xc3sprog?
<DaKnig>
I never used docker...
<Yehowshua>
Nah - it won't take two seconds to make one
<Yehowshua>
In fact I can make one real quick
<DaKnig>
I probably have an old version of docker... lemme see... 1.13.1
<vup>
DaKnig: you only need xc3sprog if you want to program your board using nmigen-boards. If you have some other way to program it, you can just program it yourself, the bitstream is located in the `build` folder.
<Yehowshua>
so make that `sudo docker build -t xc3sprog .`
<Yehowshua>
Or make a user group for docker
<DaKnig>
notice my username
<DaKnig>
when running that
<Yehowshua>
Oh smh
<DaKnig>
what's that RUN thing?
<DaKnig>
does it run this inside the docker
<DaKnig>
if so, does it really matter if its with sudo or not
<Yehowshua>
yes - it runs it in its own special CNAME linux space
<Yehowshua>
It shouldn't. I've never had that problem...
jock-tanner has joined #nmigen
<Yehowshua>
Well, one thing I can tell you is that if you can install a recent version of Ubuntu or Arch box, all these problems will magically disappear...
<DaKnig>
you see- moving the vivado install would require a new licence as mine only worked for 5 machines, and I already spent all 5
<Yehowshua>
You have the commercial license?
<DaKnig>
or using the webpack version that tracks what Im doing
<DaKnig>
I have a commercial license yes
<DaKnig>
got a voucher for 10$
<Yehowshua>
Does your particular FPGA need that commercial?
<DaKnig>
no but I prefer that because that allows me to disable the data it sends over to xilinx
<Yehowshua>
Makes sense. Well I don't see any easy solutions.
<DaKnig>
not using nmigen-boards looks like the only one
<Yehowshua>
Well, Xilinx still uses SOME command to program your FPGA
<Yehowshua>
You can change that command in nMigen boards
<DaKnig>
I just looked at the link from your last message; yeah I know how subprocesses work in python, I think I could change this.
<Yehowshua>
So you could just place that file in the same directory as blinky.py, and then modify the ` def toolchain_program(self, products, name)` to what you want, or like miek said, just change `do_program` in blinky.py to false