danieljabailey has quit [Quit: ZNC 1.6.4+deb1 - http://znc.in]
danieljabailey has joined #yosys
sklv has quit [Quit: quit]
pie_ has joined #yosys
pie__ has quit [Remote host closed the connection]
nrossi has joined #yosys
pie_ has quit [Ping timeout: 240 seconds]
AlexDaniel has quit [Ping timeout: 240 seconds]
_whitelogger has joined #yosys
leviathanch has joined #yosys
leviathanch has quit [Remote host closed the connection]
leviathanch has joined #yosys
pie_ has joined #yosys
proteus-guy has quit [Remote host closed the connection]
mbuf has joined #yosys
pie_ has quit [Ping timeout: 240 seconds]
pie_ has joined #yosys
pie_ has quit [Ping timeout: 248 seconds]
promach_ has joined #yosys
aw- has joined #yosys
FabM has joined #yosys
pie_ has joined #yosys
aw-1 has joined #yosys
aw-2 has joined #yosys
aw-1 has quit [Read error: Connection reset by peer]
aw- has quit [Ping timeout: 268 seconds]
promach_ has quit [Quit: Leaving]
aw-2 is now known as aw-
dys has joined #yosys
qu1j0t3 has quit [Ping timeout: 248 seconds]
digshadow1 has quit [Ping timeout: 248 seconds]
digshadow has joined #yosys
qu1j0t3 has joined #yosys
sklv has joined #yosys
adj__ has quit [Ping timeout: 240 seconds]
adj__ has joined #yosys
aw- has quit [Quit: Leaving.]
AlexDaniel has joined #yosys
proteus-guy has joined #yosys
sklv has quit [Quit: quit]
sklv has joined #yosys
pie__ has joined #yosys
pie_ has quit [Remote host closed the connection]
eduardo_ has joined #yosys
eduardo__ has quit [Ping timeout: 260 seconds]
pie__ has quit [Ping timeout: 248 seconds]
pie_ has joined #yosys
mbuf has quit [Quit: Leaving]
<shapr>
Does yosys support the ice40 LP as well as the HX?
<cr1901_modern>
shapr: Yes
<shapr>
oh good
<shapr>
lattice semi has a usb3 dev board that uses an ice40 LP, I'm wondering if I could build a usb3 keyboard around the ice40 lp
<shapr>
context: I know nearly nothing about any of this :-P
<cr1901_modern>
usb*3* ?! usb1.1 is doable on ice40 yes (check out TinyFPGA's bootloader)
<shapr>
I want to build an ergodox style keyboard, based on an ice40 chip
<shapr>
so I'm investigating the options
<cr1901_modern>
You're not gonna need usb3 for a keyboard
<shapr>
not unless I put 100 watts of LEDs on it, right? :-D
<shapr>
Would usb3 have lower latency than usb1.1 ?
<cr1901_modern>
I have no idea, tbh
<shapr>
I hadn't heard of the TinyFPGA bootloader, that's really useful, thanks!
<shapr>
I wasn't sure if any USB functionality was available
<ZipCPU>
I'm told that Altera support is just about to be added to yosys as well ... ;)
<shapr>
oh wow!
<cr1901_modern>
ZipCPU: I have a stupid question about ZipCPU (the CPU)
<ZipCPU>
Sure, go ahead.
<cr1901_modern>
For an instruction at any given stage in the pipeline, does the program counter pipeline reg hold the value of the _current_ instruction in that given stage
<cr1901_modern>
or the next insn?
<ZipCPU>
While it's working through the various pipeline stages, it holds the PC of the next instruction.
<cr1901_modern>
I just realized today that while "next instruction" is the convention, while insn fetch is actually happening the current insn and current PC have to match
<cr1901_modern>
(current PC for the insn being fetched*)
<ZipCPU>
Yes.
<ZipCPU>
Well ...
<ZipCPU>
I'm going to take that back.
<ZipCPU>
I have several different concepts of what the PC is.
<cr1901_modern>
Cool, np. Been a while since I looked at a CPU's internals
<ZipCPU>
There's the PC that the prefetch sees.
<ZipCPU>
When it reports an instruction out, it reports that PC to the instruction decoder.
<ZipCPU>
The decode stage adds one to that PC, but keeps the PC of the instruction with the instruction as it flows through the pipeline.
<ZipCPU>
When the instruction is finally returned, the "actual" PC, whether the user PC or the supervisor PC, is updated based upon the PC contained within the instruction that was just executed.
<ZipCPU>
s/returned/retired/
<cr1901_modern>
Two things:
<cr1901_modern>
but keeps the PC plus one* of the instruction...?
<cr1901_modern>
pc == r15 IIRC. So in practice, all PC relative stuff is done via forward
<cr1901_modern>
ing*?
<ZipCPU>
The "PC plus one" that I referenced isn't written to the register set until the instruction is retired.
<ZipCPU>
So imagine this ...
<ZipCPU>
there's a PC register associated with every step of the pipeline.
<cr1901_modern>
right
<ZipCPU>
That value contains the current instruction PC+1.
<cr1901_modern>
Right, that's the "conventional" definition of PC
<ZipCPU>
At the end of the instruction, during write back, that value is written to the register file.
<ZipCPU>
(Well ... it's actually its own register, but you get my point.)
<cr1901_modern>
yes
<ZipCPU>
The PC attached to any instruction doesn't affect anything but the registers working their way through the pipeline--at least not until writeback.
<cr1901_modern>
Yup I follow so far
<ZipCPU>
So ... after instruction decode, the PC is the instruction's PC + 1 ... this is the value kept in the pipeline registers together with the instruction.
<ZipCPU>
In the read-ops stage, this value is used as part of the B input to the ALU --- if the instruction so requests it.
<ZipCPU>
(In ZipCPU, there's read-ops after instruction decode. In read-ops, the value is read from the register file and a (possible) constant is added to it)
<cr1901_modern>
Ohh, I think it makes sense now
<ZipCPU>
The PC may also, thus, be part of the data path going into the ALU.
<ZipCPU>
If the instruction "writes" to the PC, that would be the output of the ALU.
<ZipCPU>
If writing to the PC, the instruction pipeline is cleared, and the prefetch is given a new PC value to load from.
<ZipCPU>
That's one of the few times that the prefetch PC and the actual PC register are the same.
<cr1901_modern>
So in read-ops, the PC the ALU sees is in fact "PC", not "PC + 1"?
<cr1901_modern>
oh wait, sorry
<cr1901_modern>
you just answered that lol
<cr1901_modern>
I think I get it now, after the current insn is fetched, as far as the CPU is concerned the current insn is being executed. So it makes sense to update PC to PC+1 and keep that value around
<cr1901_modern>
as we go through the pipeline
<cr1901_modern>
(b/c PC that an instruction "sees" should point to the "next insn to execute")
<ZipCPU>
Well ... sort of ... but you have to think of the CPU in terms of the pipeline.
<cr1901_modern>
right
<ZipCPU>
After the current instruction is "fetched", that instruction contains a packet of information that works its way through the pipeline.
<ZipCPU>
It's sort of hard to say what the "current" instruction is, therefore, unless you specify a pipeline stage.
<ZipCPU>
Or ... reference retired instructions.
<cr1901_modern>
I understand
<cr1901_modern>
"So ... after instruction decode, the PC is the instruction's PC + 1" After _decode_? Wouldn't the PC be equal to PC + 1 while decode was in progress?
<ZipCPU>
You know ... I've wanted many times over to get rid of the parallel "PC+1" path that works its way through the CPU.
<cr1901_modern>
sorry, bad question
<ZipCPU>
Bad question?
<cr1901_modern>
I need to reword it I think
<ZipCPU>
Some definitions: pf_pc is the PC requested of the prefetch. r_upc and ipc are the PC's as seen by retired user and supervisor instructions.
<ZipCPU>
dcd_pc is the PC coming out of the instruction decoder.
<ZipCPU>
op_pc is the PC coming out of the read-operands stage.
<ZipCPU>
alu_pc is the PC associated with the result of an ALU instruction.
<cr1901_modern>
Why is there no f_pc; the PC coming out of the fetch stage?
<ZipCPU>
Cause I couldn't remember what I called it just now ;)
<ZipCPU>
Ahh, ok, that one is called pf_instruction_pc
<ZipCPU>
So, in order, they are:
<ZipCPU>
pf_pc -> pf_instruction_pc -> dcd_pc -> op_pc -> alu_pc -> { one of ipc or r_upc }
<cr1901_modern>
Okay I can frame my q now:
<cr1901_modern>
I would've figured that the PC + 1 carried around would be the output of pf_instruction_pc
<ZipCPU>
Well, okay, let's dive into one more detail.
<ZipCPU>
The ZipCPU has two sizes of instructions: 16-bits and 32-bits.
<ZipCPU>
So, it's not really PC+1, it's either PC+1 or PC+2 (with an implied zero at the end)
<ZipCPU>
Only at the end of the instruction decode do you know which of the two it will be.
<cr1901_modern>
Ohhh, okay
<cr1901_modern>
That's clever
<cr1901_modern>
and now it makes sense
<ZipCPU>
s/clever/painful/
<ZipCPU>
;)
<cr1901_modern>
painful? Meaning to figure out that clever solution?
<ZipCPU>
Roughly, yes.
<cr1901_modern>
Or does variable insn width really bring that much pain?
<ZipCPU>
YES!
<ZipCPU>
*much* pain.
<ZipCPU>
For example, the prefetch only understands 32-bit words.
<cr1901_modern>
Ahhh, so you can end up w/ a case where a 32-bit word is split
<ZipCPU>
The ZipCPU, as currently built, can execute a pair of 15-bit instructions fitting within a 32-bit word.
<cr1901_modern>
between two fetches
<ZipCPU>
No, never.
<ZipCPU>
Neither can the PF start with a second half of an instruction.
<cr1901_modern>
Ahhh
<ZipCPU>
Hence, if the first of the two instructions faults, you will struggle to restart the instruction.
<cr1901_modern>
I see
<ZipCPU>
Likewise, between the two instructions there's an implied interrupt disable.
<ZipCPU>
Perhaps it might be simpler to just teach the PC/instruction decoder to handle starting on a second half instruction ... I'll add that to my to-do list.
<cr1901_modern>
Oh wait... 32-bit insn must be 4-byte aligned
<cr1901_modern>
correct?
<ZipCPU>
It'd certainly make a lot more sense to the programmer.
<ZipCPU>
Yes.
<cr1901_modern>
Okay, that explains why the prefetch can't split
<cr1901_modern>
which I presume means if you swap between 16/32-bit mode you need a nop to align or some other useful insn
<ZipCPU>
No, not at all.
<ZipCPU>
The 16-bit mode (really 15-bit mode) consists of two instructions packed into one 32-bit word.
<cr1901_modern>
Ahhh
* cr1901_modern
is like 0 for 190 today
<ZipCPU>
Switching modes takes place on a 32-bit word by 32-bit word basis.
<ZipCPU>
16-bit instructions can take place at any time, but only in pairs within the instruction stream.
<ZipCPU>
Further, the top bit of every 32-bit instruction word is stolen to indicate if the word is a single instruction, or a pair of two instructions.
<cr1901_modern>
Hence the check I saw in your code snippet
<cr1901_modern>
Well, I can't imagine unaligned 32-bit would be any less painful
<cr1901_modern>
(It's nice that RISC-V in theory allows variable width ISA, but we all know in practice nobody's willingly going to try it)
<ZipCPU>
*Really*??
<cr1901_modern>
You sound incredulous :P
<ZipCPU>
All I've ready has said they were very proud of their variable width ISA ... ?
<ZipCPU>
Yes, very much so.
<cr1901_modern>
Well, there's a few things I based that on, I could be wrong
<cr1901_modern>
1. RISC-V has explicitly said custom ISA extensions won't be supported in compiler. Not ever their own.
<ZipCPU>
Hmm ... maybe that's a good question for the #riscv forum?
<ZipCPU>
Oh, yeah, the tool-chain and the 16-bit instructions created a ... challenge.
<cr1901_modern>
2. variable width is painful :D. I knew it was before you mentioned your issues implementing it ;)
<ZipCPU>
Go on.
<cr1901_modern>
Oh I don't remember the specifics, just that working through an example of instruction flight of an old x86 core, I was like
<cr1901_modern>
"screw this!"
<cr1901_modern>
This was back when I was still in Uni
<ZipCPU>
So ... here's the problem I ran into ...
<ZipCPU>
The goal of the ZipCPU, as you may recall, is to be a low logic CPU.
<ZipCPU>
Should the ZipCPU libraries include the 16-bit mode by default, or not?
<ZipCPU>
16-bit mode takes about 80 LUTs or so to implement.
<ZipCPU>
On really small FPGA's, this can be a problem.
<ZipCPU>
However, if your compiled libraries are intended to support both ...
<cr1901_modern>
My opinion is that in general, savings in hardware will be dwarfed by extra cache usage
<cr1901_modern>
see: delay slots
<ZipCPU>
you need to either have only 32-bit libraries, or insist that all of your hardware supports 16-bit mode.
<cr1901_modern>
And the fact that a branch predictor is low burden
<ZipCPU>
Ok, re: the cache ...
<cr1901_modern>
There is only an icache by default, right?
<ZipCPU>
(The ZipCPU has no branch predictor, only early branching)
<cr1901_modern>
I see
<ZipCPU>
Some implementations do not have enough logic for an icache.
<ZipCPU>
Now, imagine running from flash ...
<ZipCPU>
It takes 24 clocks to get one word from flash, 28 clocks to get two words.
<ZipCPU>
If you could pack four instructions into those two words ... that's the benefit.
<ZipCPU>
(It used to take me 48 clocks to read one instruction word from flash. That was with the older flash controller, and the single instruction prefetch.v routine).
<cr1901_modern>
doesn't 28 assume quadmode?
<cr1901_modern>
24-28*
<ZipCPU>
Yes.
<ZipCPU>
My small FPGA example is a Spartan 6/LX4 with QSPI flash.
<cr1901_modern>
Alright funny you should mention that b/c I have a relevant problem that ZipCPU may be able to solve. But finish your thought
<ZipCPU>
No, go ahead ... I've finished my thought (I think)
<cr1901_modern>
mithro sent me a TinyFPGA B2 board and asked me to determine whether a LiteX/MiSoC SoC can fit on one- ice40lp8k. I'm using lm32 right now.
<ZipCPU>
Ok, go on.
<cr1901_modern>
I've concluded that the only way I'm going to do that practically is using the SPI flash as the instruction store. After all resources are accounted for
<cr1901_modern>
I have about 8kB of block RAM left for RAM (4kB for cache, 4kB for logic)
<cr1901_modern>
I would like 10-12kB of RAM or even more. I wonder if ZipCPU would be a better fit
<ZipCPU>
This sounds like the S6Soc project so far ...
<lok[m]>
Is this different than ice stick from lattice site?
<cr1901_modern>
lok[m]: Yes, different board, same FPGA
<adj__>
lok[m], lok[m] it's quite a different board: the olimex one has more IO pins exposed, it doesn't have a serial to USB chip (you need to program it through the UEXT header)
<adj__>
not having a serial to USB chip is both good and bad, it's difficult to program from a regular desktop, laptop, tablet..., but you don't need to worry about FTDI putting malware in your drivers
<adj__>
and it also reduces the board cost
dys has joined #yosys
<shapr>
sounds like I need to buy the ice40 LP usb3 dev board
<awygle>
This is an interesting board
<awygle>
No usb 3 connection to the ice40, which makes way more sense
<shapr>
oh, I thought it did have that
<awygle>
The ice40 controls the switch (HD3SS460) afaict
<awygle>
There are two onboard ice series FPGAs. Weird little board.
<shapr>
too bad, I was interested in hooking an ice40 to usb3
<awygle>
Waste of money imo. And to answer your earlier question I bet 3.0 is actually higher latency.
<awygle>
(with no evidence)
<shapr>
sounds like I need evidence one way or the other.
dys has quit [Ping timeout: 240 seconds]
dys has joined #yosys
<awygle>
shapr: you should measure HID driver stack latency too
<shapr>
yeah, that's a good point
<shapr>
I'd like to think Linux does a good job there, but who knows?
AlexDaniel has joined #yosys
<awygle>
I doubt it lol. But maybe
leviathanch has quit [Remote host closed the connection]
pie_ has quit [Ping timeout: 248 seconds]
danieljabailey has quit [Ping timeout: 260 seconds]
danieljabailey has joined #yosys
aynah[m] has quit [Ping timeout: 240 seconds]
lok[m] has quit [Ping timeout: 240 seconds]
Guest41846 has quit [Ping timeout: 252 seconds]
pointfree1 has quit [Ping timeout: 248 seconds]
marbler has quit [Ping timeout: 255 seconds]
swick has quit [Ping timeout: 255 seconds]
aynah[m] has joined #yosys
pie_ has joined #yosys
dys has quit [Ping timeout: 248 seconds]
pointfree1 has joined #yosys
Guest26074 has joined #yosys
marbler has joined #yosys
swick has joined #yosys
lok[m] has joined #yosys
<mithro>
cr1901_modern: You will need to use the spiflash for memory