<cr1901_modern>
whitequark: I want to play with Boneless on tinyfpga A and up5k and use the internal hard IP (for fun). Is there a way to give boneless wait states for e.g. wishbone buses? Maybe wrap the whole FSM in a CEInsterter?
<whitequark>
cr1901_modern: that will work in a pinch, yes
<whitequark>
i'll add explicit support for wait states a bit later
<cr1901_modern>
(Btw, _can_ you wrap the whole FSM in a CEInserter in nmigen?)
<whitequark>
of course
<whitequark>
well, you wrap the entire core.
<whitequark>
(and it's EnableInserter now)
<cr1901_modern>
I thought maybe one should wrap just the FSM, just in case. But no problem if wrapping the entire core makes no difference. This may no longer be supported in nmigen tho (in omigen it was possible to wrap an FSM object in an enable inserter).
<cr1901_modern>
>i'll add explicit support for wait states a bit later <-- take your time. Clearly I'm dragging my feet even getting started w/ this :P
<whitequark>
well yes, an FSM is no longer a separate module
<whitequark>
the thing is that it's what you actually want
<whitequark>
because otherwise you get hierarchy flattening, issues with multiple drivers, etc
<cr1901_modern>
it's not a big deal I don't think. If I want just FSM functionality of a larger component to have EnablerInserter, I just need to make an Elaborate whose sole contents are the FSM. And then combine the FSM and non-FSM parts in a parent Elabortable
<cr1901_modern>
(I think)
<whitequark>
I *think* it's usually easier to insert with m.If(...): m.next = ... everywhere
<whitequark>
because
<whitequark>
in practice it turns out you don't want them *everywhere* after all
<whitequark>
just my experience
<cr1901_modern>
oh... yea, that'll also work. Oops...
<whitequark>
I've tried using EnableInserter a few times on an FSM and basically regretted it each time
<whitequark>
often FSMs have transient states that drive some other comb enable signal for example
<whitequark>
this tends to produce obscure bugs
<cr1901_modern>
The "CEInserter" trick was used in the misoc SPI core (well one of them anyway) to implement clock division. It's the only place I remember seeing it used, but I figured it might be useful for wait states.
<whitequark>
yes. it needs to be used with great care for clock division.
<whitequark>
if your FSM has any transient states...
<whitequark>
e.g. say
<whitequark>
if your FSM strobes fifo.we
<whitequark>
and it gets stopped with fifo.we high
<whitequark>
you see what will happen.
<cr1901_modern>
Congrats, you turned a pulse signal into a logic high signal :P
<whitequark>
yep
<whitequark>
this is why I think not being able to EnableInserter()(FSM()) in nmigen will probably lead to a net reduction of bugs, if anything
<cr1901_modern>
Sure that's very reasonable.
<TD-Linux>
I'm sad that the set-counters-equal-on-reset thing for asyncfifo didn't work :(
<whitequark>
yeah, it looked cute
emeb_mac has quit [Ping timeout: 268 seconds]
emeb_mac has joined ##openfpga
mmicko has quit [Quit: leaving]
mmicko has joined ##openfpga
<kc8apf>
mwk: on 7 series, CRC is only used for verifying the bitstream got transferred to the config hardware without error.
<mwk>
like all xilinx fpgas, yes
<kc8apf>
I think that check happens in a central location on the chip. Frame data has to be clocked out from that to each tile.
<mwk>
of course it does
<kc8apf>
So the zeros are to generate config cycles to flush that pipeline
<kc8apf>
Not sure why it's zeros instead of type1 nops
<mwk>
yeah, zeroes are to flush the pipeline and switch to the next row, that's reasonable too
<mwk>
but my question still is
<mwk>
why on earth do these 0s still count as FDRI payload for CRC purposes?
<kc8apf>
They don't
<mwk>
they do
<kc8apf>
They count as type 0 frames
<mwk>
I get checksum mismatch with ISE bitstreams if I don't count them
<kc8apf>
CRC is over the whole command steam
<mwk>
no, it's not
<mwk>
NOP packets don't count to CRC
<mwk>
only register writes count towards CRC, and some registers are even exempt from that, like LOUT
<mwk>
and the register address is part of the word going to the CRC, so it wouldn't even make sense to count NOP packets, they don't have a register address
<kc8apf>
trying to find my notes from years ago
<mwk>
and yet... ISE seems to count the null words as if they were going into FDRI address
<mwk>
which is why I'm suspecting ISE has a major bug here and generates broken bitstreams, missing the FDRI packet header before all these NULLs
<kc8apf>
no, 7series is like that too
<kc8apf>
the nulls are just immediately after the FDRI
<kc8apf>
the prjxray code I sent before parses and generates working bitstreams
<kc8apf>
what I can't recall is how null insertion differed between PERFRAMECRC, DEBUG_BITSTREAM, and normal bitstreams
<mwk>
in normal bitstream, they are part of FDRI payload
<mwk>
which is reasonable
<mwk>
they get written to FDRI address, so they count towards CRC with FDRI address
<mwk>
in debug bitstream, they are just stuffed between packets
<mwk>
which doesn't make much sense
<kc8apf>
I'd have to build the prjxray tools and look at a few bitstreams
<mwk>
and what is a perframecrc bitstream
<kc8apf>
I recall them being stuffed between packets in debug bitstream
<mwk>
another undocumented bitgen option?
<kc8apf>
yup
* mwk
fires blindly
<kc8apf>
it writes each frame individually and checks the CRC after each frame
<kc8apf>
since it doesn't use LOUT, it will work on programming interfaces that don't support LOUT
<mwk>
alright, seems it's called '-g PerFrameCrc:Yes' for ISE...
<kc8apf>
I thought they reset the CRC just before the
<kc8apf>
gah. multiple devices strikes again
<mwk>
hrm
<mwk>
I don't see the nulls at all in perframecrc
<mwk>
instead, row switches are accompanied by extra "CMD = WCFG" writes
<whitequark>
ZirconiumX: ok, so, let me try to do the ALU thing i wanted.
<ZirconiumX>
Sure thing
<ZirconiumX>
The ic_count.py script is essentially the equivalent of optimising for area above all else
<tpw_rules>
ZirconiumX: i've never heard of 5 digit 74 chips
<ZirconiumX>
tpw_rules: The 74AC series are vulnerable to ground bounce; 74AC11 has a different pinout to rectify this
<whitequark>
ZirconiumX: i'm wondering if you could try synthesizing by subsystem
<whitequark>
it's hierarchical now, after all
<ZirconiumX>
Unfortunately they couldn't do much to save the reputation of 74AC
<tpw_rules>
why did you pick them then
<tpw_rules>
do you have piles because nobody wants them?
<ZirconiumX>
Because when you want speed, you pick 74AC logic
<tpw_rules>
oh
<tpw_rules>
how fast are you planning to clock this monster
<ZirconiumX>
I've seen a 74AC 6502 replica hit 20MHz on about 200 chips
<tpw_rules>
that's a lot higher than i would have put money on
<ZirconiumX>
Which outpaced the actual chip
<tpw_rules>
yeah
<tpw_rules>
i guess it had a reasonable board design too?
<ZirconiumX>
74HC/74AHC is 4-5MHz for the same design
<ZirconiumX>
Depends on your definition of reasonable. Double-sided SOIC chips
<tpw_rules>
exactly, there's no way it could have done 20 on a breadboard
<tpw_rules>
or wire-wrapped
<ZirconiumX>
PCB, yeah
<ZirconiumX>
Unfortunately I don't have any way of measuring timing on a PCB
<ZirconiumX>
s/(any) (way)/\\1 automated \\2/
<tpw_rules>
\\?
<ZirconiumX>
regex :P
<tpw_rules>
yes but doesn't that mean you'd get \1 automated \2 instead of any automated way
<tpw_rules>
how does one convince yosys to synthesize for 74series logic? it looks like you give it a model of each chip and it picks the best ones? i'm not super familiar with how yosys works
<ZirconiumX>
Depends on your shell, I suppose
<ZirconiumX>
tpw_rules: it's a bastardisation of the ASIC flow
<tpw_rules>
like do you get to define what yosys cells are? or just what they do IRL
<ZirconiumX>
You can feed ABC a list of cells, and then it'll create the design out of those
<tpw_rules>
like a single cell being four independent gates?
<ZirconiumX>
Yep
<sorear>
this isn’t a 74-specific problem, asic libraries generically have multiple output cells
<tpw_rules>
hug
<tpw_rules>
huh
<ZirconiumX>
<sorear> incredibly cursed name for a file format <--- Stallman would be proud
<tpw_rules>
libiberty
<whitequark>
gaaah
<ZirconiumX>
Anyway, I realise this, but ABC don't care, just use a single output cell, it's not their problem
<ZirconiumX>
>.>
<tpw_rules>
you should map it to potato semiconductor's catalog
<ZirconiumX>
Sure, but the collection is sufficiently limited that I think trace lengths would begin to be a problem
<tpw_rules>
also i know it's rude to think about concerns of praticality, but why the boneless architecture in particular
<ZirconiumX>
Simpler CPUs result in fewer gates, and I find it reasonably elegant honestly
<whitequark>
:D
<sorear>
I keep forgetting that “potato” is their actual name
<whitequark>
ZirconiumX: i'm curious which parts you find *in*elegant btw
<whitequark>
it's unlikely i'll change it much, but feedback is still valuable
<tpw_rules>
idk i just wouldn't want to combine the experimentation of not a common cpu architecture with the experimentation of doing it out of 74 logic
<whitequark>
(how many extremely tiny synthesizable CPUs are out there? navre is huge, for example)
<tpw_rules>
he already said there was a 6502 out of somewhat less
<tpw_rules>
but like with a 6502 you could add a few more 74 and have an apple 2
<ZirconiumX>
tpw_rules: with all due respect, if you have to target 7400 logic, conventionality goes out the window
<tpw_rules>
maybe i misunderstood
<tpw_rules>
i thought this was just for fun
<ZirconiumX>
For example, the chips were designed with tristate and open-collector logic in mind
<whitequark>
tpw_rules: lmao are you saying my 16-bit RISC CPU is only slightly larger than a darn 6502
<ZirconiumX>
It is, yes
<ZirconiumX>
whitequark: No, I'm saying it's *smaller* than a darn 6502
<whitequark>
*what*
<tpw_rules>
you said boneless was 230 but someone did a 6502 with only 200
<whitequark>
ZirconiumX: i wonder how many chips you can cut from it if you replace the decoder with a ROM
pie_ has quit [Ping timeout: 250 seconds]
<tpw_rules>
whitequark: is there a reason that logic ops set C and V to undefined?
<tpw_rules>
as opposed to preserving or eg bit 16/15 like 6502
<whitequark>
tpw_rules: everything that i didn't explicitly design to function in a specific way is left undefined so it doesn't constrain me later
<whitequark>
so, you said exactly the reason
<whitequark>
i'm not sure what's the best behavior is!
<tpw_rules>
both led to some pretty elegant tricks on 6502
<whitequark>
you don't need bit 16 in flags because um
<whitequark>
0x8000 is a 3-bit encoded immediate
<ZirconiumX>
Removing the flatten pass makes it very slightly less efficient
<ZirconiumX>
I probably *could* make the decoder a ROM, but I'd have to investigate
<ZirconiumX>
Also the 16374 does a lot of lifting
<tpw_rules>
i don't think the manual defines decode_imm_al/sr
<whitequark>
tpw_rules: indeed it doesn't, it's in a section i didn't write
<tpw_rules>
oh
<whitequark>
the synopsis in the design spreadsheet shows the enocding
<tpw_rules>
do you need help
<tpw_rules>
with writing docs
<whitequark>
mm, maybe!
<whitequark>
i've been focusing on toolchain support and smolness for now
<tpw_rules>
fetishizing over weird assembly tricks is one thing i like
<whitequark>
though a baseline level of docs is required
<whitequark>
lol
<ZirconiumX>
whitequark: was the "gah" about the ALU thing?
<tpw_rules>
also the logo needs way more visibility
<whitequark>
ZirconiumX: did i say that
<whitequark>
i'm confused
<tpw_rules>
also i guess all the load instructions are word addressed?
<tpw_rules>
i thought it was about stallman
<ZirconiumX>
<whitequark> gaaah
<whitequark>
ZirconiumX: about stallman
<whitequark>
tpw_rules: yes
<ZirconiumX>
Fair
<whitequark>
that's maybe one part of the design i'm not sure about
<whitequark>
i mean. cons: it makes porting C to Boneless painful
<whitequark>
pros: it makes porting C to Boneless painful
<ZirconiumX>
It's not like this is a unique thing to Boneless
<tpw_rules>
cursed idea: i/o space?
<cr1901_modern>
Without delay states, I can't boneless with the internal user ROM area on MachXO2 (yes, PR for support in nmigen coming soon)
<ZirconiumX>
MIPS, and SPARC I think both require being word aligned
<tpw_rules>
yes
<tpw_rules>
many RISCs per se require it
<cr1901_modern>
ZirconiumX: boneless current can't store anything but 16 bit words right now
<cr1901_modern>
or wq did you get rid of that
<ZirconiumX>
Ah, I see
<whitequark>
tpw_rules: ehhhhh
<cr1901_modern>
tpw_rules: In practice most RISC CPUs eventually got unaligned stores and loads. There were just too many problems on real life packed formats to keep that constraint
<emily>
whitequark: hey it's not your fault nobody can handle CHAR_BIT > 8
<tpw_rules>
also of course there is no docs on how the register windows work
<tpw_rules>
cr1901_modern: i mean yeah but to my understanding it was late, required OS intervention, slower, etc
<whitequark>
tpw_rules: are you sure?
<whitequark>
i mean
<whitequark>
there's no explicit doc, but i believe all semantics is constrained
<whitequark>
since each instruction specifies the exact function of W for it
<tpw_rules>
yes
<whitequark>
it could be better, certainly
<cr1901_modern>
It made sense at the time, but doesn't anymore. Kinda like ARM's PC pointing 8 past the actual currently executing insn
<cr1901_modern>
(or 4 in thumb you pedants)
<tpw_rules>
oh, i guess you're using | to mean concatenate
<ZirconiumX>
emily: that was a good laugh
<sorear>
RISC is an art history term
<tpw_rules>
that seems funky wrt the register window. how do you pass parameters?
<emily>
ZirconiumX: Cray's crack team of lawyers is preparing their case against you as we speak.
<cr1901_modern>
emily: I unironically want a new arch w/ CHAR_BIT % 8 != 0. Well, Clemency exists
<cr1901_modern>
but no C compiler
<emily>
CHAR_BIT is one of the things that makes a Turing-complete implementation of C impossible :'(
<emily>
should clearly be removed
<tpw_rules>
yeah i think either the docs for ADJW are wrong or mem[W|Ra] is wrong or ext13|imm3 is wrong
<cr1901_modern>
?
<tpw_rules>
ADJW maintains that W is a multiple of 8, but it's concatenated with the register number.
<whitequark>
hm
<ZirconiumX>
emily: Cray? Surely you jest; Unisys will have their legal department ready
<tpw_rules>
also should it be a multiple of 8? maybe you could have sliding registers like itanium for passing parameters
<whitequark>
tpw_rules: re parameters: via LDW and loads with offset
<emily>
i was reading Cray's wikipedia article recently and i was really amused at just how obsessed he was with making computers go fast
<emily>
like he'd come out with the fastest computer in the universe, super successful, everyone is happy
<emily>
and then immediately stomp his feet and go "ok now I want to make one TEN TIMES FASTER"
<cr1901_modern>
yea he did maxwell's equations for individual wires
<tpw_rules>
whitequark: oh, i missed that instruction
<cr1901_modern>
to figure out prop delay and other fun shit
<emily>
even when there was no commercial demand
<emily>
and refuse to work on anything that anyone actually wants
<tpw_rules>
still i wonder if multiples of 8 for the register window is unnecessarily constraining
<whitequark>
tpw_rules: re docs for ADJW: yes, it's missing the part where the low bits of W are unimplemented
<emily>
and also he just kept doing this until he died
<whitequark>
needs to be fixed
<whitequark>
tpw_rules: re sliding windows: it removes one adder from the implementation
<whitequark>
I suspect it will have a significant impact on performance
<whitequark>
but!
<whitequark>
we can always relax it later.
<whitequark>
that's why LDW is encoded like it is
<tpw_rules>
okay i'm wondering how to convey that the low 3 bits of W are unimplemented
<tpw_rules>
like maybe it should be W <- W + imm>>2
<tpw_rules>
because if there are 3 unimplemented bits in W, mem[W|Ra] doesn't make sense
<tpw_rules>
valid point on the sliding windows
<whitequark>
tpw_rules: wait, why?
<whitequark>
Ra is 3 bits long
<tpw_rules>
how long is W?
<whitequark>
13 bits
<tpw_rules>
okay, so adjw should say W <- W + imm>>2
<whitequark>
why >>2?
<tpw_rules>
because the imm always has the low 3 bits zero
<tpw_rules>
oh i meant to write >>3
<tpw_rules>
but still. if the imm must have the low 3 bits zero and W is only 13 bits, long, then there's only 10 effective bits of W
<whitequark>
oh I see
<whitequark>
so | in W|Rb is logical OR
<whitequark>
and the spec says the behavior is UNPREDICTABLE if you ever put anything into the low W bits
<whitequark>
(the ones that can be unimplemented)
<tpw_rules>
yeah so it's effectively 13
<whitequark>
it's technically correct i think
<tpw_rules>
okay that makes sense
<whitequark>
but certainly confusing
<whitequark>
it definitely needs an informative section
<tpw_rules>
i interpreted | to mean concatenation because all over the place you say ext13|x
<whitequark>
ah shit
<whitequark>
you're totally right
<whitequark>
we should fix it
<ZirconiumX>
And this all started from me wanting to synthesise Boneless for 7400 logic
<ZirconiumX>
My work here is done /s
<tpw_rules>
and afair | is math for concatenation. at least ||
<whitequark>
yeah
<whitequark>
doc bug.
<emily>
|| for concatenation is also pretty confusing notation...
<whitequark>
I use `or` for logical op I think
<whitequark>
yeah
<emily>
maybe do {verilog,style} or haskell ++ style
<sorear>
can we get the rest of the ANSI SQL operators added
<whitequark>
lol
<ZirconiumX>
'); DROP TABLE instruction_set; --
<tpw_rules>
i mean this began life as an excel doc, sql is only natrual
<tpw_rules>
anyway have you had a chance to apply any of your idea of what a portable assembler might look like to this?
<whitequark>
not yet
<whitequark>
need to improve more low-level parts of it first
<whitequark>
the Fmax is ridiculously low
<emily>
also help i didn't know about potato semiconductor
<emily>
"Why called Potatosemi as Brand?
<emily>
We are the IC design house making chips. Potato chips are the most popular chips in the world. They are high volume, low price & taste good. All of the people like to eat them. All of the people are happy with them. This is exactly our goals. We will like to make our chips as popular as potato chips, as high volume as potato chips, as low price as potato chips. All of the computers & electronics devices like our chips'
<emily>
taste. All of the people like to use them because they are easy to use & all of the people are happy with potato chips."
<whitequark>
amazing
<tpw_rules>
there's that but they only produce bonkers shit.
<emily>
yes i also saw the pentium 4-ass clock ttl logic
<whitequark>
timecube logic
<tpw_rules>
the 4 sides of the pentium
<tpw_rules>
(but it should have 5??)
<Ultrasauce>
who called it qdr and not 4 corners simultaneous time clock
<tpw_rules>
last q for now: how many clocks per instruction is this thing? there's a lot of memory access
<whitequark>
4 cpi
<whitequark>
uh
<whitequark>
for most instructions
<whitequark>
shifts are 4+n, complex jumps are 5 (i think)
<tpw_rules>
btw you use | as concatenate in the shifts too
<whitequark>
yeah
<whitequark>
I forgot
<whitequark>
sorry
<tpw_rules>
which don't appear to have their operands specified correctly anyway? as written it's not possible to shift left by 1, only 2 or greater
<tpw_rules>
i feel rude commenting all these doc problems but docs are important to me. please let me know if i can fix them
<whitequark>
tpw_rules: sure, just send a PR
<whitequark>
by no means i feel a lot of attachment to these docs
<tpw_rules>
okay. i'll have to brush up on my tex :P
<whitequark>
i try to not identify with my designs or code or doc too much :p
<tpw_rules>
i did recently get an icebreaker int he mail. i need to throw boneless on there
<tpw_rules>
so i gather you're using roughly Verilog syntax in the Operation parts
pie_ has joined ##openfpga
<sorear>
left shift by 1 is redundant if you have addition, though
<tpw_rules>
it affects shift right too
<whitequark>
let me see
<whitequark>
can you explain why a shift left by 1 isn't possible?
<tpw_rules>
according to the decode_sr table in the excel, imm3 is decoded to 1-8. which makes sense since shift by zero is a nop. but then you add 1 to imm3 before shifting, so you can only shift 2-9
<tpw_rules>
you would have to encode it as an EXTI
<tpw_rules>
to get 1
<whitequark>
uh, where do I add 1 to imm3?
<tpw_rules>
(also imm3 should be opB on the res <- line anyway)
<whitequark>
oh shit
<tpw_rules>
in the operation of SLLI on page 58
<whitequark>
I forgot to update that part
<whitequark>
it's trying to do what decode_imm_sr is already doing
<tpw_rules>
exactly
<whitequark>
because imm3=0 is actually shift by 8
<whitequark>
aka byte swap
<tpw_rules>
you're aping 6502 again: no ROR :P (yes i know you can do it with ROL)
<tpw_rules>
anyway. it looks to me like you're using vaguely verilog syntax in the docs. i'll rewrite it all to fix these mistakes and maybe enhance some of the other parts and submit a PR. sound good?
<whitequark>
absolutely
<whitequark>
something that would help is an explicit section defining the syntax
<whitequark>
I was planning that but only got around to the bare minimum
<whitequark>
define_imm_* should be described using tables as well
<tpw_rules>
like that it's rd, ra, rb? what about labels and stuff? i am particular and have hacked several assemblers to have my label style :P
<whitequark>
nonono
<whitequark>
the syntax for the stuff in "Operation"
<tpw_rules>
oh okay
<tpw_rules>
yeah sure
<whitequark>
assembler is separate
<whitequark>
and yeah the syntax is super simple
<whitequark>
the labels are "label:" :p
<tpw_rules>
the true distinguisher: what is the comment character
<whitequark>
; i think
<tpw_rules>
maybe i can throw like 16 of these on my icebreaker and have some fun with the led matrices
<tpw_rules>
approved
<whitequark>
:D
<whitequark>
the one problem is
<whitequark>
it doesn't have any directives right now
<whitequark>
well
<tpw_rules>
i like + and - labels too, and @local labels
<whitequark>
there's .word
<tpw_rules>
. is my preferred directive character
<whitequark>
it needs more directives
<tpw_rules>
.work
<whitequark>
I just couldn't find enough time to really add them properly
<whitequark>
will try soon
<whitequark>
oh btw
<whitequark>
if you could open an issue and suggest a principled set of directives i'm all ears.
<whitequark>
we need at least uhmm
<tpw_rules>
for the assembler? yeah i can think of my favs
<whitequark>
a directive for the jump tables
<whitequark>
cuz it changes how the addresses are calculated
<tpw_rules>
i assume you're going to be grossed out at people doing more than like 200 lines
<whitequark>
not really, zignig seems to really enjoy writing boneless assembly so why not
<whitequark>
i mean
<whitequark>
it used to not have a text assembler
<tpw_rules>
why does it? is it just easier to process? there's no optimization like on arm where you can encode the offsets in bytes
<whitequark>
but i looked at how much fun zignig was having and decided to add one cuz why not
<tpw_rules>
i don't understand why JVT is relative
<tpw_rules>
to the table
<whitequark>
because all boneless code is PC-relative
<whitequark>
it's inherently relocatable
<whitequark>
at no additional instruction cost
<whitequark>
this has created a major implementation nightmare with the orthogonal instruction set
<tpw_rules>
but it's relative to the table, not the PC
<whitequark>
but i managed it
<whitequark>
er
<whitequark>
yes
<tpw_rules>
i guess that makes sense
<whitequark>
because it's a vtable instruction
<tpw_rules>
yeah i didn't think enough there
<whitequark>
now J*S*T is relative to PC
<whitequark>
ironically, JST is harder to implement than JVT
pie_ has quit [Ping timeout: 250 seconds]
<tpw_rules>
other thought: is it possible to randomly permute the encodings and see if they save resources
<whitequark>
it would be very easy to do so bc the entire thing is generated from exactly 1 source of truth
<whitequark>
arch.opcode
<whitequark>
i already did permute them to hopefully simplify the decoder a bit
<tpw_rules>
oh ok
<whitequark>
but not extensively
* tpw_rules
only likes optimizations when they take hours and only net like a 1% improvement
<whitequark>
lol
<whitequark>
there's some low hanging fruit there still
<whitequark>
for example, the unencoded instructions could be added to the decoder such that they reuse other paths
<whitequark>
well
<whitequark>
you could also use 'x...
<whitequark>
... but i don't like it.
<tpw_rules>
are there any facilities for interrupts?
<whitequark>
not currently :D
<whitequark>
i never figured out what to do with the flags
<tpw_rules>
i wonder if you could have a port on it that like loads the high 8 bits of pc and W simultaneously
<whitequark>
or the return pc
<tpw_rules>
oh yeah that would be weird too
<whitequark>
no, you don't need to care about W
<tpw_rules>
but if the interrupt manager loaded W, you could just stick them in r0 and r1
<whitequark>
oh wait
<whitequark>
i think i know what can be done
<tpw_rules>
of the interrupting window
<whitequark>
yeah
<whitequark>
we could make it push the window
<whitequark>
there's no way to restore the flags tho
<tpw_rules>
flagless arch :D
<whitequark>
lol
<tpw_rules>
exti already stores weird state between insns and anything that doesn't write to the flags destroys them
<tpw_rules>
(and that involves the ALU)
<tpw_rules>
hm that would be gross though
<whitequark>
oh shit
<whitequark>
what happens if an interrupt arrives during exti
<tpw_rules>
make it not?
<whitequark>
variable interrupt latency is gross
<whitequark>
i think exti should be restartable
<tpw_rules>
your instructions have variable cycle counts
<whitequark>
shit
<whitequark>
point
<tpw_rules>
and exti is like 1 cycle anyway
<whitequark>
yeah, it should be short-circuited in the FSM but currently isn't
<TD-Linux>
lol this is the first time I actually looked at potato semi's catalog. amazing
<tpw_rules>
yeah i can't see a nice place to fit in a restore flags instruction. but i think having an interrupt push the window and store old state like pc and flags in registers is okay. it might be a lot of memory writes though
<whitequark>
tpw_rules: tbh i was thinking of having an interrupt controller as a peripheral
<whitequark>
(that peeks into CPU state)
<whitequark>
but... i don't know
<tpw_rules>
yeah but i imagine it would be nice to have a port that's like 16 bits of new pc and a bit to cause a transfer to it
<tpw_rules>
that an interrupt controller can hook into. or a simple system can assert a constant value on pc and flip the interrupt bit to activate it
<whitequark>
or just make it a soft reset
Asu` has joined ##openfpga
<whitequark>
you can make an interrupt table with 2 instructions
Asu has quit [Ping timeout: 248 seconds]
<whitequark>
LDX, JST
<tpw_rules>
you still lose flags and EXTI state though
<whitequark>
hence soft
<whitequark>
so it will activate only in FETCH stage without EXTI active
<tpw_rules>
arm cortex has the weird hack of having a magic return value that unstacks the exception stack frame
<whitequark>
yes
<whitequark>
i kinda want to make that
<whitequark>
it might make sense to do something like um
<tpw_rules>
i don't understand what you mean by soft reset as compared to that idea i had
<whitequark>
i mean
<whitequark>
reset just for the PC
<whitequark>
mhm
<whitequark>
actually
<whitequark>
tpw_rules: so here's what i'm thinking
<whitequark>
maybe an interrupt should hijack the decoder logic to drive LDW onto the internal buses and then a JALR
<whitequark>
this gives you PC
<whitequark>
not sure about flags :/
<whitequark>
as for "restore flags" instruction
<emily>
15:20 <TD-Linux> lol this is the first time I actually looked at potato semi's catalog. amazing
<emily>
like seriously is this some kind of practical joke or
<tpw_rules>
i think a magic return value would have a high cost
<whitequark>
we can reserve the encoding of x: JALR Rn, x which is basically totally useless
<whitequark>
to mean RETI
<whitequark>
and the Rn could be repurposed to mean the actual flags
<whitequark>
so an interrupt handler would use self-modifying code
<tpw_rules>
what is x
<whitequark>
label
<tpw_rules>
also you don't have a jalr instruction
<whitequark>
tpw_rules: oh wait
<whitequark>
JRAL
<whitequark>
same difference
<whitequark>
oh
<whitequark>
wait
<whitequark>
i meant JAL here
<whitequark>
oh that has an assembly bug
<whitequark>
missing Rd
<tpw_rules>
oh i missed the label, that makes sense
<tpw_rules>
so you just wrote a funky infinite loop
<tpw_rules>
okay
<whitequark>
tpw_rules: we also have a "JN" instruction
<whitequark>
which can probably be repurposed if necessary
<whitequark>
but it has the wrong encoding for this
<tpw_rules>
also you have four flag bits so you can't encode them in the register number
<whitequark>
yes, just realized
<tpw_rules>
what's wrong with xchf though
<tpw_rules>
can that be easily encoded?
<whitequark>
xchf Rd.
<whitequark>
I ... guess?
<tpw_rules>
i think that could be a nice general purpose instruction too.
<whitequark>
it'd need to be stuffed into some obscure corner
<tpw_rules>
wouldn't that make it more complex to implement
_whitenotifier has joined ##openfpga
<_whitenotifier>
[whitequark/Boneless-CPU] whitequark pushed 2 commits to master [+0/-0/±5] https://git.io/fjNhn