flaviusb has joined ##openfpga
<mwk> um
<mwk> you sent me a glasgow?
<cr1901_modern> whitequark: I want to play with Boneless on tinyfpga A and up5k and use the internal hard IP (for fun). Is there a way to give boneless wait states for e.g. wishbone buses? Maybe wrap the whole FSM in a CEInsterter?
<whitequark> cr1901_modern: that will work in a pinch, yes
<whitequark> i'll add explicit support for wait states a bit later
<cr1901_modern> (Btw, _can_ you wrap the whole FSM in a CEInserter in nmigen?)
<whitequark> of course
<whitequark> well, you wrap the entire core.
<whitequark> (and it's EnableInserter now)
<cr1901_modern> I thought maybe one should wrap just the FSM, just in case. But no problem if wrapping the entire core makes no difference. This may no longer be supported in nmigen tho (in omigen it was possible to wrap an FSM object in an enable inserter).
<cr1901_modern> >i'll add explicit support for wait states a bit later <-- take your time. Clearly I'm dragging my feet even getting started w/ this :P
<whitequark> well yes, an FSM is no longer a separate module
<whitequark> the thing is that it's what you actually want
<whitequark> because otherwise you get hierarchy flattening, issues with multiple drivers, etc
<cr1901_modern> it's not a big deal I don't think. If I want just FSM functionality of a larger component to have EnablerInserter, I just need to make an Elaborate whose sole contents are the FSM. And then combine the FSM and non-FSM parts in a parent Elabortable
<cr1901_modern> (I think)
<whitequark> I *think* it's usually easier to insert with m.If(...): m.next = ... everywhere
<whitequark> because
<whitequark> in practice it turns out you don't want them *everywhere* after all
<whitequark> just my experience
<cr1901_modern> oh... yea, that'll also work. Oops...
<whitequark> I've tried using EnableInserter a few times on an FSM and basically regretted it each time
<whitequark> often FSMs have transient states that drive some other comb enable signal for example
<whitequark> this tends to produce obscure bugs
<cr1901_modern> The "CEInserter" trick was used in the misoc SPI core (well one of them anyway) to implement clock division. It's the only place I remember seeing it used, but I figured it might be useful for wait states.
<whitequark> yes. it needs to be used with great care for clock division.
<whitequark> if your FSM has any transient states...
<whitequark> e.g. say
<whitequark> if your FSM strobes fifo.we
<whitequark> and it gets stopped with fifo.we high
<whitequark> you see what will happen.
<cr1901_modern> Congrats, you turned a pulse signal into a logic high signal :P
<whitequark> yep
<whitequark> this is why I think not being able to EnableInserter()(FSM()) in nmigen will probably lead to a net reduction of bugs, if anything
<cr1901_modern> Sure that's very reasonable.
<TD-Linux> I'm sad that the set-counters-equal-on-reset thing for asyncfifo didn't work :(
<whitequark> yeah, it looked cute
emeb_mac has quit [Ping timeout: 268 seconds]
emeb_mac has joined ##openfpga
mmicko has quit [Quit: leaving]
mmicko has joined ##openfpga
<kc8apf> mwk: on 7 series, CRC is only used for verifying the bitstream got transferred to the config hardware without error.
<mwk> like all xilinx fpgas, yes
<kc8apf> I think that check happens in a central location on the chip. Frame data has to be clocked out from that to each tile.
<mwk> of course it does
<kc8apf> So the zeros are to generate config cycles to flush that pipeline
<kc8apf> Not sure why it's zeros instead of type1 nops
<mwk> yeah, zeroes are to flush the pipeline and switch to the next row, that's reasonable too
<mwk> but my question still is
<mwk> why on earth do these 0s still count as FDRI payload for CRC purposes?
<kc8apf> They don't
<mwk> they do
<kc8apf> They count as type 0 frames
<mwk> I get checksum mismatch with ISE bitstreams if I don't count them
<kc8apf> CRC is over the whole command steam
<mwk> no, it's not
<mwk> NOP packets don't count to CRC
<mwk> only register writes count towards CRC, and some registers are even exempt from that, like LOUT
<mwk> and the register address is part of the word going to the CRC, so it wouldn't even make sense to count NOP packets, they don't have a register address
<kc8apf> trying to find my notes from years ago
<mwk> and yet... ISE seems to count the null words as if they were going into FDRI address
<mwk> which is why I'm suspecting ISE has a major bug here and generates broken bitstreams, missing the FDRI packet header before all these NULLs
<kc8apf> no, 7series is like that too
<kc8apf> the nulls are just immediately after the FDRI
<kc8apf> the prjxray code I sent before parses and generates working bitstreams
<kc8apf> what I can't recall is how null insertion differed between PERFRAMECRC, DEBUG_BITSTREAM, and normal bitstreams
<mwk> in normal bitstream, they are part of FDRI payload
<mwk> which is reasonable
<mwk> they get written to FDRI address, so they count towards CRC with FDRI address
<mwk> in debug bitstream, they are just stuffed between packets
<mwk> which doesn't make much sense
<kc8apf> I'd have to build the prjxray tools and look at a few bitstreams
<mwk> and what is a perframecrc bitstream
<kc8apf> I recall them being stuffed between packets in debug bitstream
<mwk> another undocumented bitgen option?
<kc8apf> yup
* mwk fires blindly
<kc8apf> it writes each frame individually and checks the CRC after each frame
<kc8apf> since it doesn't use LOUT, it will work on programming interfaces that don't support LOUT
<mwk> alright, seems it's called '-g PerFrameCrc:Yes' for ISE...
<kc8apf> I thought they reset the CRC just before the
<kc8apf> gah. multiple devices strikes again
<mwk> hrm
<mwk> I don't see the nulls at all in perframecrc
<mwk> instead, row switches are accompanied by extra "CMD = WCFG" writes
<mwk> wait, wtf
<mwk> it seems every first frame of a row is uploaded twice?
<kc8apf> no. go read that comment
<kc8apf> then be even more upset
<kc8apf> gotta go. partner is feeling ignored.
<mwk> what
<mwk> thanks, I hate it
mumptai has joined ##openfpga
tlwoerner has quit [Ping timeout: 246 seconds]
tlwoerner has joined ##openfpga
emeb_mac has quit [Ping timeout: 244 seconds]
Jybz has joined ##openfpga
Asu has joined ##openfpga
pie_ has joined ##openfpga
zkms has quit [Quit: zkms]
zkms has joined ##openfpga
unixb0y has quit [Ping timeout: 272 seconds]
unixb0y has joined ##openfpga
Prf_Jakob has quit [Quit: Spoon!]
Prf_Jakob has joined ##openfpga
* zignig would like to thank her quarkyness for all the thankless worrk on stuff various.
* zignig burrows into a bootloader that does not work _at_all_ about 9600 bps.
<zignig> *above
pie_ has quit [Remote host closed the connection]
pie_ has joined ##openfpga
pie__ has joined ##openfpga
pie_ has quit [Client Quit]
pie__ has quit [Ping timeout: 250 seconds]
pie_ has joined ##openfpga
mumptai has quit [Quit: Verlassend]
pie_ has quit [Ping timeout: 250 seconds]
pie_ has joined ##openfpga
Richard_Simmons has joined ##openfpga
Bob_Dole has quit [Ping timeout: 276 seconds]
pie_ has quit [Ping timeout: 250 seconds]
emeb_mac has joined ##openfpga
Richard_Simmons3 has joined ##openfpga
pie_ has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 276 seconds]
Richard_Simmons has joined ##openfpga
Richard_Simmons3 has quit [Ping timeout: 276 seconds]
Richard_Simmons3 has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 276 seconds]
Richard_Simmons has joined ##openfpga
Richard_Simmons3 has quit [Ping timeout: 264 seconds]
emeb_mac has quit [Ping timeout: 268 seconds]
pie_ has quit [Ping timeout: 250 seconds]
azonenberg has quit [Ping timeout: 246 seconds]
azonenberg has joined ##openfpga
Richard_Simmons3 has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 276 seconds]
Richard_Simmons has joined ##openfpga
Richard_Simmons3 has quit [Ping timeout: 276 seconds]
Richard_Simmons3 has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 250 seconds]
pie_ has joined ##openfpga
Richard_Simmons has joined ##openfpga
Richard_Simmons3 has quit [Ping timeout: 276 seconds]
Richard_Simmons3 has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 276 seconds]
Richard_Simmons has joined ##openfpga
Bob_Dole has joined ##openfpga
Richard_Simmons3 has quit [Ping timeout: 276 seconds]
Richard_Simmons3 has joined ##openfpga
Richard_Simmons has quit [Ping timeout: 276 seconds]
Bob_Dole has quit [Ping timeout: 276 seconds]
Richard_Simmons has joined ##openfpga
Richard_Simmons3 has quit [Ping timeout: 276 seconds]
s_frit has quit [Remote host closed the connection]
s_frit has joined ##openfpga
pie_ has quit [Ping timeout: 250 seconds]
pie_ has joined ##openfpga
pie_ has quit [Excess Flood]
pie_ has joined ##openfpga
mumptai has joined ##openfpga
pie_ has quit [Ping timeout: 250 seconds]
pie_ has joined ##openfpga
AndrevS has joined ##openfpga
Jybz has quit [Quit: Konversation terminated!]
<ZirconiumX> whitequark: How do I get the RTL for the Boneless CPU?
rohitksingh has joined ##openfpga
<whitequark> ZirconiumX: let's see
<whitequark> ZirconiumX: you essentially want to use it outside of nMigen right?
<ZirconiumX> Yeah
<ZirconiumX> nmigen.cli had a generate/simulate script argument, right?
<whitequark> yes, but it's a bit more tricky with boneless because it's configurable
<whitequark> let me add something
<whitequark> ZirconiumX: ok, right, do you want to provide your own main memory, or just the CPU?
<ZirconiumX> Just the CPU for now, I want to see how it synthesises at the moment
<whitequark> oh yeah
<whitequark> python3 -m boneless.gateware.core core-fsm generate boneless.v
<ZirconiumX> Git pull?
<whitequark> nope, already works
<whitequark> looks like with current yosys and -abc9 it syntesizes to 458 LUT.
<whitequark> which is certainly too many.
<ZirconiumX> Thanks WQ
<mwk> sync
<mwk> sync
<mwk> sync
<mwk> sudo reboot
<whitequark> the decoder is about 76 LUT, but more worryingly, the toplevel FSM is 200 LUT
<whitequark> mwk: um
<Ultrasauce> 3 syncs just to be sure
<ZirconiumX> mwk: :wq!
<whitequark> ZirconiumX: HEY
<whitequark> i have a highlight on that
* tpw_rules issues ACPI S5 request
<ZirconiumX> Sorry :P
<ZirconiumX> ...230 74xx chips
<ZirconiumX> That's...within the realm of feasibility
<whitequark> ZirconiumX: if you want to actually proceed with it, I'd be glad to optimize it some more
<whitequark> I think I came up with a nice way to simplify the ALU recently
<whitequark> might as well apply that and try again
<ZirconiumX> I'm sure you're aware of how rough the 74xx flow is
<whitequark> I am not
<whitequark> I know nothing about the 74xx flow
<ZirconiumX> The idealised model of the world Yosys has and the available 74xx chips are at right angles to each other
<mwk> whoops, sorry, thought I had a terminal there
* mwk curses shit display drivers locking up
<whitequark> mwk: magic sysrq?
<mwk> whitequark: disabled by default on arch
<whitequark> yeah i hate that shit
<whitequark> on debian too
<ZirconiumX> https://pastebin.com/amgt61hZ if anybody is curious
<whitequark> ZirconiumX: ok, so, let me try to do the ALU thing i wanted.
<ZirconiumX> Sure thing
<ZirconiumX> The ic_count.py script is essentially the equivalent of optimising for area above all else
<tpw_rules> ZirconiumX: i've never heard of 5 digit 74 chips
<ZirconiumX> tpw_rules: The 74AC series are vulnerable to ground bounce; 74AC11 has a different pinout to rectify this
<whitequark> ZirconiumX: i'm wondering if you could try synthesizing by subsystem
<whitequark> it's hierarchical now, after all
<ZirconiumX> Unfortunately they couldn't do much to save the reputation of 74AC
<tpw_rules> why did you pick them then
<tpw_rules> do you have piles because nobody wants them?
<ZirconiumX> Because when you want speed, you pick 74AC logic
<tpw_rules> oh
<tpw_rules> how fast are you planning to clock this monster
<ZirconiumX> I've seen a 74AC 6502 replica hit 20MHz on about 200 chips
<tpw_rules> that's a lot higher than i would have put money on
<ZirconiumX> Which outpaced the actual chip
<tpw_rules> yeah
<tpw_rules> i guess it had a reasonable board design too?
<ZirconiumX> 74HC/74AHC is 4-5MHz for the same design
<ZirconiumX> Depends on your definition of reasonable. Double-sided SOIC chips
<tpw_rules> exactly, there's no way it could have done 20 on a breadboard
<tpw_rules> or wire-wrapped
<ZirconiumX> PCB, yeah
<ZirconiumX> Unfortunately I don't have any way of measuring timing on a PCB
<ZirconiumX> s/(any) (way)/\\1 automated \\2/
<tpw_rules> \\?
<ZirconiumX> regex :P
<tpw_rules> yes but doesn't that mean you'd get \1 automated \2 instead of any automated way
<tpw_rules> how does one convince yosys to synthesize for 74series logic? it looks like you give it a model of each chip and it picks the best ones? i'm not super familiar with how yosys works
<ZirconiumX> Depends on your shell, I suppose
<ZirconiumX> tpw_rules: it's a bastardisation of the ASIC flow
<tpw_rules> like do you get to define what yosys cells are? or just what they do IRL
<ZirconiumX> You can feed ABC a list of cells, and then it'll create the design out of those
<ZirconiumX> Yeah, it's called the Liberty format
<ZirconiumX> Unfortunately it's highly undocumented
<ZirconiumX> So the flow is unaware of timing
<tpw_rules> so yosys has an arbitrary set of cells and you use abc to transform them to another set of cells which contains 74 series chips
<ZirconiumX> Sufficiently Arbitrary
<tpw_rules> i guess a cell is something like "one and gate"?
<ZirconiumX> If you note, I named the cells using a specific convention
<sorear> incredibly cursed name for a file format
<tpw_rules> yeah but that's your cells
<ZirconiumX> The part name, and then what it does
<tpw_rules> how does yosys pack the bundle of logic for those
<ZirconiumX> <tpw_rules> i guess a cell is something like "one and gate"? <--- "two signals ANDed together" is a $and
<ZirconiumX> *two arbitrary-width signals
<tpw_rules> that's a yosys internal cell, the $and
<ZirconiumX> Yep
<ZirconiumX> Which then becomes $_AND_
<tpw_rules> then you define that an 008 is four 2 input and gates. what maps between all the $ands and that?
<ZirconiumX> tpw_rules: There's a hack to this
<tpw_rules> i never would have guessed
<ZirconiumX> ABC doesn't understand multiple-output gates
<tpw_rules> like a single cell being four independent gates?
<ZirconiumX> Yep
<sorear> this isn’t a 74-specific problem, asic libraries generically have multiple output cells
<tpw_rules> hug
<tpw_rules> huh
<ZirconiumX> <sorear> incredibly cursed name for a file format <--- Stallman would be proud
<tpw_rules> libiberty
<whitequark> gaaah
<ZirconiumX> Anyway, I realise this, but ABC don't care, just use a single output cell, it's not their problem
<ZirconiumX> >.>
<tpw_rules> you should map it to potato semiconductor's catalog
<ZirconiumX> Sure, but the collection is sufficiently limited that I think trace lengths would begin to be a problem
<tpw_rules> also i know it's rude to think about concerns of praticality, but why the boneless architecture in particular
<ZirconiumX> Simpler CPUs result in fewer gates, and I find it reasonably elegant honestly
<whitequark> :D
<sorear> I keep forgetting that “potato” is their actual name
<whitequark> ZirconiumX: i'm curious which parts you find *in*elegant btw
<whitequark> it's unlikely i'll change it much, but feedback is still valuable
<tpw_rules> idk i just wouldn't want to combine the experimentation of not a common cpu architecture with the experimentation of doing it out of 74 logic
<whitequark> (how many extremely tiny synthesizable CPUs are out there? navre is huge, for example)
<tpw_rules> he already said there was a 6502 out of somewhat less
<tpw_rules> but like with a 6502 you could add a few more 74 and have an apple 2
<ZirconiumX> tpw_rules: with all due respect, if you have to target 7400 logic, conventionality goes out the window
<tpw_rules> maybe i misunderstood
<tpw_rules> i thought this was just for fun
<ZirconiumX> For example, the chips were designed with tristate and open-collector logic in mind
<whitequark> tpw_rules: lmao are you saying my 16-bit RISC CPU is only slightly larger than a darn 6502
<ZirconiumX> It is, yes
<ZirconiumX> whitequark: No, I'm saying it's *smaller* than a darn 6502
<whitequark> *what*
<tpw_rules> you said boneless was 230 but someone did a 6502 with only 200
<whitequark> ZirconiumX: i wonder how many chips you can cut from it if you replace the decoder with a ROM
pie_ has quit [Ping timeout: 250 seconds]
<tpw_rules> whitequark: is there a reason that logic ops set C and V to undefined?
<tpw_rules> as opposed to preserving or eg bit 16/15 like 6502
<whitequark> tpw_rules: everything that i didn't explicitly design to function in a specific way is left undefined so it doesn't constrain me later
<whitequark> so, you said exactly the reason
<whitequark> i'm not sure what's the best behavior is!
<tpw_rules> both led to some pretty elegant tricks on 6502
<whitequark> you don't need bit 16 in flags because um
<whitequark> 0x8000 is a 3-bit encoded immediate
<ZirconiumX> Removing the flatten pass makes it very slightly less efficient
<ZirconiumX> But here you go, whitequark: https://pastebin.com/wE4EsjbY
<whitequark> interesting
<whitequark> the decoder is actually not large
<whitequark> the toplevel is *huge*
<whitequark> i need to fix that
<ZirconiumX> I probably *could* make the decoder a ROM, but I'd have to investigate
<ZirconiumX> Also the 16374 does a lot of lifting
<tpw_rules> i don't think the manual defines decode_imm_al/sr
<whitequark> tpw_rules: indeed it doesn't, it's in a section i didn't write
<tpw_rules> oh
<whitequark> the synopsis in the design spreadsheet shows the enocding
<tpw_rules> do you need help
<tpw_rules> with writing docs
<whitequark> mm, maybe!
<whitequark> i've been focusing on toolchain support and smolness for now
<tpw_rules> fetishizing over weird assembly tricks is one thing i like
<whitequark> though a baseline level of docs is required
<whitequark> lol
<ZirconiumX> whitequark: was the "gah" about the ALU thing?
<tpw_rules> also the logo needs way more visibility
<whitequark> ZirconiumX: did i say that
<whitequark> i'm confused
<tpw_rules> also i guess all the load instructions are word addressed?
<tpw_rules> i thought it was about stallman
<ZirconiumX> <whitequark> gaaah
<whitequark> ZirconiumX: about stallman
<whitequark> tpw_rules: yes
<ZirconiumX> Fair
<whitequark> that's maybe one part of the design i'm not sure about
<whitequark> i mean. cons: it makes porting C to Boneless painful
<whitequark> pros: it makes porting C to Boneless painful
<ZirconiumX> It's not like this is a unique thing to Boneless
<tpw_rules> cursed idea: i/o space?
<cr1901_modern> Without delay states, I can't boneless with the internal user ROM area on MachXO2 (yes, PR for support in nmigen coming soon)
<ZirconiumX> MIPS, and SPARC I think both require being word aligned
<tpw_rules> yes
<tpw_rules> many RISCs per se require it
<cr1901_modern> ZirconiumX: boneless current can't store anything but 16 bit words right now
<cr1901_modern> or wq did you get rid of that
<ZirconiumX> Ah, I see
<whitequark> tpw_rules: ehhhhh
<cr1901_modern> tpw_rules: In practice most RISC CPUs eventually got unaligned stores and loads. There were just too many problems on real life packed formats to keep that constraint
<emily> whitequark: hey it's not your fault nobody can handle CHAR_BIT > 8
<tpw_rules> also of course there is no docs on how the register windows work
<tpw_rules> cr1901_modern: i mean yeah but to my understanding it was late, required OS intervention, slower, etc
<whitequark> tpw_rules: are you sure?
<whitequark> i mean
<whitequark> there's no explicit doc, but i believe all semantics is constrained
<whitequark> since each instruction specifies the exact function of W for it
<tpw_rules> yes
<whitequark> it could be better, certainly
<cr1901_modern> It made sense at the time, but doesn't anymore. Kinda like ARM's PC pointing 8 past the actual currently executing insn
<cr1901_modern> (or 4 in thumb you pedants)
<tpw_rules> oh, i guess you're using | to mean concatenate
<ZirconiumX> emily: that was a good laugh
<sorear> RISC is an art history term
<tpw_rules> that seems funky wrt the register window. how do you pass parameters?
<emily> ZirconiumX: Cray's crack team of lawyers is preparing their case against you as we speak.
<cr1901_modern> emily: I unironically want a new arch w/ CHAR_BIT % 8 != 0. Well, Clemency exists
<cr1901_modern> but no C compiler
<emily> CHAR_BIT is one of the things that makes a Turing-complete implementation of C impossible :'(
<emily> should clearly be removed
<tpw_rules> yeah i think either the docs for ADJW are wrong or mem[W|Ra] is wrong or ext13|imm3 is wrong
<cr1901_modern> ?
<tpw_rules> ADJW maintains that W is a multiple of 8, but it's concatenated with the register number.
<whitequark> hm
<ZirconiumX> emily: Cray? Surely you jest; Unisys will have their legal department ready
<tpw_rules> also should it be a multiple of 8? maybe you could have sliding registers like itanium for passing parameters
<whitequark> tpw_rules: re parameters: via LDW and loads with offset
<emily> i was reading Cray's wikipedia article recently and i was really amused at just how obsessed he was with making computers go fast
<emily> like he'd come out with the fastest computer in the universe, super successful, everyone is happy
<emily> and then immediately stomp his feet and go "ok now I want to make one TEN TIMES FASTER"
<cr1901_modern> yea he did maxwell's equations for individual wires
<tpw_rules> whitequark: oh, i missed that instruction
<cr1901_modern> to figure out prop delay and other fun shit
<emily> even when there was no commercial demand
<emily> and refuse to work on anything that anyone actually wants
<tpw_rules> still i wonder if multiples of 8 for the register window is unnecessarily constraining
<whitequark> tpw_rules: re docs for ADJW: yes, it's missing the part where the low bits of W are unimplemented
<emily> and also he just kept doing this until he died
<whitequark> needs to be fixed
<whitequark> tpw_rules: re sliding windows: it removes one adder from the implementation
<whitequark> I suspect it will have a significant impact on performance
<whitequark> but!
<whitequark> we can always relax it later.
<whitequark> that's why LDW is encoded like it is
<tpw_rules> okay i'm wondering how to convey that the low 3 bits of W are unimplemented
<tpw_rules> like maybe it should be W <- W + imm>>2
<tpw_rules> because if there are 3 unimplemented bits in W, mem[W|Ra] doesn't make sense
<tpw_rules> valid point on the sliding windows
<whitequark> tpw_rules: wait, why?
<whitequark> Ra is 3 bits long
<tpw_rules> how long is W?
<whitequark> 13 bits
<tpw_rules> okay, so adjw should say W <- W + imm>>2
<whitequark> why >>2?
<tpw_rules> because the imm always has the low 3 bits zero
<tpw_rules> oh i meant to write >>3
<tpw_rules> but still. if the imm must have the low 3 bits zero and W is only 13 bits, long, then there's only 10 effective bits of W
<whitequark> oh I see
<whitequark> so | in W|Rb is logical OR
<whitequark> and the spec says the behavior is UNPREDICTABLE if you ever put anything into the low W bits
<whitequark> (the ones that can be unimplemented)
<tpw_rules> yeah so it's effectively 13
<whitequark> it's technically correct i think
<tpw_rules> okay that makes sense
<whitequark> but certainly confusing
<whitequark> it definitely needs an informative section
<tpw_rules> i interpreted | to mean concatenation because all over the place you say ext13|x
<whitequark> ah shit
<whitequark> you're totally right
<whitequark> we should fix it
<ZirconiumX> And this all started from me wanting to synthesise Boneless for 7400 logic
<ZirconiumX> My work here is done /s
<tpw_rules> and afair | is math for concatenation. at least ||
<whitequark> yeah
<whitequark> doc bug.
<emily> || for concatenation is also pretty confusing notation...
<whitequark> I use `or` for logical op I think
<whitequark> yeah
<emily> maybe do {verilog,style} or haskell ++ style
<sorear> can we get the rest of the ANSI SQL operators added
<whitequark> lol
<ZirconiumX> '); DROP TABLE instruction_set; --
<tpw_rules> i mean this began life as an excel doc, sql is only natrual
<tpw_rules> anyway have you had a chance to apply any of your idea of what a portable assembler might look like to this?
<whitequark> not yet
<whitequark> need to improve more low-level parts of it first
<whitequark> the Fmax is ridiculously low
<emily> also help i didn't know about potato semiconductor
<emily> "Why called Potatosemi as Brand?
<emily> We are the IC design house making chips. Potato chips are the most popular chips in the world. They are high volume, low price & taste good. All of the people like to eat them. All of the people are happy with them. This is exactly our goals. We will like to make our chips as popular as potato chips, as high volume as potato chips, as low price as potato chips. All of the computers & electronics devices like our chips'
<emily> taste. All of the people like to use them because they are easy to use & all of the people are happy with potato chips."
<whitequark> amazing
<tpw_rules> there's that but they only produce bonkers shit.
<emily> yes i also saw the pentium 4-ass clock ttl logic
<whitequark> timecube logic
<tpw_rules> the 4 sides of the pentium
<tpw_rules> (but it should have 5??)
<Ultrasauce> who called it qdr and not 4 corners simultaneous time clock
<tpw_rules> last q for now: how many clocks per instruction is this thing? there's a lot of memory access
<whitequark> 4 cpi
<whitequark> uh
<whitequark> for most instructions
<whitequark> shifts are 4+n, complex jumps are 5 (i think)
<tpw_rules> btw you use | as concatenate in the shifts too
<whitequark> yeah
<whitequark> I forgot
<whitequark> sorry
<tpw_rules> which don't appear to have their operands specified correctly anyway? as written it's not possible to shift left by 1, only 2 or greater
<tpw_rules> i feel rude commenting all these doc problems but docs are important to me. please let me know if i can fix them
<whitequark> tpw_rules: sure, just send a PR
<whitequark> by no means i feel a lot of attachment to these docs
<tpw_rules> okay. i'll have to brush up on my tex :P
<whitequark> i try to not identify with my designs or code or doc too much :p
<tpw_rules> i did recently get an icebreaker int he mail. i need to throw boneless on there
<tpw_rules> so i gather you're using roughly Verilog syntax in the Operation parts
pie_ has joined ##openfpga
<sorear> left shift by 1 is redundant if you have addition, though
<tpw_rules> it affects shift right too
<whitequark> let me see
<whitequark> can you explain why a shift left by 1 isn't possible?
<tpw_rules> according to the decode_sr table in the excel, imm3 is decoded to 1-8. which makes sense since shift by zero is a nop. but then you add 1 to imm3 before shifting, so you can only shift 2-9
<tpw_rules> you would have to encode it as an EXTI
<tpw_rules> to get 1
<whitequark> uh, where do I add 1 to imm3?
<tpw_rules> (also imm3 should be opB on the res <- line anyway)
<whitequark> oh shit
<tpw_rules> in the operation of SLLI on page 58
<whitequark> I forgot to update that part
<whitequark> it's trying to do what decode_imm_sr is already doing
<tpw_rules> exactly
<whitequark> because imm3=0 is actually shift by 8
<whitequark> aka byte swap
<tpw_rules> you're aping 6502 again: no ROR :P (yes i know you can do it with ROL)
<tpw_rules> anyway. it looks to me like you're using vaguely verilog syntax in the docs. i'll rewrite it all to fix these mistakes and maybe enhance some of the other parts and submit a PR. sound good?
<whitequark> absolutely
<whitequark> something that would help is an explicit section defining the syntax
<whitequark> I was planning that but only got around to the bare minimum
<whitequark> define_imm_* should be described using tables as well
<tpw_rules> like that it's rd, ra, rb? what about labels and stuff? i am particular and have hacked several assemblers to have my label style :P
<whitequark> nonono
<whitequark> the syntax for the stuff in "Operation"
<tpw_rules> oh okay
<tpw_rules> yeah sure
<whitequark> assembler is separate
<whitequark> and yeah the syntax is super simple
<whitequark> the labels are "label:" :p
<tpw_rules> the true distinguisher: what is the comment character
<whitequark> ; i think
<tpw_rules> maybe i can throw like 16 of these on my icebreaker and have some fun with the led matrices
<tpw_rules> approved
<whitequark> :D
<whitequark> the one problem is
<whitequark> it doesn't have any directives right now
<whitequark> well
<tpw_rules> i like + and - labels too, and @local labels
<whitequark> there's .word
<tpw_rules> . is my preferred directive character
<whitequark> it needs more directives
<tpw_rules> .work
<whitequark> I just couldn't find enough time to really add them properly
<whitequark> will try soon
<whitequark> oh btw
<whitequark> if you could open an issue and suggest a principled set of directives i'm all ears.
<whitequark> we need at least uhmm
<tpw_rules> for the assembler? yeah i can think of my favs
<whitequark> a directive for the jump tables
<whitequark> cuz it changes how the addresses are calculated
<tpw_rules> i assume you're going to be grossed out at people doing more than like 200 lines
<whitequark> not really, zignig seems to really enjoy writing boneless assembly so why not
<whitequark> i mean
<whitequark> it used to not have a text assembler
<tpw_rules> why does it? is it just easier to process? there's no optimization like on arm where you can encode the offsets in bytes
<whitequark> but i looked at how much fun zignig was having and decided to add one cuz why not
<tpw_rules> i don't understand why JVT is relative
<tpw_rules> to the table
<whitequark> because all boneless code is PC-relative
<whitequark> it's inherently relocatable
<whitequark> at no additional instruction cost
<whitequark> this has created a major implementation nightmare with the orthogonal instruction set
<tpw_rules> but it's relative to the table, not the PC
<whitequark> but i managed it
<whitequark> er
<whitequark> yes
<tpw_rules> i guess that makes sense
<whitequark> because it's a vtable instruction
<tpw_rules> yeah i didn't think enough there
<whitequark> now J*S*T is relative to PC
<whitequark> ironically, JST is harder to implement than JVT
pie_ has quit [Ping timeout: 250 seconds]
<tpw_rules> other thought: is it possible to randomly permute the encodings and see if they save resources
<whitequark> it would be very easy to do so bc the entire thing is generated from exactly 1 source of truth
<whitequark> arch.opcode
<whitequark> i already did permute them to hopefully simplify the decoder a bit
<tpw_rules> oh ok
<whitequark> but not extensively
* tpw_rules only likes optimizations when they take hours and only net like a 1% improvement
<whitequark> lol
<whitequark> there's some low hanging fruit there still
<whitequark> for example, the unencoded instructions could be added to the decoder such that they reuse other paths
<whitequark> well
<whitequark> you could also use 'x...
<whitequark> ... but i don't like it.
<tpw_rules> are there any facilities for interrupts?
<whitequark> not currently :D
<whitequark> i never figured out what to do with the flags
<tpw_rules> i wonder if you could have a port on it that like loads the high 8 bits of pc and W simultaneously
<whitequark> or the return pc
<tpw_rules> oh yeah that would be weird too
<whitequark> no, you don't need to care about W
<tpw_rules> but if the interrupt manager loaded W, you could just stick them in r0 and r1
<whitequark> oh wait
<whitequark> i think i know what can be done
<tpw_rules> of the interrupting window
<whitequark> yeah
<whitequark> we could make it push the window
<whitequark> there's no way to restore the flags tho
<tpw_rules> flagless arch :D
<whitequark> lol
<tpw_rules> exti already stores weird state between insns and anything that doesn't write to the flags destroys them
<tpw_rules> (and that involves the ALU)
<tpw_rules> hm that would be gross though
<whitequark> oh shit
<whitequark> what happens if an interrupt arrives during exti
<tpw_rules> make it not?
<whitequark> variable interrupt latency is gross
<whitequark> i think exti should be restartable
<tpw_rules> your instructions have variable cycle counts
<whitequark> shit
<whitequark> point
<tpw_rules> and exti is like 1 cycle anyway
<whitequark> yeah, it should be short-circuited in the FSM but currently isn't
<TD-Linux> lol this is the first time I actually looked at potato semi's catalog. amazing
<tpw_rules> yeah i can't see a nice place to fit in a restore flags instruction. but i think having an interrupt push the window and store old state like pc and flags in registers is okay. it might be a lot of memory writes though
<whitequark> tpw_rules: tbh i was thinking of having an interrupt controller as a peripheral
<whitequark> (that peeks into CPU state)
<whitequark> but... i don't know
<tpw_rules> yeah but i imagine it would be nice to have a port that's like 16 bits of new pc and a bit to cause a transfer to it
<tpw_rules> that an interrupt controller can hook into. or a simple system can assert a constant value on pc and flip the interrupt bit to activate it
<whitequark> or just make it a soft reset
Asu` has joined ##openfpga
<whitequark> you can make an interrupt table with 2 instructions
Asu has quit [Ping timeout: 248 seconds]
<whitequark> LDX, JST
<tpw_rules> you still lose flags and EXTI state though
<whitequark> hence soft
<whitequark> so it will activate only in FETCH stage without EXTI active
<tpw_rules> arm cortex has the weird hack of having a magic return value that unstacks the exception stack frame
<whitequark> yes
<whitequark> i kinda want to make that
<whitequark> it might make sense to do something like um
<tpw_rules> i don't understand what you mean by soft reset as compared to that idea i had
<whitequark> i mean
<whitequark> reset just for the PC
<whitequark> mhm
<whitequark> actually
<whitequark> tpw_rules: so here's what i'm thinking
<whitequark> maybe an interrupt should hijack the decoder logic to drive LDW onto the internal buses and then a JALR
<whitequark> this gives you PC
<whitequark> not sure about flags :/
<whitequark> as for "restore flags" instruction
<emily> 15:20 <TD-Linux> lol this is the first time I actually looked at potato semi's catalog. amazing
<emily> like seriously is this some kind of practical joke or
<tpw_rules> i think a magic return value would have a high cost
<whitequark> we can reserve the encoding of x: JALR Rn, x which is basically totally useless
<whitequark> to mean RETI
<whitequark> and the Rn could be repurposed to mean the actual flags
<whitequark> so an interrupt handler would use self-modifying code
<tpw_rules> what is x
<whitequark> label
<tpw_rules> also you don't have a jalr instruction
<whitequark> tpw_rules: oh wait
<whitequark> JRAL
<whitequark> same difference
<whitequark> oh
<whitequark> wait
<whitequark> i meant JAL here
<whitequark> oh that has an assembly bug
<whitequark> missing Rd
<tpw_rules> oh i missed the label, that makes sense
<tpw_rules> so you just wrote a funky infinite loop
<tpw_rules> okay
<whitequark> tpw_rules: we also have a "JN" instruction
<whitequark> which can probably be repurposed if necessary
<whitequark> but it has the wrong encoding for this
<tpw_rules> also you have four flag bits so you can't encode them in the register number
<whitequark> yes, just realized
<tpw_rules> what's wrong with xchf though
<tpw_rules> can that be easily encoded?
<whitequark> xchf Rd.
<whitequark> I ... guess?
<tpw_rules> i think that could be a nice general purpose instruction too.
<whitequark> it'd need to be stuffed into some obscure corner
<tpw_rules> wouldn't that make it more complex to implement
_whitenotifier has joined ##openfpga
<_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 2 commits to master [+0/-0/±5] https://git.io/fjNhn
<_whitenotifier> [whitequark/Boneless-CPU] whitequark 0fe269b - gateware.core: remove debug statement.
<_whitenotifier> [whitequark/Boneless-CPU] whitequark d2dbf91 - doc: fix some assembly mismatches.
<whitequark> tpw_rules: well
<whitequark> i like conserving opcode space?
<whitequark> right now it's like, super dense.
<tpw_rules> maybe it can be right after SRAI. idk
<whitequark> that part is where MULDIV should live
<tpw_rules> ok
<whitequark> probably UMUL SMUL UDIV SDIV
<whitequark> and there's still four slots left
<tpw_rules> what about upper mul
<tpw_rules> and mod
<whitequark> register pairs
<tpw_rules> ok
<tpw_rules> addi and subi are almost symmetric
<tpw_rules> you could replace one of them
<whitequark> nop
<tpw_rules> ?
<whitequark> different behavior wrt imm3
AndrevS has quit [Remote host closed the connection]
<tpw_rules> that's why i said almost
<whitequark> for one
<whitequark> hm
<whitequark> also it'd make the encoder more complex for no very good reason
<whitequark> decoder*
<tpw_rules> i wondered if it would make it simpler because xchgf would be intimately connected with the ALU
<tpw_rules> they would be symmetric flags wise if you used carry as it should be used :P
<whitequark> oh?
<whitequark> what's wrong with my carry?
<tpw_rules> i'm partial to the 6502 way where C is 1 if +1 for add, but 0 if -1 for subtract
<whitequark> mhm
<tpw_rules> anyway that's another very minor holy war. but maybe it could save an xor or two
<whitequark> i'm open to it
<whitequark> i never really thought about it
<whitequark> you can open an issue so it doesn't get lost
<_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to master [+0/-0/±3] https://git.io/fjNhz
<_whitenotifier> [whitequark/Boneless-CPU] whitequark 8ba7240 - doc: fix some assembly mismatches.
<_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to master [+0/-0/±3] https://git.io/fjNhg
<_whitenotifier> [whitequark/Boneless-CPU] whitequark c6f00b2 - doc: fix some assembly mismatches.
<tpw_rules> hm maybe it's a wash
<whitequark> oh?
<tpw_rules> or the operation is incorrect again
<whitequark> ;w;
<tpw_rules> i'm consulting my own 6502 emulator
<tpw_rules> you've proved that all the instructions work correctly?
<whitequark> no, only testcases for now
<whitequark> the riscv-formal approach would not work
<whitequark> i did figure out an approach that will work
<whitequark> but haven't yet been able to implement it
<whitequark> basically, recording an execution trace
<whitequark> and looking up values in it
<tpw_rules> okay
<tpw_rules> yeah i'm not extra sure you got carry right
<whitequark> but there's functional tests for it?
<tpw_rules> then the operation might be wrong
<whitequark> yes, could be
<tpw_rules> or i misinterpreted the whole thing
<whitequark> sounds like the doc is broken either way.
<whitequark> even if it's technically correct.
<whitequark> which i'm not sure of.
uovo has joined ##openfpga
<whitequark> hm
<tpw_rules> okay riddle me this: suppose R0 is 1 and C is set. what's R1 and C after SBBI R1, R0, 1
<whitequark> so the improved ALU synthesizes to... more LUTs
<whitequark> uhhh let's see
<whitequark> that translates to R1=R0+~1+C. so. R1=0x0001+0xffff+1. so R1=0x(1)0001. C=1 R1=0x0001
<whitequark> in theory.
<whitequark> let me actually run it
<_whitenotifier> [Boneless-CPU] whitequark created branch alsru2 - https://git.io/fhUTh
<_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to alsru2 [+0/-0/±1] https://git.io/fjNh5
<_whitenotifier> [whitequark/Boneless-CPU] whitequark 1f7e2fc - WIP ALSRU
<whitequark> tpw_rules: hm. that doesn't match reality, oops
<whitequark> like at all
<tpw_rules> what did you get?
<whitequark> R1=0 F=ZC
<tpw_rules> that's 6502 convention
<tpw_rules> C should just be set to bit 16 of the result, not its inverse
<tpw_rules> wait hm
<whitequark> tpw_rules: something i just realized is... maybe i shouldn't have like
<whitequark> separate CI/CO SI/SO
<tpw_rules> why? they're different things
emeb_mac has joined ##openfpga
<whitequark> fewer LUTs
<whitequark> or... maybe not?
<whitequark> lemme check
<tpw_rules> i mean knowing the high bit is nice
<tpw_rules> but you yourself said you can test that easily
<whitequark> oh no not in the ISA
<whitequark> in the impl
<whitequark> WTF
<whitequark> i made it simpler and now it's way more luts
<whitequark> fucking piece of shit synthesizer
<whitequark> grrrrr
<whitequark> i should just like. add icecube support to nmigen
<whitequark> i resent abc so much
<_whitenotifier> [Boneless-CPU] whitequark created branch merge-ci-si - https://git.io/fhUTh
<_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to merge-ci-si [+0/-0/±3] https://git.io/fjNjU
<_whitenotifier> [whitequark/Boneless-CPU] whitequark 0253454 - WIP merge CI+SI in ALSRU
Asu` has quit [Quit: Konversation terminated!]
<tpw_rules> how does nmigen Cat work? is it low bits first?
<whitequark> tpw_rules: nmigen signals work like Python arrays
<whitequark> Cat(x,y,z) is basically [*x,*y,*z]
<tpw_rules> ok
<tpw_rules> opposite to verilog
<whitequark> yep
<tpw_rules> then i'm fairly certain the subtraction is incorrect
<tpw_rules> m.d.comb += Cat(p, self.co).eq(x + y + self.ci)
<tpw_rules> you have to invert carry on either the way in, or the way out, and i don't see it being done in either place
<sorear> 6502 doesn’t invert in or out
<sorear> litmus test: can you do a 32b add/sub without manipulating flags in the middle
<tpw_rules> oh that is true.
<tpw_rules> sorry, the gateware is correct. it uses the 6502 convention. but operation is ot
<tpw_rules> not
<tpw_rules> i'm not going to touch overflow but it should probably be touched too :P
<sorear> hmm, do you have enough non-flag-clobbering data moves to implement bignum addition with a loop
<tpw_rules> no, mov destroys C and V
<whitequark> shit
<tpw_rules> you could store to memory :P
<whitequark> we could define ADD to preserve CV
<whitequark> hm
<whitequark> er
<whitequark> AND*
<tpw_rules> i think that's what i said to start this whole conversation :P
<whitequark> i mean
<whitequark> i didn't realize it'd break bignums.
<whitequark> breaking bignums is very bad.
<whitequark> wait
<whitequark> that won't help
<whitequark> because you have to SUB to check the loop condition
<whitequark> unless you have like a duff's device
<tpw_rules> or xchf
<tpw_rules> anyway what happens if you're at 0xFFFF and J +1. is that UNPREDICTABLE? same with the relative loads
<tpw_rules> i'm going to define it as that
<whitequark> tpw_rules: defined to wrap
<whitequark> well
<whitequark> hm
<whitequark> tpw_rules: the thing is that boneless is really not suited for banking
<whitequark> so i don't see how you could ever extend it to be like 20-bit or something
<tpw_rules> that's never stopped anybody before
<whitequark> no i mean you can't put a window there
<whitequark> you can't JAL from there
<whitequark> like
<whitequark> you can bank the top half of address space for example
<whitequark> but you can't *extend* the address width
<tpw_rules> ok
<whitequark> it's *too* orthogonal for this, i can't imagine any remotely usable way to do it
<whitequark> even with hax
<tpw_rules> EXTA :D
<whitequark> won't work
<whitequark> you can't address those things without like
<mwk> clearly you need to introduce segment registers
<whitequark> far pointers??
* whitequark stabs mwk
<tpw_rules> that's what i mean
<tpw_rules> exta can take an immediate, or a register
<mwk> ow ow ow
<whitequark> tpw_rules: wtf
<whitequark> that's horrible
<tpw_rules> anyway yes that woudl be stupid
<whitequark> just use risc-v or something
<whitequark> hell, i'm not sure who would even instantiate all 64K of RAM for boneless
<whitequark> it's not a 8051 where you need fifteen instructions to do anything
<tpw_rules> sell it as a programmable state machine
<whitequark> i mean yes?
<whitequark> that's literall its origin stor
<whitequark> it's modeled after KCPSM too
<whitequark> among other things
pie_ has joined ##openfpga
<tpw_rules> whitequark: https://i.imgur.com/MbwCZ3p.png does this sound good
<sorear> ya don’t like page registers?
<tpw_rules> in vague terms of formatting
<whitequark> yeah seems reasonable
<whitequark> i'll just edit it later if i hate it, but it's good to get the ball rolling regardless
<whitequark> tpw_rules: thank you for helping by the way, i'm pretty overloaded lately
<whitequark> ... lately meaning the last five years ...
<tpw_rules> you're welcome. i'm glad i'm able to improve it instead of whining
<tpw_rules> it's ok i start graduate school on monday
<whitequark> oh no / oh yes
<whitequark> should i not ping you after that? :p
<tpw_rules> lol almost nothing will change
<whitequark> ah
<tpw_rules> just what budget pays me to work in the lab
<tpw_rules> and i'll have to write a paper ;_;
* tpw_rules i'm assuming the external bus is 16 bits data and address too
<tpw_rules> s/\/me//
<whitequark> yes
<whitequark> in fact the address bus is shared even
<whitequark> so one thing i want to do is to fit it on LP384
<whitequark> that's... challenging
<sorear> Do you have a specific external memory in mind
<whitequark> sorear: it'd be like
<whitequark> a 6502-style "bare CPU" package
<whitequark> that comes on a DIP-like board
<whitequark> basically a toy but fun