##openfpga on 2019-08-24 — irc logs at freenode.irclog.whitequark.org

00:02 flaviusb has joined ##openfpga

00:24 <mwk> um

00:24 <mwk> you sent me a glasgow?

00:34 <cr1901_modern> whitequark: I want to play with Boneless on tinyfpga A and up5k and use the internal hard IP (for fun). Is there a way to give boneless wait states for e.g. wishbone buses? Maybe wrap the whole FSM in a CEInsterter?

00:35 <whitequark> cr1901_modern: that will work in a pinch, yes

00:35 <whitequark> i'll add explicit support for wait states a bit later

00:35 <cr1901_modern> (Btw, _can_ you wrap the whole FSM in a CEInserter in nmigen?)

00:37 <whitequark> of course

00:37 <whitequark> well, you wrap the entire core.

00:37 <whitequark> (and it's EnableInserter now)

00:38 <cr1901_modern> I thought maybe one should wrap just the FSM, just in case. But no problem if wrapping the entire core makes no difference. This may no longer be supported in nmigen tho (in omigen it was possible to wrap an FSM object in an enable inserter).

00:39 <cr1901_modern> >i'll add explicit support for wait states a bit later <-- take your time. Clearly I'm dragging my feet even getting started w/ this :P

00:41 <whitequark> well yes, an FSM is no longer a separate module

00:41 <whitequark> the thing is that it's what you actually want

00:41 <whitequark> because otherwise you get hierarchy flattening, issues with multiple drivers, etc

00:42 <cr1901_modern> it's not a big deal I don't think. If I want just FSM functionality of a larger component to have EnablerInserter, I just need to make an Elaborate whose sole contents are the FSM. And then combine the FSM and non-FSM parts in a parent Elabortable

00:43 <cr1901_modern> (I think)

00:44 <whitequark> I *think* it's usually easier to insert with m.If(...): m.next = ... everywhere

00:44 <whitequark> because

00:44 <whitequark> in practice it turns out you don't want them *everywhere* after all

00:44 <whitequark> just my experience

00:44 <cr1901_modern> oh... yea, that'll also work. Oops...

00:45 <whitequark> I've tried using EnableInserter a few times on an FSM and basically regretted it each time

00:45 <whitequark> often FSMs have transient states that drive some other comb enable signal for example

00:45 <whitequark> this tends to produce obscure bugs

00:46 <cr1901_modern> The "CEInserter" trick was used in the misoc SPI core (well one of them anyway) to implement clock division. It's the only place I remember seeing it used, but I figured it might be useful for wait states.

00:51 <whitequark> yes. it needs to be used with great care for clock division.

00:51 <whitequark> if your FSM has any transient states...

00:51 <whitequark> e.g. say

00:51 <whitequark> if your FSM strobes fifo.we

00:51 <whitequark> and it gets stopped with fifo.we high

00:51 <whitequark> you see what will happen.

00:51 <cr1901_modern> Congrats, you turned a pulse signal into a logic high signal :P

00:51 <whitequark> yep

00:52 <whitequark> this is why I think not being able to EnableInserter()(FSM()) in nmigen will probably lead to a net reduction of bugs, if anything

00:53 <cr1901_modern> Sure that's very reasonable.

00:55 <TD-Linux> I'm sad that the set-counters-equal-on-reset thing for asyncfifo didn't work :(

00:55 <whitequark> yeah, it looked cute

01:27 emeb_mac has quit [Ping timeout: 268 seconds]

01:30 emeb_mac has joined ##openfpga

01:47 mmicko has quit [Quit: leaving]

01:47 mmicko has joined ##openfpga

03:15 <kc8apf> mwk: on 7 series, CRC is only used for verifying the bitstream got transferred to the config hardware without error.

03:17 <mwk> like all xilinx fpgas, yes

03:17 <kc8apf> I think that check happens in a central location on the chip. Frame data has to be clocked out from that to each tile.

03:17 <mwk> of course it does

03:17 <kc8apf> So the zeros are to generate config cycles to flush that pipeline

03:17 <kc8apf> Not sure why it's zeros instead of type1 nops

03:18 <mwk> yeah, zeroes are to flush the pipeline and switch to the next row, that's reasonable too

03:18 <mwk> but my question still is

03:18 <mwk> why on earth do these 0s still count as FDRI payload for CRC purposes?

03:18 <kc8apf> They don't

03:19 <mwk> they do

03:19 <kc8apf> They count as type 0 frames

03:19 <mwk> I get checksum mismatch with ISE bitstreams if I don't count them

03:19 <kc8apf> CRC is over the whole command steam

03:19 <mwk> no, it's not

03:19 <mwk> NOP packets don't count to CRC

03:22 <mwk> only register writes count towards CRC, and some registers are even exempt from that, like LOUT

03:23 <mwk> and the register address is part of the word going to the CRC, so it wouldn't even make sense to count NOP packets, they don't have a register address

03:24 <kc8apf> trying to find my notes from years ago

03:24 <mwk> and yet... ISE seems to count the null words as if they were going into FDRI address

03:25 <mwk> which is why I'm suspecting ISE has a major bug here and generates broken bitstreams, missing the FDRI packet header before all these NULLs

03:25 <kc8apf> no, 7series is like that too

03:25 <kc8apf> the nulls are just immediately after the FDRI

03:25 <kc8apf> the prjxray code I sent before parses and generates working bitstreams

03:27 <kc8apf> what I can't recall is how null insertion differed between PERFRAMECRC, DEBUG_BITSTREAM, and normal bitstreams

03:28 <mwk> in normal bitstream, they are part of FDRI payload

03:28 <mwk> which is reasonable

03:29 <mwk> they get written to FDRI address, so they count towards CRC with FDRI address

03:29 <mwk> in debug bitstream, they are just stuffed between packets

03:29 <mwk> which doesn't make much sense

03:30 <kc8apf> I'd have to build the prjxray tools and look at a few bitstreams

03:30 <mwk> and what is a perframecrc bitstream

03:30 <kc8apf> I recall them being stuffed between packets in debug bitstream

03:31 <mwk> another undocumented bitgen option?

03:31 <kc8apf> yup

03:31 * mwk fires blindly

03:31 <kc8apf> it writes each frame individually and checks the CRC after each frame

03:32 <kc8apf> since it doesn't use LOUT, it will work on programming interfaces that don't support LOUT

03:33 <mwk> alright, seems it's called '-g PerFrameCrc:Yes' for ISE...

03:33 <kc8apf> I thought they reset the CRC just before the

03:35 <kc8apf> gah. multiple devices strikes again

03:36 <mwk> hrm

03:36 <mwk> I don't see the nulls at all in perframecrc

03:37 <mwk> instead, row switches are accompanied by extra "CMD = WCFG" writes

03:37 <kc8apf> yeah, see https://github.com/SymbiFlow/prjxray/blob/master/lib/include/prjxray/xilinx/xc7series/configuration.h#L108

03:37 <mwk> wait, wtf

03:38 <mwk> it seems every first frame of a row is uploaded twice?

03:38 <kc8apf> no. go read that comment

03:38 <kc8apf> then be even more upset

03:38 <kc8apf> gotta go. partner is feeling ignored.

03:38 <mwk> what

03:39 <mwk> thanks, I hate it

05:20 mumptai has joined ##openfpga

06:19 tlwoerner has quit [Ping timeout: 246 seconds]

06:26 tlwoerner has joined ##openfpga

07:03 emeb_mac has quit [Ping timeout: 244 seconds]

07:09 Jybz has joined ##openfpga

09:01 Asu has joined ##openfpga

09:39 pie_ has joined ##openfpga

09:41 zkms has quit [Quit: zkms]

09:41 zkms has joined ##openfpga

09:56 unixb0y has quit [Ping timeout: 272 seconds]

09:58 unixb0y has joined ##openfpga

09:58 Prf_Jakob has quit [Quit: Spoon!]

10:01 Prf_Jakob has joined ##openfpga

10:51 * zignig would like to thank her quarkyness for all the thankless worrk on stuff various.

10:52 * zignig burrows into a bootloader that does not work _at_all_ about 9600 bps.

10:52 <zignig> *above

11:50 pie_ has quit [Remote host closed the connection]

11:52 pie_ has joined ##openfpga

11:52 pie__ has joined ##openfpga

11:52 pie_ has quit [Client Quit]

12:44 pie__ has quit [Ping timeout: 250 seconds]

12:51 pie_ has joined ##openfpga

13:21 mumptai has quit [Quit: Verlassend]

14:29 pie_ has quit [Ping timeout: 250 seconds]

14:43 pie_ has joined ##openfpga

15:26 Richard_Simmons has joined ##openfpga

15:30 Bob_Dole has quit [Ping timeout: 276 seconds]

15:35 pie_ has quit [Ping timeout: 250 seconds]

15:40 emeb_mac has joined ##openfpga

15:42 Richard_Simmons3 has joined ##openfpga

15:44 pie_ has joined ##openfpga

15:46 Richard_Simmons has quit [Ping timeout: 276 seconds]

15:48 Richard_Simmons has joined ##openfpga

15:52 Richard_Simmons3 has quit [Ping timeout: 276 seconds]

16:02 Richard_Simmons3 has joined ##openfpga

16:07 Richard_Simmons has quit [Ping timeout: 276 seconds]

16:29 Richard_Simmons has joined ##openfpga

16:33 Richard_Simmons3 has quit [Ping timeout: 264 seconds]

16:37 emeb_mac has quit [Ping timeout: 268 seconds]

16:37 pie_ has quit [Ping timeout: 250 seconds]

16:40 azonenberg has quit [Ping timeout: 246 seconds]

16:41 azonenberg has joined ##openfpga

16:43 Richard_Simmons3 has joined ##openfpga

16:47 Richard_Simmons has quit [Ping timeout: 276 seconds]

17:01 Richard_Simmons has joined ##openfpga

17:05 Richard_Simmons3 has quit [Ping timeout: 276 seconds]

17:16 Richard_Simmons3 has joined ##openfpga

17:20 Richard_Simmons has quit [Ping timeout: 250 seconds]

17:26 pie_ has joined ##openfpga

17:26 Richard_Simmons has joined ##openfpga

17:30 Richard_Simmons3 has quit [Ping timeout: 276 seconds]

17:33 Richard_Simmons3 has joined ##openfpga

17:38 Richard_Simmons has quit [Ping timeout: 276 seconds]

17:40 Richard_Simmons has joined ##openfpga

17:42 Bob_Dole has joined ##openfpga

17:45 Richard_Simmons3 has quit [Ping timeout: 276 seconds]

17:45 Richard_Simmons3 has joined ##openfpga

17:46 Richard_Simmons has quit [Ping timeout: 276 seconds]

17:49 Bob_Dole has quit [Ping timeout: 276 seconds]

17:51 Richard_Simmons has joined ##openfpga

17:54 Richard_Simmons3 has quit [Ping timeout: 276 seconds]

18:01 s_frit has quit [Remote host closed the connection]

18:02 s_frit has joined ##openfpga

18:14 pie_ has quit [Ping timeout: 250 seconds]

18:19 pie_ has joined ##openfpga

18:26 pie_ has quit [Excess Flood]

18:26 pie_ has joined ##openfpga

18:38 mumptai has joined ##openfpga

18:57 pie_ has quit [Ping timeout: 250 seconds]

19:13 pie_ has joined ##openfpga

19:15 AndrevS has joined ##openfpga

20:00 Jybz has quit [Quit: Konversation terminated!]

20:05 <ZirconiumX> whitequark: How do I get the RTL for the Boneless CPU?

20:12 rohitksingh has joined ##openfpga

20:13 <whitequark> ZirconiumX: let's see

20:13 <whitequark> ZirconiumX: you essentially want to use it outside of nMigen right?

20:13 <ZirconiumX> Yeah

20:14 <ZirconiumX> nmigen.cli had a generate/simulate script argument, right?

20:14 <whitequark> yes, but it's a bit more tricky with boneless because it's configurable

20:14 <whitequark> let me add something

20:18 <whitequark> ZirconiumX: ok, right, do you want to provide your own main memory, or just the CPU?

20:18 <ZirconiumX> Just the CPU for now, I want to see how it synthesises at the moment

20:19 <whitequark> oh yeah

20:20 <whitequark> python3 -m boneless.gateware.core core-fsm generate boneless.v

20:20 <ZirconiumX> Git pull?

20:21 <whitequark> nope, already works

20:21 <whitequark> looks like with current yosys and -abc9 it syntesizes to 458 LUT.

20:21 <whitequark> which is certainly too many.

20:22 <ZirconiumX> Thanks WQ

20:23 <mwk> sync

20:23 <mwk> sudo reboot

20:23 <whitequark> the decoder is about 76 LUT, but more worryingly, the toplevel FSM is 200 LUT

20:23 <whitequark> mwk: um

20:23 <Ultrasauce> 3 syncs just to be sure

20:23 <ZirconiumX> mwk: :wq!

20:23 <whitequark> ZirconiumX: HEY

20:23 <whitequark> i have a highlight on that

20:23 * tpw_rules issues ACPI S5 request

20:24 <ZirconiumX> Sorry :P

20:25 <ZirconiumX> ...230 74xx chips

20:25 <ZirconiumX> That's...within the realm of feasibility

20:25 <whitequark> ZirconiumX: if you want to actually proceed with it, I'd be glad to optimize it some more

20:26 <whitequark> I think I came up with a nice way to simplify the ALU recently

20:26 <whitequark> might as well apply that and try again

20:26 <ZirconiumX> I'm sure you're aware of how rough the 74xx flow is

20:26 <whitequark> I am not

20:26 <whitequark> I know nothing about the 74xx flow

20:27 <ZirconiumX> The idealised model of the world Yosys has and the available 74xx chips are at right angles to each other

20:27 <mwk> whoops, sorry, thought I had a terminal there

20:27 * mwk curses shit display drivers locking up

20:27 <whitequark> mwk: magic sysrq?

20:28 <mwk> whitequark: disabled by default on arch

20:28 <whitequark> yeah i hate that shit

20:28 <whitequark> on debian too

20:28 <ZirconiumX> https://pastebin.com/amgt61hZ if anybody is curious

20:28 <whitequark> ZirconiumX: ok, so, let me try to do the ALU thing i wanted.

20:29 <ZirconiumX> Sure thing

20:29 <ZirconiumX> The ic_count.py script is essentially the equivalent of optimising for area above all else

20:30 <tpw_rules> ZirconiumX: i've never heard of 5 digit 74 chips

20:30 <ZirconiumX> tpw_rules: The 74AC series are vulnerable to ground bounce; 74AC11 has a different pinout to rectify this

20:31 <whitequark> ZirconiumX: i'm wondering if you could try synthesizing by subsystem

20:31 <whitequark> it's hierarchical now, after all

20:31 <ZirconiumX> Unfortunately they couldn't do much to save the reputation of 74AC

20:31 <tpw_rules> why did you pick them then

20:32 <tpw_rules> do you have piles because nobody wants them?

20:32 <ZirconiumX> Because when you want speed, you pick 74AC logic

20:32 <tpw_rules> oh

20:32 <tpw_rules> how fast are you planning to clock this monster

20:32 <ZirconiumX> I've seen a 74AC 6502 replica hit 20MHz on about 200 chips

20:33 <tpw_rules> that's a lot higher than i would have put money on

20:33 <ZirconiumX> Which outpaced the actual chip

20:33 <tpw_rules> yeah

20:33 <tpw_rules> i guess it had a reasonable board design too?

20:34 <ZirconiumX> 74HC/74AHC is 4-5MHz for the same design

20:34 <ZirconiumX> Depends on your definition of reasonable. Double-sided SOIC chips

20:34 <tpw_rules> exactly, there's no way it could have done 20 on a breadboard

20:34 <tpw_rules> or wire-wrapped

20:34 <ZirconiumX> PCB, yeah

20:35 <ZirconiumX> Unfortunately I don't have any way of measuring timing on a PCB

20:36 <ZirconiumX> s/(any) (way)/\\1 automated \\2/

20:36 <tpw_rules> \\?

20:36 <ZirconiumX> regex :P

20:37 <tpw_rules> yes but doesn't that mean you'd get \1 automated \2 instead of any automated way

20:37 <tpw_rules> how does one convince yosys to synthesize for 74series logic? it looks like you give it a model of each chip and it picks the best ones? i'm not super familiar with how yosys works

20:37 <ZirconiumX> Depends on your shell, I suppose

20:37 <ZirconiumX> tpw_rules: it's a bastardisation of the ASIC flow

20:38 <tpw_rules> like do you get to define what yosys cells are? or just what they do IRL

20:38 <ZirconiumX> You can feed ABC a list of cells, and then it'll create the design out of those

20:38 <ZirconiumX> Yeah, it's called the Liberty format

20:38 <ZirconiumX> Unfortunately it's highly undocumented

20:38 <ZirconiumX> So the flow is unaware of timing

20:39 <tpw_rules> so yosys has an arbitrary set of cells and you use abc to transform them to another set of cells which contains 74 series chips

20:39 <ZirconiumX> Sufficiently Arbitrary

20:39 <tpw_rules> i guess a cell is something like "one and gate"?

20:40 <ZirconiumX> https://pastebin.com/amgt61hZ

20:40 <ZirconiumX> If you note, I named the cells using a specific convention

20:40 <sorear> incredibly cursed name for a file format

20:40 <tpw_rules> yeah but that's your cells

20:40 <ZirconiumX> The part name, and then what it does

20:40 <tpw_rules> how does yosys pack the bundle of logic for those

20:40 <ZirconiumX> <tpw_rules> i guess a cell is something like "one and gate"? <--- "two signals ANDed together" is a $and

20:41 <ZirconiumX> *two arbitrary-width signals

20:42 <tpw_rules> that's a yosys internal cell, the $and

20:42 <ZirconiumX> Yep

20:42 <ZirconiumX> Which then becomes $_AND_

20:42 <tpw_rules> then you define that an 008 is four 2 input and gates. what maps between all the $ands and that?

20:43 <ZirconiumX> tpw_rules: There's a hack to this

20:43 <tpw_rules> i never would have guessed

20:43 <ZirconiumX> ABC doesn't understand multiple-output gates

20:43 <tpw_rules> like a single cell being four independent gates?

20:43 <ZirconiumX> Yep

20:43 <sorear> this isn’t a 74-specific problem, asic libraries generically have multiple output cells

20:44 <tpw_rules> hug

20:44 <tpw_rules> huh

20:44 <ZirconiumX> <sorear> incredibly cursed name for a file format <--- Stallman would be proud

20:44 <tpw_rules> libiberty

20:44 <whitequark> gaaah

20:45 <ZirconiumX> Anyway, I realise this, but ABC don't care, just use a single output cell, it's not their problem

20:45 <ZirconiumX> >.>

20:45 <tpw_rules> you should map it to potato semiconductor's catalog

20:46 <ZirconiumX> Sure, but the collection is sufficiently limited that I think trace lengths would begin to be a problem

20:47 <tpw_rules> also i know it's rude to think about concerns of praticality, but why the boneless architecture in particular

20:48 <ZirconiumX> Simpler CPUs result in fewer gates, and I find it reasonably elegant honestly

20:48 <whitequark> :D

20:48 <sorear> I keep forgetting that “potato” is their actual name

20:48 <whitequark> ZirconiumX: i'm curious which parts you find *in*elegant btw

20:48 <whitequark> it's unlikely i'll change it much, but feedback is still valuable

20:48 <tpw_rules> idk i just wouldn't want to combine the experimentation of not a common cpu architecture with the experimentation of doing it out of 74 logic

20:49 <whitequark> (how many extremely tiny synthesizable CPUs are out there? navre is huge, for example)

20:49 <tpw_rules> he already said there was a 6502 out of somewhat less

20:50 <tpw_rules> but like with a 6502 you could add a few more 74 and have an apple 2

20:50 <ZirconiumX> tpw_rules: with all due respect, if you have to target 7400 logic, conventionality goes out the window

20:50 <tpw_rules> maybe i misunderstood

20:50 <tpw_rules> i thought this was just for fun

20:50 <ZirconiumX> For example, the chips were designed with tristate and open-collector logic in mind

20:50 <whitequark> tpw_rules: lmao are you saying my 16-bit RISC CPU is only slightly larger than a darn 6502

20:50 <ZirconiumX> It is, yes

20:51 <ZirconiumX> whitequark: No, I'm saying it's *smaller* than a darn 6502

20:51 <whitequark> *what*

20:51 <tpw_rules> you said boneless was 230 but someone did a 6502 with only 200

20:52 <whitequark> ZirconiumX: i wonder how many chips you can cut from it if you replace the decoder with a ROM

20:53 pie_ has quit [Ping timeout: 250 seconds]

20:53 <tpw_rules> whitequark: is there a reason that logic ops set C and V to undefined?

20:53 <tpw_rules> as opposed to preserving or eg bit 16/15 like 6502

20:53 <whitequark> tpw_rules: everything that i didn't explicitly design to function in a specific way is left undefined so it doesn't constrain me later

20:53 <whitequark> so, you said exactly the reason

20:54 <whitequark> i'm not sure what's the best behavior is!

20:54 <tpw_rules> both led to some pretty elegant tricks on 6502

20:54 <whitequark> you don't need bit 16 in flags because um

20:54 <whitequark> 0x8000 is a 3-bit encoded immediate

20:54 <ZirconiumX> Removing the flatten pass makes it very slightly less efficient

20:55 <ZirconiumX> But here you go, whitequark: https://pastebin.com/wE4EsjbY

20:56 <whitequark> interesting

20:56 <whitequark> the decoder is actually not large

20:57 <whitequark> the toplevel is *huge*

20:57 <whitequark> i need to fix that

20:57 <ZirconiumX> I probably *could* make the decoder a ROM, but I'd have to investigate

20:58 <ZirconiumX> Also the 16374 does a lot of lifting

20:58 <tpw_rules> i don't think the manual defines decode_imm_al/sr

20:58 <whitequark> tpw_rules: indeed it doesn't, it's in a section i didn't write

20:58 <tpw_rules> oh

20:58 <whitequark> the synopsis in the design spreadsheet shows the enocding

20:58 <tpw_rules> do you need help

20:58 <tpw_rules> with writing docs

20:59 <whitequark> mm, maybe!

20:59 <whitequark> i've been focusing on toolchain support and smolness for now

20:59 <tpw_rules> fetishizing over weird assembly tricks is one thing i like

20:59 <whitequark> though a baseline level of docs is required

20:59 <whitequark> lol

20:59 <ZirconiumX> whitequark: was the "gah" about the ALU thing?

21:00 <tpw_rules> also the logo needs way more visibility

21:01 <whitequark> ZirconiumX: did i say that

21:01 <whitequark> i'm confused

21:01 <tpw_rules> also i guess all the load instructions are word addressed?

21:01 <tpw_rules> i thought it was about stallman

21:01 <ZirconiumX> <whitequark> gaaah

21:02 <whitequark> ZirconiumX: about stallman

21:02 <whitequark> tpw_rules: yes

21:02 <ZirconiumX> Fair

21:03 <whitequark> that's maybe one part of the design i'm not sure about

21:03 <whitequark> i mean. cons: it makes porting C to Boneless painful

21:03 <whitequark> pros: it makes porting C to Boneless painful

21:03 <ZirconiumX> It's not like this is a unique thing to Boneless

21:03 <tpw_rules> cursed idea: i/o space?

21:04 <cr1901_modern> Without delay states, I can't boneless with the internal user ROM area on MachXO2 (yes, PR for support in nmigen coming soon)

21:04 <ZirconiumX> MIPS, and SPARC I think both require being word aligned

21:04 <tpw_rules> yes

21:04 <tpw_rules> many RISCs per se require it

21:04 <cr1901_modern> ZirconiumX: boneless current can't store anything but 16 bit words right now

21:04 <cr1901_modern> or wq did you get rid of that

21:04 <ZirconiumX> Ah, I see

21:05 <whitequark> tpw_rules: ehhhhh

21:06 <cr1901_modern> tpw_rules: In practice most RISC CPUs eventually got unaligned stores and loads. There were just too many problems on real life packed formats to keep that constraint

21:06 <emily> whitequark: hey it's not your fault nobody can handle CHAR_BIT > 8

21:06 <tpw_rules> also of course there is no docs on how the register windows work

21:06 <tpw_rules> cr1901_modern: i mean yeah but to my understanding it was late, required OS intervention, slower, etc

21:07 <whitequark> tpw_rules: are you sure?

21:07 <whitequark> i mean

21:07 <whitequark> there's no explicit doc, but i believe all semantics is constrained

21:07 <whitequark> since each instruction specifies the exact function of W for it

21:07 <tpw_rules> yes

21:07 <whitequark> it could be better, certainly

21:07 <cr1901_modern> It made sense at the time, but doesn't anymore. Kinda like ARM's PC pointing 8 past the actual currently executing insn

21:07 <cr1901_modern> (or 4 in thumb you pedants)

21:08 <tpw_rules> oh, i guess you're using | to mean concatenate

21:08 <ZirconiumX> emily: that was a good laugh

21:08 <sorear> RISC is an art history term

21:08 <tpw_rules> that seems funky wrt the register window. how do you pass parameters?

21:08 <emily> ZirconiumX: Cray's crack team of lawyers is preparing their case against you as we speak.

21:08 <cr1901_modern> emily: I unironically want a new arch w/ CHAR_BIT % 8 != 0. Well, Clemency exists

21:08 <cr1901_modern> but no C compiler

21:09 <emily> CHAR_BIT is one of the things that makes a Turing-complete implementation of C impossible :'(

21:09 <emily> should clearly be removed

21:10 <tpw_rules> yeah i think either the docs for ADJW are wrong or mem[W|Ra] is wrong or ext13|imm3 is wrong

21:10 <cr1901_modern> ?

21:11 <tpw_rules> ADJW maintains that W is a multiple of 8, but it's concatenated with the register number.

21:11 <whitequark> hm

21:11 <ZirconiumX> emily: Cray? Surely you jest; Unisys will have their legal department ready

21:12 <tpw_rules> also should it be a multiple of 8? maybe you could have sliding registers like itanium for passing parameters

21:13 <whitequark> tpw_rules: re parameters: via LDW and loads with offset

21:13 <emily> i was reading Cray's wikipedia article recently and i was really amused at just how obsessed he was with making computers go fast

21:13 <emily> like he'd come out with the fastest computer in the universe, super successful, everyone is happy

21:13 <emily> and then immediately stomp his feet and go "ok now I want to make one TEN TIMES FASTER"

21:13 <cr1901_modern> yea he did maxwell's equations for individual wires

21:13 <tpw_rules> whitequark: oh, i missed that instruction

21:13 <cr1901_modern> to figure out prop delay and other fun shit

21:13 <emily> even when there was no commercial demand

21:13 <emily> and refuse to work on anything that anyone actually wants

21:13 <tpw_rules> still i wonder if multiples of 8 for the register window is unnecessarily constraining

21:13 <whitequark> tpw_rules: re docs for ADJW: yes, it's missing the part where the low bits of W are unimplemented

21:13 <emily> and also he just kept doing this until he died

21:13 <whitequark> needs to be fixed

21:14 <whitequark> tpw_rules: re sliding windows: it removes one adder from the implementation

21:14 <whitequark> I suspect it will have a significant impact on performance

21:14 <whitequark> but!

21:14 <whitequark> we can always relax it later.

21:14 <whitequark> that's why LDW is encoded like it is

21:15 <tpw_rules> okay i'm wondering how to convey that the low 3 bits of W are unimplemented

21:15 <tpw_rules> like maybe it should be W <- W + imm>>2

21:16 <tpw_rules> because if there are 3 unimplemented bits in W, mem[W|Ra] doesn't make sense

21:16 <tpw_rules> valid point on the sliding windows

21:20 <whitequark> tpw_rules: wait, why?

21:20 <whitequark> Ra is 3 bits long

21:20 <tpw_rules> how long is W?

21:20 <whitequark> 13 bits

21:20 <tpw_rules> okay, so adjw should say W <- W + imm>>2

21:20 <whitequark> why >>2?

21:21 <tpw_rules> because the imm always has the low 3 bits zero

21:21 <tpw_rules> oh i meant to write >>3

21:21 <tpw_rules> but still. if the imm must have the low 3 bits zero and W is only 13 bits, long, then there's only 10 effective bits of W

21:23 <whitequark> oh I see

21:23 <whitequark> so | in W|Rb is logical OR

21:24 <whitequark> and the spec says the behavior is UNPREDICTABLE if you ever put anything into the low W bits

21:24 <whitequark> (the ones that can be unimplemented)

21:24 <tpw_rules> yeah so it's effectively 13

21:24 <whitequark> it's technically correct i think

21:24 <tpw_rules> okay that makes sense

21:24 <whitequark> but certainly confusing

21:24 <whitequark> it definitely needs an informative section

21:24 <tpw_rules> i interpreted | to mean concatenation because all over the place you say ext13|x

21:25 <whitequark> ah shit

21:25 <whitequark> you're totally right

21:25 <whitequark> we should fix it

21:25 <ZirconiumX> And this all started from me wanting to synthesise Boneless for 7400 logic

21:25 <ZirconiumX> My work here is done /s

21:25 <tpw_rules> and afair | is math for concatenation. at least ||

21:26 <whitequark> yeah

21:26 <whitequark> doc bug.

21:26 <emily> || for concatenation is also pretty confusing notation...

21:26 <whitequark> I use `or` for logical op I think

21:26 <whitequark> yeah

21:26 <emily> maybe do {verilog,style} or haskell ++ style

21:27 <sorear> can we get the rest of the ANSI SQL operators added

21:27 <whitequark> lol

21:27 <ZirconiumX> '); DROP TABLE instruction_set; --

21:31 <tpw_rules> i mean this began life as an excel doc, sql is only natrual

21:32 <tpw_rules> anyway have you had a chance to apply any of your idea of what a portable assembler might look like to this?

21:34 <whitequark> not yet

21:34 <whitequark> need to improve more low-level parts of it first

21:34 <whitequark> the Fmax is ridiculously low

21:34 <emily> also help i didn't know about potato semiconductor

21:35 <emily> "Why called Potatosemi as Brand?

21:35 <emily> We are the IC design house making chips. Potato chips are the most popular chips in the world. They are high volume, low price & taste good. All of the people like to eat them. All of the people are happy with them. This is exactly our goals. We will like to make our chips as popular as potato chips, as high volume as potato chips, as low price as potato chips. All of the computers & electronics devices like our chips'

21:35 <emily> taste. All of the people like to use them because they are easy to use & all of the people are happy with potato chips."

21:35 <whitequark> amazing

21:35 <tpw_rules> there's that but they only produce bonkers shit.

21:36 <emily> yes i also saw the pentium 4-ass clock ttl logic

21:36 <whitequark> timecube logic

21:36 <tpw_rules> the 4 sides of the pentium

21:37 <tpw_rules> (but it should have 5??)

21:37 <Ultrasauce> who called it qdr and not 4 corners simultaneous time clock

21:38 <tpw_rules> last q for now: how many clocks per instruction is this thing? there's a lot of memory access

21:40 <whitequark> 4 cpi

21:40 <whitequark> uh

21:40 <whitequark> for most instructions

21:40 <whitequark> shifts are 4+n, complex jumps are 5 (i think)

21:42 <tpw_rules> btw you use | as concatenate in the shifts too

21:42 <whitequark> yeah

21:42 <whitequark> I forgot

21:42 <whitequark> sorry

21:43 <tpw_rules> which don't appear to have their operands specified correctly anyway? as written it's not possible to shift left by 1, only 2 or greater

21:44 <tpw_rules> i feel rude commenting all these doc problems but docs are important to me. please let me know if i can fix them

21:44 <whitequark> tpw_rules: sure, just send a PR

21:44 <whitequark> by no means i feel a lot of attachment to these docs

21:44 <tpw_rules> okay. i'll have to brush up on my tex :P

21:44 <whitequark> i try to not identify with my designs or code or doc too much :p

21:45 <tpw_rules> i did recently get an icebreaker int he mail. i need to throw boneless on there

21:47 <tpw_rules> so i gather you're using roughly Verilog syntax in the Operation parts

21:47 pie_ has joined ##openfpga

21:47 <sorear> left shift by 1 is redundant if you have addition, though

21:48 <tpw_rules> it affects shift right too

21:48 <whitequark> let me see

21:49 <whitequark> can you explain why a shift left by 1 isn't possible?

21:50 <tpw_rules> according to the decode_sr table in the excel, imm3 is decoded to 1-8. which makes sense since shift by zero is a nop. but then you add 1 to imm3 before shifting, so you can only shift 2-9

21:50 <tpw_rules> you would have to encode it as an EXTI

21:50 <tpw_rules> to get 1

21:51 <whitequark> uh, where do I add 1 to imm3?

21:51 <tpw_rules> (also imm3 should be opB on the res <- line anyway)

21:51 <whitequark> oh shit

21:51 <tpw_rules> in the operation of SLLI on page 58

21:52 <whitequark> I forgot to update that part

21:52 <whitequark> it's trying to do what decode_imm_sr is already doing

21:52 <tpw_rules> exactly

21:52 <whitequark> because imm3=0 is actually shift by 8

21:52 <whitequark> aka byte swap

21:54 <tpw_rules> you're aping 6502 again: no ROR :P (yes i know you can do it with ROL)

21:57 <tpw_rules> anyway. it looks to me like you're using vaguely verilog syntax in the docs. i'll rewrite it all to fix these mistakes and maybe enhance some of the other parts and submit a PR. sound good?

21:57 <whitequark> absolutely

21:57 <whitequark> something that would help is an explicit section defining the syntax

21:57 <whitequark> I was planning that but only got around to the bare minimum

21:57 <whitequark> define_imm_* should be described using tables as well

21:58 <tpw_rules> like that it's rd, ra, rb? what about labels and stuff? i am particular and have hacked several assemblers to have my label style :P

21:58 <whitequark> nonono

21:58 <whitequark> the syntax for the stuff in "Operation"

21:58 <tpw_rules> oh okay

21:58 <tpw_rules> yeah sure

21:58 <whitequark> assembler is separate

21:58 <whitequark> and yeah the syntax is super simple

21:58 <whitequark> the labels are "label:" :p

21:59 <tpw_rules> the true distinguisher: what is the comment character

21:59 <whitequark> ; i think

21:59 <tpw_rules> maybe i can throw like 16 of these on my icebreaker and have some fun with the led matrices

21:59 <tpw_rules> approved

21:59 <whitequark> :D

21:59 <whitequark> the one problem is

21:59 <whitequark> it doesn't have any directives right now

21:59 <whitequark> well

21:59 <tpw_rules> i like + and - labels too, and @local labels

21:59 <whitequark> there's .word

21:59 <tpw_rules> . is my preferred directive character

22:00 <whitequark> it needs more directives

22:00 <tpw_rules> .work

22:00 <whitequark> I just couldn't find enough time to really add them properly

22:00 <whitequark> will try soon

22:00 <whitequark> oh btw

22:00 <whitequark> if you could open an issue and suggest a principled set of directives i'm all ears.

22:00 <whitequark> we need at least uhmm

22:00 <tpw_rules> for the assembler? yeah i can think of my favs

22:01 <whitequark> a directive for the jump tables

22:01 <whitequark> cuz it changes how the addresses are calculated

22:01 <tpw_rules> i assume you're going to be grossed out at people doing more than like 200 lines

22:01 <whitequark> not really, zignig seems to really enjoy writing boneless assembly so why not

22:01 <whitequark> i mean

22:01 <whitequark> it used to not have a text assembler

22:01 <tpw_rules> why does it? is it just easier to process? there's no optimization like on arm where you can encode the offsets in bytes

22:01 <whitequark> but i looked at how much fun zignig was having and decided to add one cuz why not

22:01 <tpw_rules> i don't understand why JVT is relative

22:02 <tpw_rules> to the table

22:02 <whitequark> because all boneless code is PC-relative

22:02 <whitequark> it's inherently relocatable

22:02 <whitequark> at no additional instruction cost

22:02 <whitequark> this has created a major implementation nightmare with the orthogonal instruction set

22:02 <tpw_rules> but it's relative to the table, not the PC

22:02 <whitequark> but i managed it

22:02 <whitequark> er

22:02 <whitequark> yes

22:02 <tpw_rules> i guess that makes sense

22:02 <whitequark> because it's a vtable instruction

22:02 <tpw_rules> yeah i didn't think enough there

22:02 <whitequark> now J*S*T is relative to PC

22:03 <whitequark> ironically, JST is harder to implement than JVT

22:03 pie_ has quit [Ping timeout: 250 seconds]

22:04 <tpw_rules> other thought: is it possible to randomly permute the encodings and see if they save resources

22:05 <whitequark> it would be very easy to do so bc the entire thing is generated from exactly 1 source of truth

22:05 <whitequark> arch.opcode

22:05 <whitequark> i already did permute them to hopefully simplify the decoder a bit

22:06 <tpw_rules> oh ok

22:06 <whitequark> but not extensively

22:06 * tpw_rules only likes optimizations when they take hours and only net like a 1% improvement

22:06 <whitequark> lol

22:06 <whitequark> there's some low hanging fruit there still

22:06 <whitequark> for example, the unencoded instructions could be added to the decoder such that they reuse other paths

22:07 <whitequark> well

22:07 <whitequark> you could also use 'x...

22:07 <whitequark> ... but i don't like it.

22:07 <tpw_rules> are there any facilities for interrupts?

22:08 <whitequark> not currently :D

22:08 <whitequark> i never figured out what to do with the flags

22:08 <tpw_rules> i wonder if you could have a port on it that like loads the high 8 bits of pc and W simultaneously

22:08 <whitequark> or the return pc

22:08 <tpw_rules> oh yeah that would be weird too

22:08 <whitequark> no, you don't need to care about W

22:08 <tpw_rules> but if the interrupt manager loaded W, you could just stick them in r0 and r1

22:08 <whitequark> oh wait

22:09 <whitequark> i think i know what can be done

22:09 <tpw_rules> of the interrupting window

22:09 <whitequark> yeah

22:09 <whitequark> we could make it push the window

22:09 <whitequark> there's no way to restore the flags tho

22:09 <tpw_rules> flagless arch :D

22:09 <whitequark> lol

22:09 <tpw_rules> exti already stores weird state between insns and anything that doesn't write to the flags destroys them

22:09 <tpw_rules> (and that involves the ALU)

22:10 <tpw_rules> hm that would be gross though

22:10 <whitequark> oh shit

22:10 <whitequark> what happens if an interrupt arrives during exti

22:10 <tpw_rules> make it not?

22:10 <whitequark> variable interrupt latency is gross

22:11 <whitequark> i think exti should be restartable

22:11 <tpw_rules> your instructions have variable cycle counts

22:11 <whitequark> shit

22:11 <whitequark> point

22:11 <tpw_rules> and exti is like 1 cycle anyway

22:11 <whitequark> yeah, it should be short-circuited in the FSM but currently isn't

22:20 <TD-Linux> lol this is the first time I actually looked at potato semi's catalog. amazing

22:21 <tpw_rules> yeah i can't see a nice place to fit in a restore flags instruction. but i think having an interrupt push the window and store old state like pc and flags in registers is okay. it might be a lot of memory writes though

22:22 <whitequark> tpw_rules: tbh i was thinking of having an interrupt controller as a peripheral

22:22 <whitequark> (that peeks into CPU state)

22:23 <whitequark> but... i don't know

22:23 <tpw_rules> yeah but i imagine it would be nice to have a port that's like 16 bits of new pc and a bit to cause a transfer to it

22:23 <tpw_rules> that an interrupt controller can hook into. or a simple system can assert a constant value on pc and flip the interrupt bit to activate it

22:24 <whitequark> or just make it a soft reset

22:24 Asu` has joined ##openfpga

22:24 <whitequark> you can make an interrupt table with 2 instructions

22:24 Asu has quit [Ping timeout: 248 seconds]

22:24 <whitequark> LDX, JST

22:25 <tpw_rules> you still lose flags and EXTI state though

22:25 <whitequark> hence soft

22:25 <whitequark> so it will activate only in FETCH stage without EXTI active

22:25 <tpw_rules> arm cortex has the weird hack of having a magic return value that unstacks the exception stack frame

22:25 <whitequark> yes

22:25 <whitequark> i kinda want to make that

22:25 <whitequark> it might make sense to do something like um

22:26 <tpw_rules> i don't understand what you mean by soft reset as compared to that idea i had

22:26 <whitequark> i mean

22:26 <whitequark> reset just for the PC

22:26 <whitequark> mhm

22:26 <whitequark> actually

22:26 <whitequark> tpw_rules: so here's what i'm thinking

22:27 <whitequark> maybe an interrupt should hijack the decoder logic to drive LDW onto the internal buses and then a JALR

22:27 <whitequark> this gives you PC

22:27 <whitequark> not sure about flags :/

22:27 <whitequark> as for "restore flags" instruction

22:27 <emily> 15:20 <TD-Linux> lol this is the first time I actually looked at potato semi's catalog. amazing

22:27 <emily> like seriously is this some kind of practical joke or

22:27 <tpw_rules> i think a magic return value would have a high cost

22:27 <whitequark> we can reserve the encoding of x: JALR Rn, x which is basically totally useless

22:28 <whitequark> to mean RETI

22:28 <whitequark> and the Rn could be repurposed to mean the actual flags

22:28 <whitequark> so an interrupt handler would use self-modifying code

22:28 <tpw_rules> what is x

22:28 <whitequark> label

22:28 <tpw_rules> also you don't have a jalr instruction

22:28 <whitequark> tpw_rules: oh wait

22:28 <whitequark> JRAL

22:28 <whitequark> same difference

22:29 <whitequark> oh

22:29 <whitequark> wait

22:29 <whitequark> i meant JAL here

22:29 <whitequark> oh that has an assembly bug

22:29 <whitequark> missing Rd

22:29 <tpw_rules> oh i missed the label, that makes sense

22:30 <tpw_rules> so you just wrote a funky infinite loop

22:30 <tpw_rules> okay

22:30 <whitequark> tpw_rules: we also have a "JN" instruction

22:30 <whitequark> which can probably be repurposed if necessary

22:30 <whitequark> but it has the wrong encoding for this

22:31 <tpw_rules> also you have four flag bits so you can't encode them in the register number

22:31 <whitequark> yes, just realized

22:31 <tpw_rules> what's wrong with xchf though

22:31 <tpw_rules> can that be easily encoded?

22:32 <whitequark> xchf Rd.

22:32 <whitequark> I ... guess?

22:32 <tpw_rules> i think that could be a nice general purpose instruction too.

22:32 <whitequark> it'd need to be stuffed into some obscure corner

22:33 <tpw_rules> wouldn't that make it more complex to implement

22:34 _whitenotifier has joined ##openfpga

22:34 <_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 2 commits to master [+0/-0/±5] https://git.io/fjNhn

22:34 <_whitenotifier> [whitequark/Boneless-CPU] whitequark 0fe269b - gateware.core: remove debug statement.

22:34 <_whitenotifier> [whitequark/Boneless-CPU] whitequark d2dbf91 - doc: fix some assembly mismatches.

22:34 <whitequark> tpw_rules: well

22:35 <whitequark> i like conserving opcode space?

22:35 <whitequark> right now it's like, super dense.

22:36 <tpw_rules> maybe it can be right after SRAI. idk

22:36 <whitequark> that part is where MULDIV should live

22:36 <tpw_rules> ok

22:36 <whitequark> probably UMUL SMUL UDIV SDIV

22:37 <whitequark> and there's still four slots left

22:37 <tpw_rules> what about upper mul

22:37 <tpw_rules> and mod

22:37 <whitequark> register pairs

22:37 <tpw_rules> ok

22:37 <tpw_rules> addi and subi are almost symmetric

22:38 <tpw_rules> you could replace one of them

22:38 <whitequark> nop

22:38 <tpw_rules> ?

22:38 <whitequark> different behavior wrt imm3

22:38 AndrevS has quit [Remote host closed the connection]

22:38 <tpw_rules> that's why i said almost

22:38 <whitequark> for one

22:38 <whitequark> hm

22:39 <whitequark> also it'd make the encoder more complex for no very good reason

22:39 <whitequark> decoder*

22:39 <tpw_rules> i wondered if it would make it simpler because xchgf would be intimately connected with the ALU

22:39 <tpw_rules> they would be symmetric flags wise if you used carry as it should be used :P

22:40 <whitequark> oh?

22:40 <whitequark> what's wrong with my carry?

22:41 <tpw_rules> i'm partial to the 6502 way where C is 1 if +1 for add, but 0 if -1 for subtract

22:41 <whitequark> mhm

22:41 <tpw_rules> anyway that's another very minor holy war. but maybe it could save an xor or two

22:41 <whitequark> i'm open to it

22:41 <whitequark> i never really thought about it

22:41 <whitequark> you can open an issue so it doesn't get lost

22:44 <_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to master [+0/-0/±3] https://git.io/fjNhz

22:44 <_whitenotifier> [whitequark/Boneless-CPU] whitequark 8ba7240 - doc: fix some assembly mismatches.

22:45 <_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to master [+0/-0/±3] https://git.io/fjNhg

22:45 <_whitenotifier> [whitequark/Boneless-CPU] whitequark c6f00b2 - doc: fix some assembly mismatches.

22:45 <tpw_rules> hm maybe it's a wash

22:45 <whitequark> oh?

22:46 <tpw_rules> or the operation is incorrect again

22:47 <whitequark> ;w;

22:48 <tpw_rules> i'm consulting my own 6502 emulator

22:50 <tpw_rules> you've proved that all the instructions work correctly?

22:51 <whitequark> no, only testcases for now

22:51 <whitequark> the riscv-formal approach would not work

22:51 <whitequark> i did figure out an approach that will work

22:51 <whitequark> but haven't yet been able to implement it

22:52 <whitequark> basically, recording an execution trace

22:52 <whitequark> and looking up values in it

22:54 <tpw_rules> okay

22:55 <tpw_rules> yeah i'm not extra sure you got carry right

22:55 <whitequark> but there's functional tests for it?

22:56 <tpw_rules> then the operation might be wrong

22:56 <whitequark> yes, could be

22:56 <tpw_rules> or i misinterpreted the whole thing

22:57 <whitequark> sounds like the doc is broken either way.

22:57 <whitequark> even if it's technically correct.

22:57 <whitequark> which i'm not sure of.

22:57 uovo has joined ##openfpga

22:57 <whitequark> hm

22:57 <tpw_rules> okay riddle me this: suppose R0 is 1 and C is set. what's R1 and C after SBBI R1, R0, 1

22:57 <whitequark> so the improved ALU synthesizes to... more LUTs

22:58 <whitequark> uhhh let's see

22:59 <whitequark> that translates to R1=R0+~1+C. so. R1=0x0001+0xffff+1. so R1=0x(1)0001. C=1 R1=0x0001

22:59 <whitequark> in theory.

22:59 <whitequark> let me actually run it

23:00 <_whitenotifier> [Boneless-CPU] whitequark created branch alsru2 - https://git.io/fhUTh

23:00 <_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to alsru2 [+0/-0/±1] https://git.io/fjNh5

23:00 <_whitenotifier> [whitequark/Boneless-CPU] whitequark 1f7e2fc - WIP ALSRU

23:02 <whitequark> tpw_rules: hm. that doesn't match reality, oops

23:03 <whitequark> like at all

23:03 <tpw_rules> what did you get?

23:03 <whitequark> R1=0 F=ZC

23:03 <tpw_rules> that's 6502 convention

23:03 <tpw_rules> C should just be set to bit 16 of the result, not its inverse

23:05 <tpw_rules> wait hm

23:06 <whitequark> tpw_rules: something i just realized is... maybe i shouldn't have like

23:06 <whitequark> separate CI/CO SI/SO

23:07 <tpw_rules> why? they're different things

23:07 emeb_mac has joined ##openfpga

23:07 <whitequark> fewer LUTs

23:07 <whitequark> or... maybe not?

23:07 <whitequark> lemme check

23:07 <tpw_rules> i mean knowing the high bit is nice

23:08 <tpw_rules> but you yourself said you can test that easily

23:08 <whitequark> oh no not in the ISA

23:08 <whitequark> in the impl

23:11 <whitequark> WTF

23:12 <whitequark> i made it simpler and now it's way more luts

23:12 <whitequark> fucking piece of shit synthesizer

23:12 <whitequark> grrrrr

23:12 <whitequark> i should just like. add icecube support to nmigen

23:12 <whitequark> i resent abc so much

23:13 <_whitenotifier> [Boneless-CPU] whitequark created branch merge-ci-si - https://git.io/fhUTh

23:13 <_whitenotifier> [whitequark/Boneless-CPU] whitequark pushed 1 commit to merge-ci-si [+0/-0/±3] https://git.io/fjNjU

23:13 <_whitenotifier> [whitequark/Boneless-CPU] whitequark 0253454 - WIP merge CI+SI in ALSRU

23:13 Asu` has quit [Quit: Konversation terminated!]

23:15 <tpw_rules> how does nmigen Cat work? is it low bits first?

23:16 <whitequark> tpw_rules: nmigen signals work like Python arrays

23:16 <whitequark> Cat(x,y,z) is basically [*x,*y,*z]

23:16 <tpw_rules> ok

23:16 <tpw_rules> opposite to verilog

23:16 <whitequark> yep

23:18 <tpw_rules> then i'm fairly certain the subtraction is incorrect

23:18 <tpw_rules> m.d.comb += Cat(p, self.co).eq(x + y + self.ci)

23:19 <tpw_rules> you have to invert carry on either the way in, or the way out, and i don't see it being done in either place

23:19 <tpw_rules> https://en.wikipedia.org/wiki/Carry_flag

23:22 <sorear> 6502 doesn’t invert in or out

23:23 <sorear> litmus test: can you do a 32b add/sub without manipulating flags in the middle

23:23 <tpw_rules> oh that is true.

23:24 <tpw_rules> sorry, the gateware is correct. it uses the 6502 convention. but operation is ot

23:24 <tpw_rules> not

23:25 <tpw_rules> i'm not going to touch overflow but it should probably be touched too :P

23:25 <sorear> hmm, do you have enough non-flag-clobbering data moves to implement bignum addition with a loop

23:25 <tpw_rules> no, mov destroys C and V

23:26 <whitequark> shit

23:26 <tpw_rules> you could store to memory :P

23:27 <whitequark> we could define ADD to preserve CV

23:27 <whitequark> hm

23:27 <whitequark> er

23:27 <whitequark> AND*

23:27 <tpw_rules> i think that's what i said to start this whole conversation :P

23:27 <whitequark> i mean

23:28 <whitequark> i didn't realize it'd break bignums.

23:28 <whitequark> breaking bignums is very bad.

23:28 <whitequark> wait

23:28 <whitequark> that won't help

23:28 <whitequark> because you have to SUB to check the loop condition

23:28 <whitequark> unless you have like a duff's device

23:29 <tpw_rules> or xchf

23:29 <tpw_rules> anyway what happens if you're at 0xFFFF and J +1. is that UNPREDICTABLE? same with the relative loads

23:32 <tpw_rules> i'm going to define it as that

23:35 <whitequark> tpw_rules: defined to wrap

23:35 <whitequark> well

23:35 <whitequark> hm

23:35 <whitequark> tpw_rules: the thing is that boneless is really not suited for banking

23:35 <whitequark> so i don't see how you could ever extend it to be like 20-bit or something

23:35 <tpw_rules> that's never stopped anybody before

23:35 <whitequark> no i mean you can't put a window there

23:35 <whitequark> you can't JAL from there

23:36 <whitequark> like

23:36 <whitequark> you can bank the top half of address space for example

23:36 <whitequark> but you can't *extend* the address width

23:36 <tpw_rules> ok

23:36 <whitequark> it's *too* orthogonal for this, i can't imagine any remotely usable way to do it

23:36 <whitequark> even with hax

23:36 <tpw_rules> EXTA :D

23:37 <whitequark> won't work

23:37 <whitequark> you can't address those things without like

23:37 <mwk> clearly you need to introduce segment registers

23:37 <whitequark> far pointers??

23:37 * whitequark stabs mwk

23:37 <tpw_rules> that's what i mean

23:37 <tpw_rules> exta can take an immediate, or a register

23:37 <mwk> ow ow ow

23:37 <whitequark> tpw_rules: wtf

23:37 <whitequark> that's horrible

23:37 <tpw_rules> anyway yes that woudl be stupid

23:37 <whitequark> just use risc-v or something

23:38 <whitequark> hell, i'm not sure who would even instantiate all 64K of RAM for boneless

23:38 <whitequark> it's not a 8051 where you need fifteen instructions to do anything

23:39 <tpw_rules> sell it as a programmable state machine

23:39 <whitequark> i mean yes?

23:39 <whitequark> that's literall its origin stor

23:39 <whitequark> it's modeled after KCPSM too

23:39 <whitequark> among other things

23:39 pie_ has joined ##openfpga

23:51 <tpw_rules> whitequark: https://i.imgur.com/MbwCZ3p.png does this sound good

23:52 <sorear> ya don’t like page registers?

23:52 <tpw_rules> in vague terms of formatting

23:52 <whitequark> yeah seems reasonable

23:53 <whitequark> i'll just edit it later if i hate it, but it's good to get the ball rolling regardless

23:54 <whitequark> tpw_rules: thank you for helping by the way, i'm pretty overloaded lately

23:55 <whitequark> ... lately meaning the last five years ...

23:55 <tpw_rules> you're welcome. i'm glad i'm able to improve it instead of whining

23:55 <tpw_rules> it's ok i start graduate school on monday

23:55 <whitequark> oh no / oh yes

23:55 <whitequark> should i not ping you after that? :p

23:56 <tpw_rules> lol almost nothing will change

23:56 <whitequark> ah

23:56 <tpw_rules> just what budget pays me to work in the lab

23:56 <tpw_rules> and i'll have to write a paper ;_;

23:56 * tpw_rules i'm assuming the external bus is 16 bits data and address too

23:56 <tpw_rules> s/\/me//

23:57 <whitequark> yes

23:57 <whitequark> in fact the address bus is shared even

23:58 <whitequark> so one thing i want to do is to fit it on LP384

23:58 <whitequark> that's... challenging

23:58 <sorear> Do you have a specific external memory in mind

23:59 <whitequark> sorear: it'd be like

23:59 <whitequark> a 6502-style "bare CPU" package

23:59 <whitequark> that comes on a DIP-like board

23:59 <whitequark> basically a toy but fun