#nmigen on 2020-04-22 — irc logs at freenode.irclog.whitequark.org

2020-01-27 18:31 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen

00:11 Degi_ has joined #nmigen

00:13 Degi has quit [Ping timeout: 240 seconds]

00:13 Degi_ is now known as Degi

00:53 danfoster has joined #nmigen

01:23 danfoster has quit [Remote host closed the connection]

01:31 <whitequark> awygle: https://github.com/vmware/cascade

01:32 <awygle> That's... Pretty awesome

01:34 <TD-Linux> >supports ulx3s

01:34 <awygle> thank you twitter

01:35 <TD-Linux> neat

01:35 <awygle> "the code is moved into hardware" is an interesting description

01:36 <awygle> i get what they're saying but still

01:36 <awygle> it looks like you might be able to slot cxxrtl into place in this flow where Verilator currently lives?

01:36 <awygle> i guess that wouldn't be hugely useful tho

01:39 <whitequark> why not?

01:40 <awygle> in the context of nmigen it seems like what you'd want is an nmigen frontend, not a cxxrtl middle-end

01:40 <whitequark> ohh, i misunderstood what you wanted

01:44 <awygle> what did you think i wanted? i mgiht want that too :p

01:45 <awygle> hm i'd never heard of "ujprog" either

01:46 <whitequark> awygle: using cxxrtl as a jit backend

01:46 <whitequark> given that it *almost* supports proper separate compilation, not inconceivable

01:46 <awygle> oh, yes, sure

01:46 <awygle> we shoudl do that lol

01:47 <awygle> the more i read this the more interesting cascade is

01:47 <awygle> >> Cascade ... can target [the ULX3S'] reprogrammable fabric to improve virtual clock frequency for most applications.

01:48 futarisIRCcloud has joined #nmigen

01:49 <whitequark> at which point does it stop being a JIT compiler and becomes a synthesizer with an ILA?

01:49 <whitequark> i don't quite get it

01:50 <awygle> i _think_ they mean "jit compiler" as "to a bitstream"

01:50 <awygle> i also misunderstood at first

01:50 <whitequark> hrm

01:50 <awygle> but then why verilator

01:50 <sorear> I think they mean it in the sense of "tiered compilation"

01:50 <awygle> *confused*

01:50 <whitequark> okay that *is* interesting

01:51 <whitequark> but also confusing

01:51 <sorear> using a SW sim as tier 1, and PnR as tier 2

01:51 <awygle> actually i think it's

01:51 <awygle> pure SW sim -> verilated compiled -> PnR

01:51 <whitequark> right so they have deopt support, right?

01:51 <whitequark> that's a lot of fun

01:52 <sorear> except that there's no profiling and no reasonable way to split the design anyway, so you just migrate the whole thing when the background compile finishes

01:52 <awygle> "deopt"?

01:52 <whitequark> deoptimization

01:52 <sorear> yes, if I'm reading the readme right they deopt for $printf etc

02:01 <awygle> the "virtualization tasks" section looks like it would be quite nice for my interactive simulator dream

02:02 <awygle> like, it's not quite that, but it's quite similar

02:13 Stary has quit [Ping timeout: 246 seconds]

02:22 Stary has joined #nmigen

02:32 <TD-Linux> awygle, ujprog is the tool used to program the ulx3s via the ft2232h on board

02:33 <TD-Linux> (it is somewhat difficult to make it work correctly also)

02:37 <awygle> interesting

02:40 <awygle> Why are all jtag api things bad

02:42 <whitequark> awygle we have an entire channel literally dedicated to that

02:54 <awygle> ... we do?

02:56 <whitequark> #glasgow ;p

02:56 <awygle> ah :p

02:56 <awygle> i thought you might mean that

02:56 <awygle> different problem tho no?

02:58 <whitequark> eh

02:58 <whitequark> not very serious here

02:58 <awygle> mhm

03:11 <awygle> oh speaking of

03:12 <awygle> i saw a comment on the glasgow (?) issue tracker that said you weren't interested in using libjtaghal and that the glasgow native support was strictly superior (i think)

03:12 <awygle> i was curious about that

03:13 <awygle> libjtaghal seems excessively complex to me, but i am interested in what you found objectionable (or if you did)

03:13 <whitequark> awygle: mh, i might have worded that poorly

03:14 <whitequark> there were a few realizations mixed up there

03:14 <whitequark> first, it turns out i did not really need libjtaghal for... well, jtag. at the time i did not understand jtag very well. i do now. it is beautiful and not really hard to use

03:14 <whitequark> second, interacting with glasgow from foreign c++ code is hard because glasgow, the USB device, doesn't (yet?) have a "stable ABI"

03:15 <TD-Linux> I mean, I use the glasgow as my ecp5 jtag adapter of choice...

03:15 <whitequark> third, it turned out that heavy vertical integration in glasgow gives almost exponential benefits

03:16 <awygle> i see

03:20 <TD-Linux> I actually find ice40 spi flashing more obnoxious because you have to hold reset and not all spi programmers support doing that

04:21 * cr1901_modern has a use for stable glasgow USB interface in the mid-future (few months from now?)

05:22 ____ has joined #nmigen

06:06 _whitelogger has joined #nmigen

06:29 XgF has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

06:32 Vinalon has joined #nmigen

06:43 thinknok has joined #nmigen

08:10 thinknok has quit [Ping timeout: 272 seconds]

08:15 chipmuenk has joined #nmigen

08:28 Asu has joined #nmigen

08:46 thinknok has joined #nmigen

09:21 ____2 has joined #nmigen

09:23 ____ has quit [Ping timeout: 256 seconds]

09:40 <Sarayan> How insanely big the nmigen code for a fpu would be? And would it map to fpga hardware?

09:40 <Sarayan> I'm thinking about the feasability of something like a NeXT on mister

09:56 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

10:31 Ekho has quit [Quit: An alternate universe was just created where I didn't leave. But here, I left you. I'm sorry.]

10:36 Vinalon has quit [Ping timeout: 264 seconds]

10:57 Ekho has joined #nmigen

11:18 <_whitenotifier-9> [nmigen] hofstee opened issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkM3

11:20 <whitequark> Sarayan: shouldn't be that huge

11:20 <whitequark> i mean, depedns on what kind of fpu, but they aren't inherently massive

11:20 <_whitenotifier-9> [nmigen] whitequark commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMZ

11:21 <_whitenotifier-9> [nmigen] whitequark commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMc

11:22 <_whitenotifier-9> [nmigen] whitequark edited a comment on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMc

11:23 <_whitenotifier-9> [nmigen] hofstee opened pull request #364: Fix `_yosys_version()` - https://git.io/JfkMl

11:35 <_whitenotifier-9> [nmigen] hofstee commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMM

11:37 <Sarayan> wq: 68040 or so, so trying a perfect cycle-exact simulation, just an equivalent behaviour, to make a full-speed NeXT for instance

11:37 <_whitenotifier-9> [nmigen] hofstee edited a comment on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMM

11:37 <Sarayan> s/so trying/not trying/

11:37 <Sarayan> I think is has standard fp ops, possibly dropped the functions, for fp32, 64 and 80 iirc

11:38 <_whitenotifier-9> [nmigen] whitequark commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMS

11:39 <_whitenotifier-9> [nmigen] codecov[bot] commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkMQ

11:39 <_whitenotifier-9> [nmigen] codecov[bot] edited a comment on pull request #364: Fix `_yosys_version()` - https://git.io/JfkMQ

11:40 <whitequark> Sarayan: that has "68000 gates", right? or more like 2/3 that amount

11:40 <_whitenotifier-9> [nmigen] codecov[bot] edited a comment on pull request #364: Fix `_yosys_version()` - https://git.io/JfkMQ

11:40 <whitequark> mh no, 1.2 mil

11:41 <whitequark> it's really hard to make any sound prediction, but i'd expect you to be able to fit it into a larger FPGA

11:41 <whitequark> not sure about the mister specifically

11:41 <Sarayan> yeah

11:41 <Sarayan> cyclone V, the one with a dual-core arm in

11:41 <daveshah> I think what mostly affects the size of an FPU is how microcoded/multicycle it is

11:42 <daveshah> The Rocket FPU is pretty large (needing pretty much an Artix-7 100T for SoC+FPU, whereas SoC on its own is fine in an ECP5 45k)

11:42 <daveshah> but I think that their implementation is fairly inefficient

11:43 <Sarayan> if the fpu has no sin() and friends, is there anything to microcode in the first place?

11:43 * whitequark . o O ( bit-serial FPU )

11:43 <Sarayan> oh damn, I nerd-sniped wq, sorry

11:43 <daveshah> Division might well benefit from some kind of microcoding

11:44 <whitequark> oh no, i'm not olofk :p

11:44 <MadHacker> There's plenty FP emulators on 8 bit micros, so that sets an upper bound for how bad it can be. You can always implement it as a tiny 8-bit micro.

11:44 <daveshah> Yeah, a picorv32 or VexRiscv would be even easier and sets an upper bound for a "microcoded" FPU (~2k LUTs)

11:44 <Sarayan> true. Nore that 8bit micros are not ieee usually

11:44 <_whitenotifier-9> [nmigen] hofstee commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkMx

11:45 <Sarayan> the 68k itself is microcoded for the "normal" instructions

11:46 <Sarayan> I really wonder how small one can make a 68040-equivalent while keeping similar performance

11:46 <Sarayan> I guess I'll start on the integer instructions when I'm bored

11:47 <_whitenotifier-9> [nmigen] whitequark commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkDv

11:47 <_whitenotifier-9> [nmigen] whitequark edited a comment on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkDv

11:47 <daveshah> I think it is usually about 6-10 ASIC gates to the LUT used in an ASIC emulation context

11:48 <whitequark> so 120-200k LUT?

11:48 <daveshah> Perhaps less, as it is 1.2M transistors not gates

11:48 <whitequark> not that big, but not ecp5 sized either

11:48 <whitequark> ah, right

11:48 <daveshah> and memories will be more efficient than that

11:48 <daveshah> ditto DSPs if there are any multiplies in there

11:48 <Sarayan> note that a large part of these transistors are just the caches

11:48 <daveshah> Yeah, those become BRAM#

11:48 <daveshah> so probably its a mid ECP5 type design

11:50 <_whitenotifier-9> [nmigen] whitequark commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkDm

11:50 <Sarayan> I have a feeling it would be fun to reimplement old workstations on fpga, and one only need external ram, there aren't the bw issues of distributed roms of arcade games

11:51 <_whitenotifier-9> [nmigen] whitequark reviewed pull request #364 commit - https://git.io/JfkDY

11:52 <daveshah> Yeah, you could use DDR3 without worrying about latency issues too

11:52 <daveshah> 68040 computer with 1GiB RAM...

11:53 <whitequark> and PCIe?

11:53 <MadHacker> I've friends tried to emulate various machines on modern hardware who've found out the hard way that RAM latency really isn't that much better. :/

11:53 <MadHacker> Meanwhile I'm sticking an HX4K on a BBC Master ROM cartridge for fun and USB.

11:53 <MadHacker> I wish the ECP5 was easier to place, I'd prefer give it PCIe for a laugh. :)

11:53 <daveshah> Depending on what machine, the whole thing should fit in cache given a decent CPU!

11:54 <MadHacker> True that.

12:02 <_whitenotifier-9> [nmigen] hofstee commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkDr

12:03 <Sarayan> daveshah: with mister you can have 128M SDRAM easily nowadays

12:06 <_whitenotifier-9> [nmigen] hofstee commented on issue #185: ASIC support tracking issue - https://git.io/JfkDy

12:14 <_whitenotifier-9> [nmigen] whitequark commented on issue #363: Can I create an active-low (asynchronous) reset? - https://git.io/JfkDx

12:17 rohitksingh has quit [Quit: No Ping reply in 180 seconds.]

12:18 rohitksingh has joined #nmigen

12:21 <_whitenotifier-9> [nmigen] hofstee synchronize pull request #364: Fix `_yosys_version()` - https://git.io/JfkMl

12:21 <_whitenotifier-9> [nmigen] codecov[bot] edited a comment on pull request #364: Fix `_yosys_version()` - https://git.io/JfkMQ

12:23 <_whitenotifier-9> [nmigen] whitequark closed pull request #364: Fix `_yosys_version()` - https://git.io/JfkMl

12:23 <_whitenotifier-9> [nmigen/nmigen] whitequark pushed 1 commit to master [+0/-0/±1] https://git.io/Jfkyc

12:23 <_whitenotifier-9> [nmigen/nmigen] hofstee 875579e - back.verilog: make Yosys version check compatible with Verific.

12:23 <_whitenotifier-9> [nmigen] whitequark commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkyC

12:26 <_whitenotifier-9> [nmigen] whitequark commented on issue #185: ASIC support tracking issue - https://git.io/Jfkyz

12:28 <_whitenotifier-9> [nmigen] hofstee commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkyM

12:29 <_whitenotifier-9> [nmigen] whitequark commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkyS

12:30 <_whitenotifier-9> [nmigen] whitequark commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkyN

12:30 <sorear> there's a big difference between something like rocket's FPU, which has a 52x52 multiplier and several barrel shifters as (retimed) combinatorial logic and can complete double-precision FMAs at 1/cycle, and a 8080-era FPU which just has a couple of 80-bit registers, shift left/right one, an adder, and finite state logic

12:31 <sorear> *8087

12:31 <_whitenotifier-9> [nmigen] whitequark commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkSe

12:35 <sorear> https://web.archive.org/web/20090530001724/packetstormsecurity.nl/programming-tutorials/Assembly/fpuopcode.html <- 50+ cycles for multiply up through the 387, which postdates 68881

12:40 <sorear> https://ia801900.us.archive.org/19/items/bitsavers_motorola68anual1ed1987_15689294/68881_68882_Users_Manual_1ed_1987.pdf#page=300 71 cycle FMUL, 51 cycle FADD for 68881

12:42 <sorear> so rocket's FPU is huge (in ASIC processes it takes up half of the tile, the other half being "rest of core + I1$ + D1$"), but it's 122 times the throughput of what you're simulating

12:51 <Sarayan> so it can be made quite small by sacrificing performance that doesn't need to be there anyway

12:52 <daveshah> Given that multipliers are cheap on FPGAs I suspect you could make it quite a bit faster without costing that much more area

12:53 <Sarayan> yeah, the cyclone v has a bunch of wide multipliers

12:53 <Sarayan> the shift is probably costlier

12:54 <daveshah> A fixed shift wouldn't be

12:54 <Sarayan> not sure if nmigen/yosys can actually use the multipliers though

12:54 <daveshah> ZirconiumX: ^

12:54 <Sarayan> fadd requires a very not fixed shift

12:54 <daveshah> Yeah

12:54 <daveshah> I think there are tricks to use the multipliers for shifting, too

12:55 <ZirconiumX> Yeah, you can't presently use the multipliers

12:55 <Sarayan> you need a 2**n then

12:55 <whitequark> isn't shift-by-mul just a mul by one hot?

12:55 <daveshah> 2**n is cheap, just a decoder

12:55 <ZirconiumX> Even worse, there doesn't appear to be a Quartus IP core for this

12:55 <Sarayan> ZX: for the multipliers?

12:56 <daveshah> The tricks come in when the thing you are shifting is larger than the multiply but I can't remember the details

12:56 <daveshah> There was an old Xilinx app note that I saw about it

12:57 <Sarayan> wq: yeah, but between the size of the one hot and the muxing of the multiplier input and output I kinda wonder if directly barrel-shifting isn't better

12:58 <ZirconiumX> Sarayan: yeah

13:01 <ZirconiumX> Well

13:01 <ZirconiumX> There's lpm_mult

13:02 <ZirconiumX> Or altera_mult_add.

13:03 <ZirconiumX> The Intel FPGA Multiply Adder (Intel Stratix 10, Intel Arria 10, and Intel Cyclone 10 GXdevices) or ALTERA_MULT_ADD (Arria V, Stratix V, and Cyclone V devices) IP coreallows you to implement a multiplier-adder.

13:03 <ZirconiumX> This isn't going to be horrendously cursed at all

13:04 <ZirconiumX> The alternative is direct cell instantiation

13:04 <ZirconiumX> Which, uh

13:04 <ZirconiumX> https://github.com/YosysHQ/yosys/pull/1957

13:04 <ZirconiumX> Hasn't gone well so far

13:04 <Sarayan> it's interesting though, how do you map a multiplier you write without thinking to whatever a fpga offers?

13:05 <whitequark> you dont

13:05 <ZirconiumX> You use `*` and hope for the best

13:05 <Sarayan> ok, then how do you do hit fpga-specific resources?

13:05 <whitequark> you use an instance

13:05 <Sarayan> if there a generic way to describe/use them?

13:05 <sorear> given that your 680x0 core necessarily already has microcode, it probably doesn't make sense to have a fully separate FPU if you're not going for cycle accuracy

13:06 <Sarayan> sorear: No intention to have it fully separate, but it's visible in the isa that it runs separately, as in the main program waits for the results

13:06 <Sarayan> (iirc, I never had a 68k with a fpu)

13:07 <sorear> x86 has FWAIT too but it's a no-op on everything recent

13:07 <Sarayan> well, I need to do the integer part for a start, it's going to be a large enough work :-)

13:07 <Sarayan> caches, mmu, fun

13:08 <Sarayan> can an instance "polyfill" for sim or for other fpgas that don't have the function?

13:09 <ZirconiumX> No, but you can write a module to wrap around the instance

13:09 <ZirconiumX> Essentially Instance is nMigen's FFI

13:09 <whitequark> instance polyfills are very much planned

13:10 <Sarayan> sorear: So you use the fabric capabilities to have a sungle-cycle fpu or so, then forget about the async?

13:12 <ZirconiumX> I know there's Intel IP for FPU functions

13:12 <sorear> if you have a FPGA that will fit a single-cycle FPU, then yes

13:25 <ZirconiumX> 58 arguments to altera_mult_add, 227 parameters

13:25 * ZirconiumX cries

13:25 <Sarayan> mwahahahhaa nice

13:26 <ZirconiumX> Why, Intel? I don't need saturating arithmetic

13:26 <ZirconiumX> I don't need you to rotate the input

13:27 <ZirconiumX> I don't need you to register the inputs and outputs either

13:27 <ZirconiumX> daveshah: how bad is the ECP5 MULT18X18 cell? I'll admit I haven't looked at it.

13:28 <daveshah> In its simple form not too bad

13:28 <daveshah> The only real weirdness are the various undocumented cascade modes

13:28 <daveshah> and the DDR registers and associated /2 clock dividers

13:33 pinknok has joined #nmigen

13:36 thinknok has quit [Ping timeout: 265 seconds]

13:56 <ZirconiumX> Good news, at least

13:57 <ZirconiumX> The cyclonev_mac primitive has *only* 22 arguments and 44 parameters

13:58 <ZirconiumX> On the other hand, it has an encrypted simulation model, so I have no clue how it works other than cargo-culting

14:10 <whitequark> i can probably decrypt it if you give me a testbench that uses it

14:14 <ZirconiumX> Sure, just need to do a bit of error-driven development

14:24 <ZirconiumX> wq: https://tasossah.com/uploader/files/cv-mac-test.tgz

14:25 <ZirconiumX> Honestly I'm surprised this synthesises

14:50 <whitequark> ZirconiumX: oh, it's just cyclonev_atoms_ncrypt.v?

14:50 <ZirconiumX> Probably

14:50 <whitequark> ... why is it only CV and 55nm?

14:50 <whitequark> (what was 55nm again?)

14:52 <ZirconiumX> I think 55nm was like C III

14:53 <ZirconiumX> It's apparently also MAX 10

14:54 <whitequark> looks like the mentor models are encrypted, the rest aren't?

14:54 <whitequark> i have no idea. doesn't matter anywway

14:55 <ZirconiumX> Yeah, googling 55nm Altera parts brings up the MAX 10 as using a TSMC 55nm process

14:57 <ZirconiumX> <whitequark> looks like the mentor models are encrypted, the rest aren't? <-- the unencrypted sim model library makes reference to some encrypted models, so

14:57 <ZirconiumX> e.g. cyclonev_clkena is also apparently in here somewhere

14:58 <whitequark> ahh

14:58 <ZirconiumX> There's gotta be some irony in discussing encrypted vendor models while writing coursework on encryption and how to break it

14:59 <tpw_rules> isn't that just not irony?

14:59 <ZirconiumX> Maybe my sense of humour is broken then

15:00 <tpw_rules> "oohoohoo i'm talking about breaking encryption while breaking encryption"

15:00 <tpw_rules> not ironic

15:01 <whitequark> circumventing, not breaking

15:15 <_whitenotifier-9> [nmigen] whitequark edited a comment on pull request #364: Fix `_yosys_version()` - https://git.io/JfkSe

15:16 <_whitenotifier-9> [nmigen] whitequark commented on pull request #364: Fix `_yosys_version()` - https://git.io/JfkF0

15:55 Vinalon has joined #nmigen

15:57 cr1901_modern has quit [Read error: Connection reset by peer]

16:23 <ronyrus> wq: I used the debug ring log + uart example from your Yumewatari project. It's extremely useful and the state decode trick is awesome!!!

16:23 <ronyrus> Is there a resource teaching these kind of tricks somewhere? Are there more?

16:25 <whitequark> ronyrus: i'm afraid that one was made after working a lot with migen (and patching it too)

16:26 <whitequark> actually i had to implement .decoding[]

16:26 <ronyrus> :) it's very useful :)

16:51 Vinalon has quit [Remote host closed the connection]

16:52 Vinalon has joined #nmigen

17:05 <_whitenotifier-9> [nmigen] Fatsie commented on issue #185: ASIC support tracking issue - https://git.io/JfkA7

17:05 <_whitenotifier-9> [nmigen] Fatsie edited a comment on issue #185: ASIC support tracking issue - https://git.io/JfkA7

17:05 pinknok has quit [Remote host closed the connection]

17:06 pinknok has joined #nmigen

17:06 <awygle> Guessing yumewatari is fairly far down on your priority list at this point?

17:08 <whitequark> awygle: no, actually

17:08 <whitequark> it's more that i have to make the universe to get some apple pie

17:09 <whitequark> depth first bugfixing

17:12 <Sarayan> yumewatari?

17:13 <whitequark> my PCIe stack

17:14 <awygle> Right

17:15 <awygle> Which particular bits of the universe are missing?

17:15 <whitequark> FSM stuff, parser stuff

17:15 <whitequark> (parser stuff likely dependent on good FSM stuff)

17:17 <awygle> Makes sense

17:18 <awygle> Is there a use case for yumewatari in particular?

17:23 <whitequark> would be the first OSS PCIe PHY

17:23 <whitequark> well... upper-PHY

17:24 <whitequark> technically it already is, depending on how conformant you want it to be

17:27 <whitequark> it has an LTSSM, it's buggy and doesn't implement a bunch of PM features, but so are lots of devices that silicon vendors actually ship. a question of magnitude, really :p

17:29 <Sarayan> target is one of the numerous fpga-on-a-pcie card?

17:29 <Sarayan> or glawgow with a rusty wire connector?

17:29 <Sarayan> s/w/s/

17:29 <whitequark> versa ecp5 5g

17:35 <Sarayan> E226 on mouser, not insane

17:38 <Sarayan> not sure what I could use it for, at least for the mister I have some ideas :-)

17:42 <daveshah> It's a nice board

17:42 <daveshah> I designed a hat with SDRAM and VGA (albeit I only ever assembled and tested the RAM)

17:42 <daveshah> https://github.com/daveshah1/ecp5-hat

17:43 <daveshah> Which might be useful for Mister devel, if you didn't want to use the DDR3

17:44 <Sarayan> what I'd love is a hat for that, or for a mister, with which I can plonk and torture yamaha sound chips

17:46 <Sarayan> glasgow looks nice but is a little short for the ones that read pcm from rom

17:49 <awygle> What about litepcie? Doesn't cover that layer?

17:49 <daveshah> No, it relies on Xilinx hard IP for the LTSSM etc, at least last I looked

17:50 <daveshah> Xilinx don't just have a SERDES, they have a much bigger part of the PCIe stack as hard IP too

17:51 <daveshah> (Lattice have this with CrossLink NX, too, now, in fact I think that might provide even more than Xilinx does)

17:53 <awygle> Ah

17:53 <awygle> Lame

18:02 cr1901_modern has joined #nmigen

18:10 futarisIRCcloud has joined #nmigen

19:29 <Vinalon> hey, I just wanted to say thanks again to ZirconiumX / MadHacker / sorear and the rest of y'all for the advice on how to shrink a CPU design last week; I managed to drop ~1000 cells by following your advice.

19:29 <ZirconiumX> Wow, damn

19:29 <ZirconiumX> Can you resend your source link?

19:29 <Vinalon> removing extraneous CSRs, combining some ALU operations, and reducing the decoder's dependence on CPU state each dropped a few hundred

19:30 <Vinalon> it's here, but I'm still cleaning up the ALU changes and haven't committed them yet: https://github.com/WRansohoff/nmigen_rv32i_min

19:31 <Vinalon> so now the ALU/CSR/CPU logic looks like it's a little less than 2000 cells, and it can fit 4 'neopixel' peripherals. I appreciate the help! :)

19:31 <ZirconiumX> Glad to hear

19:36 <sorear> "The spec does not define behavior when an unspecified opcode is encountered." illegal instruction exceptions are specified as mcause=2

19:36 <sorear> all opcodes and bit patterns which are not specified are illegal

19:38 <ZirconiumX> I don't think it's too burdensome to say "only execute legal instructions"

19:38 <ZirconiumX> e.g. SERV requires this in the pursuit of absolute minimalism

19:38 <whitequark> if you don't define every opcode, it's kind of implied that you can only ever use the defined ones as a software developer, no?

19:39 <sorear> yes, but if you're going to do that it makes more sense to rip out the entire CSR system like picorv32 did

19:39 <Vinalon> yeah, I can probably just add a default 'with m.Case()' to the end of the decoder to trigger a trap.

19:40 <Vinalon> I would remove all of the CSRs, but the tests use 'minstret' to figure out if the program is still running and I want to add configurable interrupts eventually, like when a neopixel peripheral finishes sending its colors

19:44 <sorear> you don't need 31 bits of mcause.ecode, it's a WLRL field so only valid values need to be representable

19:47 <Vinalon> oh, that's a good point, thanks. I guess I could get away with just the first few bits.

19:50 <sorear> re. default, the annoying part is that this applies to everything, not just the 7-bit opcode, so slli can trigger an exception in some cases that addi can't because slli has must-be-zero bits

19:50 <sorear> etc

19:51 <Vinalon> ah - yeah, I guess it'll never strictly comply with the specification...but I'm happy if it works with GCC using the '-mabi=rv32i' flag.

19:52 <sorear> for a microcontroller it doesn't really matter but once you get into OSes with multiple privilege levels "undocumented instructions" become rather problematic

19:54 lkcl__ has quit [Ping timeout: 265 seconds]

19:54 <Vinalon> oh, that's good to know. But on the bright side, multiple privilege levels probably wouldn't fit easily in the target chip's 5000 logic cells :P

19:55 <Vinalon> anyways, it was really nice of y'all to take a look and offer advice, and it definitely helped my learning.

20:05 ____2 has quit [Quit: Nettalk6 - www.ntalk.de]

20:23 chipmuenk has quit [Quit: chipmuenk]

20:49 pinknok has quit [Ping timeout: 272 seconds]

21:45 Asu has quit [Quit: Konversation terminated!]

22:44 <whitequark> ZirconiumX: so, how do i actually simulate your example?

22:45 <ZirconiumX> Since I don't have modelsim installed (for hopefully obvious reasons) I'm not actually sure

22:45 <whitequark> hrm, okay

22:46 <whitequark> i'm going to do this some other day then, sorry

22:46 <ZirconiumX> Sure

22:55 lkcl has joined #nmigen