##openfpga on 2019-09-11 — irc logs at freenode.irclog.whitequark.org

00:13 emeb_mac has joined ##openfpga

00:29 _whitelogger has joined ##openfpga

00:29 Shiz has joined ##openfpga

01:14 emeb_mac has quit [Ping timeout: 240 seconds]

01:17 emeb_mac has joined ##openfpga

02:00 rohitksingh has joined ##openfpga

02:23 zng has quit [Quit: ZNC 1.7.2 - https://znc.in]

02:24 zng has joined ##openfpga

03:15 mumptai_ has joined ##openfpga

03:18 mumptai has quit [Ping timeout: 240 seconds]

03:40 edmoore has quit [Ping timeout: 250 seconds]

03:40 esden has quit [Read error: Connection reset by peer]

03:42 eddyb has quit [Ping timeout: 264 seconds]

03:44 esden has joined ##openfpga

03:44 elms has quit [Ping timeout: 252 seconds]

03:44 swetland has quit [Ping timeout: 252 seconds]

03:44 eddyb has joined ##openfpga

03:45 edmoore has joined ##openfpga

03:46 elms has joined ##openfpga

03:46 Bob_Dole has quit [Ping timeout: 276 seconds]

03:47 swetland has joined ##openfpga

04:12 cr1901_modern has quit [Read error: Connection reset by peer]

04:28 egg has quit [Disconnected by services]

04:28 oeuf has joined ##openfpga

04:34 Bike has quit [Quit: Lost terminal]

05:41 ZipCPU has quit [Ping timeout: 276 seconds]

05:48 ZipCPU has joined ##openfpga

06:21 emeb_mac has quit [Ping timeout: 246 seconds]

06:40 rohitksingh has quit [Ping timeout: 264 seconds]

07:14 eddyb is now known as eddyb[legacy]

07:25 Asu has joined ##openfpga

09:23 azonenberg_work has quit [Ping timeout: 258 seconds]

09:35 cr1901_modern has joined ##openfpga

09:53 freeemint has joined ##openfpga

09:53 OmniMancer has joined ##openfpga

09:58 mumptai_ has quit [Quit: Verlassend]

10:35 Mimoja has quit [Ping timeout: 258 seconds]

10:50 Mimoja has joined ##openfpga

11:00 Bob_Dole has joined ##openfpga

12:16 pie_ has quit [Ping timeout: 245 seconds]

12:20 pie_ has joined ##openfpga

12:47 <pepijndevos> ZirconiumX, what is your line algorithm thing?

12:48 <ZirconiumX> pepijndevos: it's a hardware implementation of Bresenham's line algorithm as a finite state machine

12:48 <pepijndevos> ahhh ok

12:48 <pepijndevos> ... why?

12:48 <ZirconiumX> By itself kind of useless, but a vital building block of a GPU

12:48 <whitequark> modern GPUs don't actually draw lines

12:48 <whitequark> it's kind of a pain

12:49 <pepijndevos> A *proper* GPU, if we would have stayed with nice vector displays.

12:49 <ZirconiumX> Sure, but I don't plan to do stuff the modern way at the moment

12:50 <pepijndevos> Although... if you had a vector display, you would not need to rasterize lines anyway

12:50 <pepijndevos> Ahhh, I had some fun drawing linens once. Trying to make Adafruit's GFX library remotely usable.

12:51 <pepijndevos> https://www.youtube.com/watch?v=QvVO6pY9WNY

12:52 <pepijndevos> (drawing openstreetmap data form an SD card with an Arduino on an SPI display)

12:53 <pepijndevos> https://github.com/adafruit/Adafruit-GFX-Library/pull/36

12:55 <ZirconiumX> pepijndevos: but the dumb state machine can hit ~170 MHz on a Cyclone V, which I'm proud of

12:55 <pepijndevos> Nice nice

12:56 <ZirconiumX> Even if the Quartus Timing Analyser tried its best to make things worse :P

13:02 <pepijndevos> huh?

13:10 OmniMancer has quit [Quit: Leaving.]

13:15 freeemint has quit [Ping timeout: 250 seconds]

13:28 carl0s has joined ##openfpga

13:48 genii has joined ##openfpga

14:05 pie__ has joined ##openfpga

14:05 pie_ has quit [Ping timeout: 276 seconds]

14:14 emeb has joined ##openfpga

14:18 pie__ has quit [Ping timeout: 246 seconds]

14:18 pie_ has joined ##openfpga

14:25 OmniMancer has joined ##openfpga

15:10 OmniMancer has quit [Read error: Connection reset by peer]

16:33 Asu` has joined ##openfpga

16:33 Asu has quit [Ping timeout: 276 seconds]

16:44 dh73 has joined ##openfpga

16:49 dh73 has quit [Remote host closed the connection]

16:51 dh73 has joined ##openfpga

16:57 <ZirconiumX> pepijndevos: since I apparently missed that, the Timing Analyser tries to helpfully suggest improvements to your source, such as disabling an optimisation on a critical path to make the critical path latency worse

16:57 <pepijndevos> Nice

16:58 <whitequark> w-why

16:59 <mwk> quality toolchain

16:59 <ZirconiumX> I have no idea, but when I added the third setup stage to make the combinational logic shorter I re-enabled that optimisation for an extra 15MHz Fmax

17:04 <ZirconiumX> I was wondering how the GS worked and got its performance from

17:04 <ZirconiumX> So I looked in the data sheet

17:04 <ZirconiumX> I'm still clueless, but now my eyes are bleeding

17:04 <mwk> GS?

17:05 <ZirconiumX> Graphics Synthesizer

17:05 <ZirconiumX> PS2 GPU

17:05 <mwk> oh

17:06 <mwk> you've got... data sheets... for a GPU

17:06 <mwk> quite a cognitive dissonance for me :(

17:06 <ZirconiumX> https://twitter.com/ZirconiumX/status/1171804455026733060

17:06 <mwk> yeah, just saw that

17:07 <ZirconiumX> I've already subjected emily to it

17:08 <mwk> it's not looking good

17:08 <ZirconiumX> mwk: since it was for a console where people had to control the GPU directly rather than via an API, it's *kind of* well documented

17:09 <ZirconiumX> Even if the console has like three A4 pages of bugs that it tells you to work around

17:10 <ZirconiumX> Such as short loops in the CPU *not* looping, the "disable interrupt" instruction sometimes not disabling interrupts when you get an interrupt after said "disable interrupt" instruction, etc

17:10 <mwk> ... what do you mean, loops not looping

17:11 <whitequark> did they fuck up the pipeline

17:12 <ZirconiumX> If you have a loop that's less than 6 instructions long some of the branch instructions don't calculate the condition properly and return false

17:12 <ZirconiumX> So they don't loop

17:12 <mwk> but but how

17:12 <whitequark> forgot an interlock or a forward?

17:12 <ZirconiumX> They don't say, sadly, only that the work around is to pad loops to six instructions or more

17:13 <mwk> it's the distance from loop beginning to end that counts?

17:13 <ZirconiumX> Yep

17:13 <mwk> huh

17:13 <ZirconiumX> It's a 7-stage pipeline

17:13 <ZirconiumX> They mention that if there's a cache stall inside the loop it's okay

17:14 <ZirconiumX> But if you're cache stalling on every single loop iteration, what are you doing

17:14 <sorear> this is on the mips core??

17:14 <ZirconiumX> Oh yeah

17:14 <ZirconiumX> The GPU has bugs too but they're less interesting to talk about

17:15 <ZirconiumX> Things like truncating your textures if you write them to misaligned addresses

17:16 <ZirconiumX> And the Z test disable bit not working

17:17 <ZirconiumX> Amusingly though the PS2 officially has no "bugs" just "inconveniences"

17:17 <mwk> bwahahah

17:17 <mwk> I have to remember that one

17:20 <ZirconiumX> "Return from the interrupt handler after executing the EI instruction. If this restriction is not followed, an inconvenience may happen when an interrupt occurs immediately after executing the DI instruction."

17:21 <ZirconiumX> "Instructions that operate the cache or TLB must be directly preceded by and followed by a SYNC instruction. Detailed information is given to the respective instructions. If this restriction is not followed, an inconvenience may be caused when a COP0 Unusable exception occurs."

17:22 <ZirconiumX> "The TLBR instruction must not be immediately followed by a jump/branch instruction. Four instructions or more are required between them, excluding the SYNC.P instruction next to the TLBR instruction. Also, the TLBR instruction must not be placed at the end of a page. Six instructions or more from the end of the page are required for the TLBR instruction. If this restriction is not followed, an inconvenience may be caused when an ITLB

17:22 <ZirconiumX> miss occurs immediately after the TLBR instruction cancellation, due to the occurrence of an exception, etc."

17:23 <ZirconiumX> It's pretty funny

17:33 azonenberg_work has joined ##openfpga

17:33 pie_ has quit [Ping timeout: 245 seconds]

17:34 <Finde> you have to wonder how much extra performance you could get from correct codegen if it weren't so buggy

17:36 <emily> lol @ "an inconvenience may be caused"

17:41 <mwk> The Enrichment Center apologizes for the inconvenience and wishes you the best of luck.

17:41 <whitequark> lmao

17:48 <pepijndevos> ZirconiumX, are you making a PS2 GPU now?

17:50 <ZirconiumX> pepijndevos: Truthfully I have no idea :P

17:50 <ZirconiumX> It just started because I was curious how the PS2 GPU worked

17:50 <pepijndevos> That's how most things start, tbqh

17:52 <ZirconiumX> I'm still a little confused as to how they got the setup times so short

17:52 <ZirconiumX> They mention using DDA, and yet that requires you to calculate dy/dx

17:52 <ZirconiumX> Which is Not Cheap

17:54 <ZirconiumX> Yet they can do Gouraud shading with 4 cycles of setup time

17:57 <ZirconiumX> My hunch is they're doing Bresenham instead

17:58 <ZirconiumX> Because DDA would require a lot of division

18:00 pie_ has joined ##openfpga

18:03 <ZirconiumX> Though lerp also requires a lot of division, so

18:04 <ZirconiumX> It must be one hell of an FSM though.

18:12 * sorear wonder if ZirconiumX has seen https://01.org/linuxgraphics/hardware-specification-prms/2016-intelr-processors-based-kaby-lake-platform or another version

18:12 <ZirconiumX> I have not

18:15 <ZirconiumX> 8-bit division in 4 cycles implies, what, radix-4 SRT?

18:16 <ZirconiumX> Actually it's going to be bigger than that, hmm.

18:18 <sorear> 8-bit division?

18:18 <sorear> is that a gpu thing

18:20 <ZirconiumX> sorear: (most) GPUs work on 8 bit per channel colour

18:20 <ZirconiumX> To do Gouraud shading, you need to linearly interpolate between colours

18:21 <ZirconiumX> For that you need the slope of the line between them, which is where the 8-bit division comes into play

18:23 azonenberg_work has quit [Ping timeout: 250 seconds]

18:24 azonenberg_work has joined ##openfpga

18:27 <kernlbob_> Could be a big lookup table.

18:29 <ZirconiumX> I initially thought it might be a flavour of SRT division if it's 8-bit, but if it's 16-bit you couldn't do SRT in 4 cycles

18:57 <kernlbob_> Is SRT anything like Goldschmidt division?

18:58 <kernlbob_> Never mind -- I too can read Wikipedia. (-:

19:05 <GenTooMan> Well this (https://pastebin.com/cdqjWL6s) isn't working correctly in nmigen (not a surprise) the simulation gets stuck in an infinite loop (likely adding to vcount) with the "m.d.comb += self.vcount.eq(vcount + 1)" line suggestions and thoughts welcome I'm out of thoughts.

19:05 <ZirconiumX> GenTooMan: you have a combinational loop

19:06 <ZirconiumX> self.vcount.eq(self.vcount + 1) means "whenever vcount is updated, set vcount to vcount + 1"

19:06 <kernlbob_> You can't run a counter in combinatorial logic. You probably want `m.d.sync += self.vcount.eq...`.

19:06 <ZirconiumX> And setting vcount to vcount + 1 is an update of vcount

19:07 <ZirconiumX> Thus the logic is infinite

19:07 <ZirconiumX> You can do this synchronously as kernlbob_ suggests, but I don't know your goal here

19:15 <GenTooMan> Hmm yes I am semi aware of that. What I desire is vcount to increment each time hcount rolls over (only). I thought I could create a second clock and toggle that clock each time hcount rolled over but creating another clock didn't seem that simple.

19:16 <tnt> don't create clocks ...

19:16 <tnt> really, if you don't know why you should avoid creating clocks, you really shouldn't be doing it. Stick to single clock domain logic for now.

19:17 <tnt> Use enables.

19:24 <GenTooMan> time too dig through enables, and yes I still have a headache from trying to create a second clock domain.

19:26 <ZirconiumX> This brings up a question. Say you've got a 300 MHz main clock, but parts of the chip only need to run at half that. Do those bits have to meet 300 MHz timing still?

19:27 <tnt> ZirconiumX: technically no, if you can ensure you won't use the result anywhere else that doesn't have an enable.

19:28 <tnt> nextpnr however doesn't support multi-cycle clock constraints.

19:29 <ZirconiumX> Fair. I guess the alternative is to run the 150MHz bit at 300MHz but pipeline bits over to meet timing?

19:30 <tnt> yes, that's one option.

19:54 <davidc__> ZirconiumX: or create two clock domains and use FIFOS for the domain crossing if that makes sense in your design

20:00 <ZirconiumX> Some of the bits of the GS are FIFO-insulated, some aren't

20:01 rohitksingh has joined ##openfpga

20:39 emeb has quit [Ping timeout: 276 seconds]

20:42 emeb has joined ##openfpga

20:52 emeb_mac has joined ##openfpga

20:53 Asu` has quit [Ping timeout: 244 seconds]

20:56 s_frit has quit [Remote host closed the connection]

20:57 s_frit has joined ##openfpga

21:01 Asu` has joined ##openfpga

21:09 rohitksingh has quit [Ping timeout: 244 seconds]

22:05 freeemint has joined ##openfpga

22:12 rohitksingh has joined ##openfpga

22:27 Asu` has quit [Quit: Konversation terminated!]

22:41 danilonc has quit [Quit: WeeChat 1.9.1]

22:42 Bike has joined ##openfpga

23:15 rohitksingh has quit [Ping timeout: 245 seconds]

23:51 carl0s has quit [Remote host closed the connection]