#nmigen on 2020-05-13 — irc logs at freenode.irclog.whitequark.org

2020-01-27 18:31 ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen

02:17 proteus-guy has quit [Remote host closed the connection]

02:18 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

03:03 futarisIRCcloud has joined #nmigen

03:29 Degi has quit [Ping timeout: 258 seconds]

03:30 Degi has joined #nmigen

05:15 ____ has joined #nmigen

05:34 rohitksingh has joined #nmigen

05:43 futarisIRCcloud has quit [Quit: Connection closed for inactivity]

06:09 proteus-guy has joined #nmigen

06:10 proteus-dude has joined #nmigen

06:24 futarisIRCcloud has joined #nmigen

06:43 jeanthom has joined #nmigen

06:52 chipmuenk has joined #nmigen

07:29 <Sarayan> I'm looking for a recommendation on how to do something in hardware/on a fpga

07:30 <Sarayan> but I'm not sure if anybody is awake :-)

07:31 <sorear> o/

07:31 <Sarayan> yay

07:31 <Sarayan> so, it's about sprite drawing

07:32 <Sarayan> I want to draw a line of a zoomed sprite into a line buffer (double-buffered)

07:32 <Sarayan> the line buffer has one entry (24 bits) per horizontal pixel

07:32 <Sarayan> the hardware passes over all the active sprites and draw the line of the sprite in the buffer

07:33 <zignig> Sarayan: unce unce unce unce.

07:33 <Sarayan> an interesting characteristic is that the peak drawing speed is 8 pixels per pixel clock

07:33 <Sarayan> and the rom read capability is also of 8 pixels as a time

07:34 <zignig> Sarayan: nice work, sauce ?

07:34 <Sarayan> so, I have the position of the horizontal center of the sprite, and the 10-bit 4.6 zoom factor

07:35 <Sarayan> the zoom factor adds to the position in the original image, e.g. 1 is zoom by 64x, 0x3ff is divide by almost 16

07:35 <sorear> 8 pixels = 24 bytes?

07:36 <sorear> are the sprites in the same format as the line buffer?

07:36 <Sarayan> sorear: yeah, it's 24 bytes, the sprites pixels are a little smaller, there is per-sprite information added that generates the extra bits. Not an issue

07:36 <sorear> what does "peak drawing speed" mean? how much already exists and how much needs to exist?

07:37 <sorear> this sounds like emulating an old console gpu?

07:37 <Sarayan> I mean that the hardware I'm trying to kinda-imitate can draw 3072 pixels in a 384 pixel clock line

07:37 <Sarayan> konami arcade sprite engine, gx generation

07:37 <Sarayan> and 4096 in a 512 clocks line

07:38 <Sarayan> So I guess I have to make the linebuffer 24 bytes-wide and do pretty much write coalescing

07:38 <sorear> is that 3072 pre-scaled or post-scaled pixels

07:38 <Sarayan> post-scaled

07:39 <sorear> is scaling always up or down, and is it nearest-neighbor or more complicated

07:39 <Sarayan> in fact it's peak, so being slower when reducing wouldn;t be an issue

07:39 <Sarayan> nearest-neighbor

07:39 <Sarayan> scaling is 10 bits 4.6, gos from 64x to /16

07:40 <Sarayan> rom read access is 8 aligned pixels per clock

07:40 <Sarayan> I'm not sure at all how to efficiently go from the unscaled pixels to the scaled ones, seems awfully muxy

07:41 <sorear> I'm wondering if it would make more sense to render multiple sprites simultaneously at 1 pixel/sprite/clock

07:42 <sorear> are the sprites always 8 px wide

07:43 <Sarayan> nope, 8 to 128

07:43 <Sarayan> sorry, 16 to 128

07:45 <Sarayan> linebuffer is 512 pixels wide fwiw

07:46 <Sarayan> I suspect pipelining will be needed to meet decent timings too

07:46 <sorear> wondering if the original hardware ran at a significant multiple of the pixel clock

07:49 <Sarayan> No, it didn't

07:49 <sorear> you could have a decoupled pipeline that reads up to 8 pixels per clock from the sprite memory, feeds it into a scaler-black-box, then reads 8-pixel scaled "fragments" from the scaler's output FIFO and writes them to the line buffer at 8 pixels per clock

07:49 <sorear> the scaler-black-box doesn't need to access any memory, so it could be replicated as needed

07:50 <Sarayan> yeah, I guess something like that is needed

07:51 <Sarayan> you just don't want to halve your draw rate just because your sprite horizontal position is not on a multiple of 8

07:52 <sorear> if your memory is banked by pixel%8 you can do misaligned reads at full speed

07:52 <sorear> although that does need a barrel shifter

07:53 <sorear> the output position may be more of a problem

07:53 <sorear> wait, you're talking about screen-space position, not sprite-space

07:54 <sorear> is the 3072 cycles a "best case" or "guaranteed" number

07:54 <Sarayan> probably best case

07:56 <Sarayan> Each OBJ has 10(hex) attributes, you can define max 256 OBJs. The OBJ is line buffer system max 4096 dot (dot clock

07:56 <Sarayan> 8MHz) or 3084 dot (dot clock 6 or 12 MHz). If the OBJs exceeds this limit, the last OBJ to be written will disappear

07:56 <Sarayan> (translated by a japanese native speaker, please escuse the slightly broken english)

07:58 <Sarayan> dot clock 8MHz has 512 pixels/line, 6MHz has 384, 12MHz has 384*2 but I suspect it just means the hardware can only do rendering at 6MHz and not 12

07:58 <Sarayan> only the line read can reach the 12MHz

07:59 <Sarayan> ok, the renderer does have a clock that's two times the dot clock

07:59 <Sarayan> (forget the 12MHz case, it's not used anyway)

08:05 <sorear> anyway I'm thinking that the scaler produces on its output fifo 8-aligned blocks of 8 pixels (transparent)

08:06 <sorear> when scaling up you need to look at pairs of adjacent input blocks, that's a 16x8 crossbar

08:07 <sorear> scaling down (at full speed allowed by the fifos) would be an 8x16 or 16x8 crossbar mapping incoming blocks into a work register that's periodically discharged

08:07 <sorear> well I guess you don't need a work register, you could send a bunch of partial fragments down the pipe since that side isn't limiting

08:12 <Sarayan> maybe it's two writes per pixel clock, and then you don't need coalescing

08:13 * zignig is glad people are noodling with nmigen.

08:13 <Sarayan> I love nmigen

08:13 <Sarayan> only sane hdl I've ever seen

08:14 <zignig> yep same here , there has been other python trys , but this is cogent for me.

08:14 <sorear> "two writes" raises a lot of cans of worms re. conflicting writes, you can do it with banking and multiport and both are terrible

08:14 <sorear> if you can do a single wide write port the system will be much easier to reason about

08:15 <Sarayan> sorear: I just mean using the clock that's 2x the pixel clock

08:15 <sorear> ah

08:15 <sorear> well if you can run the whole thing at 2x pixel clock, you can shrink the entire datapath by 2x, which cuts your crossbars by 4x

08:16 <Sarayan> yeah, I can run the whole thing at 2x, but I don't see how it reduces the crossbars

08:19 <Sarayan> so, let's see

08:19 <Sarayan> you get the 8 pixels in, which go into a scaler

08:20 <Sarayan> the scaler outputs (up to) 8 scaled pixels, and shifts out as much as needed

08:20 <Sarayan> behind the scale, another mux splits the up to 8 pixels into two writes depending on the screen alignment

08:20 <sorear> 4 pixels in

08:21 <sorear> since you doubled the clock, the number of pixels you need to handle at once is halved

08:21 <Sarayan> the input datapath is 64 bits wide, for 8x(up to)8bpp pixels

08:21 <Sarayan> the rom reacts at pixel clock speed

08:21 <Sarayan> don't want to have to accelerate it if I don't have to

08:22 <sorear> in that case you can put a gearbox before the scaler to break 8-pixel input fragments into 4-pixel fragments at twice the clock

08:22 <Sarayan> because if it ever goes on a real fpga that's going to route to an external sdram interface shared with other "roms"

08:22 <sorear> my intent was that the scaler would only output already-aligned blocks

08:23 <Sarayan> screen-aligned you mean?

08:23 <sorear> yes

08:23 <Sarayan> that's hard

08:23 <Sarayan> because the zoom in is sprite-space, not screen-space

08:24 <Sarayan> well, specifically it's "advance by this distance in the source for each pixel on the screen"

08:24 <sorear> ok, I got that backward, I don't think it changes much but I need to update my model

08:25 <Sarayan> and the zero is in sprite-space

08:25 <Sarayan> technically, the information I have is "position 8/16/32/64 in the 16/32/64/128-wide sprite is at exact position x on the screen"

08:28 <sorear> so when scaling up, the scaler's state variables are (a) current (leftmost) screen position (b) corresponding sprite position (fraction mod 4) (c) scale factor (less than 1 per assumption) (d,e) the sprite pixel blocks corresponding to the current and next sprite block (8 pixels total)

08:30 <sorear> on each clock (a) compute sprite positions (relative to the sliding base) for the 4 screen pixels and the 5th next position (b) 8->4 crossbar mux to generate the screen pixel values (c) if next > 4, shift the current and next sprite-blocks and read a new next block from the input FIFO (d) update the screen and sprite position

08:32 <sorear> when scaling _down_ you have a different set of problems because the screen position isn't always %4 but similar logic applies

08:32 <sorear> at least half the work will be handling left/right boundary cases :x

08:34 <Sarayan> yeah

08:35 <Sarayan> I don't exactly see how to make the scaler spout screen-aligned blocks though

08:37 <sorear> it's like a funnel shift, but scaling

08:41 <Sarayan> I feel like the initial shift may be complicated :-)

09:05 Asu has joined #nmigen

10:01 proteus-dude has quit [Ping timeout: 256 seconds]

10:13 proteus-dude has joined #nmigen

10:25 proteus-guy has quit [Read error: Connection reset by peer]

10:25 proteus-dude has quit [Remote host closed the connection]

11:38 chipmuenk1 has joined #nmigen

11:40 chipmuenk has quit [Ping timeout: 240 seconds]

11:42 chipmuenk has joined #nmigen

11:42 chipmuenk1 has quit [Ping timeout: 260 seconds]

12:13 proteus-guy has joined #nmigen

12:13 proteus-dude has joined #nmigen

12:14 chipmuenk has quit [Quit: chipmuenk]

12:15 proteus-dude has quit [Client Quit]

14:00 chipmuenk has joined #nmigen

14:02 chipmuenk has quit [Client Quit]

15:12 chipmuenk has joined #nmigen

15:46 Asuu has joined #nmigen

15:46 Asu has quit [Ping timeout: 240 seconds]

19:39 ____ has quit [Quit: Nettalk6 - www.ntalk.de]

19:50 Asuu has quit [Read error: Connection reset by peer]

19:51 Asu has joined #nmigen

20:48 Asu has quit [Read error: Connection reset by peer]

20:53 Asu has joined #nmigen

21:08 chipmuenk has quit [Quit: chipmuenk]

21:09 jeanthom has quit [Ping timeout: 240 seconds]

22:03 peeps[zen] has quit [Read error: Connection reset by peer]

22:04 peeps has joined #nmigen

22:12 lkcl_ has joined #nmigen

22:16 lkcl has quit [Ping timeout: 246 seconds]

22:49 Asu has quit [Quit: Konversation terminated!]