ChanServ changed the topic of #nmigen to: nMigen hardware description language · code at https://github.com/nmigen · logs at https://freenode.irclog.whitequark.org/nmigen
proteus-guy has quit [Remote host closed the connection]
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
futarisIRCcloud has joined #nmigen
Degi has quit [Ping timeout: 258 seconds]
Degi has joined #nmigen
____ has joined #nmigen
rohitksingh has joined #nmigen
futarisIRCcloud has quit [Quit: Connection closed for inactivity]
proteus-guy has joined #nmigen
proteus-dude has joined #nmigen
futarisIRCcloud has joined #nmigen
jeanthom has joined #nmigen
chipmuenk has joined #nmigen
<Sarayan> I'm looking for a recommendation on how to do something in hardware/on a fpga
<Sarayan> but I'm not sure if anybody is awake :-)
<sorear> o/
<Sarayan> yay
<Sarayan> so, it's about sprite drawing
<Sarayan> I want to draw a line of a zoomed sprite into a line buffer (double-buffered)
<Sarayan> the line buffer has one entry (24 bits) per horizontal pixel
<Sarayan> the hardware passes over all the active sprites and draw the line of the sprite in the buffer
<zignig> Sarayan: unce unce unce unce.
<Sarayan> an interesting characteristic is that the peak drawing speed is 8 pixels per pixel clock
<Sarayan> and the rom read capability is also of 8 pixels as a time
<zignig> Sarayan: nice work, sauce ?
<Sarayan> so, I have the position of the horizontal center of the sprite, and the 10-bit 4.6 zoom factor
<Sarayan> the zoom factor adds to the position in the original image, e.g. 1 is zoom by 64x, 0x3ff is divide by almost 16
<sorear> 8 pixels = 24 bytes?
<sorear> are the sprites in the same format as the line buffer?
<Sarayan> sorear: yeah, it's 24 bytes, the sprites pixels are a little smaller, there is per-sprite information added that generates the extra bits. Not an issue
<sorear> what does "peak drawing speed" mean? how much already exists and how much needs to exist?
<sorear> this sounds like emulating an old console gpu?
<Sarayan> I mean that the hardware I'm trying to kinda-imitate can draw 3072 pixels in a 384 pixel clock line
<Sarayan> konami arcade sprite engine, gx generation
<Sarayan> and 4096 in a 512 clocks line
<Sarayan> So I guess I have to make the linebuffer 24 bytes-wide and do pretty much write coalescing
<sorear> is that 3072 pre-scaled or post-scaled pixels
<Sarayan> post-scaled
<sorear> is scaling always up or down, and is it nearest-neighbor or more complicated
<Sarayan> in fact it's peak, so being slower when reducing wouldn;t be an issue
<Sarayan> nearest-neighbor
<Sarayan> scaling is 10 bits 4.6, gos from 64x to /16
<Sarayan> rom read access is 8 aligned pixels per clock
<Sarayan> I'm not sure at all how to efficiently go from the unscaled pixels to the scaled ones, seems awfully muxy
<sorear> I'm wondering if it would make more sense to render multiple sprites simultaneously at 1 pixel/sprite/clock
<sorear> are the sprites always 8 px wide
<Sarayan> nope, 8 to 128
<Sarayan> sorry, 16 to 128
<Sarayan> linebuffer is 512 pixels wide fwiw
<Sarayan> I suspect pipelining will be needed to meet decent timings too
<sorear> wondering if the original hardware ran at a significant multiple of the pixel clock
<Sarayan> No, it didn't
<sorear> you could have a decoupled pipeline that reads up to 8 pixels per clock from the sprite memory, feeds it into a scaler-black-box, then reads 8-pixel scaled "fragments" from the scaler's output FIFO and writes them to the line buffer at 8 pixels per clock
<sorear> the scaler-black-box doesn't need to access any memory, so it could be replicated as needed
<Sarayan> yeah, I guess something like that is needed
<Sarayan> you just don't want to halve your draw rate just because your sprite horizontal position is not on a multiple of 8
<sorear> if your memory is banked by pixel%8 you can do misaligned reads at full speed
<sorear> although that does need a barrel shifter
<sorear> the output position may be more of a problem
<sorear> wait, you're talking about screen-space position, not sprite-space
<sorear> is the 3072 cycles a "best case" or "guaranteed" number
<Sarayan> probably best case
<Sarayan> Each OBJ has 10(hex) attributes, you can define max 256 OBJs. The OBJ is line buffer system max 4096 dot (dot clock
<Sarayan> 8MHz) or 3084 dot (dot clock 6 or 12 MHz). If the OBJs exceeds this limit, the last OBJ to be written will disappear
<Sarayan> (translated by a japanese native speaker, please escuse the slightly broken english)
<Sarayan> dot clock 8MHz has 512 pixels/line, 6MHz has 384, 12MHz has 384*2 but I suspect it just means the hardware can only do rendering at 6MHz and not 12
<Sarayan> only the line read can reach the 12MHz
<Sarayan> ok, the renderer does have a clock that's two times the dot clock
<Sarayan> (forget the 12MHz case, it's not used anyway)
<sorear> anyway I'm thinking that the scaler produces on its output fifo 8-aligned blocks of 8 pixels (transparent)
<sorear> when scaling up you need to look at pairs of adjacent input blocks, that's a 16x8 crossbar
<sorear> scaling down (at full speed allowed by the fifos) would be an 8x16 or 16x8 crossbar mapping incoming blocks into a work register that's periodically discharged
<sorear> well I guess you don't need a work register, you could send a bunch of partial fragments down the pipe since that side isn't limiting
<Sarayan> maybe it's two writes per pixel clock, and then you don't need coalescing
* zignig is glad people are noodling with nmigen.
<Sarayan> I love nmigen
<Sarayan> only sane hdl I've ever seen
<zignig> yep same here , there has been other python trys , but this is cogent for me.
<sorear> "two writes" raises a lot of cans of worms re. conflicting writes, you can do it with banking and multiport and both are terrible
<sorear> if you can do a single wide write port the system will be much easier to reason about
<Sarayan> sorear: I just mean using the clock that's 2x the pixel clock
<sorear> ah
<sorear> well if you can run the whole thing at 2x pixel clock, you can shrink the entire datapath by 2x, which cuts your crossbars by 4x
<Sarayan> yeah, I can run the whole thing at 2x, but I don't see how it reduces the crossbars
<Sarayan> so, let's see
<Sarayan> you get the 8 pixels in, which go into a scaler
<Sarayan> the scaler outputs (up to) 8 scaled pixels, and shifts out as much as needed
<Sarayan> behind the scale, another mux splits the up to 8 pixels into two writes depending on the screen alignment
<sorear> 4 pixels in
<sorear> since you doubled the clock, the number of pixels you need to handle at once is halved
<Sarayan> the input datapath is 64 bits wide, for 8x(up to)8bpp pixels
<Sarayan> the rom reacts at pixel clock speed
<Sarayan> don't want to have to accelerate it if I don't have to
<sorear> in that case you can put a gearbox before the scaler to break 8-pixel input fragments into 4-pixel fragments at twice the clock
<Sarayan> because if it ever goes on a real fpga that's going to route to an external sdram interface shared with other "roms"
<sorear> my intent was that the scaler would only output already-aligned blocks
<Sarayan> screen-aligned you mean?
<sorear> yes
<Sarayan> that's hard
<Sarayan> because the zoom in is sprite-space, not screen-space
<Sarayan> well, specifically it's "advance by this distance in the source for each pixel on the screen"
<sorear> ok, I got that backward, I don't think it changes much but I need to update my model
<Sarayan> and the zero is in sprite-space
<Sarayan> technically, the information I have is "position 8/16/32/64 in the 16/32/64/128-wide sprite is at exact position x on the screen"
<sorear> so when scaling up, the scaler's state variables are (a) current (leftmost) screen position (b) corresponding sprite position (fraction mod 4) (c) scale factor (less than 1 per assumption) (d,e) the sprite pixel blocks corresponding to the current and next sprite block (8 pixels total)
<sorear> on each clock (a) compute sprite positions (relative to the sliding base) for the 4 screen pixels and the 5th next position (b) 8->4 crossbar mux to generate the screen pixel values (c) if next > 4, shift the current and next sprite-blocks and read a new next block from the input FIFO (d) update the screen and sprite position
<sorear> when scaling _down_ you have a different set of problems because the screen position isn't always %4 but similar logic applies
<sorear> at least half the work will be handling left/right boundary cases :x
<Sarayan> yeah
<Sarayan> I don't exactly see how to make the scaler spout screen-aligned blocks though
<sorear> it's like a funnel shift, but scaling
<Sarayan> I feel like the initial shift may be complicated :-)
Asu has joined #nmigen
proteus-dude has quit [Ping timeout: 256 seconds]
proteus-dude has joined #nmigen
proteus-guy has quit [Read error: Connection reset by peer]
proteus-dude has quit [Remote host closed the connection]
chipmuenk1 has joined #nmigen
chipmuenk has quit [Ping timeout: 240 seconds]
chipmuenk has joined #nmigen
chipmuenk1 has quit [Ping timeout: 260 seconds]
proteus-guy has joined #nmigen
proteus-dude has joined #nmigen
chipmuenk has quit [Quit: chipmuenk]
proteus-dude has quit [Client Quit]
chipmuenk has joined #nmigen
chipmuenk has quit [Client Quit]
chipmuenk has joined #nmigen
Asuu has joined #nmigen
Asu has quit [Ping timeout: 240 seconds]
____ has quit [Quit: Nettalk6 - www.ntalk.de]
Asuu has quit [Read error: Connection reset by peer]
Asu has joined #nmigen
Asu has quit [Read error: Connection reset by peer]
Asu has joined #nmigen
chipmuenk has quit [Quit: chipmuenk]
jeanthom has quit [Ping timeout: 240 seconds]
peeps[zen] has quit [Read error: Connection reset by peer]
peeps has joined #nmigen
lkcl_ has joined #nmigen
lkcl has quit [Ping timeout: 246 seconds]
Asu has quit [Quit: Konversation terminated!]