#milkymist on 2011-10-20 — irc logs at freenode.irclog.whitequark.org

09:41 <lekernel> lekernel productions presents... an almost decent M1 unboxing video

09:41 <lekernel> http://milkymist.org/m1unbox.mp4

09:42 <lekernel> or http://www.youtube.com/watch?v=0k080nzA_z4

12:06 <lekernel> wpwrak, what do you think about a video input "color transformation matrix" controllable from the patch?

12:07 <lekernel> also, I used LED lighting... doesn't help

12:14 <wpwrak> yeah, video input modification could be very interesting. maybe even go one step further, and have a few FPUs that just work on that ? easy to parallelize, shouldn't need a lot of instructions/registers, and the existing FPU design is already perfect for SIMD

12:19 <lekernel> what kind of transforms would you like to be able to do?

12:19 <lekernel> or, more particularly, what data?

12:19 <lekernel> one pixel? squares of 2x2 pixels? more? overlapping/non-overlapping?

12:21 <wpwrak> i was thinking of just pixels. dunno if that's too technical a view, though

12:22 <wpwrak> if you have >1 pixel, you could of course to edge detection and such

12:22 <wpwrak> s/to/do/

12:22 <lekernel> do you have a color transform in mind that would look significantly better than with using a matrix?

12:23 <wpwrak> no. i'm just thinking of what could give us a maximum of flexibility with the available technology

12:23 <lekernel> also, the color transform could also generate an alpha channel

12:24 <lekernel> there's already alpha support down the pipeline, but it's only a global alpha

12:24 <lekernel> that could be nice to make parts of the included images transparent

12:25 <wpwrak> oh yes, that would be cool

12:26 <lekernel> we'll need to overhaul the PFPU register allocator a bit... 4x3 matrices will eat registers like crazy

12:27 <wpwrak> 3x3 ? 27 registers if you have one for r, g, b each

12:27 <lekernel> mh?

12:28 <lekernel> the matrix does RGB->RGBA

12:28 <lekernel> it's 4x3, 12 coefficients

12:29 <wpwrak> aah, i was thinking of an PFU (array) that works on each pixel.

12:30 <wpwrak> e.g., with a 3x3 RGB matrix on input, one RGBA output

12:31 <lekernel> processing multiple pixels at once is more difficult... you can't just simply insert a stage in the TMU pipeline

12:32 <wpwrak> just in the video in path, before it hits anything else ?

12:34 <lekernel> the rendering system uses the TMU to blit the camera picture into the texture buffer

12:35 <lekernel> if we put this matrix thing in the TMU, we have more flexibility and we factor things (e.g. supersedes decay, makes it possible to apply the transform on still pictures)

12:36 <wpwrak> so you'd put it in the frame buffer feedback loop ?

12:36 <lekernel> all camera pictures go through the TMU atm

12:37 <lekernel> except for video-in preview in the GUI (and that's why it's slow)

12:40 <wpwrak> i haven't looked at that part in detail yet, but is the camera->TMU interface a pixel stream ? or something more complicated ?

12:40 <wpwrak> my FPU idea was to put an array of FPUs in the camera pixel stream, where they transform each pixel before it goes to the TMU for further processing

12:41 <lekernel> the video-in core receives the signal, does the YUV->RGB transform and just DMA's the result

12:41 <lekernel> you get raw interlaced framebuffers at the output

12:41 <wpwrak> eek

12:42 <wpwrak> interlaced is like the Terminator. damn hard to kill.

12:42 <lekernel> there are two framebuffers, one for each field. you can also tell the video-in core that you want only one field (and this it what it does atm)

12:42 <lekernel> well, we simply discard one field and scale it up vertically with the TMU :-)

12:43 <wpwrak> aah, hence the low resolution ! :)

12:45 <lekernel> yes, but deinterlacing without major losses of vertical resolution is a massive pain

12:45 <wpwrak> okay, so at least with the current half-frame approach, such a pixel processor array ought to work. for a NxN matrix input, you'd need a (N-1)/2 lines + (N-1)/2 pixels buffer before the pixel FPU array

12:45 <lekernel> difficult, easy to get wrong, and a memory bandwidth pig

12:45 <wpwrak> oh yes, deinterlacing sucks :)

12:45 <wpwrak> why a pig ?

12:46 <lekernel> because deinterlacing algos can make a lot of memory accesses for things like motion estimation

12:46 <wpwrak> aah, deinterlacing. okay. thought you were talking about the FPU array :)

12:49 <lekernel> if you implement this processor array in the TMU, you can't choose the pixel access order

12:49 <lekernel> you can implement it as a separate unit that operates in framebuffers, and then you are free to use any access order

12:50 <lekernel> but it increases memory bandwidth consumption

12:50 <lekernel> if you do it in the video-in core, you lose flexibility

12:50 <wpwrak> and if it's between camera YUV->RGB and TMU ? or maybe even replacing YUB-RGB ?

12:51 <lekernel> or, you can break away from the DMA paradigm and implement "pixel stream buses" :)

12:51 <wpwrak> that sounds nice :)

12:52 <lekernel> you have this pixel processing unit sitting somewhere on the chip, and you can route it between the video-in core and its DMA write unit

12:52 <lekernel> or between a DMA-read and a DMA-write unit

12:52 <lekernel> dynamically

12:52 <wpwrak> that sounds very nice, yes

12:53 <wpwrak> or maybe even have two of them. not sure how fat they'd get. the DMA-to-DMA one would have to be bigger than a camera-only unit, due to the larger amount of pixels.

12:54 <lekernel> well, there are a lot of details. here's one :)

12:54 <lekernel> and you'll need flow control on the buses anyway

12:57 <wpwrak> for the DMA-to-DMA path, yes, sounds reasonable

13:04 <lekernel> even at the video-in output on at the DMA-write input

13:04 <lekernel> you cannot predict the memory access latency

13:04 <lekernel> because of DRAM refreshes, switching DRAM rows, and bus sharing

13:04 <wpwrak> hmm, but you'd already have to have a mechanism of this sort in place

13:05 <wpwrak> the FPU array would just add a fixed delay to the camera pixel pipeline

13:06 <wpwrak> "fixed" as in either worst-case (i.e., size of program space) or as in always the same for the same program, but variable across programs

15:28 <zumbi> hi guys! do you have a pointer for the copyright license for milkymist hardware?

15:31 <lekernel> what hardware?

15:31 <lekernel> FPGA? PCB? box?

15:40 <zumbi> lekernel: uhm.. PCB

15:40 <lekernel> cc-by-sa

18:20 <lekernel_> http://slashdot.org/submission/1824054/open-source-cpus-coming-to-a-club-near-you

18:46 <kristianpaul> first time i read cpu and club in the same tittle :)

18:48 <lekernel_> yes, I'm trying to write noticeable headlines ;)

19:28 <lekernel_> ok it should be in soon :)

20:06 <lekernel_> there! http://hardware.slashdot.org/story/11/10/20/193245/open-source-cpus-coming-to-a-club-near-you

20:06 <lekernel_> is getting better at PR

20:09 <kristianpaul> place and route? :-)

20:10 <wpwrak> place the news and route the readers ;-)

20:33 <rigid> just saw the project. nice! (i love the transparent case :)

20:33 <rigid> what s/w are you using? is this somehow related to milkdrop?

20:33 <rigid> crawls the webpage

20:34 <rigid> ah.. yep

22:12 <lekernel_> hi rigid

22:13 <lekernel_> own software, and yes, lots of milkdrop inspirations

22:25 <rigid> lekernel_: looks very nice... and great its open

22:26 <rigid> i'm looking for open audiovisualization software to use them with LEDs

23:41 <juliusb> nice slashdot article

23:44 <juliusb> lekernel_: I like your presentation about OS FPGA toolchains

23:45 <juliusb> are you doing anything like that in the UK any time soon? I've moved to Cambridge, btw, and am planning on doing things with the OSHUG here - if you're not too far away you might like to come and get some of those guys interested