lekernel changed the topic of #m-labs to: Mixxeo, Migen, MiSoC & other M-Labs projects :: fka #milkymist :: Logs http://irclog.whitequark.org/m-labs
rjo_ is now known as rjo
mumptai has quit [Ping timeout: 240 seconds]
xiangfu has joined #m-labs
xiangfu has quit [Ping timeout: 265 seconds]
sh4rm4 has quit [Remote host closed the connection]
sh4rm4 has joined #m-labs
sh[4]rm4 has joined #m-labs
sh4rm4 has quit [Ping timeout: 252 seconds]
sh[4]rm4 is now known as sh4rm4
sb0 has quit [Quit: Leaving]
Alain has joined #m-labs
Alain has quit [Read error: Connection reset by peer]
Alain has joined #m-labs
kflux has joined #m-labs
mumptai has joined #m-labs
xiangfu has joined #m-labs
kiritan has joined #m-labs
kflux has quit [Ping timeout: 244 seconds]
xiangfu has quit [Ping timeout: 265 seconds]
Alain has quit [Remote host closed the connection]
Alain_ has joined #m-labs
sh4rm4 has quit [Ping timeout: 252 seconds]
Gurty has quit [Ping timeout: 240 seconds]
Gurty has joined #m-labs
xiangfu has joined #m-labs
kiritan has quit [Read error: Operation timed out]
sh4rm4 has joined #m-labs
xiangfu has quit [Quit: leaving]
Alain_ has quit [Remote host closed the connection]
Alain has joined #m-labs
awallin_ has joined #m-labs
<awallin_> hi all, was wondering if anyone has worked with the tdc-core? in particular it would be interesting to put it on a papipilio pro spartan 6 dev-board. I think I saw a github repo with something like that..
sb0 has joined #m-labs
<sb0> awallin_, which TDC core? SERDES or delay line based?
kflux has joined #m-labs
<awallin_> sb0: hi, I did not know there were two!? I just read about the one on ohwr.org
<awallin_> how do they differ in architecture and performance?
<sb0> that's the delay line based then
<sb0> delay line has ~25ps resolution but it'll be difficult to fit in lx9 (though possible if you multiply the clock and use a shorter delay line, which might also improve resolution, but will take work)
<sb0> serdes is easier to use and much simpler and smaller, but has a bit less than 1ns resolution
<sb0> rjo has a migen version of the serdes tdc
<awallin_> that is all python code which is converted to vhdl?
<sb0> the serdes tdc I wrote is vhdl
<sb0> rjo wrote a similar one with migen
<awallin_> hm, I am looking at http://www.xilinx.com/publications/prod_mktg/Spartan6_Product_Table.pdf it seems the SPEC has an LX45 chip with maybe 5x more resources than LX9? is that roughly correct?
<sb0> with the delay line tdc, the problem is the height of the device column where you'll have to fit the delay line
<sb0> I don't think it'll fit as-is in anything smaller than lx45
<sb0> not because the device is full, but because the carry chains are not long enough
<sb0> so you'd need to shorten it, and multiply the clock
<awallin_> ok. is there a description of the serdes approach somewhere? is it in principle possible to get a resolution similar to delay-line with serdes?
<sb0> delay line will always be more precise than serdes by orders of magnitude
<awallin_> what about the xilinx tools for large FPGAs? I heard some large devices require an expensive paid version of xilinx ISE?
<mumptai> only for ones bigger than LX75 for spartan6
<awallin_> ah, ok, so LX45 is still ok. I guess ohwr/SPEC users would not be happy otherwise :)
<awallin_> so maybe I need an LX45 dev-board then if I want to play with the delay-line tdc..
<sb0> serdes is just sampling the incoming signal at ~1GHz and detecting edges
<sb0> awallin_, what's your application by the way?
<sb0> the delay line might fit in lx9, if you multiply the clock
<sb0> and note that the slowtan6 PLLs have a lot of jitter (>100ps) so for high resolution you'll need to add an external, better PLL chip and deal yourself with the phase-alignment of the clocks
<awallin_> just think it would be nice to build an open-hardware/software time-interval/frequency counter..
<sb0> though you might still have luck with the internal PLLs...
<awallin_> PLLs: you mean you generate a 100/200 MHz external clock and feed it as input on some FPGA pin to clock parts of the fpga-circuit?
<sb0> in lx9, you'll need a short carry chain
* awallin_ away for a while..
<sb0> which you will sample at a high frequency, typically all that the slowtan6 clock network will give you
<sb0> but your whole circuit will not run at that frequency, so you'll need to deserialize that sampled data with a lower frequency phase-aligned clock
<sb0> and you'll need a PLL for those clocks
<sb0> you can also not use a carry chain and implement the delays with LUTs/routing
<sb0> but you'll need a lot of difficult low-level work
<sb0> (in spartan6, the carry chains need to be vertically stacked and are relatively fast, so the device has to be high enough to accommodate one with a total delay longer than the system clock period)
<awallin_> that sounds complicated :)
<awallin_> maybe best to get started with LX45 and the current code as-is then?
<sb0> yeah, maybe it's better for you to use the serdes-tdc first
<sb0> the delay line tdc code is also not all that simple to use, even on lx45
<awallin_> have you tried it on anything else than a SPEC ?
<sb0> hmm, maybe the synchrotron soleil people did
<awallin_> so is there a well-supported cheap LX45 dev-board around?
<sb0> sp601
<sb0> or mixxeo ;)
<sb0> though the latter won't be "cheap"
<awallin_> the sp601 I find in digikey is listed as having an LX16
<sb0> sp605 sorry
<awallin_> hmh that costs as much as a SPEC :)
<awallin_> sb0: what's the MIXXEO going to be used for primarily? real-time mixing on live tv?
<sb0> more for events
kflux has quit [Ping timeout: 244 seconds]
kflux has joined #m-labs
littlebab has joined #m-labs
Alain has quit [Quit: ChatZilla 0.9.90.1 [Firefox 27.0.1/20140212131424]]
furan has joined #m-labs
<furan> hi any fpga people around?
<sb0> yes
<furan> do you know much about the hardware graphics (scaling/etc) code?
kflux has quit [Ping timeout: 240 seconds]
<sb0> in milkymist soc? since I wrote most of it, yes
<furan> cool
<furan> I'm self taught in FPGA stuff, have made several things, but graphics hardware logic kind of stumps me.
<furan> like for the scaling filter, i would expect there to be a module that takes access to the bus with some parameters and walks through doing its thing, but instead you seem to have modules for memory traversal that call into simpler modules that do the kernel.
<furan> can you tell me why it was implemented that way?
<sb0> huh?
<sb0> are you talking about milkymist soc, from http://m-labs.hk/m1.html ?
<furan> yeah
<sb0> there's no scaling filter, there's a texture mapping unit
<sb0> there's a FSM that fetches vertex data from the memory and pushes it into the pipeline
<furan> I thought there was some filter that came with a cool VPI for testbench
<sb0> you're probably talking about the TMU test bench, yes
<sb0> you can do scaling with the TMU
<sb0> but it's not a 'kernel', just texture coordinates
<furan> gotcha
<furan> can you explain to me your graphics pipeline flow?
<sb0> have you read this? http://m-labs.hk/thesis/thesis.pdf
<sb0> most of the TMU is explained there
<furan> thanks
<furan> is tmg texture management unit?
<sb0> texture mapping unit
<furan> I agree with you about open research papers; they're how I've learned my whole adult life.
<furan> I've made my own organic light emitting diodes and I think that says a lot for the value of open research papers.
ramzes has quit [Ping timeout: 264 seconds]
ramzes has joined #m-labs
<sb0> furan, do you have a web page about that?
<furan> yeah
<furan> your sdram explanation is really good
<furan> and is making me think I could optimize read-modify-write operations by keeping a row open for the duration
<furan> that's another thing, I see a lot of fifo usage with memory controllers, where the controller itself is not so close to the modules which manipulate it
<furan> whereas doing the kind of optimization above would block other modules on the bus from accessing memory during the duration
<furan> sorry if I'm bugging you I just don't know many people who do this stuff
<sb0> hpdmc is already keeping rows open
<sb0> lasmicon still does
<furan> oh so that is a thing
<sb0> the problem with back-to-back rmw is that the read and write operations have different latencies and there's also a IO bus turnaround time
<sb0> plus a write recovery time
<furan> recovery time = time to precharge state?
<sb0> so you typically want to read a lot, then write a lot
<sb0> yes
<furan> ah, hence the streaming/fifo designs
<sb0> or to read
<sb0> you cannot read data you've just written
<furan> yeah it would require a flush
<furan> so with sdram it really makes sense to do a sort of tiling thing where you read into a tile in fpga bram, do the operations, and then copy that tile back to sdram
<sb0> yeah, caching basically
<sb0> when the access pattern isn't fully predictable
<sb0> or you read/write the data multiple times in a short interval
<sb0> nice oled hack. maybe you'll like http://ehsm.eu :)
<furan> does the milkymist soc's sdram controller optimize keeping a row open for multiple operations when the word size used is larger than the word size in the sdram?
<furan> thanks :)
<furan> wow that looks cool
<sb0> sdram rows are big - much bigger than the words
<furan> I (somehow?) got invoted to a conference called "Hackers" that was started by the folks written about in steven levy's hackers book
<furan> super high bandwidth conversations
<furan> but I had to speak my first year and I made a fool of myself by talking about how I was going to make OLEDs but not knowing enough yet in front of people like donald knuth, guy who invented the furby(can't remember his name), and the woot.com founder :P
<sb0> you should bring some OLEDs to EHSM :)
<sb0> now that they work
<sb0> last year Ben Krasnow was one of the speakers - he's doing ITO deposition in his home lab now
<furan> yeah I think I like sdram now. for a while I've been thinking about it as slow and taking many clocks but now I'm thinking it enforces good design, and a lot of optimization can be done with the rows.
<sb0> he might come again this year
<furan> yeah I've told him I'll send him some alq3 to do resistive oled emitter deposition but I haven't sent it yet(oops)
<furan> I hate shipping stuff and he's in the same city I am every work week.
<sb0> yeah, making SDRAM go fast is learning the hard way :-)
<furan> lol
<sb0> if you want to make your open source GPU, the Milkymist SoC TMU would be a good starting point
<sb0> though I really recommend you use migen for that, because a lot of structures are recurring, and you could copy them with a few lines of python code instead of massive verilog/vhdl copy and paste
<sb0> you'd need to use triangle interpolation instead of the squares it uses atm, add perspective correction, mipmapping, etc.
<sb0> blending of colors (not only texture coordinates)
<furan> well I have made some kind of retarded ones so far, that just do arbitrary walks and 2d/rop operations based on a bunch of parameter registers. I used it to control a LED matrix core I wrote.
<furan> I'd like to rebuild that and get it working, and then graduate to a 3d pipeline
<sb0> a Z-buffer too
<sb0> all those things are built on the same principles as the current TMU
<furan> nods
<sb0> the most painful thing is the triangle interpolation, and maybe mipmapping, as you'll need to process 8 cache access per cycle
<sb0> for trilinear filtering
<furan> well I like to build things from scratch but I'll take a lot of knowledge out of looking at your designs
<sb0> current cache is only 4 accesses
<sb0> now that I think of it, upgrading to 8 wouldn't be too much of a hassle, just use more BRAM
<furan> does that mean you have an 8-way ported memory?
<sb0> yeah
<sb0> but there's a trick
<sb0> since you need multiport for read only
<sb0> you can have several memories and duplicate the contents
<furan> that's nuts
<sb0> well, the current TMU design already does that, since I need a 4-port memory
<furan> I was thinking if your walks were aligned towards row/column you could have 8 horizontal or vertical scan buffers
<furan> instead of duplication across brams
<sb0> hmm, the mipmap levels might change during the scanning of a single triangle
<furan> what would that cause?
<sb0> only a bit of further annoyance :) you already cannot escape the 4-port memory for bilinear filtering already
<sb0> mipmapping is about fetching 2 textures instead of one
<sb0> so you'd have two of those 4-port caches operating in parallel, or a single 8-port cache
<furan> nods
<sb0> both will use the same amount of BRAM at roughly the same performance level. the single cache might give slightly better performance when the mipmap levels change and what used to be the mipmap level of one way becomes that of the other.
<sb0> unless, of course, the cache data of one mipmap way gets trashed by the other way, which you need to be careful to avoid
<furan> if you're talking about mipmap level changing that means you keep these caches for a long time, not just for the duration of a frame
<sb0> (if possible at all. I never actually implemented mipmapping. maybe 2 separate caches is the best option for this reason)
<sb0> as I said mipmap levels can change during the texturing of a single triangle
<furan> because part of it could be 'nearer'?
<sb0> yes
<furan> I didn't realize mip mapping had that granularity, got it
<sb0> you're also averaging the output of the two mipmap levels
<furan> alright I am gonna get back to the reading about the sdram controller before I page too much out
<sb0> you do an average weighted by the distance between the ideal (frational) mipmap level and the two discrete mipmap levels you have
<sb0> that's why it's called trilinear filtering