#milkymist on 2013-05-08 — irc logs at freenode.irclog.whitequark.org

2013-02-02 10:11 lekernel changed the topic of #milkymist to: Milkymist One, Migen, Milkymist SoC & Flickernoise :: Logs: http://en.qi-hardware.com/mmlogs

01:32 antgreen has quit [Ping timeout: 248 seconds]

01:45 antgreen has joined #milkymist

06:55 bhamilton has joined #milkymist

07:03 <stekern> I've been playing with running openrisc (mor1kx) on M1 again, this time using milkymist-ng. I've made the stuff available here: https://github.com/skristiansson/milkymist-ng-mor1kx/ if someone might be interested.

07:05 <stekern> it was actually pleasantly painless to drop it in to milkymist-ng, most changes are of sw nature

07:09 bhamilton has quit [Quit: Leaving.]

07:43 bhamilton has joined #milkymist

08:21 bhamilton has quit [Quit: Leaving.]

08:23 bhamilton has joined #milkymist

08:30 lekernel has joined #milkymist

08:39 <lekernel> stekern, cool

08:39 <lekernel> still able to run at 83MHz?

08:40 <lekernel> seems so... :)

08:43 <lekernel> Number of Slice LUTs: 6,766 out of 27,288 24%

08:43 <stekern> yup, it's not very small (yet)

08:44 <stekern> a couple of features can be omitted (like the internal timer, overflow exceptions, add with carry etc)

08:45 <lekernel> that's for the whole SoC - doesn't look bad...

08:45 <lekernel> let me get the precise number for the lm32 version

08:45 <stekern> I got ~1700 slices for mor1kx and ~700 for lm32 I think

08:47 <lekernel> Number of Slice LUTs: 4,700 out of 27,288 17%

08:53 <lekernel> so, 2K LUTs to go ... :p

08:53 <stekern> yep ;)

08:54 <stekern> I'll update the configuration so it's more identical to lm32, but I think lm32 still beats it

08:55 <lekernel> you can try this too

08:55 <lekernel> http://www.eecs.umich.edu/mibench/

08:56 <stekern> the or1k architecture have a lot of special registers that need quite some space

08:56 <lekernel> can't push them into BRAM?

08:57 <lekernel> if they are seldom used, multi-cycle access may be acceptable

08:57 <stekern> I've been meaning to run those benchmarks, up until now I've only ran coremark and dhrystone

08:58 <lekernel> btw one thing I want to do with migen is automatic virtual BRAM ports using multiplied + phase aligned clocks

08:58 <stekern> yeah, pushing hem into bram is something that could be done to some of them, but the whole address space is a bit annoying (basically they are divided into groups)

08:58 <lekernel> BRAM is fast, several hundred Mhz, while the rest of the fabric is the slowness pig we know

08:59 <lekernel> so you could easily have a 4-port BRAM out of a 2-port BRAM with 2x clock multiplication for many designs

09:10 stekern_ has joined #milkymist

09:14 balrog_ has joined #milkymist

09:14 stekern has quit [*.net *.split]

09:14 balrog has quit [*.net *.split]

09:16 balrog_ is now known as balrog

09:16 stekern_ is now known as stekern

09:19 bhamilton1 has joined #milkymist

09:19 bhamilton has quit [Read error: Connection reset by peer]

09:27 <stekern> hmm, aren't bram outputs usually slower than register outputs?

09:29 <lekernel> there's some clock-to-output delay, yes

09:31 <lekernel> on slowtan6 it's 2.10ns, or 1.75ns if you enable the output register (ie reads take 2 cycles, pipelined)

09:31 <lekernel> setup/hold are all under 1ns

09:32 <lekernel> and you can clock at 280MHz max

09:32 <lekernel> most designs aren't that fast

09:33 <lekernel> this output register seems pretty useless, if all you get is 0.35ns of extra time at the output at the expense of one more cycle of latency ...

09:37 <lekernel> you certainly get a better deal by registering outside the BRAM

09:39 <lekernel> another crippled S6 feature it seems...

09:40 <stekern> it's only for marketing, "with built-in registers" ;)

09:43 <stekern> of course, if you need the result registered and the output delay isn't a problem, you'd benefit from them, but the use cases sounds a bit restricted, yes

10:13 fpgaminer has quit [Read error: Connection reset by peer]

10:14 fpgaminer has joined #milkymist

12:18 antgreen has quit [Ping timeout: 248 seconds]

13:14 bhamilton1 has quit [Quit: Leaving.]

13:16 bhamilton has joined #milkymist

13:57 <Fallenou_> lekernel: I don't remember the reason why Milkymist(-ng or not) SoC uses a lm32 core configured with 512 B of I and D cache, knowing that caches can go up to 32 kB on lm32

14:00 <Fallenou_> keeping the caches below or equal to 4 kB is indeed helping for the MMU part (no cache alias problem)

14:04 <Fallenou_> in some of your slides there is a graphic about cache hit probability, you seemed to have chosen 32 kB at that time in order to get cache hit 95% of the time

14:05 <Fallenou_> but I remember you had synthesis issues with big caches as well ...

14:05 <Fallenou_> on the other hand 512 B seems small, when you know you can go up to 4 kB without risking any cache aliasing (caused by VIPT cache)

14:10 Fallenou_ is now known as Fallenou

14:16 bhamilton has quit [Quit: Leaving.]

14:19 bhamilton has joined #milkymist

14:54 <lekernel> it's not 512 bytes, it's 256*16 bytes

14:54 <lekernel> so 4K

14:54 <lekernel> big caches cause timing problems on slowtan6

14:57 <Fallenou> right, I mixed up things, I did 256*2 instead of *16 ... (habit of converting 16 bits to 2 bytes ...)

14:57 <Fallenou> ok so 4K, perfect :)

14:59 <lekernel> I might want larger caches when moving to a less slow FPGA...

14:59 <Fallenou> it would be cool to allow software to read cache size then

15:00 <Fallenou> for the OS to adapt and handle cache aliasing issues when they are possible

15:00 <lekernel> send a patch :)

15:00 <Fallenou> SH4 cpu allows to read cache size for instance

15:00 <Fallenou> I won't hesitate ;)

15:00 <Fallenou> it may end up in CFG2 or maybe in a CFG3

15:01 <Fallenou> for now, I will only handle the current cache configuration hard coded in NetBSD kernel

15:01 <Fallenou> first things first :)

15:04 <GitHub97> [NetBSD] fallen pushed 2 new commits to master: http://git.io/rq36JA

15:04 <GitHub97> NetBSD/master ed27bd4 Yann Sionneau: Move TLB helpers into cpu.h

15:04 <GitHub97> NetBSD/master 7be2287 Yann Sionneau: Update TODO

15:06 <GitHub46> [NetBSD] fallen pushed 1 new commit to master: http://git.io/ZPUoqA

15:06 <GitHub46> NetBSD/master 81a01e8 Yann Sionneau: Add implementation of pmap.9 MD functions...

15:09 <lekernel> the figure you are talking about is about the TMU cache, not the CPU cache

15:10 <Fallenou> oh, ok

15:10 <Fallenou> 95% seemed a bit high :)

15:35 <stekern> that's interesting, I would have expected the opposite, larger cache, less tag bits to compare against, less timing problems.

15:37 <stekern> Fallenou: on or1k, you can read the cache size out of an spr, but then we are 2000 luts larger than lm32 too ;)

15:38 <lekernel> you need more BRAMs too

15:39 <lekernel> so they spread on more area on the chip, and then the particularly slow S6 routing does the rest ...

15:40 antgreen has joined #milkymist

15:42 <Fallenou> stekern: that's convenient :)

15:46 <stekern> lekernel: ah, yeah, that of course makes sense, several brams might slow things down

15:50 <stekern> (aliasing) that's another thing that's slightly annoying in or1k, you max out on 16kb with a 2-way cache if you don't want to worry about it

15:51 <stekern> what page size did you decide on in the end?

15:53 <Fallenou> for now I'm going for 4 kB pages

15:53 <Fallenou> it seems to be the size used almost everywhere (except "big pages" options and such)

16:06 <stekern> oh, so you're actually worse off in that regard ;)

16:08 <Fallenou> when you say 16 kB, is it total cache size ? taking into account the associativity ?

16:08 <stekern> what are the benefits of having a smaller cache size, for me it was already predefined to 8kb when I did the mmus for mor1kx, so I haven't given it much thought

16:09 <stekern> err, smaller page size

16:09 <Fallenou> you get more fine grain management of your virtual memory

16:09 <Fallenou> so less fragmentation I would say

16:10 <Fallenou> I mean, the kernel in a few places allocate by multiple of page size

16:10 <Fallenou> but it does not need that much memory usually (8 kB or 16 kB)

16:11 <Fallenou> But personally I didn't give the page size a big thought

16:12 <stekern> yeah, that's obvoius of course, but is there something else? perhaps that's reason enough though

16:12 <Fallenou> I took 4 kB as granted because it's almost everywhere in the litterature

16:12 <Fallenou> for instance, on recent linux kernel for x86, do they use bigger pages? (for Ubuntu, debian etc)

16:13 <Fallenou> I know there is an option for that, but I don't know if it's checked or not

16:13 <stekern> (16kb) yes, total cache, 2*8kb

16:13 <GitHub177> [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/b65uyw

16:13 <GitHub177> milkymist-ng/master 8e76c96 Sebastien Bourdeauducq: timer, uart: EventSourceLevel -> EventSourceProcess

16:14 <GitHub42> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/f9HVcQ

16:14 <GitHub42> migen/master b9b6df6 Sebastien Bourdeauducq: bank/eventmanager: refactor, rename EventSourceLevel -> EventSourceProcess, add fully externally controlled event source

16:14 <stekern> you could of course do more ways, but then the replacement logic becomes alot more complicated

16:15 <Fallenou> well not that much, if you use round robin for instance

16:15 <stekern> at least if you use lru

16:15 <Fallenou> but it's still the same issue of routing more block rams etc

16:16 <Fallenou> I wonder if lru has a big impact on performance

16:16 <Fallenou> when you have 2 ways for instance

16:18 <stekern> lru for 2-way is dead simple

16:19 <stekern> just 1 bit to check against

16:19 antgreen has quit [Ping timeout: 245 seconds]

16:20 <stekern> but, I agree, round robin for 4-way wouldn't be that complex

16:20 <Fallenou> I mean, is it really better than rr ?

16:21 <stekern> maybe I should do that, keep lru for 2-ways, and do rr for 4-ways

16:22 <stekern> it probably depends on the application, I haven't done any comparisons

16:23 <stekern> but i've got the impression that it would be better

16:26 <Fallenou> I would think that when you increase the number of ways, indeed lru can start to get interesting, because you really have a "bigger" choice to make

16:27 <Fallenou> 1 among 4 (or 8 or more)

16:27 <Fallenou> but 1 among 2 seems a poor choice anyway

16:27 <Fallenou> using rr or lru

16:28 <Fallenou> I think only a very precise software benchmark could really give better performance with lru than rr when the associativity is 2

16:28 <Fallenou> but it's just feeling, I don't really know :)

16:31 <stekern> yeah, it probably doesn't make a big difference

17:00 <GitHub17> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/5GmfyA

17:00 <GitHub17> migen/master 10212e8 Sebastien Bourdeauducq: dma_asmi: cleanup

17:40 bhamilton has quit [Quit: Leaving.]

18:06 bhamilton has joined #milkymist

18:12 bhamilton has quit [Quit: Leaving.]

19:00 <GitHub59> [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/-OfjOA

19:00 <GitHub59> milkymist-ng/master 89dbc37 Sebastien Bourdeauducq: cif: do not generate write function for CSRStatus

19:01 <GitHub126> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/8HoOow

19:01 <GitHub126> migen/master c82b53f Sebastien Bourdeauducq: bank/description/AutoCSR: add autocsr_exclude

20:00 bhamilton has joined #milkymist

20:04 bhamilton has quit [Client Quit]

20:05 sh4rm4 has quit [Quit: sh4rm4]

20:10 antgreen has joined #milkymist

20:14 sh4rm4 has joined #milkymist

20:33 <GitHub48> [milkymist-ng] sbourdeauducq pushed 3 new commits to master: http://git.io/T96SmA

20:33 <GitHub48> milkymist-ng/master 66b4bae Sebastien Bourdeauducq: top: connect dvisampler DMA IRQs

20:33 <GitHub48> milkymist-ng/master b3d87e1 Sebastien Bourdeauducq: software/videomixer: use new DMA engine

20:33 <GitHub48> milkymist-ng/master 29efa85 Sebastien Bourdeauducq: dvisampler: new DMA engine (buggy)

20:52 <GitHub82> [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/lTVgFQ

20:52 <GitHub82> milkymist-ng/master d685ed2 Sebastien Bourdeauducq: dvisampler/dma: bugfixes

21:00 <lekernel> yay! all works now.

21:00 <lekernel> there is some noise on the picture that I suspect is due to poor SI

21:00 <wpwrak> you get clean frames ?

21:00 <lekernel> yes

21:00 <wpwrak> congratulations !

21:00 <lekernel> on the VGA framebuffer, and in color :)

21:00 <wpwrak> kewl

21:01 <lekernel> with just a couple random pixels

21:01 <lekernel> probably SI, it gets worse when the pixel clock increases

21:04 <lekernel> and 800x600 is pure noise (even sync fails)

21:04 <lekernel> well I hope the direct TMDS board will fix this issue

21:05 <lekernel> I think I can consider myself lucky that at least 640x480 works :) it's pretty much on the brink of failure, and debugging SI would waste days...

21:05 <wpwrak> yeah, the expansion header isn't a great place for high-speed signals

21:06 <wpwrak> indeed. and now you can also do the mixing and fading :)

21:07 <lekernel> if I can get my chB to work... parts for assembling two extra boards are with fedex atm...

21:09 <Fallenou> congratz :)

21:10 <wpwrak> always order a generous number of spares :)

21:10 <Fallenou> there is no correction code on dvi to fix SI caused errors?

21:15 <larsc> nope

21:16 <larsc> HDMI has BCH for data islands though

21:16 <larsc> DVI is really just VGA in digital

21:17 <Fallenou> ok

22:19 bhamilton has joined #milkymist

23:05 lekernel has quit [Quit: Leaving]

23:05 bhamilton has quit [Quit: Leaving.]