lekernel changed the topic of #milkymist to: Milkymist One, Migen, Milkymist SoC & Flickernoise :: Logs: http://en.qi-hardware.com/mmlogs
antgreen has quit [Ping timeout: 248 seconds]
antgreen has joined #milkymist
bhamilton has joined #milkymist
<stekern> I've been playing with running openrisc (mor1kx) on M1 again, this time using milkymist-ng. I've made the stuff available here: https://github.com/skristiansson/milkymist-ng-mor1kx/ if someone might be interested.
<stekern> it was actually pleasantly painless to drop it in to milkymist-ng, most changes are of sw nature
bhamilton has quit [Quit: Leaving.]
bhamilton has joined #milkymist
bhamilton has quit [Quit: Leaving.]
bhamilton has joined #milkymist
lekernel has joined #milkymist
<lekernel> stekern, cool
<lekernel> still able to run at 83MHz?
<lekernel> seems so... :)
<lekernel> Number of Slice LUTs: 6,766 out of 27,288 24%
<stekern> yup, it's not very small (yet)
<stekern> a couple of features can be omitted (like the internal timer, overflow exceptions, add with carry etc)
<lekernel> that's for the whole SoC - doesn't look bad...
<lekernel> let me get the precise number for the lm32 version
<stekern> I got ~1700 slices for mor1kx and ~700 for lm32 I think
<lekernel> Number of Slice LUTs: 4,700 out of 27,288 17%
<lekernel> so, 2K LUTs to go ... :p
<stekern> yep ;)
<stekern> I'll update the configuration so it's more identical to lm32, but I think lm32 still beats it
<lekernel> you can try this too
<stekern> the or1k architecture have a lot of special registers that need quite some space
<lekernel> can't push them into BRAM?
<lekernel> if they are seldom used, multi-cycle access may be acceptable
<stekern> I've been meaning to run those benchmarks, up until now I've only ran coremark and dhrystone
<lekernel> btw one thing I want to do with migen is automatic virtual BRAM ports using multiplied + phase aligned clocks
<stekern> yeah, pushing hem into bram is something that could be done to some of them, but the whole address space is a bit annoying (basically they are divided into groups)
<lekernel> BRAM is fast, several hundred Mhz, while the rest of the fabric is the slowness pig we know
<lekernel> so you could easily have a 4-port BRAM out of a 2-port BRAM with 2x clock multiplication for many designs
stekern_ has joined #milkymist
balrog_ has joined #milkymist
stekern has quit [*.net *.split]
balrog has quit [*.net *.split]
balrog_ is now known as balrog
stekern_ is now known as stekern
bhamilton1 has joined #milkymist
bhamilton has quit [Read error: Connection reset by peer]
<stekern> hmm, aren't bram outputs usually slower than register outputs?
<lekernel> there's some clock-to-output delay, yes
<lekernel> on slowtan6 it's 2.10ns, or 1.75ns if you enable the output register (ie reads take 2 cycles, pipelined)
<lekernel> setup/hold are all under 1ns
<lekernel> and you can clock at 280MHz max
<lekernel> most designs aren't that fast
<lekernel> this output register seems pretty useless, if all you get is 0.35ns of extra time at the output at the expense of one more cycle of latency ...
<lekernel> you certainly get a better deal by registering outside the BRAM
<lekernel> another crippled S6 feature it seems...
<stekern> it's only for marketing, "with built-in registers" ;)
<stekern> of course, if you need the result registered and the output delay isn't a problem, you'd benefit from them, but the use cases sounds a bit restricted, yes
fpgaminer has quit [Read error: Connection reset by peer]
fpgaminer has joined #milkymist
antgreen has quit [Ping timeout: 248 seconds]
bhamilton1 has quit [Quit: Leaving.]
bhamilton has joined #milkymist
<Fallenou_> lekernel: I don't remember the reason why Milkymist(-ng or not) SoC uses a lm32 core configured with 512 B of I and D cache, knowing that caches can go up to 32 kB on lm32
<Fallenou_> keeping the caches below or equal to 4 kB is indeed helping for the MMU part (no cache alias problem)
<Fallenou_> in some of your slides there is a graphic about cache hit probability, you seemed to have chosen 32 kB at that time in order to get cache hit 95% of the time
<Fallenou_> but I remember you had synthesis issues with big caches as well ...
<Fallenou_> on the other hand 512 B seems small, when you know you can go up to 4 kB without risking any cache aliasing (caused by VIPT cache)
Fallenou_ is now known as Fallenou
bhamilton has quit [Quit: Leaving.]
bhamilton has joined #milkymist
<lekernel> it's not 512 bytes, it's 256*16 bytes
<lekernel> so 4K
<lekernel> big caches cause timing problems on slowtan6
<Fallenou> right, I mixed up things, I did 256*2 instead of *16 ... (habit of converting 16 bits to 2 bytes ...)
<Fallenou> ok so 4K, perfect :)
<lekernel> I might want larger caches when moving to a less slow FPGA...
<Fallenou> it would be cool to allow software to read cache size then
<Fallenou> for the OS to adapt and handle cache aliasing issues when they are possible
<lekernel> send a patch :)
<Fallenou> SH4 cpu allows to read cache size for instance
<Fallenou> I won't hesitate ;)
<Fallenou> it may end up in CFG2 or maybe in a CFG3
<Fallenou> for now, I will only handle the current cache configuration hard coded in NetBSD kernel
<Fallenou> first things first :)
<GitHub97> [NetBSD] fallen pushed 2 new commits to master: http://git.io/rq36JA
<GitHub97> NetBSD/master ed27bd4 Yann Sionneau: Move TLB helpers into cpu.h
<GitHub97> NetBSD/master 7be2287 Yann Sionneau: Update TODO
<GitHub46> [NetBSD] fallen pushed 1 new commit to master: http://git.io/ZPUoqA
<GitHub46> NetBSD/master 81a01e8 Yann Sionneau: Add implementation of pmap.9 MD functions...
<lekernel> the figure you are talking about is about the TMU cache, not the CPU cache
<Fallenou> oh, ok
<Fallenou> 95% seemed a bit high :)
<stekern> that's interesting, I would have expected the opposite, larger cache, less tag bits to compare against, less timing problems.
<stekern> Fallenou: on or1k, you can read the cache size out of an spr, but then we are 2000 luts larger than lm32 too ;)
<lekernel> you need more BRAMs too
<lekernel> so they spread on more area on the chip, and then the particularly slow S6 routing does the rest ...
antgreen has joined #milkymist
<Fallenou> stekern: that's convenient :)
<stekern> lekernel: ah, yeah, that of course makes sense, several brams might slow things down
<stekern> (aliasing) that's another thing that's slightly annoying in or1k, you max out on 16kb with a 2-way cache if you don't want to worry about it
<stekern> what page size did you decide on in the end?
<Fallenou> for now I'm going for 4 kB pages
<Fallenou> it seems to be the size used almost everywhere (except "big pages" options and such)
<stekern> oh, so you're actually worse off in that regard ;)
<Fallenou> when you say 16 kB, is it total cache size ? taking into account the associativity ?
<stekern> what are the benefits of having a smaller cache size, for me it was already predefined to 8kb when I did the mmus for mor1kx, so I haven't given it much thought
<stekern> err, smaller page size
<Fallenou> you get more fine grain management of your virtual memory
<Fallenou> so less fragmentation I would say
<Fallenou> I mean, the kernel in a few places allocate by multiple of page size
<Fallenou> but it does not need that much memory usually (8 kB or 16 kB)
<Fallenou> But personally I didn't give the page size a big thought
<stekern> yeah, that's obvoius of course, but is there something else? perhaps that's reason enough though
<Fallenou> I took 4 kB as granted because it's almost everywhere in the litterature
<Fallenou> for instance, on recent linux kernel for x86, do they use bigger pages? (for Ubuntu, debian etc)
<Fallenou> I know there is an option for that, but I don't know if it's checked or not
<stekern> (16kb) yes, total cache, 2*8kb
<GitHub177> [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/b65uyw
<GitHub177> milkymist-ng/master 8e76c96 Sebastien Bourdeauducq: timer, uart: EventSourceLevel -> EventSourceProcess
<GitHub42> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/f9HVcQ
<GitHub42> migen/master b9b6df6 Sebastien Bourdeauducq: bank/eventmanager: refactor, rename EventSourceLevel -> EventSourceProcess, add fully externally controlled event source
<stekern> you could of course do more ways, but then the replacement logic becomes alot more complicated
<Fallenou> well not that much, if you use round robin for instance
<stekern> at least if you use lru
<Fallenou> but it's still the same issue of routing more block rams etc
<Fallenou> I wonder if lru has a big impact on performance
<Fallenou> when you have 2 ways for instance
<stekern> lru for 2-way is dead simple
<stekern> just 1 bit to check against
antgreen has quit [Ping timeout: 245 seconds]
<stekern> but, I agree, round robin for 4-way wouldn't be that complex
<Fallenou> I mean, is it really better than rr ?
<stekern> maybe I should do that, keep lru for 2-ways, and do rr for 4-ways
<stekern> it probably depends on the application, I haven't done any comparisons
<stekern> but i've got the impression that it would be better
<Fallenou> I would think that when you increase the number of ways, indeed lru can start to get interesting, because you really have a "bigger" choice to make
<Fallenou> 1 among 4 (or 8 or more)
<Fallenou> but 1 among 2 seems a poor choice anyway
<Fallenou> using rr or lru
<Fallenou> I think only a very precise software benchmark could really give better performance with lru than rr when the associativity is 2
<Fallenou> but it's just feeling, I don't really know :)
<stekern> yeah, it probably doesn't make a big difference
<GitHub17> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/5GmfyA
<GitHub17> migen/master 10212e8 Sebastien Bourdeauducq: dma_asmi: cleanup
bhamilton has quit [Quit: Leaving.]
bhamilton has joined #milkymist
bhamilton has quit [Quit: Leaving.]
<GitHub59> [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/-OfjOA
<GitHub59> milkymist-ng/master 89dbc37 Sebastien Bourdeauducq: cif: do not generate write function for CSRStatus
<GitHub126> [migen] sbourdeauducq pushed 1 new commit to master: http://git.io/8HoOow
<GitHub126> migen/master c82b53f Sebastien Bourdeauducq: bank/description/AutoCSR: add autocsr_exclude
bhamilton has joined #milkymist
bhamilton has quit [Client Quit]
sh4rm4 has quit [Quit: sh4rm4]
antgreen has joined #milkymist
sh4rm4 has joined #milkymist
<GitHub48> [milkymist-ng] sbourdeauducq pushed 3 new commits to master: http://git.io/T96SmA
<GitHub48> milkymist-ng/master 66b4bae Sebastien Bourdeauducq: top: connect dvisampler DMA IRQs
<GitHub48> milkymist-ng/master b3d87e1 Sebastien Bourdeauducq: software/videomixer: use new DMA engine
<GitHub48> milkymist-ng/master 29efa85 Sebastien Bourdeauducq: dvisampler: new DMA engine (buggy)
<GitHub82> [milkymist-ng] sbourdeauducq pushed 1 new commit to master: http://git.io/lTVgFQ
<GitHub82> milkymist-ng/master d685ed2 Sebastien Bourdeauducq: dvisampler/dma: bugfixes
<lekernel> yay! all works now.
<lekernel> there is some noise on the picture that I suspect is due to poor SI
<wpwrak> you get clean frames ?
<lekernel> yes
<wpwrak> congratulations !
<lekernel> on the VGA framebuffer, and in color :)
<wpwrak> kewl
<lekernel> with just a couple random pixels
<lekernel> probably SI, it gets worse when the pixel clock increases
<lekernel> and 800x600 is pure noise (even sync fails)
<lekernel> well I hope the direct TMDS board will fix this issue
<lekernel> I think I can consider myself lucky that at least 640x480 works :) it's pretty much on the brink of failure, and debugging SI would waste days...
<wpwrak> yeah, the expansion header isn't a great place for high-speed signals
<wpwrak> indeed. and now you can also do the mixing and fading :)
<lekernel> if I can get my chB to work... parts for assembling two extra boards are with fedex atm...
<Fallenou> congratz :)
<wpwrak> always order a generous number of spares :)
<Fallenou> there is no correction code on dvi to fix SI caused errors?
<larsc> nope
<larsc> HDMI has BCH for data islands though
<larsc> DVI is really just VGA in digital
<Fallenou> ok
bhamilton has joined #milkymist
lekernel has quit [Quit: Leaving]
bhamilton has quit [Quit: Leaving.]