#milkymist on 2012-07-09 — irc logs at freenode.irclog.whitequark.org

2012-06-17 20:21 lekernel changed the topic of #milkymist to: Milkymist One, Migen, Milkymist SoC & Flickernoise :: Logs: http://en.qi-hardware.com/mmlogs :: EHSM Berlin Dec 28-30 http://ehsm.eu :: latest video http://www.youtube.com/playlist?list=PL181AAD8063FCC9DC

00:23 Jia has joined #milkymist

00:43 xiangfu has joined #milkymist

00:54 fpgaminer has joined #milkymist

01:18 <wolfspraul> good morning

01:19 <wolfspraul> the other day lekernel said fpgas are mostly designed/optimized for synchronous designs

01:19 <wolfspraul> I'm wondering what the underlying technical reasons are? What specifically makes them geared towards synchronous vs. asynchronous designs?

01:23 rejon has quit [Ping timeout: 264 seconds]

02:31 rejon has joined #milkymist

03:16 <wpwrak> perhaps the structure of the clock distribution ?

03:17 <wpwrak> e.g., perhaps many things connect to the same clock ? (instead of letting you just use any random signal as clock)

03:34 <wpwrak> Fallenou: wow. 1 k TLB entries. that's **!!!*HUGE*!!!**. 1k+1k may be the largest TLB in existence for a uniprocessor design :)

04:00 rejon has quit [Ping timeout: 255 seconds]

04:13 rejon has joined #milkymist

04:19 rejon has quit [Ping timeout: 250 seconds]

05:11 rejon has joined #milkymist

06:16 cladamw has joined #milkymist

06:25 mumptai has joined #milkymist

06:32 rejon has quit [Ping timeout: 248 seconds]

06:42 <Fallenou> (huge TLB) : yes but we don't have hardware page table walker, so tlb miss will be quite expensive (exception raised, then software lookup, TLB refill, and return from exception)

06:42 <Fallenou> so we want to avoid spending all the cpu resources on TLB refilling :)

06:43 <Fallenou> and we have BlockRAM resources, better use them ! ;)

06:58 rejon has joined #milkymist

07:09 Martoni has joined #milkymist

07:31 rejon_ has joined #milkymist

07:31 rejon has quit [Read error: Connection reset by peer]

07:50 mumptai has quit [Ping timeout: 264 seconds]

08:47 cladamw has quit [Quit: Ex-Chat]

09:09 <wpwrak> well, if you have block RAM to burn ... ;-)

09:13 <wpwrak> the TLB size probably doesn't help all that much to improve performance. but let's see ...

09:13 <qi-bot> The MMU firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-mmu-20120709-0910/

09:17 <Fallenou> wpwrak: anyway it's not hard coded, you can easily change the TLB size in the code

09:17 <Fallenou> and all the rest of the code will adapt to it

09:17 <Fallenou> index width, bit position etc

09:21 Jia has quit [Quit: Konversation terminated!]

09:23 <wpwrak> great

09:23 <Fallenou> wpwrak: that could be a marketing bullshit pitch line "we have a f*cking huge TLB"

09:23 <Fallenou> bigger than Cortex A15

09:24 <Fallenou> too bad our value is not in marketing BS :p

09:24 <wpwrak> use of the lowest bit of a phys/virt address as TLB selector seems a bit hackish. but perhaps you often need to mix other bits in there anyway ?

09:25 <wpwrak> "The M in M1 is for Monster" ;)

09:25 <Fallenou> ahah

09:26 <Fallenou> (but perhaps you often need to mix other bits in there anyway ?) <= what do you mean ?

09:26 * Fallenou didn't get that

09:27 <wpwrak> things like permission bits, when writing a TLB entry

09:28 <wpwrak> i suppose they would also be encoded in the address, wouldn't they ?

09:32 <Fallenou> yes

09:32 <Fallenou> in the page offset

09:32 <Fallenou> you have 12 free bits

09:32 <Fallenou> err 11 free bits

09:32 <Fallenou> because lowest bit is used to chose the TLB :)

09:33 <wpwrak> so these registers are write-only ?

09:33 <wpwrak> or do you get anything meaningful if you read them ?

09:34 <wpwrak> i should actually ask this on the list :)

09:36 <Fallenou> hehe that would allow someone else to benefit from the information

09:36 <Fallenou> you cannot read back what you have written in TLBCTRL, TLBPADDR and TLBVADDR

09:37 <Fallenou> the rcsr (read) gives you another kind of information

09:37 <Fallenou> reading tlbpaddr and tlbvaddr gives you address of latest tlb miss

09:38 <Fallenou> in my sample code, I read it in the miss handler

09:41 <Fallenou> https://github.com/fallen/milkymist-mmu/blob/mmu-bios/software/mmu-bios/tlb_miss_handler.c

09:41 <Fallenou> sm volatile("rcsr %0, dtlbma" : "=r"(vaddr) :: );

09:41 <Fallenou> dtlbma is an alias for tlbpaddr or tlbvaddr I don't remember which one

09:41 <Fallenou> asm volatile("rcsr %0, itlbma" : "=r"(vaddr) :: );

09:41 <Fallenou> same thing here

09:42 <Fallenou> itlbma is an alias

09:47 <Fallenou> tlbctrl does not have anything bind to rcsr yet, maybe we will need to get some information one day, it could be used to retrieve some data

10:02 * Fallenou will put the README.txt documentation in his github Wiki to update it

10:08 * Fallenou going to eat

10:24 rejon_ has quit [Ping timeout: 255 seconds]

11:05 km2 has quit [Ping timeout: 276 seconds]

11:10 km2 has joined #milkymist

11:54 <Fallenou> wpwrak: thanks for your feed back on the ML :)

11:54 <Fallenou> I will detail a little bit later

11:55 <Fallenou> btw I meant others bits than ([21:12] and 0)

11:55 <Fallenou> and not "other bits than [21:12], and [0]"

11:55 <Fallenou> but you got the idea :)

12:04 <Fallenou> wpwrak : updated documentation is there : https://github.com/fallen/milkymist-mmu/wiki

12:50 xiangfu has quit [Quit: Leaving]

12:53 rejon has joined #milkymist

13:02 <wpwrak> ah yes, parse error on my end ;-)

13:03 <wpwrak> by the way, page 17 on www.latticesemi.com/documents/doc20890x45.pdf says "DataBusError exceptions are imprecise"

13:03 <wpwrak> could this perhaps be connected to the problems you're experiencing ?

13:04 <Fallenou> hum it's something else

13:04 <Fallenou> it's about when wishbone says "error !"

13:04 <Fallenou> I've seen a comment in the code about that

13:04 <Fallenou> let me find it

13:05 <wpwrak> ah, good. so you're not using that mechanism

13:06 <Fallenou> no I'm not

13:07 <Fallenou> it's for unaligned access or something like that

13:09 <wpwrak> hmm. i wonder if this could cause trouble for linux. hopefully only in theory

13:14 <Fallenou> well, we cannot forbid the user to try to do unaligned access

13:15 <Fallenou> so we have to handle this correctly in the exception handler

13:15 <Fallenou> unfortunatel

13:15 <Fallenou> +y

13:15 <Fallenou> maybe another upcoming surprise :)

13:16 <Fallenou> wpwrak: for now I only added two exception vectors : DTLB_MISS and ITLB_MISS

13:17 <wpwrak> i guess it all depends on just how much of a mess LM32 can leave behind in such a case

13:17 <Fallenou> I don't know yet if I will share those too with the "page fault" (read/write/execute protection stuff)

13:17 <Fallenou> or if I will add exception vector specific to protection fault

13:19 <wpwrak> how would permission checks work ? e.g., if there's a write to an address that isn't in the TLB. would the TLB fault handler add the entry, return, and then, if there's a permission issue, you'd get another fault ?

13:19 <GitHub169> [migen] sbourdeauducq pushed 1 new commit to master: https://github.com/milkymist/migen/commit/ed27783a5363cd80ad9409aa8298d40bcf8ed412

13:19 <GitHub169> [migen/master] fhdl: arrays (TODO: use correct BV for intermediate signals) - Sebastien Bourdeauducq

13:20 <wpwrak> and i think it would help to keep code paths short if you separate TLB fault and permission exceptions

13:20 <Fallenou> http://gattis.github.com/milkshake/ < ahah nice WebGL milkdrop renderer

13:21 <wpwrak> you also get a pseudo-exception, which is the page fault (i.e., if a page isn't present). that would basically be a continuation of the TLB fault

13:21 <wpwrak> in case we add a page table walker later, it would become an exception on its own and the TLB faults would disappear

13:21 <Fallenou> wpwrak: I would say : 1°) TLB miss exception, TLB is refilled, access is replayed 2°) protection fault because the type of access violates the page right

13:21 <Fallenou> so two exceptions

13:22 <Fallenou> wpwrak: page table walker is really a mess to implement

13:22 <Fallenou> it will touch a much broader part of the lm32 source code

13:22 <wpwrak> two exceptions sounds good. checking permissions in software may be messy

13:22 <Fallenou> I am trying to touch as little as possible the source code

13:23 <wpwrak> (page table walker) hmm, dunno. we'll see.

13:23 <wpwrak> btw, have you thought about the case of having multiple faults in the pipeline ?

13:27 <Fallenou> not yet, sorry :(

13:35 <wpwrak> well, it's also something that can hopefully wait a bit :)

13:42 <Fallenou> this week I will have very little time to spend on MMU

13:43 <Fallenou> but starting next week I will have a looooot more time than since january

13:43 <Fallenou> (no I'm not getting fired :p)

13:45 <wpwrak> have you convinced your boss to let you work on it at work ? :)

13:46 <Fallenou> hehe no

13:46 <Fallenou> and I don't think it will ever happen :p

13:46 <Fallenou> unless I convince them so integrate lm32 in their next ASIC

13:47 <wpwrak> heh :) now you have an objective

13:47 <Fallenou> which they won't since they are using more powerful ARM cores

13:52 <Fallenou> they are prototyping an asic with a Cortex A9 inside

13:52 <Fallenou> I think they won't switch to lm32 ^^

13:55 <wpwrak> hmm yes, may be hard to sell them that idea

13:56 <Fallenou> I could convince them, they only have 128 TLB entries ;)

13:56 <Fallenou> "with lm32 you can have 1k and more !"

14:04 <sh4rm4> hmm is an arm cortex that much faster than lm32 on an asic ? afaik you can only get ~500 mhz anyway

14:08 <Fallenou> arm cortex -> 800 MHz

14:08 <Fallenou> and you can get it up to 2 GHz

14:13 <lekernel> clock frequency is easy, just add more pipeline stages

14:13 <lekernel> doesn't mean software runs faster though ;)

14:14 <Fallenou> hehe sure

14:15 <wpwrak> for stages -> infinity: fCLK -> infinity && work_done -> 0 ;-)

14:33 <Fallenou> for stages -> infinity: time_for_instruction_completion -> infinity

14:33 <Fallenou> :p

14:37 <wpwrak> yup :) also makes debugging much easier. you'll never get an incorrect result :)

14:39 <Fallenou> lekernel: you canceled your presentation at RMLL ?

15:00 <kristianpaul> wolfspraul: basically for me, well because there are "lots" of flip flop, basically thats basic for rtl, adding that to wpwrak answer

15:04 <wolfspraul> why does "lots of ff" favors sync over async?

15:05 <wolfspraul> favor

15:05 jumpercable has quit [Quit: leaving]

15:06 <kristianpaul> because the clock distribution around then

15:06 <kristianpaul> i havent seens in detailt but must flip flip on a s6 LUT have a clock signal right?

15:07 <wolfspraul> yes

15:08 <wpwrak> i don't think the FFs per se make a difference

15:09 <wpwrak> but if you assume that all of yours FFs will use the same clock, it's easier to have a good clock distribution

15:09 <kristianpaul> okay, so there is a clocking distribution in the s6 swich matrix that allow have like said "main" clocks or such easilly

15:10 <wolfspraul> we are all just guessing :-)

15:10 <kristianpaul> ;)

15:10 <wpwrak> wolfspraul: you're the expert on the internal structures, so you should know :)

15:10 <wolfspraul> nah

15:10 <kristianpaul> yes ! ;-)

15:10 <wolfspraul> it's not just the structures, it's what they mean

15:10 <lekernel> there are lots of FFs (which are synchronous elements, already) and a limited clock routing

15:10 <wpwrak> i.e., do large groups of FFs generally share the same clock ?

15:11 <wolfspraul> why are the ff synchronous elements?

15:11 <lekernel> if you have too many clocks, they will use the local interconnect, which will cause a lot of skew

15:11 <lekernel> because they have a clock

15:11 <kristianpaul> yup

15:11 <wolfspraul> each slice has a sync/async flag, what does that mean?

15:11 <lekernel> as opposed to e.g. latches

15:11 <lekernel> though you can also use some FFs in latch mode

15:11 <wpwrak> (skew) ah, so you could actually go async if you insist. but at a cost.

15:12 <wolfspraul> half of them, but I think you then loose the other half :-) (nice punishment)

15:12 <lekernel> in fact, if you can deal with the skew problems, you can probably have nice async circuits

15:12 <lekernel> but the xilinx toolchain won't do it

15:12 <lekernel> you need a completely new toolchain if you want to do async

15:12 <wpwrak> slight complication :)

15:12 <wolfspraul> what does the sync/async flag in each slice mean?

15:13 <wpwrak> wolfspraul: so that's a feature for version 2 then ;-)

15:13 <lekernel> verilog/vhdl aren't even nice languages for async either

15:13 <lekernel> also, no one uses async these days... except in very limited portions of designs...

15:13 <wpwrak> lost on a maze of little always blocks :)

15:14 <lekernel> yes, exactly

15:14 <wpwrak> s/on/in/

15:14 <wolfspraul> ok I got it now - thanks a lot!

15:15 <lekernel> what's the per-slice sync/async flag?

15:16 <lekernel> Fallenou: yes, can't go for multiple reasons

15:18 <Fallenou> lekernel: too bad :(

16:31 hypermodern has joined #milkymist

16:50 <qi-bot> The firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1520/

17:08 jimmythehorn has joined #milkymist

17:20 lekernel_ has joined #milkymist

17:20 lekernel has quit [Ping timeout: 248 seconds]

18:49 sh4rm4 has quit [Remote host closed the connection]

18:50 sh4rm4 has joined #milkymist

18:59 mumptai has joined #milkymist

20:07 hypermodern has left #milkymist [#milkymist]

20:29 <Fallenou> mwalle: activating and disactivating both TLB is what I called "going into kernel/user mode"

20:30 <Fallenou> the name is badly chosen I agree

20:30 <Fallenou> I will change this :)

20:31 <Fallenou> documentation is now available there : https://github.com/fallen/milkymist-mmu/wiki/Documentation-of-milkymist-mmu

20:32 <Fallenou> so switching off I/D TLB is using the command "switch to kernel mode" : number 5'h8

20:32 <Fallenou> but since lowest bit is for chosing ITLB or DTLB

20:32 <Fallenou> you multiply it by two (shift left)

20:32 <Fallenou> so you write 0x10 to TLBCTRL

20:32 <Fallenou> lowest bit 0 => acts on ITLB

20:33 <Fallenou> writting 0x11 to TLBCTRL => lowest bit 1 => acts on DTLB

20:34 <Fallenou> err all of this is switching ON the TLBs ... sorry I was wrong in my last email

20:35 <Fallenou> ok I'm definitely tired ...

20:36 <Fallenou> I repeat correctly, switching OFF T/D TLB is using the command "switch to kernel mode" : number 5'h4

20:37 <Fallenou> so you shift left the command ID

20:37 <Fallenou> it becomes 0x8

20:37 <Fallenou> so you write 0x8 to TLBCTRL to switch OFF ITLB, and 0x9 (lowest bit set) to switch OFF DTLB.

20:37 <Fallenou> that's my last word !

20:46 <qi-bot> The firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1910/

21:19 mumptai has quit [Quit: Verlassend]

23:56 Jia has joined #milkymist