lekernel changed the topic of #milkymist to: Milkymist One, Migen, Milkymist SoC & Flickernoise :: Logs: http://en.qi-hardware.com/mmlogs :: EHSM Berlin Dec 28-30 http://ehsm.eu :: latest video http://www.youtube.com/playlist?list=PL181AAD8063FCC9DC
Jia has joined #milkymist
xiangfu has joined #milkymist
fpgaminer has joined #milkymist
<wolfspraul> good morning
<wolfspraul> the other day lekernel said fpgas are mostly designed/optimized for synchronous designs
<wolfspraul> I'm wondering what the underlying technical reasons are? What specifically makes them geared towards synchronous vs. asynchronous designs?
rejon has quit [Ping timeout: 264 seconds]
rejon has joined #milkymist
<wpwrak> perhaps the structure of the clock distribution ?
<wpwrak> e.g., perhaps many things connect to the same clock ? (instead of letting you just use any random signal as clock)
<wpwrak> Fallenou: wow. 1 k TLB entries. that's **!!!*HUGE*!!!**. 1k+1k may be the largest TLB in existence for a uniprocessor design :)
rejon has quit [Ping timeout: 255 seconds]
rejon has joined #milkymist
rejon has quit [Ping timeout: 250 seconds]
rejon has joined #milkymist
cladamw has joined #milkymist
mumptai has joined #milkymist
rejon has quit [Ping timeout: 248 seconds]
<Fallenou> (huge TLB) : yes but we don't have hardware page table walker, so tlb miss will be quite expensive (exception raised, then software lookup, TLB refill, and return from exception)
<Fallenou> so we want to avoid spending all the cpu resources on TLB refilling :)
<Fallenou> and we have BlockRAM resources, better use them ! ;)
rejon has joined #milkymist
Martoni has joined #milkymist
rejon_ has joined #milkymist
rejon has quit [Read error: Connection reset by peer]
mumptai has quit [Ping timeout: 264 seconds]
cladamw has quit [Quit: Ex-Chat]
<wpwrak> well, if you have block RAM to burn ... ;-)
<wpwrak> the TLB size probably doesn't help all that much to improve performance. but let's see ...
<qi-bot> The MMU firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-mmu-20120709-0910/
<Fallenou> wpwrak: anyway it's not hard coded, you can easily change the TLB size in the code
<Fallenou> and all the rest of the code will adapt to it
<Fallenou> index width, bit position etc
Jia has quit [Quit: Konversation terminated!]
<wpwrak> great
<Fallenou> wpwrak: that could be a marketing bullshit pitch line "we have a f*cking huge TLB"
<Fallenou> bigger than Cortex A15
<Fallenou> too bad our value is not in marketing BS :p
<wpwrak> use of the lowest bit of a phys/virt address as TLB selector seems a bit hackish. but perhaps you often need to mix other bits in there anyway ?
<wpwrak> "The M in M1 is for Monster" ;)
<Fallenou> ahah
<Fallenou> (but perhaps you often need to mix other bits in there anyway ?) <= what do you mean ?
* Fallenou didn't get that
<wpwrak> things like permission bits, when writing a TLB entry
<wpwrak> i suppose they would also be encoded in the address, wouldn't they ?
<Fallenou> yes
<Fallenou> in the page offset
<Fallenou> you have 12 free bits
<Fallenou> err 11 free bits
<Fallenou> because lowest bit is used to chose the TLB :)
<wpwrak> so these registers are write-only ?
<wpwrak> or do you get anything meaningful if you read them ?
<wpwrak> i should actually ask this on the list :)
<Fallenou> hehe that would allow someone else to benefit from the information
<Fallenou> you cannot read back what you have written in TLBCTRL, TLBPADDR and TLBVADDR
<Fallenou> the rcsr (read) gives you another kind of information
<Fallenou> reading tlbpaddr and tlbvaddr gives you address of latest tlb miss
<Fallenou> in my sample code, I read it in the miss handler
<Fallenou> sm volatile("rcsr %0, dtlbma" : "=r"(vaddr) :: );
<Fallenou> dtlbma is an alias for tlbpaddr or tlbvaddr I don't remember which one
<Fallenou> asm volatile("rcsr %0, itlbma" : "=r"(vaddr) :: );
<Fallenou> same thing here
<Fallenou> itlbma is an alias
<Fallenou> tlbctrl does not have anything bind to rcsr yet, maybe we will need to get some information one day, it could be used to retrieve some data
* Fallenou will put the README.txt documentation in his github Wiki to update it
* Fallenou going to eat
rejon_ has quit [Ping timeout: 255 seconds]
km2 has quit [Ping timeout: 276 seconds]
km2 has joined #milkymist
<Fallenou> wpwrak: thanks for your feed back on the ML :)
<Fallenou> I will detail a little bit later
<Fallenou> btw I meant others bits than ([21:12] and 0)
<Fallenou> and not "other bits than [21:12], and [0]"
<Fallenou> but you got the idea :)
<Fallenou> wpwrak : updated documentation is there : https://github.com/fallen/milkymist-mmu/wiki
xiangfu has quit [Quit: Leaving]
rejon has joined #milkymist
<wpwrak> ah yes, parse error on my end ;-)
<wpwrak> by the way, page 17 on www.latticesemi.com/documents/doc20890x45.pdf says "DataBusError exceptions are imprecise"
<wpwrak> could this perhaps be connected to the problems you're experiencing ?
<Fallenou> hum it's something else
<Fallenou> it's about when wishbone says "error !"
<Fallenou> I've seen a comment in the code about that
<Fallenou> let me find it
<wpwrak> ah, good. so you're not using that mechanism
<Fallenou> no I'm not
<Fallenou> it's for unaligned access or something like that
<wpwrak> hmm. i wonder if this could cause trouble for linux. hopefully only in theory
<Fallenou> well, we cannot forbid the user to try to do unaligned access
<Fallenou> so we have to handle this correctly in the exception handler
<Fallenou> unfortunatel
<Fallenou> +y
<Fallenou> maybe another upcoming surprise :)
<Fallenou> wpwrak: for now I only added two exception vectors : DTLB_MISS and ITLB_MISS
<wpwrak> i guess it all depends on just how much of a mess LM32 can leave behind in such a case
<Fallenou> I don't know yet if I will share those too with the "page fault" (read/write/execute protection stuff)
<Fallenou> or if I will add exception vector specific to protection fault
<wpwrak> how would permission checks work ? e.g., if there's a write to an address that isn't in the TLB. would the TLB fault handler add the entry, return, and then, if there's a permission issue, you'd get another fault ?
<GitHub169> [migen] sbourdeauducq pushed 1 new commit to master: https://github.com/milkymist/migen/commit/ed27783a5363cd80ad9409aa8298d40bcf8ed412
<GitHub169> [migen/master] fhdl: arrays (TODO: use correct BV for intermediate signals) - Sebastien Bourdeauducq
<wpwrak> and i think it would help to keep code paths short if you separate TLB fault and permission exceptions
<Fallenou> http://gattis.github.com/milkshake/ < ahah nice WebGL milkdrop renderer
<wpwrak> you also get a pseudo-exception, which is the page fault (i.e., if a page isn't present). that would basically be a continuation of the TLB fault
<wpwrak> in case we add a page table walker later, it would become an exception on its own and the TLB faults would disappear
<Fallenou> wpwrak: I would say : 1°) TLB miss exception, TLB is refilled, access is replayed 2°) protection fault because the type of access violates the page right
<Fallenou> so two exceptions
<Fallenou> wpwrak: page table walker is really a mess to implement
<Fallenou> it will touch a much broader part of the lm32 source code
<wpwrak> two exceptions sounds good. checking permissions in software may be messy
<Fallenou> I am trying to touch as little as possible the source code
<wpwrak> (page table walker) hmm, dunno. we'll see.
<wpwrak> btw, have you thought about the case of having multiple faults in the pipeline ?
<Fallenou> not yet, sorry :(
<wpwrak> well, it's also something that can hopefully wait a bit :)
<Fallenou> this week I will have very little time to spend on MMU
<Fallenou> but starting next week I will have a looooot more time than since january
<Fallenou> (no I'm not getting fired :p)
<wpwrak> have you convinced your boss to let you work on it at work ? :)
<Fallenou> hehe no
<Fallenou> and I don't think it will ever happen :p
<Fallenou> unless I convince them so integrate lm32 in their next ASIC
<wpwrak> heh :) now you have an objective
<Fallenou> which they won't since they are using more powerful ARM cores
<Fallenou> they are prototyping an asic with a Cortex A9 inside
<Fallenou> I think they won't switch to lm32 ^^
<wpwrak> hmm yes, may be hard to sell them that idea
<Fallenou> I could convince them, they only have 128 TLB entries ;)
<Fallenou> "with lm32 you can have 1k and more !"
<sh4rm4> hmm is an arm cortex that much faster than lm32 on an asic ? afaik you can only get ~500 mhz anyway
<Fallenou> arm cortex -> 800 MHz
<Fallenou> and you can get it up to 2 GHz
<lekernel> clock frequency is easy, just add more pipeline stages
<lekernel> doesn't mean software runs faster though ;)
<Fallenou> hehe sure
<wpwrak> for stages -> infinity: fCLK -> infinity && work_done -> 0 ;-)
<Fallenou> for stages -> infinity: time_for_instruction_completion -> infinity
<Fallenou> :p
<wpwrak> yup :) also makes debugging much easier. you'll never get an incorrect result :)
<Fallenou> lekernel: you canceled your presentation at RMLL ?
<kristianpaul> wolfspraul: basically for me, well because there are "lots" of flip flop, basically thats basic for rtl, adding that to wpwrak answer
<wolfspraul> why does "lots of ff" favors sync over async?
<wolfspraul> favor
jumpercable has quit [Quit: leaving]
<kristianpaul> because the clock distribution around then
<kristianpaul> i havent seens in detailt but must flip flip on a s6 LUT have a clock signal right?
<wolfspraul> yes
<wpwrak> i don't think the FFs per se make a difference
<wpwrak> but if you assume that all of yours FFs will use the same clock, it's easier to have a good clock distribution
<kristianpaul> okay, so there is a clocking distribution in the s6 swich matrix that allow have like said "main" clocks or such easilly
<wolfspraul> we are all just guessing :-)
<kristianpaul> ;)
<wpwrak> wolfspraul: you're the expert on the internal structures, so you should know :)
<wolfspraul> nah
<kristianpaul> yes ! ;-)
<wolfspraul> it's not just the structures, it's what they mean
<lekernel> there are lots of FFs (which are synchronous elements, already) and a limited clock routing
<wpwrak> i.e., do large groups of FFs generally share the same clock ?
<wolfspraul> why are the ff synchronous elements?
<lekernel> if you have too many clocks, they will use the local interconnect, which will cause a lot of skew
<lekernel> because they have a clock
<kristianpaul> yup
<wolfspraul> each slice has a sync/async flag, what does that mean?
<lekernel> as opposed to e.g. latches
<lekernel> though you can also use some FFs in latch mode
<wpwrak> (skew) ah, so you could actually go async if you insist. but at a cost.
<wolfspraul> half of them, but I think you then loose the other half :-) (nice punishment)
<lekernel> in fact, if you can deal with the skew problems, you can probably have nice async circuits
<lekernel> but the xilinx toolchain won't do it
<lekernel> you need a completely new toolchain if you want to do async
<wpwrak> slight complication :)
<wolfspraul> what does the sync/async flag in each slice mean?
<wpwrak> wolfspraul: so that's a feature for version 2 then ;-)
<lekernel> verilog/vhdl aren't even nice languages for async either
<lekernel> also, no one uses async these days... except in very limited portions of designs...
<wpwrak> lost on a maze of little always blocks :)
<lekernel> yes, exactly
<wpwrak> s/on/in/
<wolfspraul> ok I got it now - thanks a lot!
<lekernel> what's the per-slice sync/async flag?
<lekernel> Fallenou: yes, can't go for multiple reasons
<Fallenou> lekernel: too bad :(
hypermodern has joined #milkymist
<qi-bot> The firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1520/
jimmythehorn has joined #milkymist
lekernel_ has joined #milkymist
lekernel has quit [Ping timeout: 248 seconds]
sh4rm4 has quit [Remote host closed the connection]
sh4rm4 has joined #milkymist
mumptai has joined #milkymist
hypermodern has left #milkymist [#milkymist]
<Fallenou> mwalle: activating and disactivating both TLB is what I called "going into kernel/user mode"
<Fallenou> the name is badly chosen I agree
<Fallenou> I will change this :)
<Fallenou> so switching off I/D TLB is using the command "switch to kernel mode" : number 5'h8
<Fallenou> but since lowest bit is for chosing ITLB or DTLB
<Fallenou> you multiply it by two (shift left)
<Fallenou> so you write 0x10 to TLBCTRL
<Fallenou> lowest bit 0 => acts on ITLB
<Fallenou> writting 0x11 to TLBCTRL => lowest bit 1 => acts on DTLB
<Fallenou> err all of this is switching ON the TLBs ... sorry I was wrong in my last email
<Fallenou> ok I'm definitely tired ...
<Fallenou> I repeat correctly, switching OFF T/D TLB is using the command "switch to kernel mode" : number 5'h4
<Fallenou> so you shift left the command ID
<Fallenou> it becomes 0x8
<Fallenou> so you write 0x8 to TLBCTRL to switch OFF ITLB, and 0x9 (lowest bit set) to switch OFF DTLB.
<Fallenou> that's my last word !
<qi-bot> The firmware build was successful, see images here: http://fidelio.qi-hardware.com/~xiangfu/build-milkymist/milkymist-firmware-20120709-1910/
mumptai has quit [Quit: Verlassend]
Jia has joined #milkymist