#milkymist on 2012-07-23 — irc logs at freenode.irclog.whitequark.org

2012-06-17 20:21 lekernel changed the topic of #milkymist to: Milkymist One, Migen, Milkymist SoC & Flickernoise :: Logs: http://en.qi-hardware.com/mmlogs :: EHSM Berlin Dec 28-30 http://ehsm.eu :: latest video http://www.youtube.com/playlist?list=PL181AAD8063FCC9DC

00:05 Jia has joined #milkymist

00:21 elldekaa has quit [Remote host closed the connection]

00:33 xiangfu has joined #milkymist

01:05 rejon has joined #milkymist

01:13 cladamw has joined #milkymist

01:39 sh4rm4 has quit [Ping timeout: 276 seconds]

01:40 sh4rm4 has joined #milkymist

01:49 rejon has quit [Ping timeout: 264 seconds]

02:07 cladamw has quit [Quit: Ex-Chat]

02:40 rejon has joined #milkymist

02:51 rejon has quit [Ping timeout: 260 seconds]

02:54 kristianpaul has quit [Ping timeout: 248 seconds]

02:55 kristianpaul has joined #milkymist

03:32 azonenberg has quit [Ping timeout: 245 seconds]

04:21 azonenberg has joined #milkymist

05:20 jimmythehorn has joined #milkymist

06:01 xiangfu has quit [Ping timeout: 252 seconds]

06:21 Martoni has joined #milkymist

06:35 rejon has joined #milkymist

06:54 lekernel has quit [Quit: Konversation terminated!]

07:35 rejon has quit [Ping timeout: 264 seconds]

07:44 elldekaa has joined #milkymist

07:56 rejon has joined #milkymist

08:02 rejon_ has joined #milkymist

08:04 rejon has quit [Ping timeout: 255 seconds]

08:55 kilae has joined #milkymist

09:44 azonenberg has quit [Ping timeout: 245 seconds]

10:04 Jia has quit [Quit: Konversation terminated!]

10:11 rejon_ has quit [Ping timeout: 260 seconds]

12:09 robmyers has quit [*.net *.split]

12:09 fpgaminer has quit [*.net *.split]

12:09 mwalle has quit [*.net *.split]

12:10 robmyers has joined #milkymist

12:10 fpgaminer has joined #milkymist

12:10 mwalle has joined #milkymist

12:16 xiangfu has joined #milkymist

12:21 robmyers has quit [*.net *.split]

12:21 fpgaminer has quit [*.net *.split]

12:21 mwalle has quit [*.net *.split]

12:22 robmyers has joined #milkymist

12:22 fpgaminer has joined #milkymist

12:22 mwalle has joined #milkymist

12:25 kristianpaul has quit [Ping timeout: 248 seconds]

12:44 kristianpaul has joined #milkymist

13:07 xiangfu has quit [Ping timeout: 245 seconds]

13:38 xiangfu has joined #milkymist

13:42 antgreen has joined #milkymist

13:43 xiangfu has quit [Ping timeout: 240 seconds]

14:50 mumptai_ has joined #milkymist

15:20 jimmythehorn has quit [Quit: jimmythehorn]

15:48 rejon_ has joined #milkymist

15:49 mumptai_ has quit [Ping timeout: 248 seconds]

16:22 lekernel has joined #milkymist

16:31 Martoni has quit [Quit: ChatZilla 0.9.88.2 [Firefox 14.0.1/20120713225625]]

16:42 mumptai has joined #milkymist

16:51 elldekaa has quit [Remote host closed the connection]

17:16 jimmythehorn has joined #milkymist

17:17 elldekaa has joined #milkymist

17:23 rejon_ has quit [Ping timeout: 252 seconds]

17:24 azonenberg has joined #milkymist

19:57 <larsc> mwalle: hm, the fix doesn't work. I've now added explicit mv instructions for the syscall registers. This generates quite a bit of overhead, but well it works for now.

19:57 <larsc> BusyBox v1.18.5 (2011-12-27 18:34:58 CET) hush - the humble shell

19:57 <larsc> Enter 'help' for a list of built-in commands.

19:57 <larsc> Linux (none) 3.5.0+ #2412 Mon Jul 23 21:58:10 CEST 2012 lm32 GNU/Linux

19:57 <larsc> # uname -a

20:06 <Fallenou> :)

20:17 <kristianpaul> oh

20:17 <kristianpaul> and the other busybox utils works?

20:21 <larsc> yes

20:24 <larsc> no mmu yet, though ;)

20:27 <Fallenou> really ? no one did a mmu ?

20:28 <Fallenou> sounds trivial to do :o

20:29 * Fallenou returns to its pipeline drawings

20:32 <kristianpaul> larsc: GREAT !

20:32 <kristianpaul> so i can run my silly apps on uclinux now :)

20:42 <mwalle> larsc: does it work reliable? even in qemu?

20:42 <larsc> mwalle: I only tested in qemu so far

20:43 <sh4rm4> you can build uclibc from the official kernel sources ?

20:43 <mwalle> larsc: did you spawn some processes?

20:43 <sh4rm4> *uclinux

20:43 <larsc> mwalle: only a few

20:43 <mwalle> uclinux? isnt that really ancient? :)

20:43 <larsc> but i can run a while `true`; do data; done

20:44 <mwalle> larsc: mh ok, cool ;)

20:44 <sh4rm4> so this is a full linux ?

20:44 <mwalle> sh4rm4: no shared libs yet

20:44 <mwalle> sh4rm4: only static flat binarys

20:45 <mwalle> binaries

20:45 * kristianpaul no cares as soon have a serial port and a working busybox

20:45 <kristianpaul> mwalle: but it can run static elfs?

20:45 <sh4rm4> i didn't know you can build official linux without mmu

20:46 <kristianpaul> or i need embeded custom apps in owrt build?

20:46 <mwalle> kristianpaul: that worked for years now :b

20:46 <kristianpaul> he, asking just in case :)

20:47 <mwalle> but there was still a bug causing linux to abort in qemu (and hanging in real hw i guess)

20:47 <mwalle> and i guess signals arent working completely

20:48 <kristianpaul> what that last means?

20:48 <mwalle> (because thats still theobroma code)

20:48 <mwalle> larsc: correct me if im wrong ;)

20:49 <larsc> I added a bug to it as well, when i cleaned it up

20:49 <mwalle> kristianpaul: dunno, timers/alarms may not work

20:50 <larsc> ctrl+c works fine

20:50 <larsc> so signals are at least somewhat working

20:50 <mwalle> wpwrak: (tlb miss handler) my idea was to distinguish between itlb and dtlb by the control word you write in the control reg

20:51 <kristianpaul> larsc: what about mmu-less malloc?

20:51 <mwalle> TLBVADDR TLBPADDR would be shared between both

20:51 <mwalle> kristianpaul: sbrk should work

20:52 <mwalle> kristianpaul: of course theres no memory protection between tasks

20:52 <mwalle> larsc: btw i guess i should push my elf2flt patches upstream..

20:53 <larsc> anonymous mmap also works

20:56 <mwalle> wpwrak: i dunno if its really worth the hassle to implcitly update xTLB on TLBPADDR write

20:56 <mwalle> wpwrak: btw does unlikely work with gcc-lm32? :)

21:04 <mwalle> mh http://pastebin.com/ZUfrWs2s

21:04 <mwalle> strange

21:05 <mwalle> ah forget it ;)

21:15 kilae has quit [Quit: ChatZilla 0.9.88.2 [Firefox 14.0.1/20120713134347]]

21:18 <mwalle> still strange: http://pastebin.com/tVAvBqWN, likely() produces worse code than code without annotations

21:20 antgreen has quit [Ping timeout: 260 seconds]

21:20 lekernel has quit [Read error: Connection reset by peer]

21:20 lekernel_ has joined #milkymist

21:22 <Fallenou> what's __builtin_expect() doing ?

21:25 <sh4rm4> it optimizes code for the likely branch

21:25 <Fallenou> oh ok, you as a developer tell gcc what branch you think is most likely to be taken

21:25 <Fallenou> nice

21:26 <sh4rm4> yep, like p = malloc()... if (unlikely(p)) { ...

21:26 <sh4rm4> oops, !p

21:26 <Fallenou> ok :)

21:26 <Fallenou> got it, thx

21:27 <wpwrak> mwalle: (likely/unlikely) interesting :)

21:29 <wpwrak> mwalle: (itlb/dtlb) if you put it in the control word, this may add a few more checks and branches

21:30 lekernel_ is now known as lekernel

21:30 <wpwrak> mwalle: (hassle) dunno if it's a hassle. if it's easy, you save one CSR write. cyclesss are (my) precioussss ;-)

21:31 <Fallenou> I agree we should optimize tlb miss as much as possible

21:32 <Fallenou> as they will happen a looot

21:33 <Fallenou> I quickly had a look at your email wpwrak it sounded great, even if I didn't get everything

21:44 Gurty` has quit [Quit: ...]

21:51 mumptai has quit [Ping timeout: 260 seconds]

21:56 Gurty has joined #milkymist

22:01 lekernel has quit [Quit: Konversation terminated!]

22:35 <mwalle> wpwrak: so we need at least an own tlb miss handler

22:36 <wpwrak> yeah, i'd say so

22:37 <mwalle> and since we dont want to distinguish between itlb and dtlb within the handler code, we need either make the hw remember the current tlb or we need two exceptions

22:37 <mwalle> then i vote for the second

22:37 <mwalle> to keep hw simple

22:37 <wpwrak> Fallenou: i'm glad you like it :) and yes, while writing, i noticed how it was gradually getting trickier :)

22:38 <wpwrak> mwalle: we could reuse Fallenou's idea of indicating the TLB with a bit in the address. e.g., VADDR[0]. that way, just writing/keeping the fault address would select the correct TLB.

22:39 <mwalle> mh .. but if we need to set some magic bits in the paddr... thats one more instruction

22:39 <wpwrak> mwalle: otherwise, yes, separate handlers are easier than fancy status bits

22:39 <mwalle> we could use wcsr TLBCTRL, r1 too..

22:39 <wpwrak> i would have the magic in the VADDR. use free PADDS bits for permission and such

22:39 <wpwrak> s/PADDS/PADDR

22:40 <wpwrak> (PADDR) because the PADDR can come straight from the page table. which is where all the other bits live, too

22:41 Padawan- has quit [Ping timeout: 248 seconds]

22:41 <mwalle> wpwrak: the thing is, with the bits magic, PADDR and VADDR are two seperate registers for both TLBS and you select between them with the lowest bit but you have to set that bit for both PADDR and VADDR

22:41 <wpwrak> so the PADDR would be sort of a union { unsigned page_addr:20; unsigned flags:12; }

22:42 <wpwrak> why ? have one set of registers and only use VADDR[0] to select the TLB

22:42 <mwalle> ah, yes ;)

22:42 <wpwrak> when entering the TL miss handler, VADDR is already pre-set, including VADDR[0]

22:43 <wpwrak> then, with my "magic" PADDR write, you just fetch the page table entry and write to update. very simple :)

22:43 <mwalle> ok agreed, then one exceptions is enought, isnt it?

22:44 <mwalle> because the handler dont need to distiguish between instruction and data tlb

22:44 <wpwrak> should be, yes. as far as i can tell, there's nothing really different between them anyway

22:44 <wpwrak> oh course, while implementing things, surprises may surface :)

22:45 <mwalle> and BADADDR should be the same CSR number as VADDR

22:45 <mwalle> with the lowest bit set or not?

22:45 <wpwrak> aye. and more than that, it should be the same register. not one being a read and the other a write register that have nothing in common

22:46 <wpwrak> if we use the (VADDR[0] ? I : D)##TLB approach, yes

22:47 <mwalle> masking that bit out, doesnt cost us an instruction, right? because we need to shift anyway?

22:47 <wpwrak> we never look the lower 12 bits. that is, unless we run into trouble. but that's no longer in the fast path

22:47 <wpwrak> trouble = segfault / oops

22:50 <mwalle> ok so to conclude, we have an TLBVADDR, which is read/write (read for BADADDR), VADDR[0] indicates tlb, writes to PADDR triggers a TLB update

22:50 <mwalle> TLBCTRL is only needed for invaldiation/flushing

22:50 <mwalle> (yet)

22:51 <mwalle> reading PADDR should return the TLB entry for a given VADDR imho

22:51 <mwalle> shouldnt be too hard in H/W ;)

22:53 <Fallenou> 00:50 < wpwrak> when entering the TL miss handler, VADDR is already pre-set, including VADDR[0] < the one you read from, not the one you write to

22:53 <wpwrak> hmm. do we have a read strobe ?

22:53 <Fallenou> if you read from VADDR, you get the address causing the miss

22:53 <wpwrak> Fallenou: i would make the two the same

22:53 <Fallenou> but behind the scene you have two different registers

22:53 <Fallenou> one you read from (faulty address)

22:53 <wpwrak> Fallenou: have one VADDR register. if there's a fault, you write the fault address to it

22:54 <Fallenou> one you write to (to set up a mapping or invalidate a line)

22:54 <Fallenou> well, ok it's possible I think :)

22:54 * Fallenou 's brain is heating up reading pipeline drawings

22:54 <wpwrak> Fallenou: this means that you have to be careful when updating the TLB, but i think you don't need to be more careful than you already have to be

22:55 <mwalle> gn8

22:55 <Fallenou> gn8 !

22:55 <wpwrak> Fallenou: e.g., avoid all exceptions and disable interrupts before doing any such thing. or else you may find yourself in trouble :)

22:55 <Fallenou> wpwrak: indeed having the vaddr already set up is good, because you don't have to set it using software when updating the line

22:55 <Fallenou> but beware not having another miss while handling the previous

22:56 <Fallenou> you would lose vaddr

22:56 <Fallenou> but it should not happen

22:56 <Fallenou> irq are disabled, that's ok

22:56 <Fallenou> exceptions are not

22:56 <Fallenou> lm32's exception are not designed to be "turned off"

22:56 <mwalle> Fallenou: but mmu is disabled, so you cant get another miss

22:56 <Fallenou> yes

22:57 <Fallenou> we just have to try not using misaligned load/store, avoid divide by zero, etc etc :)

22:57 <Fallenou> and we should be ok :p

22:57 <Fallenou> gn8 mwalle !

22:57 <wpwrak> you can also have non-{exception

22:57 <wpwrak> let's try this again

22:57 * Fallenou is fighting against icache refilling during tlb miss

22:58 <wpwrak> you can also have non-{exception|interrupt} TLB updates that get interrupted by an exception/interrupt and have VADDR changed by a fault

22:58 <Fallenou> oh, yes

22:58 <Fallenou> right

22:58 <Fallenou> that could really happen

22:58 <wpwrak> which wouldn't be much fun either. so you have to 1) disable interrupts and 2) code it such that you can't fault in the middle of the operation

22:58 <mwalle> mh not interrupts, because we can say 'you have to turn of interrupts'

22:59 <mwalle> but then still, we could have a fault...

22:59 <mwalle> page miss

22:59 <Fallenou> yes but your code which is playing with tlb can generate a tlb miss

22:59 <wpwrak> just put it into a bit of asm("...") :)

22:59 <Fallenou> you have to be sure you don't cross a page boundary

22:59 <wpwrak> load registers first, then work from registers

23:00 <wpwrak> ah, ITLB miss. right.

23:00 <Fallenou> yes

23:00 <mwalle> mhmhmh, why does the os need to run with mmus enabled?

23:00 <wpwrak> yes, you need alignment, too :)

23:00 <Fallenou> mwalle: because OS needs virtual addressing

23:00 <Fallenou> I mean kernel

23:00 <Fallenou> for vmalloc etc

23:00 <mwalle> just because linux expected it to be at 0xc0000000 ?!

23:00 <wpwrak> don't modules vmalloc their code space ?

23:01 <mwalle> Fallenou: well it does run without it atm ;)

23:01 <Fallenou> I am really not a linux expert but several people told me "linux runs with MMU enabled"

23:01 <Fallenou> yes indeed :p

23:01 <mwalle> wpwrak: dunno

23:02 <Fallenou> I guess if you don't have vmalloc for kernel you may lose a lot of functionnalities ? (modules ?)

23:02 <Fallenou> don't know exactly

23:02 <Fallenou> a good question to ask on kernelnewbies :)

23:02 <wpwrak> vmalloc is normally used for data. but in the back of my head, i seem to remember that modules do ugly things.

23:02 <mwalle> do we really need to update the tlb from other places than the miss handler?

23:02 <wpwrak> of course, that may just be to avoid having to make a large contiguous allocation

23:02 <Fallenou> in theory no mwalle but there must be exceptions

23:02 <wpwrak> yes, now it makes sense

23:02 <Fallenou> that we don't think of

23:03 <Fallenou> most kernel code that interacts with user space (ioctl, syscalls) are allocating using vmalloc

23:03 <wpwrak> mwalle: invalidation

23:03 <Fallenou> because user space does not care if it's physically contigious or not

23:03 <wpwrak> mwalle: writing a valid entry ... maybe not

23:03 <Fallenou> and physically contiguous memory is a rare resource

23:03 <Fallenou> you need to keep it for hardware/dma

23:03 <mwalle> wpwrak: mah,, so simple.. ;) invalidation, right..

23:04 <wpwrak> we could always just update the page table and let the TLB miss handler do the propagation

23:04 <Fallenou> well you need to apply changes to the tlb

23:04 <mwalle> wpwrak: yeah but as you said invalidation of just one entry wont work

23:04 <Fallenou> else the mapping will not be invalidated

23:04 <wpwrak> Fallenou: errr, vmalloc'ed memory normally doesn't go to user space

23:04 <Fallenou> and between write to VADDR and write to TLBCTRL you can be interrupted by a itlb miss

23:05 <Fallenou> wpwrak: really ?

23:05 <Fallenou> vmalloc'ed+zeroed ?

23:05 * Fallenou opens his kernel book at page vmalloc()

23:05 <wpwrak> it's for large kernel-internal allocations. not sure if you could actually send vmalloc'ed memory to user space. (i mean, whether there's code that does that. of course, you could hack it.)

23:06 <mwalle> so guys think again ;) there must be a smart solution for this ;)

23:06 <mwalle> im going to bed

23:06 <wpwrak> we could have a INVALIDATE_VADDR register. that would make the operation atomic ;-)

23:06 <Fallenou> indeed when modules are loaded into memory

23:06 <Fallenou> they are loaded in memory allocated via vmalloc

23:07 <mwalle> wpwrak: please not ;)

23:07 <mwalle> smart solution ;)

23:07 <Fallenou> ok let's think a bit more, tomorrow :p

23:07 <wpwrak> mwalle: how about "if you write to TLBCTRL, the VADDR is fetched from r1" ? ;-)

23:07 <Fallenou> gn8 mwalle ! it's great to have those brainstormings :)

23:08 <Fallenou> no please no

23:08 <Fallenou> no regular registers

23:08 <wpwrak> (-:C

23:08 <Fallenou> it's a pain , really

23:08 <mwalle> wpwrak: still too transparent :b

23:09 <wpwrak> have a two-level FIFO. if you get an exception while it's half-full, adjust the return address such that you return to the previous instruction :)

23:09 <mwalle> to make it easier to fetch data, load the value at r1 ;b

23:10 <Fallenou> wpwrak: ohoh, not bad :)

23:10 <Fallenou> a bit hackish

23:10 <wpwrak> does the WCSR instruction have any unused bits ? maybe we could hide the TLBCTRL command there (-:C

23:10 <Fallenou> let the poor mwalle go to sleep

23:10 <Fallenou> I should go as well

23:11 <wpwrak> he should have plenty of material for some impressive nightmares ;-)

23:11 <mwalle> haha

23:11 <Fallenou> wpwrak: only possible unuseds bits are "csr id" but I don't think we can divide their numbers by two

23:11 <Fallenou> or we limit value written to csr to 15 bits :p

23:11 <Fallenou> bweark

23:12 <mwalle> Fallenou: the opcode itself has many unused bits

23:12 <mwalle> csr id is only 5 bits wide iirc

23:12 <wpwrak> well, if all else fails, we can still do just what i've already described above: disable interrupts, make sure you don't get a DTLB miss, and align the code such that it doesn't cross a page boundary.

23:12 <wpwrak> plan B: make a trampolin that executes with the TLB off

23:13 <Fallenou> oh right

23:13 <Fallenou> indeed wcsr only write from register

23:13 <Fallenou> so you have the 16 immediate bits for free

23:14 <wpwrak> 16 bits to play with. bwahaha ! :)

23:14 <Fallenou> maybe we could use them ^^

23:14 <Fallenou> that would make "two wcsr instructions"

23:14 <wpwrak> 48 bit CSR registers :)

23:15 <Fallenou> we could put tlbctrl commands in those 16 bits

23:15 <Fallenou> and then suppress tlbctrl

23:15 <Fallenou> only write to vaddr/paddr with the command hidden in the 16 lower bits

23:16 <Fallenou> tlbwcsr tlbvaddr, my_value << 16 | my_command

23:16 <Fallenou> well no

23:16 <Fallenou> but you got the idea

23:16 <Fallenou> that would be a mess :( plenty of wcsr*** commands

23:17 <wpwrak> asm("") is your friend :)

23:17 <Fallenou> or that would mean adding a tlbwcsr statement in gnu-as, which takes two args

23:18 <Fallenou> 1 register and 1 immediate

23:18 <wpwrak> we could also keep the TLBCTRL code in the lower bits of VADDR

23:18 <Fallenou> oh, right

23:18 <Fallenou> simpler

23:18 <wpwrak> write to PADDR -> update entry. write VADDR -> perform operation. (and make one a no-op)

23:19 <Fallenou> yes, good idea

23:19 <wpwrak> (tlbwcsr) three args. you may have more than one destination CSR :)

23:19 <Fallenou> 01:26 < wpwrak> write to PADDR -> update entry. write VADDR -> perform operation. (and make one a no-op) <= I guess you solved the problem ?

23:20 <Fallenou> this seems enough to take care of everything

23:21 <wpwrak> already ? that would be disappointingly easy :)

23:21 <Fallenou> well that makes invalidating a line a single instruction

23:22 <Fallenou> only writting to VADDR

23:22 <wpwrak> but yes, in case we find we need anything more, we can always introduce extra CSRs

23:22 <wpwrak> i think atomic operations are the best approach. no need to worry about a lot of things.

23:22 <Fallenou> tlb flush would be performed using vaddr ? or still tlbctrl ?

23:23 <Fallenou> would be kind of ugly to do this through vaddr I think

23:23 <Fallenou> since it involves ALL the tlb, and not a vaddr in particular

23:23 <wpwrak> only perhaps what happens if your're executing a TLB change and a ITLB miss occurs when fetching one of the following instructions. may be a bit delicate not to mess this up :)

23:23 <Fallenou> and tlbctrl seems useless now that you can feed commands in vaddr lower bits :(

23:24 <wpwrak> yeah, we killed TLBCTRL :)

23:24 <Fallenou> I think too

23:24 <Fallenou> RIP

23:24 <wpwrak> VADDR could take of all the operations. we have a lot of bits to play with :)

23:24 <Fallenou> sure

23:25 <Fallenou> enough killing for tonight

23:25 <Fallenou> see you tomorrow :)

23:25 <Fallenou> gn8 !

23:26 <wpwrak> sweet dreams ! :)

23:27 <Fallenou> thanks, you too !

23:28 <wpwrak> hmm, in a few hours :)