elldekaa has quit [Remote host closed the connection]
xiangfu has joined #milkymist
rejon has joined #milkymist
cladamw has joined #milkymist
sh4rm4 has quit [Ping timeout: 276 seconds]
sh4rm4 has joined #milkymist
rejon has quit [Ping timeout: 264 seconds]
cladamw has quit [Quit: Ex-Chat]
rejon has joined #milkymist
rejon has quit [Ping timeout: 260 seconds]
kristianpaul has quit [Ping timeout: 248 seconds]
kristianpaul has joined #milkymist
azonenberg has quit [Ping timeout: 245 seconds]
azonenberg has joined #milkymist
jimmythehorn has joined #milkymist
xiangfu has quit [Ping timeout: 252 seconds]
Martoni has joined #milkymist
rejon has joined #milkymist
lekernel has quit [Quit: Konversation terminated!]
rejon has quit [Ping timeout: 264 seconds]
elldekaa has joined #milkymist
rejon has joined #milkymist
rejon_ has joined #milkymist
rejon has quit [Ping timeout: 255 seconds]
kilae has joined #milkymist
azonenberg has quit [Ping timeout: 245 seconds]
Jia has quit [Quit: Konversation terminated!]
rejon_ has quit [Ping timeout: 260 seconds]
robmyers has quit [*.net *.split]
fpgaminer has quit [*.net *.split]
mwalle has quit [*.net *.split]
robmyers has joined #milkymist
fpgaminer has joined #milkymist
mwalle has joined #milkymist
xiangfu has joined #milkymist
robmyers has quit [*.net *.split]
fpgaminer has quit [*.net *.split]
mwalle has quit [*.net *.split]
robmyers has joined #milkymist
fpgaminer has joined #milkymist
mwalle has joined #milkymist
kristianpaul has quit [Ping timeout: 248 seconds]
kristianpaul has joined #milkymist
xiangfu has quit [Ping timeout: 245 seconds]
xiangfu has joined #milkymist
antgreen has joined #milkymist
xiangfu has quit [Ping timeout: 240 seconds]
mumptai_ has joined #milkymist
jimmythehorn has quit [Quit: jimmythehorn]
rejon_ has joined #milkymist
mumptai_ has quit [Ping timeout: 248 seconds]
lekernel has joined #milkymist
Martoni has quit [Quit: ChatZilla 0.9.88.2 [Firefox 14.0.1/20120713225625]]
mumptai has joined #milkymist
elldekaa has quit [Remote host closed the connection]
jimmythehorn has joined #milkymist
elldekaa has joined #milkymist
rejon_ has quit [Ping timeout: 252 seconds]
azonenberg has joined #milkymist
<larsc>
mwalle: hm, the fix doesn't work. I've now added explicit mv instructions for the syscall registers. This generates quite a bit of overhead, but well it works for now.
kilae has quit [Quit: ChatZilla 0.9.88.2 [Firefox 14.0.1/20120713134347]]
<mwalle>
still strange: http://pastebin.com/tVAvBqWN, likely() produces worse code than code without annotations
antgreen has quit [Ping timeout: 260 seconds]
lekernel has quit [Read error: Connection reset by peer]
lekernel_ has joined #milkymist
<Fallenou>
what's __builtin_expect() doing ?
<sh4rm4>
it optimizes code for the likely branch
<Fallenou>
oh ok, you as a developer tell gcc what branch you think is most likely to be taken
<Fallenou>
nice
<sh4rm4>
yep, like p = malloc()... if (unlikely(p)) { ...
<sh4rm4>
oops, !p
<Fallenou>
ok :)
<Fallenou>
got it, thx
<wpwrak>
mwalle: (likely/unlikely) interesting :)
<wpwrak>
mwalle: (itlb/dtlb) if you put it in the control word, this may add a few more checks and branches
lekernel_ is now known as lekernel
<wpwrak>
mwalle: (hassle) dunno if it's a hassle. if it's easy, you save one CSR write. cyclesss are (my) precioussss ;-)
<Fallenou>
I agree we should optimize tlb miss as much as possible
<Fallenou>
as they will happen a looot
<Fallenou>
I quickly had a look at your email wpwrak it sounded great, even if I didn't get everything
Gurty` has quit [Quit: ...]
mumptai has quit [Ping timeout: 260 seconds]
Gurty has joined #milkymist
lekernel has quit [Quit: Konversation terminated!]
<mwalle>
wpwrak: so we need at least an own tlb miss handler
<wpwrak>
yeah, i'd say so
<mwalle>
and since we dont want to distinguish between itlb and dtlb within the handler code, we need either make the hw remember the current tlb or we need two exceptions
<mwalle>
then i vote for the second
<mwalle>
to keep hw simple
<wpwrak>
Fallenou: i'm glad you like it :) and yes, while writing, i noticed how it was gradually getting trickier :)
<wpwrak>
mwalle: we could reuse Fallenou's idea of indicating the TLB with a bit in the address. e.g., VADDR[0]. that way, just writing/keeping the fault address would select the correct TLB.
<mwalle>
mh .. but if we need to set some magic bits in the paddr... thats one more instruction
<wpwrak>
mwalle: otherwise, yes, separate handlers are easier than fancy status bits
<mwalle>
we could use wcsr TLBCTRL, r1 too..
<wpwrak>
i would have the magic in the VADDR. use free PADDS bits for permission and such
<wpwrak>
s/PADDS/PADDR
<wpwrak>
(PADDR) because the PADDR can come straight from the page table. which is where all the other bits live, too
Padawan- has quit [Ping timeout: 248 seconds]
<mwalle>
wpwrak: the thing is, with the bits magic, PADDR and VADDR are two seperate registers for both TLBS and you select between them with the lowest bit but you have to set that bit for both PADDR and VADDR
<wpwrak>
so the PADDR would be sort of a union { unsigned page_addr:20; unsigned flags:12; }
<wpwrak>
why ? have one set of registers and only use VADDR[0] to select the TLB
<mwalle>
ah, yes ;)
<wpwrak>
when entering the TL miss handler, VADDR is already pre-set, including VADDR[0]
<wpwrak>
then, with my "magic" PADDR write, you just fetch the page table entry and write to update. very simple :)
<mwalle>
ok agreed, then one exceptions is enought, isnt it?
<mwalle>
because the handler dont need to distiguish between instruction and data tlb
<wpwrak>
should be, yes. as far as i can tell, there's nothing really different between them anyway
<wpwrak>
oh course, while implementing things, surprises may surface :)
<mwalle>
and BADADDR should be the same CSR number as VADDR
<mwalle>
with the lowest bit set or not?
<wpwrak>
aye. and more than that, it should be the same register. not one being a read and the other a write register that have nothing in common
<wpwrak>
if we use the (VADDR[0] ? I : D)##TLB approach, yes
<mwalle>
masking that bit out, doesnt cost us an instruction, right? because we need to shift anyway?
<wpwrak>
we never look the lower 12 bits. that is, unless we run into trouble. but that's no longer in the fast path
<wpwrak>
trouble = segfault / oops
<mwalle>
ok so to conclude, we have an TLBVADDR, which is read/write (read for BADADDR), VADDR[0] indicates tlb, writes to PADDR triggers a TLB update
<mwalle>
TLBCTRL is only needed for invaldiation/flushing
<mwalle>
(yet)
<mwalle>
reading PADDR should return the TLB entry for a given VADDR imho
<mwalle>
shouldnt be too hard in H/W ;)
<Fallenou>
00:50 < wpwrak> when entering the TL miss handler, VADDR is already pre-set, including VADDR[0] < the one you read from, not the one you write to
<wpwrak>
hmm. do we have a read strobe ?
<Fallenou>
if you read from VADDR, you get the address causing the miss
<wpwrak>
Fallenou: i would make the two the same
<Fallenou>
but behind the scene you have two different registers
<Fallenou>
one you read from (faulty address)
<wpwrak>
Fallenou: have one VADDR register. if there's a fault, you write the fault address to it
<Fallenou>
one you write to (to set up a mapping or invalidate a line)
<Fallenou>
well, ok it's possible I think :)
* Fallenou
's brain is heating up reading pipeline drawings
<wpwrak>
Fallenou: this means that you have to be careful when updating the TLB, but i think you don't need to be more careful than you already have to be
<mwalle>
gn8
<Fallenou>
gn8 !
<wpwrak>
Fallenou: e.g., avoid all exceptions and disable interrupts before doing any such thing. or else you may find yourself in trouble :)
<Fallenou>
wpwrak: indeed having the vaddr already set up is good, because you don't have to set it using software when updating the line
<Fallenou>
but beware not having another miss while handling the previous
<Fallenou>
you would lose vaddr
<Fallenou>
but it should not happen
<Fallenou>
irq are disabled, that's ok
<Fallenou>
exceptions are not
<Fallenou>
lm32's exception are not designed to be "turned off"
<mwalle>
Fallenou: but mmu is disabled, so you cant get another miss
<Fallenou>
yes
<Fallenou>
we just have to try not using misaligned load/store, avoid divide by zero, etc etc :)
<Fallenou>
and we should be ok :p
<Fallenou>
gn8 mwalle !
<wpwrak>
you can also have non-{exception
<wpwrak>
let's try this again
* Fallenou
is fighting against icache refilling during tlb miss
<wpwrak>
you can also have non-{exception|interrupt} TLB updates that get interrupted by an exception/interrupt and have VADDR changed by a fault
<Fallenou>
oh, yes
<Fallenou>
right
<Fallenou>
that could really happen
<wpwrak>
which wouldn't be much fun either. so you have to 1) disable interrupts and 2) code it such that you can't fault in the middle of the operation
<mwalle>
mh not interrupts, because we can say 'you have to turn of interrupts'
<mwalle>
but then still, we could have a fault...
<mwalle>
page miss
<Fallenou>
yes but your code which is playing with tlb can generate a tlb miss
<wpwrak>
just put it into a bit of asm("...") :)
<Fallenou>
you have to be sure you don't cross a page boundary
<wpwrak>
load registers first, then work from registers
<wpwrak>
ah, ITLB miss. right.
<Fallenou>
yes
<mwalle>
mhmhmh, why does the os need to run with mmus enabled?
<wpwrak>
yes, you need alignment, too :)
<Fallenou>
mwalle: because OS needs virtual addressing
<Fallenou>
I mean kernel
<Fallenou>
for vmalloc etc
<mwalle>
just because linux expected it to be at 0xc0000000 ?!
<wpwrak>
don't modules vmalloc their code space ?
<mwalle>
Fallenou: well it does run without it atm ;)
<Fallenou>
I am really not a linux expert but several people told me "linux runs with MMU enabled"
<Fallenou>
yes indeed :p
<mwalle>
wpwrak: dunno
<Fallenou>
I guess if you don't have vmalloc for kernel you may lose a lot of functionnalities ? (modules ?)
<Fallenou>
don't know exactly
<Fallenou>
a good question to ask on kernelnewbies :)
<wpwrak>
vmalloc is normally used for data. but in the back of my head, i seem to remember that modules do ugly things.
<mwalle>
do we really need to update the tlb from other places than the miss handler?
<wpwrak>
of course, that may just be to avoid having to make a large contiguous allocation
<Fallenou>
in theory no mwalle but there must be exceptions
<wpwrak>
yes, now it makes sense
<Fallenou>
that we don't think of
<Fallenou>
most kernel code that interacts with user space (ioctl, syscalls) are allocating using vmalloc
<wpwrak>
mwalle: invalidation
<Fallenou>
because user space does not care if it's physically contigious or not
<wpwrak>
mwalle: writing a valid entry ... maybe not
<Fallenou>
and physically contiguous memory is a rare resource
<Fallenou>
you need to keep it for hardware/dma
<mwalle>
wpwrak: mah,, so simple.. ;) invalidation, right..
<wpwrak>
we could always just update the page table and let the TLB miss handler do the propagation
<Fallenou>
well you need to apply changes to the tlb
<mwalle>
wpwrak: yeah but as you said invalidation of just one entry wont work
<Fallenou>
else the mapping will not be invalidated
<wpwrak>
Fallenou: errr, vmalloc'ed memory normally doesn't go to user space
<Fallenou>
and between write to VADDR and write to TLBCTRL you can be interrupted by a itlb miss
<Fallenou>
wpwrak: really ?
<Fallenou>
vmalloc'ed+zeroed ?
* Fallenou
opens his kernel book at page vmalloc()
<wpwrak>
it's for large kernel-internal allocations. not sure if you could actually send vmalloc'ed memory to user space. (i mean, whether there's code that does that. of course, you could hack it.)
<mwalle>
so guys think again ;) there must be a smart solution for this ;)
<mwalle>
im going to bed
<wpwrak>
we could have a INVALIDATE_VADDR register. that would make the operation atomic ;-)
<Fallenou>
indeed when modules are loaded into memory
<Fallenou>
they are loaded in memory allocated via vmalloc
<mwalle>
wpwrak: please not ;)
<mwalle>
smart solution ;)
<Fallenou>
ok let's think a bit more, tomorrow :p
<wpwrak>
mwalle: how about "if you write to TLBCTRL, the VADDR is fetched from r1" ? ;-)
<Fallenou>
gn8 mwalle ! it's great to have those brainstormings :)
<Fallenou>
no please no
<Fallenou>
no regular registers
<wpwrak>
(-:C
<Fallenou>
it's a pain , really
<mwalle>
wpwrak: still too transparent :b
<wpwrak>
have a two-level FIFO. if you get an exception while it's half-full, adjust the return address such that you return to the previous instruction :)
<mwalle>
to make it easier to fetch data, load the value at r1 ;b
<Fallenou>
wpwrak: ohoh, not bad :)
<Fallenou>
a bit hackish
<wpwrak>
does the WCSR instruction have any unused bits ? maybe we could hide the TLBCTRL command there (-:C
<Fallenou>
let the poor mwalle go to sleep
<Fallenou>
I should go as well
<wpwrak>
he should have plenty of material for some impressive nightmares ;-)
<mwalle>
haha
<Fallenou>
wpwrak: only possible unuseds bits are "csr id" but I don't think we can divide their numbers by two
<Fallenou>
or we limit value written to csr to 15 bits :p
<Fallenou>
bweark
<mwalle>
Fallenou: the opcode itself has many unused bits
<mwalle>
csr id is only 5 bits wide iirc
<wpwrak>
well, if all else fails, we can still do just what i've already described above: disable interrupts, make sure you don't get a DTLB miss, and align the code such that it doesn't cross a page boundary.
<wpwrak>
plan B: make a trampolin that executes with the TLB off
<Fallenou>
oh right
<Fallenou>
indeed wcsr only write from register
<Fallenou>
so you have the 16 immediate bits for free
<wpwrak>
16 bits to play with. bwahaha ! :)
<Fallenou>
maybe we could use them ^^
<Fallenou>
that would make "two wcsr instructions"
<wpwrak>
48 bit CSR registers :)
<Fallenou>
we could put tlbctrl commands in those 16 bits
<Fallenou>
and then suppress tlbctrl
<Fallenou>
only write to vaddr/paddr with the command hidden in the 16 lower bits
<Fallenou>
that would be a mess :( plenty of wcsr*** commands
<wpwrak>
asm("") is your friend :)
<Fallenou>
or that would mean adding a tlbwcsr statement in gnu-as, which takes two args
<Fallenou>
1 register and 1 immediate
<wpwrak>
we could also keep the TLBCTRL code in the lower bits of VADDR
<Fallenou>
oh, right
<Fallenou>
simpler
<wpwrak>
write to PADDR -> update entry. write VADDR -> perform operation. (and make one a no-op)
<Fallenou>
yes, good idea
<wpwrak>
(tlbwcsr) three args. you may have more than one destination CSR :)
<Fallenou>
01:26 < wpwrak> write to PADDR -> update entry. write VADDR -> perform operation. (and make one a no-op) <= I guess you solved the problem ?
<Fallenou>
this seems enough to take care of everything
<wpwrak>
already ? that would be disappointingly easy :)
<Fallenou>
well that makes invalidating a line a single instruction
<Fallenou>
only writting to VADDR
<wpwrak>
but yes, in case we find we need anything more, we can always introduce extra CSRs
<wpwrak>
i think atomic operations are the best approach. no need to worry about a lot of things.
<Fallenou>
tlb flush would be performed using vaddr ? or still tlbctrl ?
<Fallenou>
would be kind of ugly to do this through vaddr I think
<Fallenou>
since it involves ALL the tlb, and not a vaddr in particular
<wpwrak>
only perhaps what happens if your're executing a TLB change and a ITLB miss occurs when fetching one of the following instructions. may be a bit delicate not to mess this up :)
<Fallenou>
and tlbctrl seems useless now that you can feed commands in vaddr lower bits :(
<wpwrak>
yeah, we killed TLBCTRL :)
<Fallenou>
I think too
<Fallenou>
RIP
<wpwrak>
VADDR could take of all the operations. we have a lot of bits to play with :)