#milkymist on 2011-04-27 — irc logs at freenode.irclog.whitequark.org

01:46 <kristianpaul> Fallenou: so in rtems i just need modify a pointer value to freely point memory isnt? or there are limitations about what i can point considering that the FS is also ram..

01:46 <kristianpaul> it may sound stupid, but i need confirm

06:07 <Fallenou> kristianpaul: look at how registers are read from and written to

06:07 <Fallenou> you van use directly the address

06:07 <Fallenou> beware of cache pb, volatile etc

08:12 <terpstra> lekernel, were you imagining a separate TLB for instruction and data buses? (seems best given the dual-ported design of the LM32)

08:33 <lekernel> yeah

08:52 <lekernel> aw: so it seems the new protection system works great

08:53 <lekernel> http://en.qi-hardware.com/wiki/Protection_of_Reversed_Polarity_on_DC_plug-in#Sch._D

08:53 <aw> yes, SCH D.

08:53 <aw> i am using the official adapter to record data again though...

08:54 <aw> the holdinf current seems that actually higher than any I did before. yup off course it must be. due to 2A fuse.

08:55 <aw> meanwile I am watching the temperature especially with our current adapter to see the surroundings temperature around DC jack.

08:56 <aw> this is most now I am checking though. :-)

08:59 <aw> you can see also the whole marked '2.85A' is my limited from lab. power supply...and it's output/capability won't drop too much when loading. I think that I need to have a 'burning run' for at least 1 week or do a ageing on adapter.

09:02 <aw> lekernel, i really forgot that 1A must be available for two host usb, thanks thatÂ Â last email from you to remind me.

09:12 <lekernel> aw: why do you have 40mA going through the diodes at 5V?

09:13 <aw> lekernel, where?

09:13 <lekernel> table 1 non-reversed 5 4.994 4.992 0.04 / -

09:14 <aw> umm..it's no loads condition.(without MM1)

09:14 <lekernel> yes

09:14 <lekernel> there's another zener in the same series with a 5.6V voltage, maybe it's better to take that one

09:15 <aw> no.

09:15 <lekernel> with yours the minimum specified zener current is 4.85V, that might explain that 40mA current

09:15 <lekernel> s/zener current/zener voltage

09:16 <aw> when initially power up on NO LOAD circuit, the fuse is cold as before it goes into 'holding' stage.

09:16 <lekernel> you didn't get my point. the thing is that with a 5V voltage, your circuit should consume ZERO power. but instead of that you have 40mA through the diodes.

09:17 <aw> fuse can still have current flow even when it stays over a 'holding' value than current slowly rise to its 'cut/trip' current.

09:18 <lekernel> do we really want to have this protection circuit continually consume power and get hot?

09:18 <aw> hmm..i know you feel strange, later I'll measure again. :-) 5V is less than 5.1V. so you think there should no current. :-)

09:18 <lekernel> well your measurement is probably correct

09:19 <lekernel> the diode datasheet specifies that the zener voltage can be as low as 4.85V

09:19 <aw> i actually haven't not decided to if use this circuit now.

09:19 <lekernel> and we do not want that, so I'm suggesting that we take the 5.6V diode instead with a minimum voltage of 5.32V

09:19 <aw> when I saw/discovered those temperatures.

09:20 <aw> btw, now no matter if picking a 5.6V diode, I can imagine that temperature is stilll existed there though. this is worse than rc2.

09:21 <lekernel> it will still get hot **when the user exceeds the specified voltage**

09:21 <aw> well the true thing is this h/w batch is better than rc2 to have protection function.

09:21 <aw> but yes.

09:21 <lekernel> not when they use the recommended adapter

09:21 <lekernel> with your zener, it would get hot with the recommended adapter

09:22 <lekernel> getting hot when the user does something stupid isn't a problem

09:22 <aw> so i do really have not decided this though. I even think that I personally don't like this batch now.

09:22 <lekernel> I do. 2A fuse, 5.6V zener, done.

09:23 <aw> so I am trying to get how warm our adapter will up?

09:23 <lekernel> with the 5.6V zener there should be ZERO current and ZERO heating

09:24 <aw> well...good idea on 5.6V though.

09:25 <aw> usb spec needs 4.75~5.25V too. so 5.6V diode is over. that's why i picked 5.1V.

09:25 <lekernel> yeah I know

09:25 <aw> but I can try though to see how low it will be. :-)

09:26 <lekernel> but 5.6V for a short period shouldn't do much damage, and is definitely better than having 20V or so if the user is stupud

09:26 <lekernel> and there is still good protection for reversed polarity or AC adapters

09:26 <aw> but the true conditions on user we don't want them to use 20V adapter.

09:27 <lekernel> I know. but the whole point of this protection is to provide some security against human stupidity

09:27 <lekernel> and I insist on "some", as stupidity is infinite there can be no fully adequate protection

09:28 <aw> so like we declare that board "suggested input range: 4.75 V ~ ?V"..

09:28 <aw> yeah

09:28 <lekernel> no, we declare it as a *mandatory* input range

09:28 <lekernel> no change compared to rc2

09:28 <lekernel> but there is an additional safety belt if users do not listen to that

09:28 <lekernel> so it ends up doing less damage

09:29 <aw> well..wait I am supposedly .not provide a real conditional. :-)

09:30 <lekernel> 5.6V is 1N5339BG

09:30 <aw> yup...that's why i said that I haven't no idea/ or decided that if we need this h/w patches...too unknown conditions could be happended.

09:30 <lekernel> just take that, do some quick testing and go ahead

09:30 <aw> umm..yes

09:31 <aw> ha...you really want to try that 5.6V even it's over 5.25V for usb?

09:31 <lekernel> the wanted result is that the board should a) have no regression b) incur less or no damage when fed inappropriate voltages

09:31 <aw> surely i can quickly go for it.

09:31 <lekernel> yes, definitely

09:31 <lekernel> as I said

09:32 <lekernel> 5.6V on USB wouldn't damage much in most cases, and is still a lot better than whatever overvoltage an inappropriate adapter would give

09:32 <lekernel> imagine this situation: user plugs a 20V adapter to the M1

09:32 <aw> okay..imaginable though...

09:32 <lekernel> with USB devices on it

09:33 <lekernel> without the protection you get 20V on port, and this will probably break the USB devices

09:33 <aw> sorry that i am going to outside now...

09:33 <aw> talk yo later.

09:33 <lekernel> with the protection you get 5.6V or so for a dozen seconds, and this will probably NOT break the USB devices

09:33 <aw> later back to see this. cu

09:33 <lekernel> so. 2A fuse, 5.6V zener.

09:34 <lekernel> period :-p

09:34 <lekernel> cu

09:34 <aw> time to go..cu

09:45 <terpstra> lekernel, a thought: couldn't a LM32 TLB just work like CSRs work right now?

09:45 <terpstra> in 'kernel mode' there is no address translation

09:46 <terpstra> in 'user mode' you use a CAM lookup of these TLB registers for the appropriate page

09:46 <terpstra> and if there is a miss, a segfault exception is raised

09:46 <terpstra> and the OS has to fill in the missing page into some CSRs

09:46 <terpstra> we already have to save/restore 32 registers on context switch, so saving some 16-32 extra TLB entries doesn't seem like more more overhead

09:48 <terpstra> i guess the CSR namespace has been filled up too much with other CSRs, but a single new instruction 'WTLB' that behaves almost like the 'WCSR' should be enough to get the job done

10:04 <wpwrak> terpstra: the kernel also needs to be able to copy to/from user space. better if it can use the TLB for this, instead of having figure out these things "manually". could be a one-shot switch, though. e.g., set a bit that makes the next access use the TLB, then switch back.

10:04 <wpwrak> terpstra: another thing for the kernel: for vmalloc, you also want the MMU in kernel space

10:05 <terpstra> wpwrak, why does it need the tlb to copy to user space? it knows which page is at which address for the user-space, so it can just copy to the appropriate page's physical address

10:06 <wpwrak> terpstra: yes, it's possible but messy

10:06 <terpstra> if i recall correctly, the linux kernel already has a function you are supposed to call when accessing user-space memory via a pointer as provided from user-spave

10:07 <terpstra> ie: if you get a pointer from user-land via an ioctl, you are supposed to convert it for use inside kernel space

10:07 <wpwrak> terpstra: yup. you have these functions. as i said, you can do all this without mmu support, but it's a lot of overhead

10:08 <terpstra> not so much overhead as compared to reloading the TLB i'd wager... ?

10:08 <wpwrak> terpstra: for example, if you copy a string byte by byte, you need to do a page table lookup and permission check for each access. messy.

10:08 <terpstra> what? we would you do that?

10:08 <terpstra> do it one lookup for the block transfer

10:08 <wpwrak> terpstra: for larger accesses, you also have to check if you're crossing a page boundary

10:09 <terpstra> crossing page boundary, sure

10:09 <terpstra> but doing a single table lookup per page copied sounds like negligible overhead to me

10:09 <wpwrak> terpstra: yes, if this is implemented as a block transfer. this isn't always the case.

10:10 <wpwrak> (reloading the tlbs) why not have two ? one for user space and one for kernel space

10:10 <terpstra> area cost

10:10 <wpwrak> is the cost prohibitively high ?

10:10 <terpstra> well TLB will need to be a fairly high associative cache

10:10 <terpstra> and we'll need one for each bus already

10:11 <terpstra> making kernel-mode need it too doubles the cost

10:11 <wpwrak> (each bus) you mean instruction and data ?

10:11 <terpstra> yes

10:12 <wpwrak> you probably don't need an I-TLB for the kernel. so the extra cost is only +50% ;)

10:13 <terpstra> for an FPGA we probably can't make it a fully associative cache like in a real CPU... as we don't want to use tons of registers, so we will need a 2- or 4-way associative TLB in order to use FPGA ram blocks

10:13 <terpstra> TLB is going to be really expensive in area i think

10:14 <terpstra> going to be slow too. :-/

10:14 <wpwrak> well, you could make a really simple TLB (e.g., one entry) and collect statistics :)

10:16 <terpstra> you need in sequence: RAM block indexing (based on low page id bits), then comparison of TLB tag to high page bits, a MUX to pick the correct entry in the associative cache, then comparison of TLB result to L1 cache tag for the physical tagging check, finally the signal has to trigger an exception

10:16 <terpstra> that's some deep signalling...

10:17 <terpstra> all this happens between two clock edges

10:18 <wpwrak> yeah. well, you have to do this anyway, whether you have a kernel tlb or not.

10:18 <terpstra> yes

10:18 <terpstra> but kernel TLB just makes it even bigger ;)

10:19 <wpwrak> ah, and you don't need the kernel tlb for kernel/user space access. you'd just reuse the user space tlb. what you need is a way to switch it on while in kernel mode.

10:19 <terpstra> maybe just one TLB

10:20 <terpstra> and have kernel mode bit enable access to a 'restricted' memory range

10:20 <terpstra> then you can happily re-use user-space pointers when copying to/from your kernel-land memory in the restricted range

10:20 <wpwrak> not sure how badly you need vmalloc int the kernel. it's kinda frowned upon, not enough that people wouldn't use it ...

10:20 <terpstra> the restricted range doesn't go through TLB

10:21 <terpstra> think 1GB is enough memory for userland? ;)

10:21 <wpwrak> that would be more or less equivalent to a 2GB/2GB split. yes, a possibility

10:22 <terpstra> or maybe: 2GB user-land, 1GB kernel land, 1GB memory mapped IO non-cached region

10:22 <terpstra> user mode cannot access addresses with high bit set

10:22 <wpwrak> you're very generous with that address space :)

10:22 <terpstra> addresses with high bit set do not go through TLB

10:22 <wpwrak> well, for a first version that'll do. can always be improved later.

10:25 <terpstra> unfortunately, my idea of a WTLB instruction won't work

10:25 <terpstra> since a TLB entry will need to be 40 bits wide

10:26 <terpstra> well, i guess it could be made to work if we have 256 TLB entries. *cackle*

10:27 <terpstra> <1 bit user/kernel> <19 bits virtual page number> <12 bits page offset>

10:27 <wpwrak> why 40 bits ?

10:28 <terpstra> the 19 bits virtual page number = <13 bits TLB tag> <6 bits TLB index>

10:29 <terpstra> then your TLB entries have: <13 bits TLB tag> <19 bits physical address>

10:29 <terpstra> and it fits!

10:29 <terpstra> and only 32 TLB entries needed

10:29 <terpstra> (i was imagining a full 20-bits for virtual address and physical address)

10:30 <terpstra> this way you can pack it better, though

10:30 <wpwrak> ah, regarding the split. it's not so nice, because you'd then have to check that user pointers are in the correct address range, along with overflow issues. probably still better to have a means to just switch the user mode for the next access.

10:32 <wpwrak> you also need permission bits: read, write, and execute would be desirable, too

10:32 <terpstra> lies

10:32 <terpstra> we have two TLBs one for data and one for instruction

10:32 <terpstra> so execute means it is in the instruction TLB

10:32 <terpstra> i suppose read/write needs a bit, though for the data bus

10:32 <wpwrak> very good. so just one for write.

10:32 <wpwrak> yes

10:33 <terpstra> damn you

10:33 <wpwrak> hehe :)

10:33 <terpstra> there be not enough bits ;)

10:34 <terpstra> should it be possible for a user to map device memory ?

10:34 <terpstra> i suppose this is useful especially for a micro kernel

10:34 <wpwrak> hmm yes. that would be very nice to have.

10:35 <terpstra> so you need a full 20 bit physical address in the TLB

10:35 <wpwrak> also for plain user space. think the old architecture of the X server.

10:35 <wpwrak> or all my current atrocities surrounding UBB on the ben ;-)

10:35 <terpstra> so 20 bits for physical address, 1 bit for read/write flag.....

10:35 <terpstra> that means only 11 bits for the tag

10:36 <terpstra> i guess if you had 8 bits of TLB index (256 entries... eek)

10:37 <terpstra> that's too bgi

10:37 <terpstra> big*

10:37 <terpstra> or give up on fitting the TLB entry in 32 bits

10:39 <terpstra> or go for a bigger page size ;)

10:40 <wpwrak> keep things easy - use 1 GB pages :)

10:40 <terpstra> 8k page size would mean <19 bits physical address> and thus <12 bits virtual address tag> and only <6 bits for the TLB index>

10:41 <terpstra> so back to 32 TLB entries

10:41 <terpstra> that is nice

10:41 <wpwrak> plus, that way you'll find all the programs that assume that a page is 4 kB :)

10:42 <terpstra> they've been fixed already i think

10:42 <terpstra> debian must run on stuff with 8k pages by now

10:42 <terpstra> afk

10:43 <wpwrak> run or stumble :) well, you can try 8 k and if it sucks too much, go to 4 k

11:04 <lekernel> can't we just disable address translation in kernel mode?

11:04 <lekernel> this way we're also backward compatible with programs like RTEMS stuff that do not use the MMU

11:04 <lekernel> they just run in kernel mode all the time

11:08 <terpstra> lekernel, that's what i wanted to do too

11:08 <terpstra> but wpwrak says its a problem

11:09 <terpstra> so what do you think about just grabbing the entire TLB on context switch like we have to handle registers anyway?

11:10 <terpstra> it doesn't/shouldn't be so big as the L1 caches anyway

11:10 <lekernel> depends... how big is the TLB?

11:10 <lekernel> and how do we ensure compatibility with programs that do not use the MMU?

11:11 <terpstra> well, i also liked the idea that kernel mode = no MMU... then you have your compatability

11:11 <lekernel> I don't think there's a problem, Norman pointed out on the list that Microblaze does that

11:11 <terpstra> i've been reading around, and it seems that the TLB for mips isn't so big

11:11 <terpstra> even the AMD64 only has 1024 entries

11:11 <terpstra> so 32 should be fine i guess

11:11 <terpstra> probably 16 is already plenty

11:12 <terpstra> http://www.linux-mips.org/wiki/TLB

11:12 <terpstra> R2000 had 64 entries

11:13 <terpstra> R4000 had 32 to 64

11:13 <terpstra> (so later versions had less entries, which seems suggestive to me)

11:14 <lekernel> "TLB is organized as 3-way set associative."

11:14 <lekernel> hmm...

11:14 <terpstra> yeah, we definitely will need associativity

11:14 <lekernel> if we have only 32 entries, it can be fully associative, no?

11:14 <terpstra> i suppose we could try without at first tho

11:15 <terpstra> problem with fully associative is it rules out using RAM cells

11:15 <terpstra> you need full registers then

11:15 <terpstra> which is a lot

11:16 <terpstra> on my cyclon3 the LM32 needed only like 1k registers for the full design i think

11:16 <lekernel> we can also have no associativity and a lot of TLB entries to compensate

11:16 <lekernel> so we take advantage of the BRAM

11:16 <terpstra> i think for a first version this makes the most sense

11:16 <lekernel> but reloading the TLB would take time during context switches then...

11:17 <terpstra> however, i don't buy totally into the 2- and 4- way associative is like 2* and 4* bigger cache

11:17 <lekernel> though probably not a lot more than those architectures which flush the L1 caches on each context switches

11:17 <terpstra> there are many byzantine scenarios that can happen in practise where associativity is >>> more slots

11:17 <lekernel> yeah sure

11:18 <lekernel> as a general rule x-way associative has better performance than x times the size

11:18 <terpstra> but for a first version, i think non-associative makes sense

11:19 <lekernel> http://www.xilinx.com/support/documentation/application_notes/xapp203.pdf

11:20 <lekernel> non portable though

11:20 <terpstra> that's nice for you xilinx users

11:21 <lekernel> yeah... and xilinx patented the srl16 too

11:21 <terpstra> so basically one LUT can decode 4-bit index ?

11:21 <terpstra> that's possible on altera too

11:21 <terpstra> problem is that you can't reprogram the LUT at run time ;)

11:22 <terpstra> i guess this is the value added part of the xilinx approach?

11:22 <terpstra> ahh, yes, i see it now

11:22 <terpstra> SRL16E diagram

11:23 <terpstra> to mimic a SRL16E portably i would need 4 registers, and 3 LUTs i think

11:24 <terpstra> anyway

11:25 <terpstra> wpwrak, do you realllllly need the mmu in kernel mode?

11:25 <wpwrak> terpstra: maybe the best approach is to implement a trivially simple TLB, run a test load (e.g., kernel compilation, emacs, whatever) and keep statistics of what happens. then pick a design accordingly.

11:26 <terpstra> we also need a way to determine the address that triggered a TLB miss

11:26 <wpwrak> terpstra: (mmu in kernel mode) well, for vmalloc ...

11:26 <terpstra> wpwrak, why does vmalloc need an mmu?

11:26 <terpstra> can't it just allocate from the physical address space?

11:27 <lekernel> terpstra: well I think that having a large non-associative TLB in a block RAM is good for starters

11:27 <wpwrak> terpstra: because it can give you virtually contiguous allocations even if your pages are all physically fragmented

11:28 <terpstra> Code that uses vmalloc is likely to get a chilly reception if submitted for inclusion in the kernel. If possible, you should work directly with individual pages rather than trying to smooth things over with vmalloc.

11:28 <terpstra> lol

11:28 <lekernel> s6 FPGAs have RAM blocks of up to 16 kilobits each... a few or even just one of them can hold a sizable amount of TLB entries

11:28 <terpstra> i don't think we need/want more than 32 TLB entries

11:28 <terpstra> by keeping the TLB small we can more easily just load/store it from the kernel instead of trying to preserve it like the L1 cache

11:29 <wpwrak> terpstra: (chilly reception) for sure. yet it exists, so .. :)

11:29 <lekernel> terpstra: you mean for encoding the WTLB instruction?

11:29 <wpwrak> terpstra: anyway, you can make the kernel tlb fairly inefficient.

11:29 <lekernel> I don't see what the problem is with a large TLB, except more context switch overhead

11:29 <terpstra> yeah

11:29 <terpstra> i don't want context switch overhead

11:30 <terpstra> either we need to leave stale TLB entries that get flushed on demand (more work for the hardware)

11:30 <wpwrak> terpstra: ah, and i think modules may use the mmu too. so, i-tlb for the kernel as well. life sucks, doesn't it ? :)

11:30 <terpstra> or we need to save/restore more TLB entries on context switch

11:31 <terpstra> modules get loaded at different addresses

11:31 <terpstra> i don't think there's MMU action there

11:31 <terpstra> that's why it's a pain to find the symbol of a module from a kernel register dump

11:31 <lekernel> otoh a larger TLB means less TLB misses

11:31 <lekernel> well

11:31 <lekernel> I don't think it'd be hard to make the TLB size configurable with this approach

11:31 <lekernel> so we can just try and see :-)

11:31 <terpstra> it impacts the layout of the TLB tho

11:32 <terpstra> if you want to pack the TLB entries into 32 bits ;)

11:32 <terpstra> in a perfect world you could have 32 TLB entries, each 32 bit wide

11:32 <terpstra> then it would have a 'normal' LM32 register encoding

11:32 <terpstra> ie: a simple WTBL instruction would work just like WCSR does now

11:34 <juliusb> just give up on this LM32 stuff, use OpenRISC ;)

11:34 <terpstra> ...

11:34 <juliusb> We've already got this MMU stuff going

11:34 <juliusb> our kernel port is solid, too

11:34 <terpstra> hmmmmm

11:34 <terpstra> :)

11:34 <wpwrak> terpstra: (i-tlb) you're right. doesn't actually run code from the vmalloc'ed region

11:34 <juliusb> one interesting experiment I want to do very soon is actually calculate overhead for TLB misses and reloading

11:34 <juliusb> and the effect TLB sizing and associativity has on that

11:34 <terpstra> juliusb, how does the openrisc do tlb ?

11:35 <juliusb> good question. the architecture is fairly flexible - allows various sizes and up to 4-way associativity

11:35 <juliusb> i'm not across the details of it specifically off the top of my head

11:35 <terpstra> physically tagged and indexed?\

11:37 <juliusb> well,...

11:37 <lekernel> yeah, let's use openrisc. then the flickernoise framerate would drop to something like 0.2 fps while the FPGA LUT count increases :-)

11:37 <juliusb> no, I think virtually tagged

11:37 <juliusb> hangon no

11:37 <juliusb> lekernel: prove it :)

11:38 <juliusb> no I agree, or1200 aint so tiny

11:38 <terpstra> juliusb, to be honest i haven't fairly evaluated the openrisc

11:38 <terpstra> it is just so big

11:38 <juliusb> but, i'm serious about using it if you're considering doing a Linux port

11:38 <terpstra> but adding an mmu to the lm32 will make it big too

11:38 <juliusb> it's been like 2 years of work for us to just get the kernel port and toolchain to a point where it's usuable now

11:39 <lekernel> terpstra: I don't think that a simple TLB in a block RAM would make it very big

11:39 <juliusb> we have some good kernel developers now, and the HW seems quite stable across various technologies

11:39 <lekernel> my guess is something like 2 BRAM + 200 LUTs, not more

11:39 <terpstra> lekernel, the OR is only 6* bigger than the lm32 :)

11:39 <juliusb> lekernel: but as described before, you need a lot more than just a block ram, you need a tag ram and then all the appropriate error detection and exception handling logic

11:39 <juliusb> for each port

11:40 <juliusb> ... it would be an interesting experiment though

11:40 <terpstra> yes, juliusb is right that it will cost us

11:40 <lekernel> sure, that's what those 200 LUTs are for

11:40 <juliusb> .. hey by the way, why do you want to run Linux in the first place??

11:40 <terpstra> cause i want debisn!

11:40 <terpstra> debian!

11:40 <juliusb> it's not a good idea for embedded stuff I argue - you have this MMU mess, and it only gets worse if you want shard library code

11:40 <terpstra> ;)

11:41 <juliusb> you need all that indirect function calling garbage

11:41 <terpstra> (for gsi/cern we don't want linux tbh)

11:41 <juliusb> it helps extensibility at the software level, but that's it right?

11:41 <terpstra> i am just interested from a hypothetical point of view

11:41 <juliusb> i think you sacrifice a lot of performance just to have the basic benefits of a GNU/Linux, namely the plethora of software out there

11:42 <terpstra> i agree with you

11:42 <lekernel> same here. i'm globally satisfied with RTEMS.

11:42 <wpwrak> i think 2-way could be useful to avoid thrashing block copies. a dirty approach would be to have only one entry 2-way. basically if you evict a tlb entry, you move it to the 2nd way.

11:42 <wpwrak> (that's for data)

11:42 <juliusb> software based on RTOS, however, is far more complicated to write and maintain than stuff that's POSIX compliant for Linux

11:42 <terpstra> wpwrak, that's what a victim cache has been for traditionally ;)

11:43 <lekernel> not that much

11:43 <wpwrak> not sure what code would be most happy with

11:43 <juliusb> ...i mean more complicated to write and then port to a new design or architecture etc.

11:43 <lekernel> as a matter of fact, a lot of 3rd party POSIX stuff runs almost flawlessly on RTEMS

11:43 <lekernel> I have freetype, libpng, libjpeg, libgd, mupdf, ...

11:43 <juliusb> ya, I saw RTEMS is POSIX friendly

11:43 <terpstra> the main advantage of an mmu: fork()

11:43 <juliusb> that is very good

11:43 <terpstra> i think most of the rest can be dealt with

11:44 <wpwrak> terpstra: aah, already invented. darn.

11:44 <terpstra> wpwrak, i didn't mean to invent it---i meant that's the functionality you gain from an mmu

11:44 <terpstra> you can't really do fork() without an mmu

11:45 <juliusb> but who is going to do the port of the kernel to LM32??

11:45 <juliusb> or does it exist already?

11:45 <terpstra> there is a uclinux port afaik ?

11:45 <juliusb> oh good, 2.4 kernels are fun

11:45 <juliusb> :)

11:46 <juliusb> there's no such thing as far as I'm aware, it got merged with the mainline a long time ago, no?

11:46 <terpstra> i've not used it

11:46 <terpstra> i just know lattice claims this

11:46 <wpwrak> terpstra: (invented) i meant the victim cache

11:46 <terpstra> wpwrak, ack

11:47 <lekernel> terpstra: there is a super crappy uclinux port by lattice, which larsc, mwalle, Takeshi and I have improved

11:47 <lekernel> it's still not merged upstream though

11:47 <terpstra> it's 2.6 or 2.4?

11:47 <juliusb> i've just looked, they've got a 2.6 version now

11:47 <lekernel> 2.6... in fact we follow upstream

11:47 <juliusb> but there's MMU-less kernel now, right? and uClibc

11:47 <lekernel> yes

11:48 <terpstra> so if an mmu were added, not so hard to get 'proper' linux on it i guess?

11:48 <juliusb> what's the difference, then, between uClibc and real kernel?

11:48 <juliusb> err, uClinux and real kernel

11:48 <juliusb> they strip a lot of crap out of it?

11:48 <lekernel> I don't know. I have little knowledge about linux memory management internals

11:48 <terpstra> uclibc has nothing to do with mmu or not

11:48 <terpstra> uclibc is just a smaller version of libc

11:49 <terpstra> uclibc is under 200k compared to > 3MB for glibc

11:49 <terpstra> you usually see uclibc + busybox on embedded devices like routers/etc

11:49 <terpstra> where you have 8-32MB of RAM

11:49 <terpstra> those systems also have an MMU

11:49 <juliusb> i'm sure there's some NO_MMU stuff in uClibc

11:49 <terpstra> sure, to remove fork() ;)

11:49 <wpwrak> crawls to bed and hopes for happy dreams of an mmu :)

11:50 <terpstra> you won't be getting fork() without an MMU

11:50 <terpstra> and that's why even embedded devices with linux have one

11:50 <terpstra> those cheapo little routers, kindles, android phones, etc --- they all have an MMU even when they have almost no memory

11:51 <terpstra> (tho the kindle actually has half a GB of ram)

11:52 <juliusb> sure, it's ASIC and probably the extra silicon required to put in a n MMU and reduced amount of softwareexecuted to do virtual memory management is worth it

11:52 <terpstra> yep

11:52 <juliusb> If you're really, really, stretched for area, maybe MMU-less makes sense

11:52 <terpstra> we should really see how much area a completely primitive mmu takes

11:52 <terpstra> if lekernel is right that it's 200 LUTs or less, then might as well have it on an FPGA too

11:53 <lekernel> in milkymist we're only using 44% of the fpga area, so a mmu would get merged provided it does not slow things down or introduce other regressions

11:53 <juliusb> I think you'll want all the performance you can get on FPGA running Linux and it would make a lot of sense to have one

11:53 <juliusb> We're so concerned about performance on Or1K linux that we're looknig at doing hardware page table lookups instead of handling misses

11:53 <juliusb> ... in software

11:54 <juliusb> it's really, really, slow

11:54 <terpstra> lm32 is very fast

11:54 <terpstra> i bet i could write a TLB replacement algorithm that ran in under 50 cycles

11:54 <terpstra> possibly even under 30

11:54 <juliusb> no, i'm not talking /MHz here, I'm talking overall performane because Linux is just a state-swapping machine

11:55 <juliusb> always loading and storing and accessing various process states

11:55 <terpstra> i see

11:55 <juliusb> terpstra: sure, but what about saving and configuring your state to get into the plcae were you can then do your TLB algorithm in 30 cycles??

11:55 <terpstra> that's a good reason to make kernel-land not mmu mapped ?

11:56 <juliusb> I think it's a good reason to avoid Linux :)

11:56 <terpstra> juliusb, i was including the save/restore in those 30 estimate

11:56 <terpstra> if we added an mmu to the lm32 it would launch an exception handler where you do a quick LRU/heap operation and then an eret

11:56 <juliusb> terpstra: It's not so much but with a pissweak TLB you're doing it all the time (seriously, every new function call) and it adds up

11:57 <terpstra> hmm

11:57 <lekernel> juliusb: how many function calls are new?

11:57 <lekernel> you're talking about lazy linking, right?

11:58 <juliusb> I'm not sure exactly how it works but I'm pretty sure it occurs quite frequently

11:58 <juliusb> well, anything outside of the page

11:58 <juliusb> well, instruction and data, too, mind you

11:58 <terpstra> wouldn't the page with the 'got' stay in the TLB most of the time?

11:58 <lekernel> well, the TLB miss on each new function call just hits at application startup

11:58 <juliusb> hopefully the data TLB miss doesn't occurr so often

11:58 <lekernel> the code gets patched after that and no longer misses the TLB

11:58 <terpstra> lekernel, the code doesn't get patched -- the 'got' gets filled

11:59 <juliusb> I'm talking about statically linked programs here, I don't know about dynamicaly linked stuff

11:59 <terpstra> your function calls to global symbols go via the data bus

11:59 <juliusb> we don't have dynamic linking yet in our toolchain, but we're working on it, and it looks like extra headache for userspace execution

11:59 <juliusb> s/headache/overhead

11:59 <terpstra> yes, indirection is expensive

12:00 <terpstra> i'm somewhat skeptical that the TLB miss rate is so high

12:00 <juliusb> but, I'm contributing to this discussion because I'm going to be starting some work shortly on really gaugeing the overhead of TLBs

12:00 <terpstra> why would the mips folks move from 64 TLB entries to 48 if it is such a problem?

12:01 <lekernel_> terpstra: yeah, you're probably right. but in either case, I don't think that lazy linking significantly increases any TLB miss rate.

12:01 <juliusb> and our feeling is, after playing with our port, is that TLB misses occur often, and a good way to increase time spent doing useful things, rather than management overhead, is minimising this

12:01 <terpstra> juliusb, fair enough.

12:01 <terpstra> your current tlb is how big?

12:01 <juliusb> 64

12:01 <juliusb> we can have up to 128

12:02 <terpstra> and you still have lots of misses, eh? that's somewhat worrying. 2-way associative?

12:02 <juliusb> but is single way

12:02 <terpstra> ah

12:02 <terpstra> then i believe you

12:02 <juliusb> yes, I want to add ways

12:02 <terpstra> most TLB in 'real hardware' is CAN

12:02 <juliusb> CAN?

12:02 <terpstra> so fully associative

12:02 <juliusb> ah ok

12:03 <terpstra> sorry, CAM

12:03 <terpstra> i typo'd

12:03 <lekernel_> 2-way associative looks doable... lm32 does it for the caches

12:03 <terpstra> yes

12:06 <juliusb> ... or come and pimp out the OR1200's TLBs to do multi-way ;)

12:06 <terpstra> hmm

12:06 <terpstra> give me the or1k vs. lm32 sales pitch :)

12:09 <juliusb> well, I'm not the expert but I know the licensing on LM32 isn't pure BSD (has some taint from LM), whereas or1200 is all LGPL

12:09 <terpstra> true

12:09 <juliusb> i don't know LM32 architecture so well, but I think OR1K has pretty solid architecture, missing a few key things like atomic synchronisation instructions

12:09 <juliusb> but those can be added

12:09 <juliusb> OR1200 as an implementation is bad I think

12:10 <juliusb> I've been hacking on it for a few years and hopefully had made it better, but certainly it hasn't become leaner and more efficient

12:10 <lekernel_> which diminishes your point about LGPL

12:10 <juliusb> our toolchain is good now

12:10 <terpstra> so your position is that the or1k + toolchain + kernel support is good, but the or1200 implementation is the bad part/

12:10 <juliusb> our toolchain was a joke, but now it's good

12:11 <juliusb> yes, but it at least as MMUs already in there to save you working on that, but I think having a full on kernel port (we're giong to start pushing for acceptance in GCC and Linux sometime this year) is a pretty big deal

12:11 <juliusb> it's a lot of work to add all the bells and whistles

12:12 <juliusb> or1200 isn't bad, it's just not awesome

12:12 <juliusb> ... i may know of a rewrite in progress

12:12 <juliusb> ... but that's a little ways off yet

12:12 <lekernel_> gcc/linux kernel: true. but as far as I'm concerned it is not my priority

12:12 <terpstra> binutils+gcc for lm32 is already in mainline

12:12 <juliusb> lekernel_: I understand you need as much performance as possible, but again I ask why even consider Linux when you need to be productive on almost every cycle, the pitch kind of isn't for that

12:12 <terpstra> so here the lm32 is further than the or1k

12:12 <juliusb> it's for anyone considering Linux

12:13 <lekernel_> neither is the MMU, and I cannot accept the regressions that OR1K would introduce just to get some work already done on the MMU

12:13 <juliusb> ok, sure, but we will be sometime this week

12:13 <lekernel_> terpstra: otoh the mainline lm32 gcc is often broken... it was somewhat acceptable in gcc 4.5 and was badly broken in 4.6

12:13 <terpstra> is there a good document for the or1k comparable to the lm32's archman pdf?

12:13 <juliusb> i'm saying as an open source CPU that has a working full on kernel port, I would consider or1200

12:14 <lekernel_> maintaining gcc is a pain in the ass

12:14 <juliusb> yep, but we have guys doing that

12:14 <juliusb> terpstra: yes, we have recently re-worked the architecture spec

12:14 <juliusb> cleaned it up, etc

12:14 <terpstra> could you toss me a link?

12:14 <terpstra> i'd like to read it

12:15 <juliusb> http://opencores.org/download,or1k - click on the openrisc_arch_submit4.odt link

12:15 <juliusb> it's not in SVN yet I think

12:15 <juliusb> we've still got it out for review

12:15 <juliusb> but... it's on logincores.org (opencores.org I mean)

12:15 <juliusb> hehe

12:15 <juliusb> gotta register

12:15 <lekernel_> juliusb: I'm not considering linux, except for demos and just the fun of it

12:16 <lekernel_> juliusb: when are you going to change that policy?

12:16 <terpstra> i have an opencores account, not a problem,

12:16 <juliusb> i just had lunch with the guy in charge here, he's not convinced

12:16 <juliusb> I tried

12:16 <juliusb> he argues that what's the big deal - you're getting access to stuff for free, give us some information so we can provide to advertisers who comes here so we can fund the webserver

12:17 <roh> juliusb: never discuss too much with stupid people. work around them.

12:17 <juliusb> hehe

12:17 <juliusb> well, there's already a fork happening: openrisc.net

12:17 <juliusb> they got fedup with opencores

12:17 <lekernel_> another irritating thing in opencores policy is the requirement that files be uploaded on your server. which in turns mandates the use of SVN and your web interface, both being a lot inferior than e.g. git and github

12:17 <terpstra> ohwr.org

12:17 <juliusb> sure, I think they're fighting a losing battle

12:17 <juliusb> ohh nice, ohwr.org

12:18 <terpstra> (that's where my stuff lives)

12:18 <juliusb> cool, thanks

12:18 <juliusb> anyway, this is an ongoing thing with OpenCores - they stillÂ Â don't see, even after talking a lot with them, why they can't take a little if they give a little

12:19 <lekernel_> and btw I can't see why running such a webserver would be so expensive

12:19 <juliusb> I'm at least trying to get them to dump the forums and bugtracker (both some custom hack they got this young guy to do) and use a mailinglist and bugzilla

12:19 <juliusb> ya, well, it shouldn't be, but it is if you go about the wrong way for 3 years

12:19 <juliusb> I think their heart is in the right place - they didn't want OpenCores to die and thought they could make it great

12:19 <juliusb> but I think they're not so open-sourcey

12:19 <juliusb> i probably shouldn't be saying this :P

12:20 <juliusb> anyway

12:20 <juliusb> it's in flux, I hope, and things will change eventually

12:20 <terpstra> meh - until someone writes an opensource hdl toolchain, we don't reallllly have 'opencores' anyway

12:20 <lekernel_> well, you're among friends. I'd even dare say you've just joined the #opencores-haters channel *g*

12:20 <juliusb> i know well with the guy who started openrisc.net and it'll be interesting to see the response they have

12:21 <juliusb> hehe sure, and I'm working hard on OpenRISC and just like to see others getting into the oshw stuff, too

12:21 <lekernel_> terpstra: this is under way :p

12:21 <juliusb> i come in peace, but I'm employed by ORSoC and feel I should at least try to provide them with good advice on OpenCores

12:22 <terpstra> juliusb, i don't hate opencores. i hate the blinky flash adds. ;)

12:22 <juliusb> but, anyway, just wnated to point out if you really want Linux on an open source CPU, try Or1K

12:22 <juliusb> I think there's some tuning to be done, like anything, but it's probably a good place to start

12:22 <terpstra> juliusb, i will read the arch manual and then form a more informed opinion :)

12:23 <juliusb> i expect nothing less :)

12:23 <juliusb> but I, too, am very interested in the fully open source toolchain for HDL synthesis and backend

12:24 <juliusb> hence popping in here the other day to ask lekernel_ about his work so far

12:25 <lekernel> he, it's coming :)

12:25 <terpstra> juliusb, or1k has a branch delay slot?

12:25 <lekernel> wanna help?

12:25 <terpstra> wasn't this proven to be a bad idea by mips?

12:25 <terpstra> learn from the past! ;)

12:25 <lekernel> why is it a bad idea?

12:25 <juliusb> architecture is initially from 1999

12:25 <lekernel> fwiw microblaze has it, and from studies I've read it does provide a performance advantage

12:25 <terpstra> "The most serious drawback to delayed branches is the additional control complexity they entail. If the delay slot instruction takes an exception, the processor has to be restarted on the branch, rather than that next instruction. Exceptions now have essentially two addresses, the exception address and the restart address, and generating and distinguishing between the two correctly in all cases has been a source of bugs for later designs."

12:26 <juliusb> precisely

12:26 <juliusb> i'm dealing with this now, actually

12:26 <lekernel> what is in fact a bad idea is have several delay slots

12:26 <lekernel> just one is still reasonable

12:26 <terpstra> http://en.wikipedia.org/wiki/Classic_RISC_pipeline -- scroll down to the area where they list the reasons

12:26 <terpstra> that reason is just the most pertinent i think

12:26 <juliusb> well, I think the control overhead of having one compared to none is far more than from having one compared to two

12:26 <lekernel> ok...

12:27 <juliusb> it's a hassle for out of order etc

12:27 <lekernel> well a lot of features make a mess from exceptions. out of order execution being most infamous for that.

12:27 <lekernel> but if you want a simple design, then yeah it's probably better not to have the delay slot

12:27 <juliusb> that sounds about right, but it just adds a little bit of extra complexity where you don't want anything extra

12:28 <lekernel> it does increase performance, so it's a trade of

12:28 <terpstra> lekernel, it increases performance only if the compiler can find a good instruction to put there

12:28 <terpstra> which at the end of a basic block usually means putting a 'write to memory'

12:28 <juliusb> but pipelines taht run really fast now are very long

12:28 <lekernel> yes. but from the paper I've read it still works

12:28 <terpstra> but those are precisely the instructions which generate faults

12:29 <juliusb> part of the idea was to offload complexity into the compiler from the HW, as the HW development wasn't so advanced right?

12:29 <juliusb> but now it just makes things more complicated at the HW level

12:29 <terpstra> sure

12:30 <terpstra> i am a firm believer in simpler cores, but many corse

12:30 <juliusb> and compilers are actually fairly clever now, so I guess that's not an issue, but why cause the HW to be more complex when really there's marginal benefit

12:30 <terpstra> we've carried the hardware supporting crappy sequential software about as far as it can go

12:30 <lekernel> well... if you have OOO execution, delay slots sure make no sense

12:30 <juliusb> yes, as someone who writes, tests and debugs cores, I would eliminate the delay slow

12:31 <juliusb> slot

12:31 <lekernel> but I wouldn't toss it as a definitely crappy idea either

12:31 <terpstra> fair enough

12:31 <lekernel> I think it still does some good in some cases.

12:31 <juliusb> for OR2K, we propose eliminating them http://opencores.org/or2k/OR2K:Community_Portal

12:31 <terpstra> i agree it is a nice way to avoid the wasted instructions you otherwise have

12:32 <juliusb> yes,for the simple 4/5 stage pipelines, they do gain you some advantage compared to not, there

12:32 <lekernel> yup

12:33 <lekernel> juliusb: do you want to help with the synthesis toolchains?

12:34 <lekernel> (speaking about delay slots: for the OR2K, sure, eliminate them)

12:35 <juliusb> lekernel: probably not right at the moment, sorry, I was just curious to see how it was looking

12:35 <juliusb> perhaps in a while, though

12:35 <juliusb> I think it's definitely needed and would be very cool

12:35 <lekernel> there are some relatively simple things to do, like implementing Verilog case statements

12:35 <juliusb> mainly i'd be interested to see an open source synthesis engine

12:35 <juliusb> to check the impact of various design choices

12:35 <lekernel> (all it's needed is translate those statements to IR muxes)

12:36 <lekernel> at least for now, then we'll see how to do things like FSM extraction

12:38 <juliusb> cool, if I get some time i'll let you know, will find out how to get started

12:39 <lekernel> ok. just ask here or on llhdl@lists.milkymist.org if you have questions or problems.

12:40 <juliusb> will do

12:41 <lekernel> btw, I was a bit stuck lately with the placement engine

12:41 <lekernel> I wanted to do post placement packing, but this is rather hard especially with the current chip database architecture

12:42 <lekernel> so I think i'll revert to good old pre-placement packing heuristics for now

12:42 <lekernel> not sure how good it's going to work with the relatively complex s6 slices, but we'll see

12:42 <lekernel> maybe it works great

12:43 <lekernel> as a matter of fact, I think Altera has even more complex logic blocks ("LAB clusters" or something)... and it's not clear how they pack them

12:44 <lekernel> also, with post placement packing, I'd lose one of the potential benefits of clustering, which is that the placer algorithm can be faster because it has to deal with fewer elements

12:44 <lekernel> so perhaps it's simply a bad idea after all

13:06 <terpstra> lekernel, why does an LM32 dcache read (lw instruction) take 3 cycles for result? X stage calculates address, M stage touches cache.... what happens in W stage?

13:07 <lekernel> write to register file?

13:07 <lekernel> mh

13:08 <lekernel> I don't know

13:08 <terpstra> but at the end of the M stage it could have used the bypass

13:08 <terpstra> just like the 2-stage shift instruction does

13:09 <terpstra> hrm

13:10 <terpstra> there's an "align" step in the block diagram

13:10 <terpstra> so D fetches base register, X adds offset, M fetches the cache, and W 'aligns' the result (and writes back to register file at end of cycle)

13:11 <terpstra> what is this magical align?

13:11 <lekernel> I guess this is for reading bytes or 16-bit words on any offset

13:12 <terpstra> ahhh

13:12 <terpstra> and sign extension / etc

13:12 <terpstra> makes sense

13:12 <terpstra> yes

13:13 <terpstra> thanks\

14:29 <lekernel> hi xiangfu

14:29 <xiangfu> hi

14:32 <guyzmo> +-*****

14:42 <lekernel> hi guyzmo

14:45 <guyzmo> hey :)

14:45 <guyzmo> sorry, was plugging in stuff

14:52 <guyzmo> damn, so sad rlwrap can't work over flterm :/

14:53 <guyzmo> (and all control characters just output garbage)

15:03 <guyzmo> hum

15:04 <guyzmo> can't get the led par to lighten up :/

15:28 <lekernel> did you try it in flickernoise?

15:33 <guyzmo> not yet

15:33 <guyzmo> of course I'm gonna try it

15:34 <lekernel> control panel -> dmx -> dmx table (called "dmx desk" if you have upgraded, but I don't want to be negative here, but I'd tend to bet you did not)

15:35 <guyzmo> ok

15:35 <lekernel> fortunately the dmx desk works with all released versions :-)

15:35 <guyzmo> ;)

15:49 <guyzmo> damn, why did I forget my DMX cable :-S

16:15 <kristianpaul> Fallenou: (registers) like in the drivers and sys_conf.h?

16:20 <kristianpaul> oh, yes i think

16:20 <kristianpaul> :p

18:37 <guyzmo> grmbl

18:37 <guyzmo> none of my XLR cables work with DMX signal

18:38 <guyzmo> though I remember we had one of them working

18:40 <guyzmo> I will have to get one cable from the GaÃ®tÃ© Lyrique tomorrow

21:36 <lekernel> http://colossus.cs.rpi.edu/~azonenberg/papers/litho1.pdf

21:37 <lekernel> http://siliconexposed.blogspot.com/

22:14 <lekernel> "Since writing it I've made features at 5 micron half-pitch using the camera-port method, and am about to buy a 1-watt 385nm LED as an exposure source. This is way more power than I need so I will be able to use a nice thick diffuser on it. Once the exposure lamp is fixed I should be able to make 75 \lambda square dies at 5 micron resolution using the 40x objective, or 20 micron using the 10x."

22:19 <lekernel> http://i.imgur.com/DR6O9.jpg

22:19 <lekernel> http://i.imgur.com/s9RMP.jpg

22:27 <wpwrak> lekernel: the first one looks like a forest ;-)

22:28 <lekernel> hi azonenberg

22:28 <azonenberg> hi

22:28 <lekernel> welcome, honored to see you here :)

22:28 <lekernel> i'm sebastien

22:28 <azonenberg> ah, k

22:28 <azonenberg> Lets move our discussion to here rather than fb chat so other people can see

22:29 <azonenberg> The paper i sent you only describes my work at the 15um node

22:29 <lekernel> ok :)

22:29 <azonenberg> Though i did outline the process that I later reached 5um at

22:30 <lekernel> how do you engrave through the silicon?

22:30 <azonenberg> I plan to open the project as much as possible btw, all tools etc will be released under an open license (probably BSD or similar)

22:30 <lekernel> excellent :)

22:30 <azonenberg> Read the FB note (which i need to post publicly somewhere)

22:30 <azonenberg> Long story short, apply hardmask (probably Ta2O5) to the silicon by spin coating and heat treatment

22:31 <azonenberg> Spin coat photoresist over that

22:31 <azonenberg> expose and develop

22:31 <lekernel> i see

22:31 <azonenberg> Etch hardmask with 2% HF (Whink rust remover, same stuff jeri uses for gate oxide)

22:31 <azonenberg> Then etch the silicon using 30% KOH / 15% IPA / 55% water at ~80C

22:31 <lekernel> sorry about the dumb question, i'm still going through the pile of material and links on your website and fb :)

22:31 <azonenberg> You cant use KOH directly because it will attack the resist

22:32 <azonenberg> Lol, no questions are dumb

22:32 <azonenberg> For the record i have no formal training in EE myself :P

22:32 <azonenberg> my BS (and PhD in a few years) will be in comp sci

22:32 <azonenberg> Anyway so the nice thing about KOH is that its very anisotropic

22:33 <azonenberg> FeCl3 and similar etchants for copper, if you've ever done home PCB fab, are isotropic - they eat equally in all directions

22:33 <azonenberg> So you get rounded sidewalls and such

22:33 <azonenberg> But KOH eats along the <100> crystal plane nearly 100x faster than <111>

22:33 <azonenberg> And <110> is a hair slower than <100> but not by too much

22:33 <lekernel> cool. I talked about this to a fab employee, and he told me I'd never get any good anisotropic etchant because they are super expensive, hard to buy, etc.

22:33 <lekernel> if it's just KOH, well... :)

22:34 <azonenberg> If you get <110> you can go straight down (assuming your features are parallel to the <111> plane>

22:34 <azonenberg> h/o let me send you a paper

22:34 <azonenberg> "Fabrication of very smooth walls and bottoms of silicon microchannels for heat dissipation of semiconductor devices"

22:34 <azonenberg> http://www.sciencedirect.com/science?_ob=ArticleURL&_udi=B6V44-40D0MGJ-3&_user=659639&_coverDate=06%2F30%2F2000&_rdoc=1&_fmt=high&_orig=gateway&_origin=gateway&_sort=d&_docanchor&view=c&_searchStrId=1730655729&_rerunOrigin=google&_acct=C000035878&_version=1&_urlVersion=0&_userid=659639&md5=c27c2d506cd9137d148a914c8bde1407&searchtype=a

22:34 <azonenberg> Look at the figure they have in there (fig 9 i think?) - 400 micron deep etch with almost vertical sidewalls

22:34 <azonenberg> if i didnt know better i'd say it was made with RIE

22:35 <azonenberg> that's what i used as the starting point for the comb drive process i have on fbook

22:35 <lekernel> warms up his university proxy to get through the cretinous sciencedirect paywall

22:36 <azonenberg> Lol

22:36 <azonenberg> I have an openvpn server running at a friend's house

22:36 <azonenberg> the machine in my office on campus, and my laptop here, tunnel into it

22:36 <azonenberg> then the office machine advertises routes to most journal websites ;)

22:37 <lekernel> that's more sophisticated than I do... I use ssh redirect and /etc/hosts

22:39 <azonenberg> I run OSPF http://pastebin.com/Tn4T2e8k

22:39 <azonenberg> .11 is the vpn addy of my box on campus lol

22:40 <lekernel> hm... can't reach any server at uni tonight

22:40 <azonenberg> mirrors

22:41 <lekernel> have you done multilayer yet?

22:42 <azonenberg> I havent done any etching yet since i cant afford the materials until my next payday lol

22:42 <azonenberg> Look at the date on the paper

22:42 <azonenberg> i only got litho working reliably last week

22:42 <lekernel> yeah, saw it :)

22:42 <azonenberg> this was an unsolved problem for months

22:42 <lekernel> man that's awesome work

22:42 <azonenberg> cant belive how simple the solution turned out to be lol

22:42 <lekernel> best hack i've seen lately :-)

22:44 <z4qx> o/

22:45 <azonenberg> lekernel: http://colossus.cs.rpi.edu/~azonenberg/mirror/smoothwalls.pdf

22:45 <lekernel> thanks

22:46 <lekernel> do you think you can etch vertically like this in e.g. SiO2?

22:46 <azonenberg> lekernel: Why not?

22:46 <azonenberg> I can buy KOH for $4 a pound

22:46 <lekernel> I don't know... since you are relying on the crystal structure

22:47 <lekernel> what happens when you grow oxide on a wafer? do you have a neat crystal structure or a messy one?

22:47 <azonenberg> First off, i will be buying wafers aligned to <110>

22:47 <azonenberg> Probably these http://www.mtixtl.com/sisinglecrystalsubstrate110orn10x10x05mm1spundoped.aspx

22:47 <azonenberg> they arent technically wafers as they arent round, but <110> is hard to find in full wafers for decent prices

22:48 <azonenberg> And i will not be growing oxide, also

22:48 <azonenberg> iirc they used Si3N4 deposited by LPCVD as a hardmask, but i dont have CVD capabilities

22:48 <azonenberg> So i'll be spin coating this stuff http://emulsitone.com/taf.html

22:48 <lekernel> so you want to focus on MEMS?

22:48 <azonenberg> After heat treating it forms Ta2O5, which is pretty easy to etch with HF

22:49 <lekernel> growing oxide is mandatory for most transistors (afaik)

22:49 <azonenberg> But it's resistant to alkaline etches

22:49 <azonenberg> Dielectric is, it need not be SiO2

22:49 <azonenberg> tantalum pentoxide was actually considered as a high-K dielectric for DRAM a while back - it would work

22:49 <azonenberg> But emulsitone also sells a SiO2 coating solution

22:50 <azonenberg> And, more importantly, i plan to buy a furnace i can do thermal oxidation in

22:50 <azonenberg> I just dont have $1200 to spare yet

22:50 <azonenberg> i can do bulk micromachining for much less ($500 or so)

22:50 <azonenberg> Including all of the consumables

22:50 <azonenberg> CMOS is definitely on the to-do list but its down the road

22:51 <lekernel> do you know about this? http://visual6502.org/

22:51 <azonenberg> among other things because transistors are so sensitive to trace metal contamination whereas MEMS are less so

22:51 <azonenberg> Yep

22:51 <lekernel> there are also the 4004 masks published by Intel for you to chew on :-)

22:51 <azonenberg> I do reversing too

22:51 <azonenberg> Lol, um

22:51 <lekernel> less transistors than the 6502

22:51 <azonenberg> you *do* know that one of my dreams has been to make a 1:1 scale model of the 4004?

22:52 <azonenberg> fully functional

22:52 <lekernel> haha :)

22:52 <azonenberg> But like i said mems is easier so that comes first

22:52 <azonenberg> no need for doping or tons of masks, the process i'm looking at only needs three masks and only one even somewhat precise alignment step

22:52 <azonenberg> the first mask is contact litho at \lambda = 200um lol

22:53 <azonenberg> just thinning the wafer in the middle and leaving a thick rim around the edge for handling

22:53 <azonenberg> then the through-wafer etch for the fingers followed by metal 1

22:55 <azonenberg> though, as you saw in the paper, getting sub-5um alignment will be pretty easy

22:55 <lekernel> another thing that could potentially be interesting is MMIC's

22:55 <azonenberg> ?

22:55 <lekernel> microwave ICs

22:55 <lekernel> those are a pain to buy

22:55 <azonenberg> oh... Those will be trickier - tighter tolerances

22:55 <lekernel> do you think so?

22:55 <lekernel> maybe the transistors are

22:55 <azonenberg> Once i get the basic process working i'll see where it goes lol

22:56 <lekernel> but a big MMIC advantage is in the ability to print microstrip lines with more precision than on a PCB

22:56 <azonenberg> Good point

22:56 <azonenberg> Actually, funny thing - i was thinking of making a hybrid of PCB and IC technology at some point to do massively multilayer boards

22:56 <lekernel> I actually do not know how to build a good microwave transistor

22:57 <azonenberg> Start with dual layer FR4 with copper on both sides

22:57 <lekernel> but it does seem to use very nasty chemicals like germane gas

22:57 <azonenberg> Pattern your metal 1 and 2 (for power distribution)

22:57 <azonenberg> lay down oxide on top of M2

22:57 <azonenberg> sputter or evaporate a micron or so of Al or Cu, etch M3

22:57 <azonenberg> rinse and repeat lol

22:58 <lekernel> germane is one of the few chemicals I dare not touch, close to sarin gas and the like

22:58 <azonenberg> What about concentrated HF?

22:58 <azonenberg> or SiH4?

22:58 <azonenberg> I draw the line at 2% HF myself lol

22:58 <lekernel> HF is still a lot less dangerous than germane

22:58 <lekernel> even concentrated HF

22:58 <azonenberg> Phosgene?

22:58 <azonenberg> They use that for ion implantation

22:59 <azonenberg> Arsine too

22:59 <azonenberg> Neither of those are healthy to be around

22:59 <azonenberg> My process will be diffusion based using spin on dopants though

22:59 <azonenberg> Less precise but safer and requires less fancy equipment

22:59 <azonenberg> just HF wet etch the doped oxide film, coat undoped oxide around it, and heat for a while

23:01 <azonenberg> According to wiki, GeH4 is used for CVD epitaxy in a similar manner to SiH4

23:01 <azonenberg> So that means they're using germanium based substrates

23:02 <lekernel> ok

23:02 <lekernel> so no CVD etc.?

23:02 <azonenberg> Nope

23:03 <azonenberg> I'm ranking processes in order of preferenace

23:03 <lekernel> what about metal layers? how can you do them without PVD?

23:03 <azonenberg> Spin coating is pretty much impossible to avoid and easy to do (though precise coating thickness control will be a bit tricky until i get a speed controller)

23:03 <azonenberg> Metalization will be done by filament evaporation or DC sputtering

23:03 <azonenberg> I'm exploring both in parallel and whichever one starts working first is the one i'll use

23:04 <azonenberg> though eventually i want both

23:04 <azonenberg> Thermal diffusion id going to be necessary for CMOS but not MEMS

23:04 <azonenberg> is*

23:04 <azonenberg> or at least, not the comb drive

23:05 <lekernel> heard of this? http://www.gdiy.com/projects/thin-film-sputtering-machine/index.php

23:05 <azonenberg> No, actually, I havent

23:05 <azonenberg> But i do have a friend doing research in sputtering

23:05 <lekernel> there you can get your metal layers :-)

23:05 <azonenberg> Metalization was my second area to focus on after litho

23:06 <azonenberg> To be done in parallel with etching

23:07 <azonenberg> I really havent studied it in nearly as much depth lol

23:10 <lekernel> at electrolab (a hackspace near Paris) someone got their hands on a couple of turbopumps. we haven't used them yet, though.

23:10 <lekernel> I was actually thinking about doing the sputtering first

23:10 <azonenberg> Nice

23:11 <azonenberg> I was planning to do thermal evaporaition initially, actually, since i thouhgt it would be easier

23:11 <lekernel> yeah, maybe I'll start with that too :)

23:11 <azonenberg> but if you get sputtering working I might send you guys a few dies to metalize lol

23:12 <azonenberg> the tricky thing with sputtering is gonna be doing it *cheaply*

23:12 <azonenberg> For $3.5K - $5K you can buy a small sputtering rig from MTI or similar

23:12 <lekernel> my #1 problem is time (and then money to build such expensive stuff). i'm doing too much stuff ...

23:12 <azonenberg> Homebrewing cheaper is not going to be easy

23:12 <azonenberg> But evaporation looks like it will be a lot easier to do cheaply

23:12 <lekernel> yeah probably

23:13 <azonenberg> You need a high current, precisely controlled power supply (may be possible to adapt one designed for welding, i may build one for the low-power ~100W prototype)

23:13 <lekernel> with a little effort we can also probably get an old evaporator from the 70s too

23:13 <azonenberg> A 2-stage rotary vane vacuum pump will get me down to ~40 mtorr, i dont know if thats deep enough

23:13 <azonenberg> Ted Pella will sell tungsten boats, filaments, etc for a decent price

23:13 <lekernel> we merely need to rent a van and drive it on some 600km to pick the evaporator up :)

23:13 <azonenberg> As with wire / pellet charges for evaporation

23:13 <lekernel> but again there are time problems

23:14 <azonenberg> I projected (given the pump and vacuum gauge i am thinking of borrowing from a friend) that building a working evaporator would cost ~$1.5K

23:14 <azonenberg> maybe only $1K

23:14 <lekernel> http://paillard.claude.free.fr/ is very cool too

23:15 <lekernel> that guy built his vacuum pumps himself

23:15 <lekernel> including a molecular one

23:15 <azonenberg> Nice, but i dont know french :(

23:15 <lekernel> unfortunately he's stopped doing this

23:16 <azonenberg> And i dont plan to build a pump since i can get access to one

23:16 <azonenberg> Or, at least a roughing pump

23:16 <azonenberg> if high-vac turns out to be necessary i may try my hand at makign a diffusion pump

23:16 <lekernel> sure. but vacuum pumps are otherwise expensive like hell, so it's good if there is a DIY alternative

23:17 <azonenberg> unitednuclear sells a 2-stage rotary vane roughing pump for $295

23:17 <azonenberg> i cant imagine DIYing one for less

23:17 <lekernel> in fact, vacuum anything is expensive like hell, even when it clearly needs not to be

23:17 <azonenberg> Yeah

23:17 <azonenberg> But i am not really focusing on vacuum too much yet

23:17 <azonenberg> I'm designing processes in the order that i'd use 'em

23:17 <azonenberg> and next after spin coating and exposure is etching

23:18 <lekernel> that guy http://benkrasnow.blogspot.com/2011/03/diy-scanning-electron-microscope.html uses spark plugs as voltage feedthrough

23:18 <azonenberg> Yeah, i saw that one

23:18 <lekernel> those otherwise cost around 100-200Â¬ or so at a professional vacuum equipment manufacturer

23:18 <azonenberg> Not bad at all

23:20 <lekernel> rotary vane pumps aren't the worst... the main problem is turbomolecular pumps which are around $8000

23:21 <azonenberg> Turbopumps are not cheap, that's for sure

23:21 <lekernel> and also seem to be easily damaged if for example your vacuum is suddenly broken with the pump running

23:21 <azonenberg> But do you really think you can build one?

23:21 <azonenberg> And yes, that will kill them

23:21 <lekernel> well, apparently Claude Paillard did something like that

23:22 <azonenberg> Impressive

23:22 <lekernel> yeah :)

23:22 <lekernel> his work is amazing

23:22 <azonenberg> But the question i'm asking right now is, how high vacuum is needed for basic evaporation?

23:22 <lekernel> unfortunately he did not publish all the details and he's no longer into that

23:22 <azonenberg> If I purge the chamber with argon or something to remove any traces of oxygen

23:22 <azonenberg> then pump down to 40 microns vacuum

23:22 <azonenberg> will that be adequate?

23:22 <lekernel> that's what I'm thinking too. but why is it that no professional installation does that?

23:22 <azonenberg> I mean, i've seen DC sputtering done at ~100 mtorr

23:23 <azonenberg> Its probably less efficient, slower deposition, etc

23:23 <azonenberg> But for DIY the first rule is "make it work"

23:23 <azonenberg> not "make it cost effective for mass production"

23:24 <lekernel> well, even in research labs when mass production isn't a priority, all sputtering i've heard of is done with first high vacuum then letting a little bit of noble gas in

23:24 <azonenberg> Yeah

23:24 <azonenberg> I'm not sure why

23:24 <lekernel> I'm asking myself the same question.

23:24 <azonenberg> But RF sputtering is normally done at much lower (1-2 mtorr) pressures

23:24 <azonenberg> i'll be doing DC

23:24 <lekernel> but no one has been able to answer it yet

23:25 <azonenberg> Yep, one more item on the todo list

23:25 <azonenberg> I want to set up some kind of proper website for coordinating this, now that i have people interested from all over the place

23:26 <azonenberg> right now i'm the main guy pushing the research, i'm bouncing ideas off of two friends who live near me

23:26 <azonenberg> and there are a bunch of folks i know online who i talk to about it here and there

23:26 <azonenberg> But there's no central location for posting status reports etc

23:27 <azonenberg> Any recommendations on some kind of web-based tool that will work well for it?

23:27 <lekernel> maybe for starters, just a mailing list with public archives?

23:28 <azonenberg> I set up the group "homecmos" on google groups but there's been zero traffic so far lol

23:28 <azonenberg> i havent tried using it much

23:28 <lekernel> personally I don't really like google groups... good old mailman is best

23:29 <azonenberg> Want to host the list somewhere? Be my guest

23:29 <lekernel> I can probably create you a mailman list on lists.milkymist.org

23:29 <lekernel> if you want...

23:31 <azonenberg> that might work... right now i'm still trying to figure out what kind of web presence to have

23:31 <azonenberg> right now its just static html hosted from my office box lol

23:31 <azonenberg> any wiki hosts to recommend?

23:32 <lekernel> otherwise I think sourceforge also provides mailing lists

23:32 <lekernel> wiki... hmm... actually, no

23:32 <lekernel> I use mediawiki and it's awful because of spam problems

23:32 <lekernel> it would not even let you mass delete accounts or edits and comes with no captcha by default

23:32 <azonenberg> As a minimum I want a wiki (posting restricted to registered users probably) and a mailing list

23:33 <lekernel> so a default mediawiki installation is unusable because it gets daily vandalized by bots and you spend hours fixing it

23:33 <azonenberg> Yeah, i run default mediawiki for one project but its internal and on a LAN-only server

23:33 <azonenberg> behind a firewall

23:33 <lekernel> there's also github which provides a wiki

23:33 <azonenberg> grrrr git

23:33 <azonenberg> no

23:33 <lekernel> the nice thing is that the wiki is backed by a git repository

23:33 <azonenberg> prefers svn

23:34 <lekernel> huh? why?

23:34 <lekernel> svn is slower and more unstable than git

23:34 <azonenberg> Never liked distributed vcs in general

23:34 <lekernel> well you can forget about the distributed features if you don't need them

23:34 <azonenberg> i'm a big fan of continuous integration so i want everyone committing to trunk so the code gets as many eyes on it as possible early on

23:34 <azonenberg> git seems to encourage branching to an extent i dislike

23:34 <lekernel> that is possible with git as well

23:34 <azonenberg> but i dont want to start any religious wars lol

23:35 <lekernel> well, personally when I switched from svn to git I don't understand how I have endured svn that long

23:36 <lekernel> corrupt repositories (both on client on server), slowness, bugs, segfaults, crashes, etc.

23:36 <lekernel> I do not use the distributed features of git a lot either (though being able to commit while offline is nice), and use it mostly for its speed and robustless

23:37 <azonenberg> lol i've never seen any of those, but w/e

23:37 <lekernel> robustness

23:37 <azonenberg> Right now i have an svn repo but its pretty empty, migrating wouldnt be hard

23:37 <wpwrak> lekernel: never has stability issues with svn. but i agree on the slowness. once you get used to the speed of git, svn becomes quite unbearable

23:37 <wpwrak> s/has/had/

23:37 <azonenberg> I want the wiki and mailing list first, vcs can be hosted wherever

23:38 <azonenberg> thoughts on google code? They support VCS backed wikis

23:38 <lekernel> wpwrak: well you can try to grab the milkymist tree and commit it in one go to a svn repository. there's a good chance this will fail.

23:38 <lekernel> with git no problem

23:39 <wpwrak> lekernel: hehe, i'll pass :) but we used svn quite extensively at openmoko for many years and i don't remember any stability issues. we actually had more trouble with git :)

23:41 <azonenberg> So I think i'm going to go google on this

23:41 <azonenberg> i already have the group so i'll google-code the wiki

23:42 <lekernel> if you have a good wiki engine to recommend (mediawiki isn't) I can also host it for you

23:43 <azonenberg> lekernel: I dont, unfortunately

23:43 <azonenberg> nice thing about google code is that the wiki is VCS backed

23:43 <azonenberg> So you can even send out commit emails on wiki changes etc

23:43 <lekernel> but I don't want to have more mediawiki problems. one wiki is already enough to get me pissed.

23:44 <azonenberg> Yeah lol

23:45 <wpwrak> lekernel: to paraphrase a joke i once heard about IBM: mediawiki is not a necessary evil. mediawiki is not necessary.

23:47 <lekernel> thoughts about pmwiki?

23:48 <azonenberg> lekernel: Never heard of it, i think i'll run with google for a while and see how it works

23:49 <wpwrak> azonenberg: btw, i agree that vcs-based makes a lot of sense. particularly if you also have an offline renderer/formatter such that you can edit your pages locally and just commit

23:53 <lekernel> btw use of mm w/ video input and camera: http://www.vimeo.com/22966103

23:53 <lekernel> gn8

23:58 <wpwrak> lekernel: (video) nice !