<kristianpaul> Fallenou: so in rtems i just need modify a pointer value to freely point memory isnt? or there are limitations about what i can point considering that the FS is also ram..
<kristianpaul> it may sound stupid, but i need confirm
<Fallenou> kristianpaul: look at how registers are read from and written to
<Fallenou> you van use directly the address
<Fallenou> beware of cache pb, volatile etc
<terpstra> lekernel, were you imagining a separate TLB for instruction and data buses? (seems best given the dual-ported design of the LM32)
<lekernel> yeah
<lekernel> aw: so it seems the new protection system works great
<aw> yes, SCH D.
<aw> i am using the official adapter to record data again though...
<aw> the holdinf current seems that actually higher than any I did before. yup off course it must be. due to 2A fuse.
<aw> meanwile I am watching the temperature especially with our current adapter to see the surroundings temperature around DC jack.
<aw> this is most now I am checking though. :-)
<aw> you can see also the whole marked '2.85A' is my limited from lab. power supply...and it's output/capability won't drop too much when loading. I think that I need to have a 'burning run' for at least 1 week or do a ageing on adapter.
<aw> lekernel, i really forgot that 1A must be available for two host usb, thanks that  last email from you to remind me.
<lekernel> aw: why do you have 40mA going through the diodes at 5V?
<aw> lekernel, where?
<lekernel> table 1 non-reversed 5 4.994 4.992 0.04 / -
<aw> umm..it's no loads condition.(without MM1)
<lekernel> yes
<lekernel> there's another zener in the same series with a 5.6V voltage, maybe it's better to take that one
<aw> no.
<lekernel> with yours the minimum specified zener current is 4.85V, that might explain that 40mA current
<lekernel> s/zener current/zener voltage
<aw> when initially power up on NO LOAD circuit, the fuse is cold as before it goes into 'holding' stage.
<lekernel> you didn't get my point. the thing is that with a 5V voltage, your circuit should consume ZERO power. but instead of that you have 40mA through the diodes.
<aw> fuse can still have current flow even when it stays over a 'holding' value than current slowly rise to its 'cut/trip' current.
<lekernel> do we really want to have this protection circuit continually consume power and get hot?
<aw> hmm..i know you feel strange, later I'll measure again. :-) 5V is less than 5.1V. so you think there should no current. :-)
<lekernel> well your measurement is probably correct
<lekernel> the diode datasheet specifies that the zener voltage can be as low as 4.85V
<aw> i actually haven't not decided to if use this circuit now.
<lekernel> and we do not want that, so I'm suggesting that we take the 5.6V diode instead with a minimum voltage of 5.32V
<aw> when I saw/discovered those temperatures.
<aw> btw, now no matter if picking a 5.6V diode, I can imagine that temperature is stilll existed there though. this is worse than rc2.
<lekernel> it will still get hot **when the user exceeds the specified voltage**
<aw> well the true thing is this h/w batch is better than rc2 to have protection function.
<aw> but yes.
<lekernel> not when they use the recommended adapter
<lekernel> with your zener, it would get hot with the recommended adapter
<lekernel> getting hot when the user does something stupid isn't a problem
<aw> so i do really have not decided this though. I even think that I personally don't like this batch now.
<lekernel> I do. 2A fuse, 5.6V zener, done.
<aw> so I am trying to get how warm our adapter will up?
<lekernel> with the 5.6V zener there should be ZERO current and ZERO heating
<aw> well...good idea on 5.6V though.
<aw> usb spec needs 4.75~5.25V too. so 5.6V diode is over. that's why i picked 5.1V.
<lekernel> yeah I know
<aw> but I can try though to see how low it will be. :-)
<lekernel> but 5.6V for a short period shouldn't do much damage, and is definitely better than having 20V or so if the user is stupud
<lekernel> and there is still good protection for reversed polarity or AC adapters
<aw> but the true conditions on user we don't want them to use 20V adapter.
<lekernel> I know. but the whole point of this protection is to provide some security against human stupidity
<lekernel> and I insist on "some", as stupidity is infinite there can be no fully adequate protection
<aw> so like we declare that board "suggested input range: 4.75 V ~ ?V"..
<aw> yeah
<lekernel> no, we declare it as a *mandatory* input range
<lekernel> no change compared to rc2
<lekernel> but there is an additional safety belt if users do not listen to that
<lekernel> so it ends up doing less damage
<aw> well..wait I am supposedly .not provide a real conditional. :-)
<lekernel> 5.6V is 1N5339BG
<aw> yup...that's why i said that I haven't no idea/ or decided that if we need this h/w patches...too unknown conditions could be happended.
<lekernel> just take that, do some quick testing and go ahead
<aw> umm..yes
<aw> ha...you really want to try that 5.6V even it's over 5.25V for usb?
<lekernel> the wanted result is that the board should a) have no regression b) incur less or no damage when fed inappropriate voltages
<aw> surely i can quickly go for it.
<lekernel> yes, definitely
<lekernel> as I said
<lekernel> 5.6V on USB wouldn't damage much in most cases, and is still a lot better than whatever overvoltage an inappropriate adapter would give
<lekernel> imagine this situation: user plugs a 20V adapter to the M1
<aw> okay..imaginable though...
<lekernel> with USB devices on it
<lekernel> without the protection you get 20V on port, and this will probably break the USB devices
<aw> sorry that i am going to outside now...
<aw> talk yo later.
<lekernel> with the protection you get 5.6V or so for a dozen seconds, and this will probably NOT break the USB devices
<aw> later back to see this. cu
<lekernel> so. 2A fuse, 5.6V zener.
<lekernel> period :-p
<lekernel> cu
<aw> time to go..cu
<terpstra> lekernel, a thought: couldn't a LM32 TLB just work like CSRs work right now?
<terpstra> in 'kernel mode' there is no address translation
<terpstra> in 'user mode' you use a CAM lookup of these TLB registers for the appropriate page
<terpstra> and if there is a miss, a segfault exception is raised
<terpstra> and the OS has to fill in the missing page into some CSRs
<terpstra> we already have to save/restore 32 registers on context switch, so saving some 16-32 extra TLB entries doesn't seem like more more overhead
<terpstra> i guess the CSR namespace has been filled up too much with other CSRs, but a single new instruction 'WTLB' that behaves almost like the 'WCSR' should be enough to get the job done
<wpwrak> terpstra: the kernel also needs to be able to copy to/from user space. better if it can use the TLB for this, instead of having figure out these things "manually". could be a one-shot switch, though. e.g., set a bit that makes the next access use the TLB, then switch back.
<wpwrak> terpstra: another thing for the kernel: for vmalloc, you also want the MMU in kernel space
<terpstra> wpwrak, why does it need the tlb to copy to user space? it knows which page is at which address for the user-space, so it can just copy to the appropriate page's physical address
<wpwrak> terpstra: yes, it's possible but messy
<terpstra> if i recall correctly, the linux kernel already has a function you are supposed to call when accessing user-space memory via a pointer as provided from user-spave
<terpstra> ie: if you get a pointer from user-land via an ioctl, you are supposed to convert it for use inside kernel space
<wpwrak> terpstra: yup. you have these functions. as i said, you can do all this without mmu support, but it's a lot of overhead
<terpstra> not so much overhead as compared to reloading the TLB i'd wager... ?
<wpwrak> terpstra: for example, if you copy a string byte by byte, you need to do a page table lookup and permission check for each access. messy.
<terpstra> what? we would you do that?
<terpstra> do it one lookup for the block transfer
<wpwrak> terpstra: for larger accesses, you also have to check if you're crossing a page boundary
<terpstra> crossing page boundary, sure
<terpstra> but doing a single table lookup per page copied sounds like negligible overhead to me
<wpwrak> terpstra: yes, if this is implemented as a block transfer. this isn't always the case.
<wpwrak> (reloading the tlbs) why not have two ? one for user space and one for kernel space
<terpstra> area cost
<wpwrak> is the cost prohibitively high ?
<terpstra> well TLB will need to be a fairly high associative cache
<terpstra> and we'll need one for each bus already
<terpstra> making kernel-mode need it too doubles the cost
<wpwrak> (each bus) you mean instruction and data ?
<terpstra> yes
<wpwrak> you probably don't need an I-TLB for the kernel. so the extra cost is only +50% ;)
<terpstra> for an FPGA we probably can't make it a fully associative cache like in a real CPU... as we don't want to use tons of registers, so we will need a 2- or 4-way associative TLB in order to use FPGA ram blocks
<terpstra> TLB is going to be really expensive in area i think
<terpstra> going to be slow too. :-/
<wpwrak> well, you could make a really simple TLB (e.g., one entry) and collect statistics :)
<terpstra> you need in sequence: RAM block indexing (based on low page id bits), then comparison of TLB tag to high page bits, a MUX to pick the correct entry in the associative cache, then comparison of TLB result to L1 cache tag for the physical tagging check, finally the signal has to trigger an exception
<terpstra> that's some deep signalling...
<terpstra> all this happens between two clock edges
<wpwrak> yeah. well, you have to do this anyway, whether you have a kernel tlb or not.
<terpstra> yes
<terpstra> but kernel TLB just makes it even bigger ;)
<wpwrak> ah, and you don't need the kernel tlb for kernel/user space access. you'd just reuse the user space tlb. what you need is a way to switch it on while in kernel mode.
<terpstra> maybe just one TLB
<terpstra> and have kernel mode bit enable access to a 'restricted' memory range
<terpstra> then you can happily re-use user-space pointers when copying to/from your kernel-land memory in the restricted range
<wpwrak> not sure how badly you need vmalloc int the kernel. it's kinda frowned upon, not enough that people wouldn't use it ...
<terpstra> the restricted range doesn't go through TLB
<terpstra> think 1GB is enough memory for userland? ;)
<wpwrak> that would be more or less equivalent to a 2GB/2GB split. yes, a possibility
<terpstra> or maybe: 2GB user-land, 1GB kernel land, 1GB memory mapped IO non-cached region
<terpstra> user mode cannot access addresses with high bit set
<wpwrak> you're very generous with that address space :)
<terpstra> addresses with high bit set do not go through TLB
<wpwrak> well, for a first version that'll do. can always be improved later.
<terpstra> unfortunately, my idea of a WTLB instruction won't work
<terpstra> since a TLB entry will need to be 40 bits wide
<terpstra> well, i guess it could be made to work if we have 256 TLB entries. *cackle*
<terpstra> <1 bit user/kernel> <19 bits virtual page number> <12 bits page offset>
<wpwrak> why 40 bits ?
<terpstra> the 19 bits virtual page number = <13 bits TLB tag> <6 bits TLB index>
<terpstra> then your TLB entries have: <13 bits TLB tag> <19 bits physical address>
<terpstra> and it fits!
<terpstra> and only 32 TLB entries needed
<terpstra> (i was imagining a full 20-bits for virtual address and physical address)
<terpstra> this way you can pack it better, though
<wpwrak> ah, regarding the split. it's not so nice, because you'd then have to check that user pointers are in the correct address range, along with overflow issues. probably still better to have a means to just switch the user mode for the next access.
<wpwrak> you also need permission bits: read, write, and execute would be desirable, too
<terpstra> lies
<terpstra> we have two TLBs one for data and one for instruction
<terpstra> so execute means it is in the instruction TLB
<terpstra> i suppose read/write needs a bit, though for the data bus
<wpwrak> very good. so just one for write.
<wpwrak> yes
<terpstra> damn you
<wpwrak> hehe :)
<terpstra> there be not enough bits ;)
<terpstra> should it be possible for a user to map device memory ?
<terpstra> i suppose this is useful especially for a micro kernel
<wpwrak> hmm yes. that would be very nice to have.
<terpstra> so you need a full 20 bit physical address in the TLB
<wpwrak> also for plain user space. think the old architecture of the X server.
<wpwrak> or all my current atrocities surrounding UBB on the ben ;-)
<terpstra> so 20 bits for physical address, 1 bit for read/write flag.....
<terpstra> that means only 11 bits for the tag
<terpstra> i guess if you had 8 bits of TLB index (256 entries... eek)
<terpstra> that's too bgi
<terpstra> big*
<terpstra> or give up on fitting the TLB entry in 32 bits
<terpstra> or go for a bigger page size ;)
<wpwrak> keep things easy - use 1 GB pages :)
<terpstra> 8k page size would mean <19 bits physical address> and thus <12 bits virtual address tag> and only <6 bits for the TLB index>
<terpstra> so back to 32 TLB entries
<terpstra> that is nice
<wpwrak> plus, that way you'll find all the programs that assume that a page is 4 kB :)
<terpstra> they've been fixed already i think
<terpstra> debian must run on stuff with 8k pages by now
<terpstra> afk
<wpwrak> run or stumble :) well, you can try 8 k and if it sucks too much, go to 4 k
<lekernel> can't we just disable address translation in kernel mode?
<lekernel> this way we're also backward compatible with programs like RTEMS stuff that do not use the MMU
<lekernel> they just run in kernel mode all the time
<terpstra> lekernel, that's what i wanted to do too
<terpstra> but wpwrak says its a problem
<terpstra> so what do you think about just grabbing the entire TLB on context switch like we have to handle registers anyway?
<terpstra> it doesn't/shouldn't be so big as the L1 caches anyway
<lekernel> depends... how big is the TLB?
<lekernel> and how do we ensure compatibility with programs that do not use the MMU?
<terpstra> well, i also liked the idea that kernel mode = no MMU... then you have your compatability
<lekernel> I don't think there's a problem, Norman pointed out on the list that Microblaze does that
<terpstra> i've been reading around, and it seems that the TLB for mips isn't so big
<terpstra> even the AMD64 only has 1024 entries
<terpstra> so 32 should be fine i guess
<terpstra> probably 16 is already plenty
<terpstra> R2000 had 64 entries
<terpstra> R4000 had 32 to 64
<terpstra> (so later versions had less entries, which seems suggestive to me)
<lekernel> "TLB is organized as 3-way set associative."
<lekernel> hmm...
<terpstra> yeah, we definitely will need associativity
<lekernel> if we have only 32 entries, it can be fully associative, no?
<terpstra> i suppose we could try without at first tho
<terpstra> problem with fully associative is it rules out using RAM cells
<terpstra> you need full registers then
<terpstra> which is a lot
<terpstra> on my cyclon3 the LM32 needed only like 1k registers for the full design i think
<lekernel> we can also have no associativity and a lot of TLB entries to compensate
<lekernel> so we take advantage of the BRAM
<terpstra> i think for a first version this makes the most sense
<lekernel> but reloading the TLB would take time during context switches then...
<terpstra> however, i don't buy totally into the 2- and 4- way associative is like 2* and 4* bigger cache
<lekernel> though probably not a lot more than those architectures which flush the L1 caches on each context switches
<terpstra> there are many byzantine scenarios that can happen in practise where associativity is >>> more slots
<lekernel> yeah sure
<lekernel> as a general rule x-way associative has better performance than x times the size
<terpstra> but for a first version, i think non-associative makes sense
<lekernel> non portable though
<terpstra> that's nice for you xilinx users
<lekernel> yeah... and xilinx patented the srl16 too
<terpstra> so basically one LUT can decode 4-bit index ?
<terpstra> that's possible on altera too
<terpstra> problem is that you can't reprogram the LUT at run time ;)
<terpstra> i guess this is the value added part of the xilinx approach?
<terpstra> ahh, yes, i see it now
<terpstra> SRL16E diagram
<terpstra> to mimic a SRL16E portably i would need 4 registers, and 3 LUTs i think
<terpstra> anyway
<terpstra> wpwrak, do you realllllly need the mmu in kernel mode?
<wpwrak> terpstra: maybe the best approach is to implement a trivially simple TLB, run a test load (e.g., kernel compilation, emacs, whatever) and keep statistics of what happens. then pick a design accordingly.
<terpstra> we also need a way to determine the address that triggered a TLB miss
<wpwrak> terpstra: (mmu in kernel mode) well, for vmalloc ...
<terpstra> wpwrak, why does vmalloc need an mmu?
<terpstra> can't it just allocate from the physical address space?
<lekernel> terpstra: well I think that having a large non-associative TLB in a block RAM is good for starters
<wpwrak> terpstra: because it can give you virtually contiguous allocations even if your pages are all physically fragmented
<terpstra> Code that uses vmalloc is likely to get a chilly reception if submitted for inclusion in the kernel. If possible, you should work directly with individual pages rather than trying to smooth things over with vmalloc.
<terpstra> lol
<lekernel> s6 FPGAs have RAM blocks of up to 16 kilobits each... a few or even just one of them can hold a sizable amount of TLB entries
<terpstra> i don't think we need/want more than 32 TLB entries
<terpstra> by keeping the TLB small we can more easily just load/store it from the kernel instead of trying to preserve it like the L1 cache
<wpwrak> terpstra: (chilly reception) for sure. yet it exists, so .. :)
<lekernel> terpstra: you mean for encoding the WTLB instruction?
<wpwrak> terpstra: anyway, you can make the kernel tlb fairly inefficient.
<lekernel> I don't see what the problem is with a large TLB, except more context switch overhead
<terpstra> yeah
<terpstra> i don't want context switch overhead
<terpstra> either we need to leave stale TLB entries that get flushed on demand (more work for the hardware)
<wpwrak> terpstra: ah, and i think modules may use the mmu too. so, i-tlb for the kernel as well. life sucks, doesn't it ? :)
<terpstra> or we need to save/restore more TLB entries on context switch
<terpstra> modules get loaded at different addresses
<terpstra> i don't think there's MMU action there
<terpstra> that's why it's a pain to find the symbol of a module from a kernel register dump
<lekernel> otoh a larger TLB means less TLB misses
<lekernel> well
<lekernel> I don't think it'd be hard to make the TLB size configurable with this approach
<lekernel> so we can just try and see :-)
<terpstra> it impacts the layout of the TLB tho
<terpstra> if you want to pack the TLB entries into 32 bits ;)
<terpstra> in a perfect world you could have 32 TLB entries, each 32 bit wide
<terpstra> then it would have a 'normal' LM32 register encoding
<terpstra> ie: a simple WTBL instruction would work just like WCSR does now
<juliusb> just give up on this LM32 stuff, use OpenRISC ;)
<terpstra> ...
<juliusb> We've already got this MMU stuff going
<juliusb> our kernel port is solid, too
<terpstra> hmmmmm
<terpstra> :)
<wpwrak> terpstra: (i-tlb) you're right. doesn't actually run code from the vmalloc'ed region
<juliusb> one interesting experiment I want to do very soon is actually calculate overhead for TLB misses and reloading
<juliusb> and the effect TLB sizing and associativity has on that
<terpstra> juliusb, how does the openrisc do tlb ?
<juliusb> good question. the architecture is fairly flexible - allows various sizes and up to 4-way associativity
<juliusb> i'm not across the details of it specifically off the top of my head
<terpstra> physically tagged and indexed?\
<juliusb> well,...
<lekernel> yeah, let's use openrisc. then the flickernoise framerate would drop to something like 0.2 fps while the FPGA LUT count increases :-)
<juliusb> no, I think virtually tagged
<juliusb> hangon no
<juliusb> lekernel: prove it :)
<juliusb> no I agree, or1200 aint so tiny
<terpstra> juliusb, to be honest i haven't fairly evaluated the openrisc
<terpstra> it is just so big
<juliusb> but, i'm serious about using it if you're considering doing a Linux port
<terpstra> but adding an mmu to the lm32 will make it big too
<juliusb> it's been like 2 years of work for us to just get the kernel port and toolchain to a point where it's usuable now
<lekernel> terpstra: I don't think that a simple TLB in a block RAM would make it very big
<juliusb> we have some good kernel developers now, and the HW seems quite stable across various technologies
<lekernel> my guess is something like 2 BRAM + 200 LUTs, not more
<terpstra> lekernel, the OR is only 6* bigger than the lm32 :)
<juliusb> lekernel: but as described before, you need a lot more than just a block ram, you need a tag ram and then all the appropriate error detection and exception handling logic
<juliusb> for each port
<juliusb> ... it would be an interesting experiment though
<terpstra> yes, juliusb is right that it will cost us
<lekernel> sure, that's what those 200 LUTs are for
<juliusb> .. hey by the way, why do you want to run Linux in the first place??
<terpstra> cause i want debisn!
<terpstra> debian!
<juliusb> it's not a good idea for embedded stuff I argue - you have this MMU mess, and it only gets worse if you want shard library code
<terpstra> ;)
<juliusb> you need all that indirect function calling garbage
<terpstra> (for gsi/cern we don't want linux tbh)
<juliusb> it helps extensibility at the software level, but that's it right?
<terpstra> i am just interested from a hypothetical point of view
<juliusb> i think you sacrifice a lot of performance just to have the basic benefits of a GNU/Linux, namely the plethora of software out there
<terpstra> i agree with you
<lekernel> same here. i'm globally satisfied with RTEMS.
<wpwrak> i think 2-way could be useful to avoid thrashing block copies. a dirty approach would be to have only one entry 2-way. basically if you evict a tlb entry, you move it to the 2nd way.
<wpwrak> (that's for data)
<juliusb> software based on RTOS, however, is far more complicated to write and maintain than stuff that's POSIX compliant for Linux
<terpstra> wpwrak, that's what a victim cache has been for traditionally ;)
<lekernel> not that much
<wpwrak> not sure what code would be most happy with
<juliusb> ...i mean more complicated to write and then port to a new design or architecture etc.
<lekernel> as a matter of fact, a lot of 3rd party POSIX stuff runs almost flawlessly on RTEMS
<lekernel> I have freetype, libpng, libjpeg, libgd, mupdf, ...
<juliusb> ya, I saw RTEMS is POSIX friendly
<terpstra> the main advantage of an mmu: fork()
<juliusb> that is very good
<terpstra> i think most of the rest can be dealt with
<wpwrak> terpstra: aah, already invented. darn.
<terpstra> wpwrak, i didn't mean to invent it---i meant that's the functionality you gain from an mmu
<terpstra> you can't really do fork() without an mmu
<juliusb> but who is going to do the port of the kernel to LM32??
<juliusb> or does it exist already?
<terpstra> there is a uclinux port afaik ?
<juliusb> oh good, 2.4 kernels are fun
<juliusb> :)
<juliusb> there's no such thing as far as I'm aware, it got merged with the mainline a long time ago, no?
<terpstra> i've not used it
<terpstra> i just know lattice claims this
<wpwrak> terpstra: (invented) i meant the victim cache
<terpstra> wpwrak, ack
<lekernel> terpstra: there is a super crappy uclinux port by lattice, which larsc, mwalle, Takeshi and I have improved
<lekernel> it's still not merged upstream though
<terpstra> it's 2.6 or 2.4?
<juliusb> i've just looked, they've got a 2.6 version now
<lekernel> 2.6... in fact we follow upstream
<juliusb> but there's MMU-less kernel now, right? and uClibc
<lekernel> yes
<terpstra> so if an mmu were added, not so hard to get 'proper' linux on it i guess?
<juliusb> what's the difference, then, between uClibc and real kernel?
<juliusb> err, uClinux and real kernel
<juliusb> they strip a lot of crap out of it?
<lekernel> I don't know. I have little knowledge about linux memory management internals
<terpstra> uclibc has nothing to do with mmu or not
<terpstra> uclibc is just a smaller version of libc
<terpstra> uclibc is under 200k compared to > 3MB for glibc
<terpstra> you usually see uclibc + busybox on embedded devices like routers/etc
<terpstra> where you have 8-32MB of RAM
<terpstra> those systems also have an MMU
<juliusb> i'm sure there's some NO_MMU stuff in uClibc
<terpstra> sure, to remove fork() ;)
<wpwrak> crawls to bed and hopes for happy dreams of an mmu :)
<terpstra> you won't be getting fork() without an MMU
<terpstra> and that's why even embedded devices with linux have one
<terpstra> those cheapo little routers, kindles, android phones, etc --- they all have an MMU even when they have almost no memory
<terpstra> (tho the kindle actually has half a GB of ram)
<juliusb> sure, it's ASIC and probably the extra silicon required to put in a n MMU and reduced amount of softwareexecuted to do virtual memory management is worth it
<terpstra> yep
<juliusb> If you're really, really, stretched for area, maybe MMU-less makes sense
<terpstra> we should really see how much area a completely primitive mmu takes
<terpstra> if lekernel is right that it's 200 LUTs or less, then might as well have it on an FPGA too
<lekernel> in milkymist we're only using 44% of the fpga area, so a mmu would get merged provided it does not slow things down or introduce other regressions
<juliusb> I think you'll want all the performance you can get on FPGA running Linux and it would make a lot of sense to have one
<juliusb> We're so concerned about performance on Or1K linux that we're looknig at doing hardware page table lookups instead of handling misses
<juliusb> ... in software
<juliusb> it's really, really, slow
<terpstra> lm32 is very fast
<terpstra> i bet i could write a TLB replacement algorithm that ran in under 50 cycles
<terpstra> possibly even under 30
<juliusb> no, i'm not talking /MHz here, I'm talking overall performane because Linux is just a state-swapping machine
<juliusb> always loading and storing and accessing various process states
<terpstra> i see
<juliusb> terpstra: sure, but what about saving and configuring your state to get into the plcae were you can then do your TLB algorithm in 30 cycles??
<terpstra> that's a good reason to make kernel-land not mmu mapped ?
<juliusb> I think it's a good reason to avoid Linux :)
<terpstra> juliusb, i was including the save/restore in those 30 estimate
<terpstra> if we added an mmu to the lm32 it would launch an exception handler where you do a quick LRU/heap operation and then an eret
<juliusb> terpstra: It's not so much but with a pissweak TLB you're doing it all the time (seriously, every new function call) and it adds up
<terpstra> hmm
<lekernel> juliusb: how many function calls are new?
<lekernel> you're talking about lazy linking, right?
<juliusb> I'm not sure exactly how it works but I'm pretty sure it occurs quite frequently
<juliusb> well, anything outside of the page
<juliusb> well, instruction and data, too, mind you
<terpstra> wouldn't the page with the 'got' stay in the TLB most of the time?
<lekernel> well, the TLB miss on each new function call just hits at application startup
<juliusb> hopefully the data TLB miss doesn't occurr so often
<lekernel> the code gets patched after that and no longer misses the TLB
<terpstra> lekernel, the code doesn't get patched -- the 'got' gets filled
<juliusb> I'm talking about statically linked programs here, I don't know about dynamicaly linked stuff
<terpstra> your function calls to global symbols go via the data bus
<juliusb> we don't have dynamic linking yet in our toolchain, but we're working on it, and it looks like extra headache for userspace execution
<juliusb> s/headache/overhead
<terpstra> yes, indirection is expensive
<terpstra> i'm somewhat skeptical that the TLB miss rate is so high
<juliusb> but, I'm contributing to this discussion because I'm going to be starting some work shortly on really gaugeing the overhead of TLBs
<terpstra> why would the mips folks move from 64 TLB entries to 48 if it is such a problem?
<lekernel_> terpstra: yeah, you're probably right. but in either case, I don't think that lazy linking significantly increases any TLB miss rate.
<juliusb> and our feeling is, after playing with our port, is that TLB misses occur often, and a good way to increase time spent doing useful things, rather than management overhead, is minimising this
<terpstra> juliusb, fair enough.
<terpstra> your current tlb is how big?
<juliusb> 64
<juliusb> we can have up to 128
<terpstra> and you still have lots of misses, eh? that's somewhat worrying. 2-way associative?
<juliusb> but is single way
<terpstra> ah
<terpstra> then i believe you
<juliusb> yes, I want to add ways
<terpstra> most TLB in 'real hardware' is CAN
<juliusb> CAN?
<terpstra> so fully associative
<juliusb> ah ok
<terpstra> sorry, CAM
<terpstra> i typo'd
<lekernel_> 2-way associative looks doable... lm32 does it for the caches
<terpstra> yes
<juliusb> ... or come and pimp out the OR1200's TLBs to do multi-way ;)
<terpstra> hmm
<terpstra> give me the or1k vs. lm32 sales pitch :)
<juliusb> well, I'm not the expert but I know the licensing on LM32 isn't pure BSD (has some taint from LM), whereas or1200 is all LGPL
<terpstra> true
<juliusb> i don't know LM32 architecture so well, but I think OR1K has pretty solid architecture, missing a few key things like atomic synchronisation instructions
<juliusb> but those can be added
<juliusb> OR1200 as an implementation is bad I think
<juliusb> I've been hacking on it for a few years and hopefully had made it better, but certainly it hasn't become leaner and more efficient
<lekernel_> which diminishes your point about LGPL
<juliusb> our toolchain is good now
<terpstra> so your position is that the or1k + toolchain + kernel support is good, but the or1200 implementation is the bad part/
<juliusb> our toolchain was a joke, but now it's good
<juliusb> yes, but it at least as MMUs already in there to save you working on that, but I think having a full on kernel port (we're giong to start pushing for acceptance in GCC and Linux sometime this year) is a pretty big deal
<juliusb> it's a lot of work to add all the bells and whistles
<juliusb> or1200 isn't bad, it's just not awesome
<juliusb> ... i may know of a rewrite in progress
<juliusb> ... but that's a little ways off yet
<lekernel_> gcc/linux kernel: true. but as far as I'm concerned it is not my priority
<terpstra> binutils+gcc for lm32 is already in mainline
<juliusb> lekernel_: I understand you need as much performance as possible, but again I ask why even consider Linux when you need to be productive on almost every cycle, the pitch kind of isn't for that
<terpstra> so here the lm32 is further than the or1k
<juliusb> it's for anyone considering Linux
<lekernel_> neither is the MMU, and I cannot accept the regressions that OR1K would introduce just to get some work already done on the MMU
<juliusb> ok, sure, but we will be sometime this week
<lekernel_> terpstra: otoh the mainline lm32 gcc is often broken... it was somewhat acceptable in gcc 4.5 and was badly broken in 4.6
<terpstra> is there a good document for the or1k comparable to the lm32's archman pdf?
<juliusb> i'm saying as an open source CPU that has a working full on kernel port, I would consider or1200
<lekernel_> maintaining gcc is a pain in the ass
<juliusb> yep, but we have guys doing that
<juliusb> terpstra: yes, we have recently re-worked the architecture spec
<juliusb> cleaned it up, etc
<terpstra> could you toss me a link?
<terpstra> i'd like to read it
<juliusb> http://opencores.org/download,or1k - click on the openrisc_arch_submit4.odt link
<juliusb> it's not in SVN yet I think
<juliusb> we've still got it out for review
<juliusb> but... it's on logincores.org (opencores.org I mean)
<juliusb> hehe
<juliusb> gotta register
<lekernel_> juliusb: I'm not considering linux, except for demos and just the fun of it
<lekernel_> juliusb: when are you going to change that policy?
<terpstra> i have an opencores account, not a problem,
<juliusb> i just had lunch with the guy in charge here, he's not convinced
<juliusb> I tried
<juliusb> he argues that what's the big deal - you're getting access to stuff for free, give us some information so we can provide to advertisers who comes here so we can fund the webserver
<roh> juliusb: never discuss too much with stupid people. work around them.
<juliusb> hehe
<juliusb> well, there's already a fork happening: openrisc.net
<juliusb> they got fedup with opencores
<lekernel_> another irritating thing in opencores policy is the requirement that files be uploaded on your server. which in turns mandates the use of SVN and your web interface, both being a lot inferior than e.g. git and github
<terpstra> ohwr.org
<juliusb> sure, I think they're fighting a losing battle
<juliusb> ohh nice, ohwr.org
<terpstra> (that's where my stuff lives)
<juliusb> cool, thanks
<juliusb> anyway, this is an ongoing thing with OpenCores - they still  don't see, even after talking a lot with them, why they can't take a little if they give a little
<lekernel_> and btw I can't see why running such a webserver would be so expensive
<juliusb> I'm at least trying to get them to dump the forums and bugtracker (both some custom hack they got this young guy to do) and use a mailinglist and bugzilla
<juliusb> ya, well, it shouldn't be, but it is if you go about the wrong way for 3 years
<juliusb> I think their heart is in the right place - they didn't want OpenCores to die and thought they could make it great
<juliusb> but I think they're not so open-sourcey
<juliusb> i probably shouldn't be saying this :P
<juliusb> anyway
<juliusb> it's in flux, I hope, and things will change eventually
<terpstra> meh - until someone writes an opensource hdl toolchain, we don't reallllly have 'opencores' anyway
<lekernel_> well, you're among friends. I'd even dare say you've just joined the #opencores-haters channel *g*
<juliusb> i know well with the guy who started openrisc.net and it'll be interesting to see the response they have
<juliusb> hehe sure, and I'm working hard on OpenRISC and just like to see others getting into the oshw stuff, too
<lekernel_> terpstra: this is under way :p
<juliusb> i come in peace, but I'm employed by ORSoC and feel I should at least try to provide them with good advice on OpenCores
<terpstra> juliusb, i don't hate opencores. i hate the blinky flash adds. ;)
<juliusb> but, anyway, just wnated to point out if you really want Linux on an open source CPU, try Or1K
<juliusb> I think there's some tuning to be done, like anything, but it's probably a good place to start
<terpstra> juliusb, i will read the arch manual and then form a more informed opinion :)
<juliusb> i expect nothing less :)
<juliusb> but I, too, am very interested in the fully open source toolchain for HDL synthesis and backend
<juliusb> hence popping in here the other day to ask lekernel_ about his work so far
<lekernel> he, it's coming :)
<terpstra> juliusb, or1k has a branch delay slot?
<lekernel> wanna help?
<terpstra> wasn't this proven to be a bad idea by mips?
<terpstra> learn from the past! ;)
<lekernel> why is it a bad idea?
<juliusb> architecture is initially from 1999
<lekernel> fwiw microblaze has it, and from studies I've read it does provide a performance advantage
<terpstra> "The most serious drawback to delayed branches is the additional control complexity they entail. If the delay slot instruction takes an exception, the processor has to be restarted on the branch, rather than that next instruction. Exceptions now have essentially two addresses, the exception address and the restart address, and generating and distinguishing between the two correctly in all cases has been a source of bugs for later designs."
<juliusb> precisely
<juliusb> i'm dealing with this now, actually
<lekernel> what is in fact a bad idea is have several delay slots
<lekernel> just one is still reasonable
<terpstra> http://en.wikipedia.org/wiki/Classic_RISC_pipeline -- scroll down to the area where they list the reasons
<terpstra> that reason is just the most pertinent i think
<juliusb> well, I think the control overhead of having one compared to none is far more than from having one compared to two
<lekernel> ok...
<juliusb> it's a hassle for out of order etc
<lekernel> well a lot of features make a mess from exceptions. out of order execution being most infamous for that.
<lekernel> but if you want a simple design, then yeah it's probably better not to have the delay slot
<juliusb> that sounds about right, but it just adds a little bit of extra complexity where you don't want anything extra
<lekernel> it does increase performance, so it's a trade of
<terpstra> lekernel, it increases performance only if the compiler can find a good instruction to put there
<terpstra> which at the end of a basic block usually means putting a 'write to memory'
<juliusb> but pipelines taht run really fast now are very long
<lekernel> yes. but from the paper I've read it still works
<terpstra> but those are precisely the instructions which generate faults
<juliusb> part of the idea was to offload complexity into the compiler from the HW, as the HW development wasn't so advanced right?
<juliusb> but now it just makes things more complicated at the HW level
<terpstra> sure
<terpstra> i am a firm believer in simpler cores, but many corse
<juliusb> and compilers are actually fairly clever now, so I guess that's not an issue, but why cause the HW to be more complex when really there's marginal benefit
<terpstra> we've carried the hardware supporting crappy sequential software about as far as it can go
<lekernel> well... if you have OOO execution, delay slots sure make no sense
<juliusb> yes, as someone who writes, tests and debugs cores, I would eliminate the delay slow
<juliusb> slot
<lekernel> but I wouldn't toss it as a definitely crappy idea either
<terpstra> fair enough
<lekernel> I think it still does some good in some cases.
<juliusb> for OR2K, we propose eliminating them http://opencores.org/or2k/OR2K:Community_Portal
<terpstra> i agree it is a nice way to avoid the wasted instructions you otherwise have
<juliusb> yes,for the simple 4/5 stage pipelines, they do gain you some advantage compared to not, there
<lekernel> yup
<lekernel> juliusb: do you want to help with the synthesis toolchains?
<lekernel> (speaking about delay slots: for the OR2K, sure, eliminate them)
<juliusb> lekernel: probably not right at the moment, sorry, I was just curious to see how it was looking
<juliusb> perhaps in a while, though
<juliusb> I think it's definitely needed and would be very cool
<lekernel> there are some relatively simple things to do, like implementing Verilog case statements
<juliusb> mainly i'd be interested to see an open source synthesis engine
<juliusb> to check the impact of various design choices
<lekernel> (all it's needed is translate those statements to IR muxes)
<lekernel> at least for now, then we'll see how to do things like FSM extraction
<juliusb> cool, if I get some time i'll let you know, will find out how to get started
<lekernel> ok. just ask here or on llhdl@lists.milkymist.org if you have questions or problems.
<juliusb> will do
<lekernel> btw, I was a bit stuck lately with the placement engine
<lekernel> I wanted to do post placement packing, but this is rather hard especially with the current chip database architecture
<lekernel> so I think i'll revert to good old pre-placement packing heuristics for now
<lekernel> not sure how good it's going to work with the relatively complex s6 slices, but we'll see
<lekernel> maybe it works great
<lekernel> as a matter of fact, I think Altera has even more complex logic blocks ("LAB clusters" or something)... and it's not clear how they pack them
<lekernel> also, with post placement packing, I'd lose one of the potential benefits of clustering, which is that the placer algorithm can be faster because it has to deal with fewer elements
<lekernel> so perhaps it's simply a bad idea after all
<terpstra> lekernel, why does an LM32 dcache read (lw instruction) take 3 cycles for result? X stage calculates address, M stage touches cache.... what happens in W stage?
<lekernel> write to register file?
<lekernel> mh
<lekernel> I don't know
<terpstra> but at the end of the M stage it could have used the bypass
<terpstra> just like the 2-stage shift instruction does
<terpstra> hrm
<terpstra> there's an "align" step in the block diagram
<terpstra> so D fetches base register, X adds offset, M fetches the cache, and W 'aligns' the result (and writes back to register file at end of cycle)
<terpstra> what is this magical align?
<lekernel> I guess this is for reading bytes or 16-bit words on any offset
<terpstra> ahhh
<terpstra> and sign extension / etc
<terpstra> makes sense
<terpstra> yes
<terpstra> thanks\
<lekernel> hi xiangfu
<xiangfu> hi
<guyzmo> +-*****
<lekernel> hi guyzmo
<guyzmo> hey :)
<guyzmo> sorry, was plugging in stuff
<guyzmo> damn, so sad rlwrap can't work over flterm :/
<guyzmo> (and all control characters just output garbage)
<guyzmo> hum
<guyzmo> can't get the led par to lighten up :/
<lekernel> did you try it in flickernoise?
<guyzmo> not yet
<guyzmo> of course I'm gonna try it
<lekernel> control panel -> dmx -> dmx table (called "dmx desk" if you have upgraded, but I don't want to be negative here, but I'd tend to bet you did not)
<guyzmo> ok
<lekernel> fortunately the dmx desk works with all released versions :-)
<guyzmo> ;)
<guyzmo> damn, why did I forget my DMX cable :-S
<kristianpaul> Fallenou: (registers) like in the drivers and sys_conf.h?
<kristianpaul> oh, yes i think
<kristianpaul> :p
<guyzmo> grmbl
<guyzmo> none of my XLR cables work with DMX signal
<guyzmo> though I remember we had one of them working
<guyzmo> I will have to get one cable from the Gaîté Lyrique tomorrow
<lekernel> "Since writing it I've made features at 5 micron half-pitch using the camera-port method, and am about to buy a 1-watt 385nm LED as an exposure source. This is way more power than I need so I will be able to use a nice thick diffuser on it. Once the exposure lamp is fixed I should be able to make 75 \lambda square dies at 5 micron resolution using the 40x objective, or 20 micron using the 10x."
<wpwrak> lekernel: the first one looks like a forest ;-)
<lekernel> hi azonenberg
<azonenberg> hi
<lekernel> welcome, honored to see you here :)
<lekernel> i'm sebastien
<azonenberg> ah, k
<azonenberg> Lets move our discussion to here rather than fb chat so other people can see
<azonenberg> The paper i sent you only describes my work at the 15um node
<lekernel> ok :)
<azonenberg> Though i did outline the process that I later reached 5um at
<lekernel> how do you engrave through the silicon?
<azonenberg> I plan to open the project as much as possible btw, all tools etc will be released under an open license (probably BSD or similar)
<lekernel> excellent :)
<azonenberg> Read the FB note (which i need to post publicly somewhere)
<azonenberg> Long story short, apply hardmask (probably Ta2O5) to the silicon by spin coating and heat treatment
<azonenberg> Spin coat photoresist over that
<azonenberg> expose and develop
<lekernel> i see
<azonenberg> Etch hardmask with 2% HF (Whink rust remover, same stuff jeri uses for gate oxide)
<azonenberg> Then etch the silicon using 30% KOH / 15% IPA / 55% water at ~80C
<lekernel> sorry about the dumb question, i'm still going through the pile of material and links on your website and fb :)
<azonenberg> You cant use KOH directly because it will attack the resist
<azonenberg> Lol, no questions are dumb
<azonenberg> For the record i have no formal training in EE myself :P
<azonenberg> my BS (and PhD in a few years) will be in comp sci
<azonenberg> Anyway so the nice thing about KOH is that its very anisotropic
<azonenberg> FeCl3 and similar etchants for copper, if you've ever done home PCB fab, are isotropic - they eat equally in all directions
<azonenberg> So you get rounded sidewalls and such
<azonenberg> But KOH eats along the <100> crystal plane nearly 100x faster than <111>
<azonenberg> And <110> is a hair slower than <100> but not by too much
<lekernel> cool. I talked about this to a fab employee, and he told me I'd never get any good anisotropic etchant because they are super expensive, hard to buy, etc.
<lekernel> if it's just KOH, well... :)
<azonenberg> If you get <110> you can go straight down (assuming your features are parallel to the <111> plane>
<azonenberg> h/o let me send you a paper
<azonenberg> "Fabrication of very smooth walls and bottoms of silicon microchannels for heat dissipation of semiconductor devices"
<azonenberg> Look at the figure they have in there (fig 9 i think?) - 400 micron deep etch with almost vertical sidewalls
<azonenberg> if i didnt know better i'd say it was made with RIE
<azonenberg> that's what i used as the starting point for the comb drive process i have on fbook
<lekernel> warms up his university proxy to get through the cretinous sciencedirect paywall
<azonenberg> Lol
<azonenberg> I have an openvpn server running at a friend's house
<azonenberg> the machine in my office on campus, and my laptop here, tunnel into it
<azonenberg> then the office machine advertises routes to most journal websites ;)
<lekernel> that's more sophisticated than I do... I use ssh redirect and /etc/hosts
<azonenberg> I run OSPF http://pastebin.com/Tn4T2e8k
<azonenberg> .11 is the vpn addy of my box on campus lol
<lekernel> hm... can't reach any server at uni tonight
<azonenberg> mirrors
<lekernel> have you done multilayer yet?
<azonenberg> I havent done any etching yet since i cant afford the materials until my next payday lol
<azonenberg> Look at the date on the paper
<azonenberg> i only got litho working reliably last week
<lekernel> yeah, saw it :)
<azonenberg> this was an unsolved problem for months
<lekernel> man that's awesome work
<azonenberg> cant belive how simple the solution turned out to be lol
<lekernel> best hack i've seen lately :-)
<z4qx> o/
<lekernel> thanks
<lekernel> do you think you can etch vertically like this in e.g. SiO2?
<azonenberg> lekernel: Why not?
<azonenberg> I can buy KOH for $4 a pound
<lekernel> I don't know... since you are relying on the crystal structure
<lekernel> what happens when you grow oxide on a wafer? do you have a neat crystal structure or a messy one?
<azonenberg> First off, i will be buying wafers aligned to <110>
<azonenberg> they arent technically wafers as they arent round, but <110> is hard to find in full wafers for decent prices
<azonenberg> And i will not be growing oxide, also
<azonenberg> iirc they used Si3N4 deposited by LPCVD as a hardmask, but i dont have CVD capabilities
<azonenberg> So i'll be spin coating this stuff http://emulsitone.com/taf.html
<lekernel> so you want to focus on MEMS?
<azonenberg> After heat treating it forms Ta2O5, which is pretty easy to etch with HF
<lekernel> growing oxide is mandatory for most transistors (afaik)
<azonenberg> But it's resistant to alkaline etches
<azonenberg> Dielectric is, it need not be SiO2
<azonenberg> tantalum pentoxide was actually considered as a high-K dielectric for DRAM a while back - it would work
<azonenberg> But emulsitone also sells a SiO2 coating solution
<azonenberg> And, more importantly, i plan to buy a furnace i can do thermal oxidation in
<azonenberg> I just dont have $1200 to spare yet
<azonenberg> i can do bulk micromachining for much less ($500 or so)
<azonenberg> Including all of the consumables
<azonenberg> CMOS is definitely on the to-do list but its down the road
<lekernel> do you know about this? http://visual6502.org/
<azonenberg> among other things because transistors are so sensitive to trace metal contamination whereas MEMS are less so
<azonenberg> Yep
<lekernel> there are also the 4004 masks published by Intel for you to chew on :-)
<azonenberg> I do reversing too
<azonenberg> Lol, um
<lekernel> less transistors than the 6502
<azonenberg> you *do* know that one of my dreams has been to make a 1:1 scale model of the 4004?
<azonenberg> fully functional
<lekernel> haha :)
<azonenberg> But like i said mems is easier so that comes first
<azonenberg> no need for doping or tons of masks, the process i'm looking at only needs three masks and only one even somewhat precise alignment step
<azonenberg> the first mask is contact litho at \lambda = 200um lol
<azonenberg> just thinning the wafer in the middle and leaving a thick rim around the edge for handling
<azonenberg> then the through-wafer etch for the fingers followed by metal 1
<azonenberg> though, as you saw in the paper, getting sub-5um alignment will be pretty easy
<lekernel> another thing that could potentially be interesting is MMIC's
<azonenberg> ?
<lekernel> microwave ICs
<lekernel> those are a pain to buy
<azonenberg> oh... Those will be trickier - tighter tolerances
<lekernel> do you think so?
<lekernel> maybe the transistors are
<azonenberg> Once i get the basic process working i'll see where it goes lol
<lekernel> but a big MMIC advantage is in the ability to print microstrip lines with more precision than on a PCB
<azonenberg> Good point
<azonenberg> Actually, funny thing - i was thinking of making a hybrid of PCB and IC technology at some point to do massively multilayer boards
<lekernel> I actually do not know how to build a good microwave transistor
<azonenberg> Start with dual layer FR4 with copper on both sides
<lekernel> but it does seem to use very nasty chemicals like germane gas
<azonenberg> Pattern your metal 1 and 2 (for power distribution)
<azonenberg> lay down oxide on top of M2
<azonenberg> sputter or evaporate a micron or so of Al or Cu, etch M3
<azonenberg> rinse and repeat lol
<lekernel> germane is one of the few chemicals I dare not touch, close to sarin gas and the like
<azonenberg> What about concentrated HF?
<azonenberg> or SiH4?
<azonenberg> I draw the line at 2% HF myself lol
<lekernel> HF is still a lot less dangerous than germane
<lekernel> even concentrated HF
<azonenberg> Phosgene?
<azonenberg> They use that for ion implantation
<azonenberg> Arsine too
<azonenberg> Neither of those are healthy to be around
<azonenberg> My process will be diffusion based using spin on dopants though
<azonenberg> Less precise but safer and requires less fancy equipment
<azonenberg> just HF wet etch the doped oxide film, coat undoped oxide around it, and heat for a while
<azonenberg> According to wiki, GeH4 is used for CVD epitaxy in a similar manner to SiH4
<azonenberg> So that means they're using germanium based substrates
<lekernel> ok
<lekernel> so no CVD etc.?
<azonenberg> Nope
<azonenberg> I'm ranking processes in order of preferenace
<lekernel> what about metal layers? how can you do them without PVD?
<azonenberg> Spin coating is pretty much impossible to avoid and easy to do (though precise coating thickness control will be a bit tricky until i get a speed controller)
<azonenberg> Metalization will be done by filament evaporation or DC sputtering
<azonenberg> I'm exploring both in parallel and whichever one starts working first is the one i'll use
<azonenberg> though eventually i want both
<azonenberg> Thermal diffusion id going to be necessary for CMOS but not MEMS
<azonenberg> is*
<azonenberg> or at least, not the comb drive
<azonenberg> No, actually, I havent
<azonenberg> But i do have a friend doing research in sputtering
<lekernel> there you can get your metal layers :-)
<azonenberg> Metalization was my second area to focus on after litho
<azonenberg> To be done in parallel with etching
<azonenberg> I really havent studied it in nearly as much depth lol
<lekernel> at electrolab (a hackspace near Paris) someone got their hands on a couple of turbopumps. we haven't used them yet, though.
<lekernel> I was actually thinking about doing the sputtering first
<azonenberg> Nice
<azonenberg> I was planning to do thermal evaporaition initially, actually, since i thouhgt it would be easier
<lekernel> yeah, maybe I'll start with that too :)
<azonenberg> but if you get sputtering working I might send you guys a few dies to metalize lol
<azonenberg> the tricky thing with sputtering is gonna be doing it *cheaply*
<azonenberg> For $3.5K - $5K you can buy a small sputtering rig from MTI or similar
<lekernel> my #1 problem is time (and then money to build such expensive stuff). i'm doing too much stuff ...
<azonenberg> Homebrewing cheaper is not going to be easy
<azonenberg> But evaporation looks like it will be a lot easier to do cheaply
<lekernel> yeah probably
<azonenberg> You need a high current, precisely controlled power supply (may be possible to adapt one designed for welding, i may build one for the low-power ~100W prototype)
<lekernel> with a little effort we can also probably get an old evaporator from the 70s too
<azonenberg> A 2-stage rotary vane vacuum pump will get me down to ~40 mtorr, i dont know if thats deep enough
<azonenberg> Ted Pella will sell tungsten boats, filaments, etc for a decent price
<lekernel> we merely need to rent a van and drive it on some 600km to pick the evaporator up :)
<azonenberg> As with wire / pellet charges for evaporation
<lekernel> but again there are time problems
<azonenberg> I projected (given the pump and vacuum gauge i am thinking of borrowing from a friend) that building a working evaporator would cost ~$1.5K
<azonenberg> maybe only $1K
<lekernel> http://paillard.claude.free.fr/ is very cool too
<lekernel> that guy built his vacuum pumps himself
<lekernel> including a molecular one
<azonenberg> Nice, but i dont know french :(
<lekernel> unfortunately he's stopped doing this
<azonenberg> And i dont plan to build a pump since i can get access to one
<azonenberg> Or, at least a roughing pump
<azonenberg> if high-vac turns out to be necessary i may try my hand at makign a diffusion pump
<lekernel> sure. but vacuum pumps are otherwise expensive like hell, so it's good if there is a DIY alternative
<azonenberg> unitednuclear sells a 2-stage rotary vane roughing pump for $295
<azonenberg> i cant imagine DIYing one for less
<lekernel> in fact, vacuum anything is expensive like hell, even when it clearly needs not to be
<azonenberg> Yeah
<azonenberg> But i am not really focusing on vacuum too much yet
<azonenberg> I'm designing processes in the order that i'd use 'em
<azonenberg> and next after spin coating and exposure is etching
<lekernel> that guy http://benkrasnow.blogspot.com/2011/03/diy-scanning-electron-microscope.html uses spark plugs as voltage feedthrough
<azonenberg> Yeah, i saw that one
<lekernel> those otherwise cost around 100-200¬ or so at a professional vacuum equipment manufacturer
<azonenberg> Not bad at all
<lekernel> rotary vane pumps aren't the worst... the main problem is turbomolecular pumps which are around $8000
<azonenberg> Turbopumps are not cheap, that's for sure
<lekernel> and also seem to be easily damaged if for example your vacuum is suddenly broken with the pump running
<azonenberg> But do you really think you can build one?
<azonenberg> And yes, that will kill them
<lekernel> well, apparently Claude Paillard did something like that
<azonenberg> Impressive
<lekernel> yeah :)
<lekernel> his work is amazing
<azonenberg> But the question i'm asking right now is, how high vacuum is needed for basic evaporation?
<lekernel> unfortunately he did not publish all the details and he's no longer into that
<azonenberg> If I purge the chamber with argon or something to remove any traces of oxygen
<azonenberg> then pump down to 40 microns vacuum
<azonenberg> will that be adequate?
<lekernel> that's what I'm thinking too. but why is it that no professional installation does that?
<azonenberg> I mean, i've seen DC sputtering done at ~100 mtorr
<azonenberg> Its probably less efficient, slower deposition, etc
<azonenberg> But for DIY the first rule is "make it work"
<azonenberg> not "make it cost effective for mass production"
<lekernel> well, even in research labs when mass production isn't a priority, all sputtering i've heard of is done with first high vacuum then letting a little bit of noble gas in
<azonenberg> Yeah
<azonenberg> I'm not sure why
<lekernel> I'm asking myself the same question.
<azonenberg> But RF sputtering is normally done at much lower (1-2 mtorr) pressures
<azonenberg> i'll be doing DC
<lekernel> but no one has been able to answer it yet
<azonenberg> Yep, one more item on the todo list
<azonenberg> I want to set up some kind of proper website for coordinating this, now that i have people interested from all over the place
<azonenberg> right now i'm the main guy pushing the research, i'm bouncing ideas off of two friends who live near me
<azonenberg> and there are a bunch of folks i know online who i talk to about it here and there
<azonenberg> But there's no central location for posting status reports etc
<azonenberg> Any recommendations on some kind of web-based tool that will work well for it?
<lekernel> maybe for starters, just a mailing list with public archives?
<azonenberg> I set up the group "homecmos" on google groups but there's been zero traffic so far lol
<azonenberg> i havent tried using it much
<lekernel> personally I don't really like google groups... good old mailman is best
<azonenberg> Want to host the list somewhere? Be my guest
<lekernel> I can probably create you a mailman list on lists.milkymist.org
<lekernel> if you want...
<azonenberg> that might work... right now i'm still trying to figure out what kind of web presence to have
<azonenberg> right now its just static html hosted from my office box lol
<azonenberg> any wiki hosts to recommend?
<lekernel> otherwise I think sourceforge also provides mailing lists
<lekernel> wiki... hmm... actually, no
<lekernel> I use mediawiki and it's awful because of spam problems
<lekernel> it would not even let you mass delete accounts or edits and comes with no captcha by default
<azonenberg> As a minimum I want a wiki (posting restricted to registered users probably) and a mailing list
<lekernel> so a default mediawiki installation is unusable because it gets daily vandalized by bots and you spend hours fixing it
<azonenberg> Yeah, i run default mediawiki for one project but its internal and on a LAN-only server
<azonenberg> behind a firewall
<lekernel> there's also github which provides a wiki
<azonenberg> grrrr git
<azonenberg> no
<lekernel> the nice thing is that the wiki is backed by a git repository
<azonenberg> prefers svn
<lekernel> huh? why?
<lekernel> svn is slower and more unstable than git
<azonenberg> Never liked distributed vcs in general
<lekernel> well you can forget about the distributed features if you don't need them
<azonenberg> i'm a big fan of continuous integration so i want everyone committing to trunk so the code gets as many eyes on it as possible early on
<azonenberg> git seems to encourage branching to an extent i dislike
<lekernel> that is possible with git as well
<azonenberg> but i dont want to start any religious wars lol
<lekernel> well, personally when I switched from svn to git I don't understand how I have endured svn that long
<lekernel> corrupt repositories (both on client on server), slowness, bugs, segfaults, crashes, etc.
<lekernel> I do not use the distributed features of git a lot either (though being able to commit while offline is nice), and use it mostly for its speed and robustless
<azonenberg> lol i've never seen any of those, but w/e
<lekernel> robustness
<azonenberg> Right now i have an svn repo but its pretty empty, migrating wouldnt be hard
<wpwrak> lekernel: never has stability issues with svn. but i agree on the slowness. once you get used to the speed of git, svn becomes quite unbearable
<wpwrak> s/has/had/
<azonenberg> I want the wiki and mailing list first, vcs can be hosted wherever
<azonenberg> thoughts on google code? They support VCS backed wikis
<lekernel> wpwrak: well you can try to grab the milkymist tree and commit it in one go to a svn repository. there's a good chance this will fail.
<lekernel> with git no problem
<wpwrak> lekernel: hehe, i'll pass :) but we used svn quite extensively at openmoko for many years and i don't remember any stability issues. we actually had more trouble with git :)
<azonenberg> So I think i'm going to go google on this
<azonenberg> i already have the group so i'll google-code the wiki
<lekernel> if you have a good wiki engine to recommend (mediawiki isn't) I can also host it for you
<azonenberg> lekernel: I dont, unfortunately
<azonenberg> nice thing about google code is that the wiki is VCS backed
<azonenberg> So you can even send out commit emails on wiki changes etc
<lekernel> but I don't want to have more mediawiki problems. one wiki is already enough to get me pissed.
<azonenberg> Yeah lol
<wpwrak> lekernel: to paraphrase a joke i once heard about IBM: mediawiki is not a necessary evil. mediawiki is not necessary.
<lekernel> thoughts about pmwiki?
<azonenberg> lekernel: Never heard of it, i think i'll run with google for a while and see how it works
<wpwrak> azonenberg: btw, i agree that vcs-based makes a lot of sense. particularly if you also have an offline renderer/formatter such that you can edit your pages locally and just commit
<lekernel> btw use of mm w/ video input and camera: http://www.vimeo.com/22966103
<lekernel> gn8
<wpwrak> lekernel: (video) nice !