lekernel changed the topic of #milkymist to: Mixxeo, Migen, Milkymist-ng & other Milkymist projects :: Logs: http://en.qi-hardware.com/mmlogs :: Mixxeo preorder lists.milkymist.org/pipermail/devel-milkymist.org/2013-May/003344.html
antgreen has quit [Ping timeout: 240 seconds]
antgreen has joined #milkymist
bentley` has quit [Remote host closed the connection]
proppy has joined #milkymist
bentley` has joined #milkymist
antgreen has quit [Ping timeout: 248 seconds]
<azonenberg> my xc2c32a bitstream decompiler is coming along nicely
<azonenberg> flipflop and clock config is the only thing i have left at the core conceptual level
<azonenberg> then i need to script decoding of the other 3/4 of the global routing matrix, i only have one quadrant done so far
Alarm has joined #milkymist
lekernel has joined #milkymist
<larsc> azonenberg: pretty neat
<azonenberg> larsc: it's getting there, at this point i'd say only a few more days (of actual work, depends on my schedule when i can get to it) to have the whole chip pretty much decoded
<azonenberg> At which point i can stop the reverse engineering and start the forward engineering
<azonenberg> as well as some refactoring to make it easier to scale the code to larger CR-II devices
<azonenberg> right now i only support the 32a
<azonenberg> the 64a will be a fairly easy upgrade as the architectures are almost identical
<azonenberg> the 128 and larger are a different architecture which would require some code changes
<larsc> how does the architecture influence the bitstream format? Is it mostly different, or same structure but different data?
<azonenberg> the low end devices each have one input-only pin, i have not yet decoded the bits for using it
<azonenberg> the higher-end do not
<azonenberg> low end are two I/O banks, larger is more (but thats an easy change too)
<azonenberg> the PLA is identical
<azonenberg> the global routing is different for each device but i have not yet seen any reason to believe the big are any more different from each other than the small
<azonenberg> the big difference is the low-end devices have one macrocell format
<azonenberg> the high-end have two, and it's not the same as that of the low end
<larsc> hm
<azonenberg> they introduce a new class "buried macrocells" which are internal logic only, not broken out to pads
<azonenberg> these are basically a subset of the low-end macrocell
<azonenberg> minus the I/O config
<azonenberg> but the non-buried mcells in the high-end devices have a lot of new options
<azonenberg> for example "datagate"
<azonenberg> and SSTL capability on the inputs
<azonenberg> that will take more work
<azonenberg> first priority is a full F/OSS toolchain for the 2c32a
<larsc> it's nice seeing you getting there
<azonenberg> I think in another week i'll be able to go from bitstream to RTL source
<azonenberg> then use the xilinx tools to re-synthesize
<azonenberg> and get a bit-for-bit identical output
<azonenberg> (for the 32a)
<azonenberg> or at least, logically equivalent
<azonenberg> there's a few spots where its hard to control what the optimizer does
<azonenberg> which makes generating certain structures to study tricky :p
<larsc> if your tool was e.g. a compiler and the cplds be ISAs, would you say the difference between high and and low end is more like the differens between MIPS I and MIPS II or more like between MIPS and ARM?
<azonenberg> I'd say it's closer to x86 and x86-64 lol
<larsc> ok
<azonenberg> Low end macrocell config (xilinx-generated comment in the bitstream)
<azonenberg> N Aclk ClkOp Clk:2 ClkFreq R:2 P:2 RegMod:2 INz:2 FB:2 InReg St XorIn:2 RegCom Oe:4 Tm Slw Pu*
<azonenberg> sorry the N shouldnt be there
<azonenberg> thats the comment marker
<azonenberg> Buried macrocells in high-end
<azonenberg> Aclk Clk:2 ClkFreq ClkOp FB:2 P:2 Pu RegMod:2 R:2 XorIn:2
<azonenberg> I/O macrocells in high end
<azonenberg> Aclk Clk:2 ClkFreq ClkOp DG FB:2 InMod:2 InReg INz:2 Oe:4 P:2 Pu RegCom RegMod:2 R:2 Slw Tm XorIn:2
<azonenberg> the DG bit enables/disables datagate but i dont know which is which yet
<azonenberg> the schmitt trigger (ST) bit was replaced with a two-bit "InMod" field
<azonenberg> this is to handle the fact that an IOB now has three encodings
<azonenberg> normal, schmitt trigger, SSTL comparator
<azonenberg> vs just two
<azonenberg> i dont know if the fourth value is used for anything
<azonenberg> then the PLA is the same
<azonenberg> the global routing differs from device to device
<azonenberg> global clock stuff i have not decoded for the 32a yet so i dont know how that differs
<larsc> but that's something that can describded using data (e.g. a big table) or do you have to write different code for each device?
<azonenberg> Most code is going to be the same for all devices
<larsc> ok
<azonenberg> There are several spots i have device-specific stuff though
<azonenberg> My tools are on track to be *massively* faster than the xilinx ones lol
<larsc> hehe
<azonenberg> as a minimum, i'll take muuuuch less time to go from an in-memory model of the device to a bitstream
<azonenberg> their tool takes like a second and a half
<azonenberg> lol
<azonenberg> mine took 260ms to load a bitstream, parse it, decompile it, then reserialize :p
<azonenberg> ... and most of that time was spent in printf calls in the decompiler
<azonenberg> this is also debug -O0 builds
<azonenberg> vs the xilinx shipping release build lol
<azonenberg> When, in the future, I write any FPGA stuff
<azonenberg> i will take great pains to design for scalability from the ground up
<azonenberg> including parallel algorithms from day one
<larsc> yea
<azonenberg> But for CPLDs i think optimized serial code is fast enough
<larsc> cores per machine won't be getting less
<azonenberg> Lol
<azonenberg> that, and i have a rack of stuff in the living room
<azonenberg> i dont want it gathering dust while i build with one core :p
<larsc> I guess another interesting thing is to actually split the task over multiple physical machines in a network
<azonenberg> That is the goal
<larsc> nice
<azonenberg> I plan to design my FPGA tools to be extremely scalable
<azonenberg> exactly what algorithms i use are TBD
<azonenberg> But i was reading some papers that got like 100x speedups on simulated annealing
<azonenberg> before i do that i'd want to look at more future-looking algorithms that are deterministic thoguh
<azonenberg> so i dont end up like xilinx's tools :p
<azonenberg> even if i do twice as much work as the random algorithm
<azonenberg> on 16 cores i can afford that :p
<azonenberg> Anyway that's a LONG way out
<azonenberg> CPLDs first
<larsc> doesn't vivado use some kind of multivariable function solver to do P&R?
<azonenberg> So they claim, yes
<azonenberg> i want to explore such algorithms
<azonenberg> they seem to give better results
<azonenberg> Which is why i dont want to implement SA
<larsc> hm
<azonenberg> Right now i'm actually tuning some cluster settings to make my research codebase build / test faster
<azonenberg> i have lots of different dev boards and many test cases could run on any of several
<azonenberg> so i'm trying to load-balance so that each board is used about the same amount
<azonenberg> rather than having long queues on a few
playthatbeat has quit [Quit: KVIrc 4.1.3 Equilibrium http://www.kvirc.net/]
playthatbeat has joined #milkymist
antgreen has joined #milkymist
antgreen has quit [Ping timeout: 264 seconds]
mumptai has quit [Ping timeout: 256 seconds]
<lekernel> nice... AMOLED interface standards also use the I€€€ model
<lekernel> oooh but it *is* I€€€
<GitHub138> [linux-milkymist] larsclausen pushed 4 new commits to master: http://git.io/mmYQZA
<GitHub138> linux-milkymist/master ac44f7f Lars-Peter Clausen: lm32: Put signal trampoline in static code...
<GitHub138> linux-milkymist/master d9b73e7 Lars-Peter Clausen: lm32: Directly link against libgcc...
<GitHub138> linux-milkymist/master f6c70cb Lars-Peter Clausen: lm32: Use free_reserved_area helpers...
antgreen has joined #milkymist
<larsc> ysionnea1: https://github.com/milkymist/linux-milkymist/commit/d9b73e7c50bb795ea70962980b183db6675f8049 much easier than copying the files from libgcc
<lekernel> the rpi way: "let's use a members-only standard implemented with a proprietary chip plus an obscure blob, and go conference-hopping in OSHW events"
<ysionnea1> larsc: oh, very nice trick :)
<ysionnea1> thanks
<ysionnea1> lekernel: :/ sad
lekernel has quit [Ping timeout: 276 seconds]
<ysionnea1> larsc: ./obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/bin/gcc -print-libgcc-file-name
<ysionnea1> libgcc.a
<ysionnea1> that gives me only the filename, not the path
<larsc> gives me the full path here
<larsc> ${CROSS_COMPILE}gcc -print-libgcc-file-name
<larsc> /opt/rtems-4.11/lib/gcc/lm32-rtems4.11/4.5.2/libgcc.a
<ysionnea1> hum weird I'll have a look at my gcc source tree
ysionnea1 is now known as ysionneau
lekernel has joined #milkymist
<ysionneau> macbookprodeyannsionneau:NetBSD fallen$ $PWD/obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/bin/gcc --sysroot=$HOME/dev/NetBSD/obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/ -print-libgcc-file-name
<ysionneau> libgcc.a
<larsc> probably your macbook or something ;)
<ysionneau> yep, or maybe the lm32--netbsd toolchain with a wrong configuration
<GitHub196> [linux-milkymist] larsclausen pushed 1 new commit to master: http://git.io/cEnpSw
<GitHub196> linux-milkymist/master 064001a Lars-Peter Clausen: lm32: Fix idle function...
<larsc> one fallout from the v3.10 merge, idle didn't work anymore
<wpwrak_> azonenberg: (My tools are on track to be *massively* faster than the xilinx ones) what's next on your list ? competitive swimming against a pile of rocks ? ;-)
mumptai has joined #milkymist
lekernel has quit [Quit: Leaving]
Alarm has quit [Ping timeout: 252 seconds]
<azonenberg> wpwrak_: lol
<azonenberg> I was actually thinking a 100m sprint vs a tree sloth
* wpwrak_ wonders how the sloth would perform if adding hornets to the equation
<larsc> a sloth is quite power efficent though
<wpwrak_> optimally fine-tuned sleep states
<larsc> by years and years of natural selection, he who sleeps the most survives ;)
mumptai has quit [Quit: Verlassend]