#milkymist on 2013-07-06 — irc logs at freenode.irclog.whitequark.org

2013-05-16 16:04 lekernel changed the topic of #milkymist to: Mixxeo, Migen, Milkymist-ng & other Milkymist projects :: Logs: http://en.qi-hardware.com/mmlogs :: Mixxeo preorder lists.milkymist.org/pipermail/devel-milkymist.org/2013-May/003344.html

03:07 antgreen has quit [Ping timeout: 240 seconds]

03:22 antgreen has joined #milkymist

03:36 bentley` has quit [Remote host closed the connection]

03:45 proppy has joined #milkymist

03:52 bentley` has joined #milkymist

04:23 antgreen has quit [Ping timeout: 248 seconds]

06:05 <azonenberg> woot http://pastebin.com/raw.php?i=ZaYb9mvY

06:06 <azonenberg> my xc2c32a bitstream decompiler is coming along nicely

06:06 <azonenberg> flipflop and clock config is the only thing i have left at the core conceptual level

06:06 <azonenberg> then i need to script decoding of the other 3/4 of the global routing matrix, i only have one quadrant done so far

08:11 Alarm has joined #milkymist

08:17 lekernel has joined #milkymist

08:18 <azonenberg> lekernel: http://pastebin.com/raw.php?i=ZaYb9mvY

08:58 <larsc> azonenberg: pretty neat

08:59 <azonenberg> larsc: it's getting there, at this point i'd say only a few more days (of actual work, depends on my schedule when i can get to it) to have the whole chip pretty much decoded

08:59 <azonenberg> At which point i can stop the reverse engineering and start the forward engineering

08:59 <azonenberg> as well as some refactoring to make it easier to scale the code to larger CR-II devices

08:59 <azonenberg> right now i only support the 32a

08:59 <azonenberg> the 64a will be a fairly easy upgrade as the architectures are almost identical

09:00 <azonenberg> the 128 and larger are a different architecture which would require some code changes

09:01 <larsc> how does the architecture influence the bitstream format? Is it mostly different, or same structure but different data?

09:02 <azonenberg> the low end devices each have one input-only pin, i have not yet decoded the bits for using it

09:02 <azonenberg> the higher-end do not

09:02 <azonenberg> low end are two I/O banks, larger is more (but thats an easy change too)

09:02 <azonenberg> the PLA is identical

09:03 <azonenberg> the global routing is different for each device but i have not yet seen any reason to believe the big are any more different from each other than the small

09:03 <azonenberg> the big difference is the low-end devices have one macrocell format

09:03 <azonenberg> the high-end have two, and it's not the same as that of the low end

09:03 <larsc> hm

09:03 <azonenberg> they introduce a new class "buried macrocells" which are internal logic only, not broken out to pads

09:04 <azonenberg> these are basically a subset of the low-end macrocell

09:04 <azonenberg> minus the I/O config

09:04 <azonenberg> but the non-buried mcells in the high-end devices have a lot of new options

09:04 <azonenberg> for example "datagate"

09:04 <azonenberg> and SSTL capability on the inputs

09:04 <azonenberg> that will take more work

09:04 <azonenberg> first priority is a full F/OSS toolchain for the 2c32a

09:05 <larsc> it's nice seeing you getting there

09:05 <azonenberg> I think in another week i'll be able to go from bitstream to RTL source

09:05 <azonenberg> then use the xilinx tools to re-synthesize

09:05 <azonenberg> and get a bit-for-bit identical output

09:05 <azonenberg> (for the 32a)

09:06 <azonenberg> or at least, logically equivalent

09:06 <azonenberg> there's a few spots where its hard to control what the optimizer does

09:06 <azonenberg> which makes generating certain structures to study tricky :p

09:06 <larsc> if your tool was e.g. a compiler and the cplds be ISAs, would you say the difference between high and and low end is more like the differens between MIPS I and MIPS II or more like between MIPS and ARM?

09:07 <azonenberg> I'd say it's closer to x86 and x86-64 lol

09:07 <larsc> ok

09:07 <azonenberg> Low end macrocell config (xilinx-generated comment in the bitstream)

09:07 <azonenberg> N Aclk ClkOp Clk:2 ClkFreq R:2 P:2 RegMod:2 INz:2 FB:2 InReg St XorIn:2 RegCom Oe:4 Tm Slw Pu*

09:07 <azonenberg> sorry the N shouldnt be there

09:08 <azonenberg> thats the comment marker

09:08 <azonenberg> Buried macrocells in high-end

09:08 <azonenberg> Aclk Clk:2 ClkFreq ClkOp FB:2 P:2 Pu RegMod:2 R:2 XorIn:2

09:08 <azonenberg> I/O macrocells in high end

09:08 <azonenberg> Aclk Clk:2 ClkFreq ClkOp DG FB:2 InMod:2 InReg INz:2 Oe:4 P:2 Pu RegCom RegMod:2 R:2 Slw Tm XorIn:2

09:08 <azonenberg> the DG bit enables/disables datagate but i dont know which is which yet

09:09 <azonenberg> the schmitt trigger (ST) bit was replaced with a two-bit "InMod" field

09:09 <azonenberg> this is to handle the fact that an IOB now has three encodings

09:09 <azonenberg> normal, schmitt trigger, SSTL comparator

09:09 <azonenberg> vs just two

09:09 <azonenberg> i dont know if the fourth value is used for anything

09:09 <azonenberg> then the PLA is the same

09:09 <azonenberg> the global routing differs from device to device

09:10 <azonenberg> global clock stuff i have not decoded for the 32a yet so i dont know how that differs

09:10 <larsc> but that's something that can describded using data (e.g. a big table) or do you have to write different code for each device?

09:11 <azonenberg> Most code is going to be the same for all devices

09:11 <larsc> ok

09:11 <azonenberg> There are several spots i have device-specific stuff though

09:11 <azonenberg> My tools are on track to be *massively* faster than the xilinx ones lol

09:11 <larsc> hehe

09:13 <azonenberg> as a minimum, i'll take muuuuch less time to go from an in-memory model of the device to a bitstream

09:13 <azonenberg> their tool takes like a second and a half

09:13 <azonenberg> lol

09:13 <azonenberg> mine took 260ms to load a bitstream, parse it, decompile it, then reserialize :p

09:13 <azonenberg> ... and most of that time was spent in printf calls in the decompiler

09:14 <azonenberg> this is also debug -O0 builds

09:15 <azonenberg> vs the xilinx shipping release build lol

09:15 <azonenberg> When, in the future, I write any FPGA stuff

09:15 <azonenberg> i will take great pains to design for scalability from the ground up

09:15 <azonenberg> including parallel algorithms from day one

09:15 <larsc> yea

09:16 <azonenberg> But for CPLDs i think optimized serial code is fast enough

09:16 <larsc> cores per machine won't be getting less

09:16 <azonenberg> Lol

09:16 <azonenberg> that, and i have a rack of stuff in the living room

09:16 <azonenberg> i dont want it gathering dust while i build with one core :p

09:17 <larsc> I guess another interesting thing is to actually split the task over multiple physical machines in a network

09:17 <azonenberg> That is the goal

09:17 <larsc> nice

09:17 <azonenberg> I plan to design my FPGA tools to be extremely scalable

09:17 <azonenberg> exactly what algorithms i use are TBD

09:19 <azonenberg> But i was reading some papers that got like 100x speedups on simulated annealing

09:19 <azonenberg> before i do that i'd want to look at more future-looking algorithms that are deterministic thoguh

09:19 <azonenberg> so i dont end up like xilinx's tools :p

09:19 <azonenberg> even if i do twice as much work as the random algorithm

09:20 <azonenberg> on 16 cores i can afford that :p

09:20 <azonenberg> Anyway that's a LONG way out

09:20 <azonenberg> CPLDs first

09:23 <larsc> doesn't vivado use some kind of multivariable function solver to do P&R?

09:23 <azonenberg> So they claim, yes

09:24 <azonenberg> i want to explore such algorithms

09:24 <azonenberg> they seem to give better results

09:24 <azonenberg> Which is why i dont want to implement SA

09:26 <larsc> hm

09:27 <azonenberg> Right now i'm actually tuning some cluster settings to make my research codebase build / test faster

09:28 <azonenberg> i have lots of different dev boards and many test cases could run on any of several

09:28 <azonenberg> so i'm trying to load-balance so that each board is used about the same amount

09:28 <azonenberg> rather than having long queues on a few

12:45 playthatbeat has quit [Quit: KVIrc 4.1.3 Equilibrium http://www.kvirc.net/]

12:49 playthatbeat has joined #milkymist

12:54 antgreen has joined #milkymist

13:15 antgreen has quit [Ping timeout: 264 seconds]

14:21 mumptai has quit [Ping timeout: 256 seconds]

14:39 <lekernel> nice... AMOLED interface standards also use the I€€€ model

14:39 <lekernel> http://www.mipi.org/specifications/display-interface

14:44 <lekernel> oooh but it *is* I€€€

14:44 <lekernel> http://www.ieee-isto.org/member-programs/mipi-alliance

14:49 <GitHub138> [linux-milkymist] larsclausen pushed 4 new commits to master: http://git.io/mmYQZA

14:49 <GitHub138> linux-milkymist/master ac44f7f Lars-Peter Clausen: lm32: Put signal trampoline in static code...

14:49 <GitHub138> linux-milkymist/master d9b73e7 Lars-Peter Clausen: lm32: Directly link against libgcc...

14:49 <GitHub138> linux-milkymist/master f6c70cb Lars-Peter Clausen: lm32: Use free_reserved_area helpers...

15:06 antgreen has joined #milkymist

15:23 <larsc> ysionnea1: https://github.com/milkymist/linux-milkymist/commit/d9b73e7c50bb795ea70962980b183db6675f8049 much easier than copying the files from libgcc

15:41 <lekernel> meanwhile... http://www.raspberrypi.org/phpBB3/viewtopic.php?f=7&t=2061#p39441 :)

15:44 <lekernel> the rpi way: "let's use a members-only standard implemented with a proprietary chip plus an obscure blob, and go conference-hopping in OSHW events"

16:56 <ysionnea1> larsc: oh, very nice trick :)

16:57 <ysionnea1> thanks

16:59 <ysionnea1> lekernel: :/ sad

17:04 lekernel has quit [Ping timeout: 276 seconds]

17:09 <ysionnea1> larsc: ./obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/bin/gcc -print-libgcc-file-name

17:09 <ysionnea1> libgcc.a

17:09 <ysionnea1> that gives me only the filename, not the path

17:11 <larsc> gives me the full path here

17:12 <larsc> ${CROSS_COMPILE}gcc -print-libgcc-file-name

17:12 <larsc> /opt/rtems-4.11/lib/gcc/lm32-rtems4.11/4.5.2/libgcc.a

17:14 <ysionnea1> hum weird I'll have a look at my gcc source tree

17:14 ysionnea1 is now known as ysionneau

17:17 lekernel has joined #milkymist

17:22 <larsc> https://lists.yoctoproject.org/pipermail/poky/2012-March/007675.html

17:27 <ysionneau> macbookprodeyannsionneau:NetBSD fallen$ $PWD/obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/bin/gcc --sysroot=$HOME/dev/NetBSD/obj/tooldir.Darwin-10.8.0-i386/lm32--netbsd/ -print-libgcc-file-name

17:27 <ysionneau> libgcc.a

17:30 <larsc> probably your macbook or something ;)

17:33 <ysionneau> yep, or maybe the lm32--netbsd toolchain with a wrong configuration

18:24 <GitHub196> [linux-milkymist] larsclausen pushed 1 new commit to master: http://git.io/cEnpSw

18:24 <GitHub196> linux-milkymist/master 064001a Lars-Peter Clausen: lm32: Fix idle function...

18:25 <larsc> one fallout from the v3.10 merge, idle didn't work anymore

18:36 <wpwrak_> azonenberg: (My tools are on track to be *massively* faster than the xilinx ones) what's next on your list ? competitive swimming against a pile of rocks ? ;-)

18:40 mumptai has joined #milkymist

18:50 lekernel has quit [Quit: Leaving]

19:04 Alarm has quit [Ping timeout: 252 seconds]

19:44 <azonenberg> wpwrak_: lol

19:44 <azonenberg> I was actually thinking a 100m sprint vs a tree sloth

19:46 * wpwrak_ wonders how the sloth would perform if adding hornets to the equation

19:52 <larsc> a sloth is quite power efficent though

19:53 <wpwrak_> optimally fine-tuned sleep states

19:55 <larsc> by years and years of natural selection, he who sleeps the most survives ;)

22:23 mumptai has quit [Quit: Verlassend]