Topic for #milkymist is now Milkymist One, Milkymist SoC & Flickernoise development channel (LLHDL/Antares are welcome too) :: Logs: http://en.qi-hardware.com/mmlogs :: JFDI
aw_ joined #milkymist
aw joined #milkymist
wolfspraul joined #milkymist
xiangfu joined #milkymist
aw_ joined #milkymist
Guest42658 joined #milkymist
errordeveloper joined #milkymist
wolfspraul joined #milkymist
Gurty joined #milkymist
rejon joined #milkymist
<GitHub187>
[scripts] xiangfu pushed 2 new commits to master: http://git.io/vmKSDw
<GitHub187>
[scripts/master] reflash_m1.sh snapshot: don't flash data by default - Xiangfu Liu
<GitHub187>
[scripts/master] update the power-on message - Xiangfu Liu
<azonenberg>
lekernel_: Any experience working with multiprocessor softcores?
<azonenberg>
I'm particularly interested in the interconnect
<azonenberg>
in terms of cache coherency and how multiple processors share the bus
<azonenberg>
i'm working on a triple-core SoC from scratch
<azonenberg>
and am designing the interconnect fabric now
<azonenberg>
i'm using a shared bus (only one core can talk at a time, but it's full duplex)
<lekernel_>
what's at the other end of the shared bus? DRAM?
<lekernel_>
also, softcores are slow. why not use dedicated accelerators?
<azonenberg>
A fixed-mapping MMU
<azonenberg>
That splits the address bus between memory mapped IO and DDR2
<azonenberg>
the DDR2 has an L2 cache in front of it
<lekernel_>
MMU? you mean address decoder?
<azonenberg>
each core has its own dedicated L1
<azonenberg>
Basically, yes
<azonenberg>
hardwired mapping
<azonenberg>
The L1 is gonig to be structured in such a way as to be a passthrough for the IO address range and cache DRAM and flash addresses
<azonenberg>
then DRAM and flash will have their own SoC-wide L2 caches
<azonenberg>
I know i'm reinventing the wheel a bit, its mostly an educational exercise
* azonenberg
is writing a dissertation on computer architecture soon and wants to sharpen his skills first
<azonenberg>
But its actually going to be quite fast
<azonenberg>
on spartan6 -2 speed i am shooting for 200 MHz
<azonenberg>
* 2-way superscalar
<azonenberg>
= 800 mflops for 2 cores
<azonenberg>
I had to pipeline the heck out of it, but its looking feasible
<lekernel_>
until you get timing paths into the bus arbiter? :)
<azonenberg>
Actually, the bus arbiter is looking just fine
<azonenberg>
i just did a standalone test of it at 200 mhz and it works just fine
<azonenberg>
on hardware
<azonenberg>
My solution to this thing is, pipeline it like crazy
<azonenberg>
its a barrel processor
<lekernel_>
with all the cores and memory controller connected to it?
<azonenberg>
so a 16 stage pipeline means zero latency
<azonenberg>
and 32 stages means one stall
<azonenberg>
i run 16 threads and context switch every clock
<lekernel_>
ah, i see
<azonenberg>
Right now its looking like when running out of L1 cache with a 16 stage pipeline i will have no stalls
<azonenberg>
despite not having any forwarding whatsoever
<azonenberg>
an L1 cache miss that hits in L2 will most likely stall one instruction
<azonenberg>
if i can fit the L1=>L2 and back in 16 clocks
<azonenberg>
or 2 instructions if it takes me 32
<azonenberg>
as long as i can keep the entire bus structure pipelined
<azonenberg>
this is a very GPU-esque architecture
<azonenberg>
hiding latency by multithreading
<lekernel_>
what about cache miss rates when you have 16 threads switching so fast?
<azonenberg>
I envision it being something like CUDA, each thread executing mostly the same instructions
<azonenberg>
But they can branch as they see fit'
<azonenberg>
The entire architecture is mostly an experiment
<lekernel_>
you should compile dedicated hardware accelerators ...
<azonenberg>
you mean, ASIC level?
<lekernel_>
adding layers over layers makes things slow
<azonenberg>
Sure, go get me $30K and i'll get it fabbed in MOSIS :p
<lekernel_>
yes, generate VHDL from CUDA directly
<azonenberg>
and no, this is mostly an educational exercise
<lekernel_>
no, I mean use the FPGA fabric directly
<azonenberg>
The goal is to see how many flops i can pull out of a softcore CPU
<azonenberg>
running real code
<lekernel_>
softcores are only good to run housekeeping or legacy software
<azonenberg>
also i have a project in mind that will involve me working with non-hardware people
<azonenberg>
I have dedicated accelerators for stuff like JPEG encoding that i'm working on
<azonenberg>
But the flight control code has to be in C
<azonenberg>
or C++
<azonenberg>
or assembly
<azonenberg>
since i am working with CS people who dont knowh hardware
<azonenberg>
So i want to design a nice powerful architecture for them to run it on
<azonenberg>
the other motivation as i said is just cutting my teeth on computer architecture
<azonenberg>
this is not something i envision being a softcore forever, but custom ASICs are not cheap
<azonenberg>
if things go well and it works as planned i might try sending it out to mosis eventually
<azonenberg>
i would love to have a laptop running a CPU i designed
<azonenberg>
in 180nm TSMC or something
<azonenberg>
But i'm not that advanced yet :p
<azonenberg>
I read your post about the latticemico32 synthesis lol
<azonenberg>
and i think my processor will be faster
<azonenberg>
But i'd have to reimplement some of the xilinx hard IP cores like the memory controller
<azonenberg>
and their soft FPU
<azonenberg>
I'm pretty sure i can write a better FPU but i havent gotten around to it yet, and as long as it's interface-compatible with theirs it'd be a drop-in replacement
<lekernel_>
their soft fpu? what's that?
<lekernel_>
you're using coregen for a fpu?
<azonenberg>
Yes, for now
<azonenberg>
i wanted to focus on the datapath and interconnect first
<azonenberg>
then go and write myself an FPU when i had all of the surrounding stuff done
<azonenberg>
in the meantime i have theirs because it tells me an FPU of that size and speed is possible
<azonenberg>
iow, setting a lower bound
<azonenberg>
then i can try and outperform it with an open one
<azonenberg>
Coregen lets you generate floating point add/sub, multiply, divide, and sqrt units separately
<azonenberg>
So i'll replace them with my own one by one
<azonenberg>
But again the focus for now is on the datapath and microarchitecture more than implementation
<lekernel_>
you can use the milkymist pfpu pipelines btw ...
<azonenberg>
The goal here is to practice efficient pipelined architecture
<azonenberg>
So i want to use as little premade code as possible
<azonenberg>
like i said i'm doing a thesis on computer architecture soon and i want practice
<lekernel_>
but you reused the coregen pipelines already :-)
<azonenberg>
Temporarily, so i could build the other stuff around them
<azonenberg>
its not expected to stay
<azonenberg>
if i had used a free one i'd have less incentive to replace it :p
<lekernel_>
so that's what I get for developing free hardware ...
<azonenberg>
production project? Sure
<azonenberg>
But for educational value sometimes its better to reimplement
<azonenberg>
Once i build mine, i'll compare it to yours and any other open ones i find
<azonenberg>
and use the best one in real projects
<kristianpaul>
Once an application for custom ASIC cores, this demanding computer graphics process is now the province of low-cost FPGAs.
<Alarm>
The problem is to download the latest version I'm using wget but it's not great for a set of files
<lekernel>
the M1 downloads the latest version itself
<lekernel>
just connect it to your internet router ...
<wpwrak>
lekernel: (ab-use) what on earth is that presentation about anyway ?
<lekernel>
USB in QEMU it seems
<lekernel>
but I asked myself the same question for a while ;)
Alarm joined #milkymist
<wpwrak>
;-))
<Alarm>
I want to do the update by the jtag for pedagogic reasons. The method "WebUpdate" has no interest for me
<Alarm>
my problem is basic. I am looking for a simple command to download binaries
<Alarm>
"wget-r" aspire all files
DJTachyon joined #milkymist
Gurty` joined #milkymist
mumptai joined #milkymist
* lekernel
is giving orcc a try. of course, hundreds of MB of java bloat to install ...
Alarm_ joined #milkymist
mumptai joined #milkymist
Alarm joined #milkymist
errordeveloper joined #milkymist
juliusb joined #milkymist
mumptai joined #milkymist
<kristianpaul>
some comments from a friend "you can get video switch for 8usd, but mixer.. as minimun do fading from one picture to another"
<kristianpaul>
and please dont be angry with me for posting this, i'm just replying comments
<lekernel>
the M1 isn't a video switch or mixer. the switch functionality is just a little add-on. you can also get an arduino led blinker for $25 which can do the same as the front panel LEDs on the M1... same kind of stupid comparison
<wpwrak>
mixer may be tricky: you need two codecs for that
<wpwrak>
and i'm not sure if the chip we use has multiple codecs inside
<lekernel>
it does not
<lekernel>
M1 was never intended as a video mixer
<kristianpaul>
i'm very exited to bug other friends about M1/FN new features also bring back some feedback
<kristianpaul>
sure not
<lekernel>
the main feature of this software update is image support - and stress that it can be used with MIDI controllers. the rest is secondary.
juliusb_ joined #milkymist
<kristianpaul>
sure sure
<kristianpaul>
and for you hapiness he really likes the pacman video from wpwrak
<wpwrak>
and one more device enumerates :)
<wpwrak>
hehe ;-)
<wpwrak>
we need a few more images per patch. then we can have real games :)
<kristianpaul>
wee :)
<wpwrak>
C64 retro style :)
<wpwrak>
of course, the LV3 is still mute. that one's a tough cookie
juliusb joined #milkymist
antgreen joined #milkymist
<wpwrak>
stekern: the latest patch set may also fix the low-speed regression you experienced.
<wpwrak>
stekern: at least it removes quite a bit of confusion i had added before :)
<stekern>
wpwrak: cool, do you keep those patches in a git repo somewhere?
<wpwrak>
only locally
<stekern>
ok, well, lekernel seems to be quite quick to apply them anyways
<stekern>
I need to sign up on the ML
<wpwrak>
yeah. he probably has his alarm clock connected to "grep PATCH" :)
<mwalle>
lekernel: (usb abuse) thats qemu and it used the hid layer in a strange way