<wpwrak>
(age of tuner circuit) the html files are from 2007. also the PDF of the schematics says january 2007. amazing, all this looks more 1970-ish ;-)
<roh>
i dont get whats new there
<roh>
isnt it a dead boring default cable tuner?
<wpwrak>
dunno. maybe it has an unusually wide range or such ?
<roh>
not that i can see
<wpwrak>
ah well. maybe just good marketing then ;-)
<terpstra>
hello all!
<terpstra>
I've ported the LM32 from milkymist to a small SoC system for use on our Altera FPGAs. In the process I got the JTAG working and wrote a little tool that talks to the FPGA over the USB Blaster's JTAG. I've been able to happily load and execute small programs via this tool into the SRAM. As my next step I wanted to try to get gdb working (as a pre-step to configuring a kernel for our SoC). I've seen that milkymist somehow uses gdb with the LM32 alr
<terpstra>
eady. Where can I find out more about this?
<Fallenou>
Hi terpstra really nice job :)
<Fallenou>
is your project open source ? do you have a web page or something ?
<Fallenou>
I would like to have a look :)
<terpstra>
it is open source
<terpstra>
but it's not got a project page per-se
<terpstra>
at the moment i've been tasked to evaluate the alternative soft CPUs, and part of that is determining what they can do---jtag, simulation, debugger, toolchain, LUT size, speed, etc
<terpstra>
the contenders are: leon3, openrisc1000, zpu, and lm32
<terpstra>
so if we end up picking the lm32, then it would end up as a visible part of the project
<terpstra>
i'd be happy to send you a tarball tho
<terpstra>
Fallenou, am i mistaken about milkymist and gdb? all i've found is the page about using gdb with qemu, but i'm trying to get it to talk directly to the CPU in-chip
<Fallenou>
I really don't know, sorry
<Fallenou>
but mwalle or lekernel would know
<terpstra>
i already have a working connection to the CPU via JTAG and was just thinking of whipping up a small debug ROM on the instruction bus and a register dump WOM on the data bus
<terpstra>
but i don't want to reinvent the wheel
<terpstra>
especially the wheel that implements the gdbserver protocol
<Fallenou>
You can send an e-mail to the Milkymist mailing list if you want, it would be great to say a little "hello I am doing this and this about lm32" :)
<Fallenou>
it's always good to know what others are doing
<kristianpaul>
wpwrak: (UBB) yes, well, i need the 3V3 to 5V shifter, but is on my todo once i tune and can listen something..
<terpstra>
btw, where can i find a "clean" copy of the milkymist lm32? i've fixed a few bugs in the copy i got from your tree and probably should give you the patches
<lekernel>
it's undocumented however
<lekernel>
and, afaik, not thoroughly tested
<terpstra>
these things are not roadblocks ;)
<lekernel>
terpstra: hmm, there is no copy of the "milkymist lm32" other than in the milkymist github repository
<lekernel>
what kind of bugs have you fixed?
<terpstra>
the jtag tweaks you guys did had problems with clock domains (for altera at least)
<terpstra>
i 'fixed it' by making it use the capture JTAG state instead of just grabbing data on e1dr
<terpstra>
and thus removed sensitivity to stuff that wasn't the clock
<terpstra>
the other problem was that you couldn't flush the icache over jtag
<terpstra>
the jtag write csr was just ignored
<lekernel>
interesting
<terpstra>
this is needed if you load your firmware over jtag (as i do) and then want the cpu to execute it
<terpstra>
also i changed the register file
<terpstra>
you had it using actual registers
<terpstra>
i switched it to use the positive edge EBR implementation
<terpstra>
which costs 2k of memory bits but saves like 1-1.5k LUTs
<lekernel>
mh, last time I checked Xst was able to synthesize the LM32 register file on distributed RAM
<terpstra>
(making the LM32 only 3k instead of 4.5k on cyclone3 and 1.5k on aria2)
<terpstra>
the code i copied was still using lattice blackbox logic for this.. .?
<terpstra>
i switched it to inferred
<lekernel>
ha, so you're not using the milkymist code I think... I stripped out all the lattice logic
<terpstra>
not all
<terpstra>
oh: i also pipelined the multiplier
<terpstra>
that let me get it up to 175MHz
<Fallenou>
wow :)
<lekernel>
in cyclone?
<Fallenou>
nice !
<terpstra>
you stripped out a lot of the ram stuff in the i/d-cache
<terpstra>
cyclone3 is 125MHz
<terpstra>
aria2 is 175
<lekernel>
ah, yes :)
<lekernel>
that's still pretty fast
<terpstra>
i am pretty happy with it
<lekernel>
even without a multiplier at all spartan6 barely reaches 100MHz
<terpstra>
it seems to run stable (and quartus timequest was happy anyway)
<terpstra>
i have everything except DIV enabled
<terpstra>
the DIV seems quite expensive for very poor performance gain :P
<lekernel>
where is the remaining lattice blackbox logic? i'm checking the code atm
<lekernel>
it's probably `ifdef'd out anyway, since Xst doesn't whine
<terpstra>
you need dual port ram
<terpstra>
that's why you didn't do it
<terpstra>
you made a single port lm32_ram
<terpstra>
but the register file needs dual port
<terpstra>
look in lm32_cpu.v
<terpstra>
search down to
<lekernel>
lm32_ram is always supposed to be single port, no?
<terpstra>
search to: 'Register file instantiation as Pseudo-Dual Port EBRs.'
<terpstra>
that code is disabled b/c you don't set:
<terpstra>
`define CFG_EBR_POSEDGE_REGISTER_FILE
<terpstra>
in your include.v
<terpstra>
that saved me a lot of area and might be why mine is faster than yours
<terpstra>
i made a lm32_dp_ram.v
<terpstra>
for inferring dual-port memory and plopped it on top of the lattice blackbox
<terpstra>
they use 2x dual-port for the register file as follows:
<terpstra>
each cycle, the target register in both is updated
<terpstra>
each cycle, the source register r0 is read from the copy0 and r1 from copy1
<terpstra>
so single-port won't cut it
<lekernel>
iirc now the register file is implemented on asynchronous distributed RAM. yeah, maybe putting it into the block RAM might improve things a bit
<terpstra>
like i said, it cut 150% to 100% area for me
<lekernel>
though I doubt it would be as much as 125MHz
<lekernel>
yeah of course, if you had it on pure LUTs in the beginning, it becomes slow and bloated
<lekernel>
with distributed RAM, it's not as bloated as pure LUTs
<terpstra>
afaik, that's what you're doing atm?
<terpstra>
you're using the registers[] array
<Fallenou>
the array if inferred in blockram i guess
<Fallenou>
is*
<terpstra>
reg [`LM32_WORD_RNG] registers[0:(1<<`LM32_REG_IDX_WIDTH)-1];Â Â // Register file
<terpstra>
that didn't become inferred blockram
<terpstra>
and i doubt it can
<terpstra>
since it's used with multiple access
<lekernel>
because the LUT becomes used as an optimized RAM
<lekernel>
i.e. the portion of the LUT that is normally used for configuration stores instead the RAM data
<lekernel>
it's a special mode of the Xilinx LUTs
<terpstra>
i've not used xilinx yet
<lekernel>
and, iirc, Xst infers this mode for the LM32 register file
<kristianpaul>
(125MHz yay !)
<terpstra>
kristianpaul, it gets slower once you hook the WB up to a crossbar
<terpstra>
atm my design only hits 124.6MHz (which is extremely frustrating)
<kristianpaul>
ah :--/
<lekernel>
terpstra: still it'd be interesting to examine the possibility to map it to block RAM
<lekernel>
could you send your patch to the mailing list, along with the jtag fixes?
<terpstra>
lekernel, like i said, i just made an inferred dual port memory and plopped it in
<Fallenou>
please share your code ;)
<terpstra>
give me a pointer to where your clean tree is
<terpstra>
and i'll break my tree into patches wrt. it
<lekernel>
you should use git://github.com/lekernel/milkymist.git as git clone URL
<lekernel>
not the HTTP link
<terpstra>
-.-
<terpstra>
thanks
<terpstra>
you guys use pure verilog, yeah?
<lekernel>
yes
<terpstra>
our interconnect is all vhdl so i guess you don't want that
<roh>
uh. 200$ is surprisingly cheap for a develboard
<Fallenou>
yes
<lekernel>
when we move to our own synthesis technology, it'll be easier if we have only one language :)
<terpstra>
it's quite nice to use, the cyclone3
<roh>
terpstra: well.. does it have nicer tools than quartus ;)
<terpstra>
what's wrong with quartus?
<terpstra>
you can still use joe and make ;)
<terpstra>
ok
<terpstra>
got a patch file with all the whitespace and/or warning-silencing edits removed
<terpstra>
but it kinda rolls all the changes together
<terpstra>
where to send it?
<lekernel>
devel at lists.milkymist.org
<roh>
terpstra: well.. its the same crap as xilinx tools.. in 'getting them' as well as 'installing them'
<roh>
they are _huuuughe_ and the vendors are a pain in the ass with only for account users and stuff..
<lekernel>
roh: you are welcome to send me llhdl contributions ;)
<lekernel>
let's replace this crap
<roh>
lekernel: heh... first i need to be able to WRITE code in verilog. currently i am quite happy to make sense of it when reading. knowing electronics and C helps a lot tho
<roh>
my last attemt to install ISE was prohibited by available diskspace *sigh*
<roh>
what the f*ck do they need 2-digit gbytes for?
<lekernel>
own copy of the C libraries, own copy of the C++ libraries, JVM, own copy of Perl, ....
<roh>
last time i installed quartus i needed win32 for it and it crashed on a (guided) attempt to compile something.. i guess because it was a guide and code for another version or so... sigh.
<lekernel>
plus a ton of bloated pseudo-cross-platform libraries
<roh>
jvm? wtf?
<lekernel>
yeah, some parts of the xilinx toolchain are in java
<terpstra>
ok, email sent
<roh>
if their licence wouldnt suck someone could make it _very_ small i guess.. removing all the stuff already packaged in the distro (if you got an os with packaging and not win32)
<lekernel>
also some of their executables aren't stripped
<terpstra>
i suppose i should subscribe to this list
<lekernel>
terpstra: i'll moderate your message
<larsc>
roh: i guess you are lukey that it doesn't bring its own apache with php
<lekernel>
terpstra: so, you're doing heavy ion research? interesting :)
<terpstra>
i appologize for writing the firmware loader in tcl.... altera still won't give me the headers to talk jtag via C
<terpstra>
that's what GSI has done in the past
<roh>
larsc: these would be actually small ;)
<terpstra>
we will be producing positrons and stuff now
<terpstra>
we need a softcore as part of the control system that runs the accelerator
<scrts`>
how much did You pay for that arriaII pci-e board?
<scrts`>
I am looking for a pci-e board :)
<lekernel>
terpstra: there's also Uwe Bonnes from the Institut für Kernphysik in Darmstadt who's doing similar things
<lekernel>
with softcores I mean
<terpstra>
my colleague says we paid 1500EUR
<terpstra>
for the PCIe
<lekernel>
he's on the milkymist list
<terpstra>
but that's the more expensive development one we use for prototyping
<wolfspraul>
lekernel: do you have some Milkymist news I should mention in the qi february community update?
<lekernel>
just because the placer's heuristic algorithm then picks the wrong numbers
<lekernel>
and so far I haven't really found a better way of getting things to work than to run multiple instances of the place and route on a multicore machine each with a different PRNG sequence
<lekernel>
until one happens to work
<roh>
eh. how do you know which one 'works' ?
<lekernel>
there's a timing model that tells you if the design is ok or not
<lekernel>
this, of course, is also subject to bugs
<roh>
ouch.
<lekernel>
especially with spartan6 it seems
<lekernel>
i've overclocked designs without problems by as much as one nanoseconds
<roh>
the more i learn about fpga and their develtools.. the more frightend i am about the wicked state of affairs. man thats dirty and bloody.
<lekernel>
and, on the other hands, designs that were supposed to meet timing exhibited intermittent issues until I froze the FPGA to some -40C
<lekernel>
fortunately it's rare, but it happens
<lekernel>
bottom line, a good freeze spray is sometimes handy when you're tracking down weird FPGA bugs
<roh>
lekernel: hehe. yes. also when debugging other electronics.
<roh>
even on repairing analog electronics
<lekernel>
roh: all silicon compilers are dirty and bloody. you can read even better stories e.g. on http://deepchip.com/
<terpstra>
lekernel, why are you using such a bad tool for place and route?
<terpstra>
if it's so unreliable, pick another
<lekernel>
there's no other
<terpstra>
or is this the xilinx tools you speak about?
<lekernel>
yeah, it's the xilinx place and route
<terpstra>
quartus seems quite reliable when it comes to synthesis
<terpstra>
and it can show you nicely where and why your design is slow
<lekernel>
well, I had my share of Altera bugs too
<terpstra>
i'm not exactly experienced with these things... only learned vhdl and verilog three months ago ;)
<lekernel>
while I think their software is slightly less bad than Xilinx's, I wouldn't be surprised if such p&r woes also happen with Altera
<terpstra>
you mean the inconsistent timings that come out of placed and routed logic?
<terpstra>
supposedly altera can use timing-driven synthesis to help here
<lekernel>
no, I'm talking about timing model bugs
<terpstra>
i see
<terpstra>
well altera has two different timing analsys tools
<terpstra>
hopefully at least one of them will work ;)
<terpstra>
(tho so far i've only seen both work)
<lekernel>
that sounds quite painful too :)
<terpstra>
nah
<terpstra>
the newer chips use the newer tool "time quest"
<lekernel>
i'd say: let's simply develop our own open source timing engine
<terpstra>
the older ones used some other tool
<terpstra>
lekernel, i think you underestimate how hard it is to do this well
<lekernel>
oh, I never said it was easy
<terpstra>
it's my understanding also that the bitstream you need to program FPGAs is a closely guarded trade secret
<terpstra>
so making a new synthesis tool would ba a PITA
<lekernel>
i'll probably get the complete (reverse engineered) spartan6 xilinx bitstream format in my mailbox during the next weeks :)
<terpstra>
hah
<lekernel>
and btw, it's not that hard to reverse engineer
<terpstra>
are you seriously planning on building your own synthesis tool?
<lekernel>
I wouldn't say it's easy, but it's not impossible or super-hard either
<terpstra>
i didn't say it would be hard, i said it would be a PITA. expect lawsuits coming your way if you use that reverse engineered info.
<lekernel>
phew. that's what everyone says. but i've heard a lot of rumors and all turned out to be false
<terpstra>
as part of installing quartus, i agreed not to reverse engineer it
<terpstra>
(in the licence text(
<terpstra>
don't get me wrong, though: building an open source synthesis tool would be a great project!
<terpstra>
just like gcc is the cornerstone of open source
<lekernel>
iirc the xilinx licenses prohibits decompilation and disassembly. the bitstream format is recovered using black box techniques
<lekernel>
the guy runs the toolchain the normal way, and then uses custom binary analysis tools on the result
<lekernel>
and btw, I'm not even sure Xilinx would go after a project that produces bitstreams for their devices
<lekernel>
there are tons of rumors depicting FPGA companies as "evil guys", but an astonishing amount of them are pure bullshit
<terpstra>
i've been pretty impressed by how open altera has been with me
<terpstra>
i asked them for the header files for their jtag client library
<terpstra>
and mentioned that an NDA would be a problem
<lekernel>
Xilinx even provides (unsupported) lists of all the interconnect in their chips
<terpstra>
... and they sent me complete documentation and a whole whack of source
<lekernel>
i'm even writing a parser for them atm ;)
<lekernel>
those are multi-GB text files
<lekernel>
what's missing is 1. timing information (this will be hard, probably need to build a chip characterization system) and 2. how this information relates to bitstream content (not extremely hard to find out)
<terpstra>
this debug ROM is evil. r0!=0. bad. :)
<lekernel>
terpstra: it's done on purpose... and works nicely
<terpstra>
i know
<terpstra>
i was planning on doing it too ;)
<terpstra>
it's the only available register that one can smash to point to the register save region
<lekernel>
the characterization system would probably involve building various ring oscillators with the elements to characterize in the loop
<lekernel>
then measuring the resulting frequency
<terpstra>
eh?
<lekernel>
and finally solving the system of equations to find out the timing property of each element
<lekernel>
this sounds like a lot of fun
<terpstra>
wouldn't you just trace the signal from inputs to outputs?
<terpstra>
some sort of graph traversal algorithm
<terpstra>
with weights based on chip-specific timing information
<lekernel>
yes, but it's easier to do that automatically and on-chip with a ring oscillator
<terpstra>
why would you want to do timing analysis on-chip?
<lekernel>
oh, the purpose is to recover that timing information
<terpstra>
i'd rather be running it on my fat intel devel system
<terpstra>
oh
<terpstra>
sorry, i gotcha
<lekernel>
it's built into the xilinx software atm
<lekernel>
and we can either reverse engineer the software, or measure it ourselves
<terpstra>
so you want to make what amounts to a bitstream that can measure the delays in the chip
<lekernel>
imo the second technique is more fun, accurate and legal
<terpstra>
and then use that to feed into the timing analysis program
<lekernel>
yes
<terpstra>
nifty
<terpstra>
you would be able to account for per-chip variability that way
<lekernel>
yeah, we'll probably need to run measurements on many chips and at different voltages and temperatures
<lekernel>
but it's easy. all it would need is a jtag probe and a on-board stable clock source
<terpstra>
in terms of open source synthesis tools
<lekernel>
the rest is automated
<terpstra>
i'd be more interested in the front-end stuff
<lekernel>
the front end stuff is already working to some extent
<lekernel>
no... I spent less than two months on that
<lekernel>
at first I focus on producing working netlists
<lekernel>
not necessarily fully optimized
<terpstra>
yep
<lekernel>
though it's already capable of using carry chains and such
<lekernel>
missing optimizations are a good "random logic" LUT mapper (I'm thinking of using the BDS-PGA algorithm)
<terpstra>
i like the idea of a LLVM for hardware a lot
<lekernel>
FSM re-encoding
<lekernel>
and a couple of smaller things, like shift register extraction, large mux extraction, large comparator extraction, ...
<lekernel>
most can be implemented with the current architecture as Mapkit "plug-ins"
<lekernel>
also, there are a couple of things that the Verilog front end doesn't support, e.g. instantiations, parameters, case statements and generate
<lekernel>
lots of work :p
<terpstra>
indeed
<lekernel>
but still not too bad for < 2 months
<terpstra>
i'm somewhat surprised no one else has started doing this already?
<lekernel>
well, there have been attempts
<lekernel>
but usually they all degenerate into sterile debate and often undue trolling towards FPGA companies
<terpstra>
hah
<lekernel>
and, sometimes, fail because of mere technical incompetence
<lekernel>
but I think the main factor is trolling and other management problems
<terpstra>
if i had more time, i'd be interested in helping out
<lekernel>
and the funniest thing is JHDLBits never got squashed by Xilinx
<lekernel>
it's all rumors
<terpstra>
by not interfacing with their code at all
<terpstra>
but building from bitstream+jtag up, you avoid a lot of the stickiness
<lekernel>
LLHDL puts out EDIF, which is a standard format...
<lekernel>
which is then read by the xilinx p&r (for now)
<lekernel>
LLHDL is just the front end, it doesn't do any physical implementation
<lekernel>
this will be handled by a separate project (and now by the fpga vendor's tools, through the standard EDIF interface)
<terpstra>
so you can already run your compiled llhdl?
<terpstra>
that's nice!
<lekernel>
yeah, the verilog file I posted works nicely on the MM1 board
<terpstra>
nice
<lekernel>
with llhdl synthesis
<lekernel>
I expect that in some other two months, it'll be with open source antares p&r and bitstream generation :)
<terpstra>
do i understand this rom that on a breakpoint it saves everything to ram, reports via uart that its ready and then reads the offset (!) of the command to execute?
<lekernel>
mh... I don't know... mwalle wrote this
<terpstra>
you guys are french, yeah?
<lekernel>
not everyone
<lekernel>
actually most people here are German
<lekernel>
and I live in Berlin, though I'm French :)
<terpstra>
i think i understand this ROM enough now to use it. now to try and compile this openocd-lm32 :)
<terpstra>
i live in darmstadt, though i'm canadian
<lekernel>
roh: we can take stuff out of that X-Ray system I told you about. last attempt stopped when I ran into the probably PCB-contaminated oil cooling system
<lekernel>
maybe i'll come later with appropriate gloves etc.
<roh>
*g*
<lekernel>
The number of participants should exceed 10 persons and stay below a maximum of 60 persons.
<lekernel>
ok, who's in?
<roh>
lekernel: ccc berlin does 'hackertours' sometimes.. maybe you should ask for participants there.
<terpstra>
lol
<terpstra>
if you do come to the GSI, let me know
<terpstra>
you can meet the real hardware hackers here
<roh>
terpstra: bring people to the camp 2011 ;)
<roh>
we always want to meet people working on the interresting stuff nobody else understands
<roh>
that always worked good afaik. also we know that the workshop-orga was not good at the congress (there wasnt any) and we need to get better there (wanna help with it? ;)
<terpstra>
doh.  my initial DEBA dosen't match
<lekernel>
well, maybe :)
<lekernel>
terpstra: out of those 1.5K LUTs on Arria2, how many of those LUTs use the "fracturing" feature?
<terpstra>
i don't know what that is ;)
<lekernel>
i'm quite amazed that this FPGA architecture cuts the LUT count in more than half
<terpstra>
yeah
<lekernel>
and I even suspect some figure manipulation ;)
<terpstra>
what is the fracturing feature?
<lekernel>
make two LUTs with one
<terpstra>
i've never seen it mentioned in the consumed resources reports
<terpstra>
i'll copy-paste the relevant bits from the report
<terpstra>
; Family                            ; Cyclone III                                  ;
<terpstra>
; Device                            ; EP3C25F324C6                                  ;
<terpstra>
; Timing Models                      ; Final                                        ;
<terpstra>
; Total logic elements              ; 3,571 / 24,624 ( 15 % )                      ;
<terpstra>
;    Total combinational functions  ; 3,330 / 24,624 ( 14 % )                      ;
<terpstra>
;    Dedicated logic registers      ; 1,650 / 24,624 ( 7 % )                        ;
<terpstra>
; Total registers                    ; 1650                                          ;
<terpstra>
that's for the cyclone3
<terpstra>
i'll rebuild it now for arria2
<lekernel>
ah, it's "logic elements"
<lekernel>
so a LUT fractured in two would count as one
<lekernel>
(imo)
<lekernel>
who, are they shipping cyclone 5 now, or is it the same vaporware as xilinx 7 series?
<terpstra>
argh
<terpstra>
i can't compile for arria2 under linux
<terpstra>
i forgot
<lekernel>
so, you see, software problems with altera too :)
<terpstra>
the stupid parallel port dongle only works under windows :P
<terpstra>
well, if we'd bought linux-friendly licences...
<terpstra>
it will let me target a generic arria2
<terpstra>
but that won't give an accurate fill %age and it picks the smallest that works
<terpstra>
(i don't want to reboot)
<lekernel>
kk never mind
<terpstra>
; Family                            ; Arria II GX                                  ;
<terpstra>
; Met timing requirements          ; N/A                                          ;
<terpstra>
; Logic utilization                ; N/A                                          ;
<terpstra>
;    Combinational ALUTs          ; 1,805                                        ;
<terpstra>
;    Memory ALUTs                  ; 0                                            ;
<terpstra>
;    Dedicated logic registers    ; 1,650                                        ;
<terpstra>
; Total registers                  ; 1650                                          ;
<terpstra>
; Total pins                        ; 4                                            ;
<terpstra>
; Total virtual pins                ; 0                                            ;
<terpstra>
; Total block memory bits          ; 126,976                                      ;
<terpstra>
here it talks about ALUTs instead of logic elements
<terpstra>
in my design, i use a full crossbar interconnect
<terpstra>
so their little example showing a 2* savings is somewhat relevant
<terpstra>
but that's not the majority of the used area ...
<lekernel>
"The benchmark comparison uses 80 real customer designs." ...does the Altera software includes, like the Xilinx one, a mandatory phone home "feature" to gather those statistics?
<terpstra>
it's opt-in
<terpstra>
but, yes
<lekernel>
if you take the free of charge version of the xilinx tool, it's always enabled and you can't "opt out"
<terpstra>
the web edition version of quartus is quite nice
<terpstra>
i use it under linux even though i have a fully licenced windows version
<lekernel>
well, the way to opt out anyway is to delete its curl library, so it isn't too hard
<terpstra>
i just miss the signaltap2 logic analyzer (which to be fair is quite essential)
<terpstra>
and synthesis to the higher end fpgas
<lekernel>
yeah... one should design an open replacement to signaltap/chipscope
<lekernel>
preferably platform independent
<terpstra>
doesn't seem that hard a task really
<lekernel>
no, it isn't
<terpstra>
it could be done as a compiler pass in your llhdl
<lekernel>
sure
<terpstra>
just a bit of tooling of the llhdl to add hooks and a capture logic
<lekernel>
well, in LLHDL you can write the IR to files and manipulate that in custom applications
<lekernel>
you won't even need to touch the core code, just develop an independent utility
<terpstra>
i doubt that would work as cleanly as you envision
<terpstra>
you need access to the original symbol names and hierarhcy
<terpstra>
otherwise the user won't be able to say what signals he wants
<lekernel>
those are accessible from the "external" flow
<terpstra>
(i assume your llhdl will perform optimizations which rename the signals during their work)
<lekernel>
only in the last passes
<lekernel>
but you can hook before aht
<lekernel>
that
<terpstra>
regardless, not a difficult task
<lekernel>
the first passes compile Verilog (and maybe VHDL) without any optimization
<terpstra>
just some work
<lekernel>
and directly write LLDHL interchange files that you could pass to linker, optimizers and mappers
<lekernel>
or fancy things like logic analyzer insertion utilities
<terpstra>
anyway, the moral of that document you sent me seems to be this:
<terpstra>
stratix 2 ALUTs are bigger than stratix 1 / cyclone 3 LEs
<terpstra>
so it's apples and oranges
<terpstra>
don't suppose you know where the openocd.cfg for the lm32 is?
<lekernel>
iirc there were some threads on the mailing list about that some months ago
<lekernel>
Marcus Erlandsson, Chief Technology Officer and Founder, OpenCores
<lekernel>
Abstract: Open-source hardware IP-cores is today the only efficient way of developing the next generation of products. A problem today with product development is that when product complexity increases, the verification workload increases exponentially, which leads to significant higher development costs. Open-source hardware enables companies to significantly reduce verification costs and therefore allow a more cost-effective developme
<lekernel>
nt method.
<lekernel>
oh my...
<lekernel>
there are more bugs in a opencores project than in the average rainforest, and they invite _him_ to talk about _verification_
<lekernel>
omg
<lekernel>
oh, and Arduino "the father of Open Source hardware"
<lekernel>
ok, got it
<lekernel>
I wonder what percentage of that opencores bullshit talk is delusions and what is outright lies to please some investor or (poor) ORSoC customer
<wpwrak>
lekernel: maybe you should speak at such conferences, too ? ;-)
<lekernel>
I don't know
<kristianpaul>
i agree with wpwrak
<wpwrak>
show people that there's life beyond the bovine feces :)
<lekernel>
do they want to know? everyone feels good about blinking LEDs...
<lekernel>
the only way to pull that off is to make big lectures at large/central conferences, preferably where there are lots of journalists and well-known people
<lekernel>
otherwise you're just gesticulating
<tuxbrain_away>
lekernel: If it wasn't for you and for your work I had still part of this bovine feces.... well in fact I'm still there but at least you let me know there are higher grounds out there to walktrough time to time :)
<kristianpaul>
bovine = moo ? ;-)
<scrts>
hm, there are workng cores on opencores!
<scrts>
I mean exists
<scrts>
:))
<lekernel>
yeah, some 0.1%, sadly not including their flagship openrisc
<lekernel>
how many products can you count that reliably use opencores IPs? except the cases where ORSoC claims one is used for "a large customer" which is never named?
<lekernel>
that, and the "tracking everything" e.g. compulsory registration - about which they have double standards, they whine because lattice does the same - "because download statistics are essential to build credibility" (well, they should quit FPGAs and do lolcats and pr0n then)
<kristianpaul>
lekernel: usrp2 i think uses zpu, not sure is that is on opencores
<lekernel>
zpu is on opencores, and zpu is crap
<lekernel>
well, it's not an "official" opencores project
<kristianpaul>
that 0.1% is your sdram controller? ;-)
<kristianpaul>
I saw it on opencores last tiem..
<kristianpaul>
also navre
<lekernel>
nah, there are also some other decent designs there - aeMB for example
<kristianpaul>
who else
<kristianpaul>
?
<terpstra>
hey, jumping in here
<terpstra>
could you be more specific in your rant against openrisc and zpu :)
<lekernel>
well, openrisc uses 3 times as many LUTs as LM32 for half the speed
<kristianpaul>
ah, i remenber you are benchmarking those too, isnt terpstra ?
<terpstra>
yes
<terpstra>
i already more-or-less rejected openrisc as fatter than the leon3 with less functionality
<lekernel>
and, last time I checked, the design contained latches
<terpstra>
but thought maybe you lot had more to say
<lekernel>
which are 1. surprising in a design I thought would be synchronous and 2. undocumented
<terpstra>
some people at CERN really like the ZPU and i need more ammunition against it
<lekernel>
so I guess they come from the usual beginner's HDL pitfall
<kristianpaul>
you fint the right place for that ;-)
<kristianpaul>
find**
<terpstra>
found* ;)
<kristianpaul>
oh, sorry
<lekernel>
I concede the ZPU isn't as crappy as OpenRISC, it's only problem is it's ridiculously slow
<kristianpaul>
what are doing with zpu at CERN?
<terpstra>
same thing we are looking at the LM32 for
<kristianpaul>
oh
<terpstra>
use to run DHCP/ARP/PTP inside a timing controller to coordinate devices
<kristianpaul>
zpu have his own ethernet controller?
<terpstra>
they like the ZPU because it's small. and it is. about 1/3rd the LM32
<terpstra>
nope.
<lekernel>
that's not counting the microcode ROM
<terpstra>
the ethernet core will be a custom wishbone device from us
<lekernel>
though it can be OK in a FPGA if you have spare block RAMs you wouldn't use otherwise
<terpstra>
the microcode is less than 4k i think?
<terpstra>
using a LM32 means you have fat icache and dcache
<terpstra>
which clock in at more
<lekernel>
4K is already a lot of area in an ASIC
<lekernel>
you can disable the LM32 caches, can't you?
<terpstra>
in theory
<lekernel>
and it still would be faster than ZPU
<terpstra>
in practice, without icache you have no JTAG
<scrts>
lekernel, regarding question about used cores from opencores: my collegues use i2c core from opencores in our company, he said it works :)
<terpstra>
speed is clearly in favour of the LM32... but we don't need soooo much speed from something just running dhcp/arp/etc
<lekernel>
terpstra: there's also the navre (AVR core) that I made for the USB controller
<terpstra>
i guess i could fix jtag for the lm32 without icache. it does seem strange that it doesn't have the hooks in the instruction_unit.v
<terpstra>
i have that in my list... i rejected it, because ...
<lekernel>
iirc it's some 1k Spartan-6 LUTs
<terpstra>
things i listed against navre: only 1 committer/he could die (i guess that's you), self-reported status: beta, # of pages documentation: 0, tested # of FPGAs: 2, debug support/JTAG: no, no wishbone bus
<lekernel>
well, yes. I only wanted those damn USB ports to work, not make a softcore
<lekernel>
unfortunately they all were unusable
<terpstra>
it was 1k LUTs on an arria2
<terpstra>
for the navre
<terpstra>
compared to 2-3k for the LM32
<kristianpaul>
terpstra: howfast is zpu i  your cyclone?
<terpstra>
and 500 for the ZPU
<terpstra>
my table must be wrong
<terpstra>
it says 300MHz
<terpstra>
but i don't believe that
<lekernel>
seriously ZPU is super-slow. but if you can live with that slowness, good for you
<kristianpaul>
what? ;)
<lekernel>
oh, it's perhaps really 300MHz
<terpstra>
the instructions do almost nothing tho
<lekernel>
but ZPU takes some 50-100 cycles to do what another processor would do in one
<terpstra>
this was all timed on an arria2
<terpstra>
where LM32 is 175
<terpstra>
so it's 'possible'
<lekernel>
yeah, I'm not surprised
<lekernel>
in terms of clock speed, ZPU was also very fast for me
<terpstra>
i haven't done an in-depth test of the ZPU yet tho
<terpstra>
so take those #s with a grain of saly
<terpstra>
salt*
<terpstra>
the leon3 is pretty nice too
<lekernel>
so your 300MHz ZPU might perform like a 3MHz LM32
<terpstra>
the code is hideous tho
<lekernel>
maybe even worse, depending on code
<terpstra>
i know the ZPU is much slower than the LM32
<terpstra>
but is that the only bad thing i can say?
<terpstra>
i've seen several implementations of the ZPU floating around
<terpstra>
do any of them have JTAG?
<terpstra>
i didn't find any
<lekernel>
if you do the speed/LUT ratio, the ZPU doesn't look that good
<terpstra>
my table is at the bottom of this page btw:
<lekernel>
but you can simulate some software and tune the number of CPU pipeline stages, enable/disable out of order execution, make the CPU superscalar with different issue widths, etc.
<lekernel>
and it would tell you in minutes how fast your software would go
<terpstra>
nios has such configurability as well
<lekernel>
yup. but you need to go through a lengthy logic synthesis and, for unimplemented features, weeks or more of development time
<terpstra>
so, i guess i have finally figured out why openocd doesn't work for me
<terpstra>
it uses the usb blaster jtag directly
<terpstra>
to access jtag devices in the core logic you need to go through the 'jtag sld hub' indirection
<terpstra>
and openocd doesn't know how
<terpstra>
guess i have to teach it. :-/
<lekernel>
scrts: we _might_ participate in GSoC this year. it could _maybe_ be a good opportunity to get that LM32 MMU done
<terpstra>
before a really big open hardware revolution can begin---where vendors end up opensourcing b/c they used quality opensource IPcores ... we need a gcc for hardware.
<terpstra>
looks at lekernel.
<roh>
gcc? naaah. someting free and open.. just not so bitrotten ;) better compare it to clang
<terpstra>
gcc's bitrot is the proof of its success ;)
<roh>
and a problem for everybody wanting to do fancy experiments and develop new stuff for compilers
<terpstra>
sure, llvm is great
<terpstra>
but if there had not been gcc, i doubt there'd have been llvm
<roh>
a friend of mine will travel to nasa soon.. and give a talk at JPL about his 'theorem proover'
<terpstra>
also know as an ML compiler? ;)
<terpstra>
known*
<roh>
sure. my guess is gcc will stay a macroassembler, while llvm will be the future of c compilers
<roh>
and a backend for lots of other languages
<terpstra>
at this point, that's a pretty safe "guess" to make :)
<kristianpaul>
all altera?, "Virtex 6 GTX simulation (Nikhef)." :-)
<terpstra>
cern uses xilinx
<terpstra>
we use altera
<kristianpaul>
haha
<kristianpaul>
nice
<terpstra>
so our stuff has to work on both
<lekernel>
it'll be used at CERN as well?
<terpstra>
yes
<terpstra>
the LHC
<lekernel>
mh? I thought they were done with the design
<terpstra>
to be honest, i'm not entirely clear why they need it either
<terpstra>
we're still in the design phase for the new accelerator here
<terpstra>
i've even heard that want to use it for their RF devices
<antgreen>
you are evicting xilinx's tools from your workflow?
<lekernel>
yes
<lekernel>
but it will take time
<antgreen>
everything worthwhile takes time!
<antgreen>
cool stuff.
<Fallenou>
Applications for mentoring organization for gsoc are now being accepted
<lekernel>
yup. Jon is taking care of it this year
<lekernel>
after last year's experience I'm not that much into applying myself
<mwalle>
terpstra: (debug rom) bascially thats a reverse engineered version of lattice original rom, with some tweaks regarding its size and added 16 bit and 32 bit access
<mwalle>
terpstra: (openocd) my lm32 port is wip (at least it was in progress .. ;) it should be at least possible to set some breakpoints and stepping
<mwalle>
terpstra: BREAK is sent as a JTAG DP (lattice calls it debug protocol, the real jtag commands, not the JTAG UART), isnt it?
<mwalle>
so i guess it should jump to DEBA right after issuing the command
<mwalle>
terpstra: (jtag core) iirc i just coded the (xilinx) jtag core according to xilinx schematics in some user guide. i dont know for sure if capture and reset are synchronous to tck for the xilinx BSCAN cell