<ysionneau>
not sure I understand how/why they use a lot of "software" terms for something in an fpga
<ysionneau>
for instance they compare their garbage collector which seems to manage some kind of heap implemented on blockram
<ysionneau>
with "malloc"
<ysionneau>
is there some kind of "hardware malloc" somewhere? :o
<ysionneau>
never heard of that
Alain__ has joined #m-labs
rjo_ has quit [Ping timeout: 252 seconds]
Alain__ has quit [Remote host closed the connection]
<sb0>
"By uniform we mean that the shape of the objects (the size of the data fields and the location of pointers) is fixed."
<sb0>
that's cheating
<sb0>
even string manipulation won't work with that
<sb0>
well, you could split long strings into a linked list of uniform objects
<sb0>
"For the first time, garbage collection of programs synthesized to hardware is practical and realizable." meh
<sb0>
the right thing to do with this is a Python machine :-) and the block RAMs should be SDRAM-backed caches.
rofl__ is now known as sh4rm4
<ysionneau>
sb0: the thing is, I don't even understand the point of what they are doing
<ysionneau>
what's the point of a "hardware" GC ?
<sb0>
make it faster than a software GC
<ysionneau>
ah so the point is to handle the garbage collection of the software, ok
<ysionneau>
I was thinking it would be in order to manage dynamic allocations of hardware buffers or dynamically allocated stuff in the fpga
<ysionneau>
but no, ok it's for software
<ysionneau>
so that would need to be tightly coupled with the MMU I guess
<sb0>
in their case they suggest using it for some hw-synthesized algo that uses dynamic memory. it sounds to me most HW accelerators don't need that, but it might make sense to have hardware GC in a CPU.
<sb0>
one thing I can think about is a CPU where registers contain pointers at all time, that would address such "uniform" objects
<sb0>
and you put the BRAM in the pipeline (which adds 2 stages)
<sb0>
then you can implement HW GC, and accelerated duck typing
<sb0>
and since those uniform objects are relatively large, you can use the extra space for doing SIMD/vector operations
<sb0>
gee the official Python/LLVM backend is terrible... seems they even have debug print's still laying around
<ysionneau>
sb0: ok I see
<ysionneau>
thanks for the light
<ysionneau>
llvm seems young and quickly evolving which can be a pain
<ysionneau>
but it also seems the way forward
<ysionneau>
to be*
<sb0>
yeah, llvmpy.org (the decent binding) doesn't work with dev/3.5
<ysionneau>
if you use submodules you could freeze llvmpy to the one version that works for you :/
<ysionneau>
if you find one :p
<ysionneau>
then you only struggle with updates when you are ready to do so ^^
<sb0>
been trying... couldn't find a version of llvm-or1k that would 1) work on its own 2) be compatible with llvmpy
<sb0>
also I can't build llvm-lm32 anymore for some reason, tblgen segfaults when processing LM32.td
<ysionneau>
speaking about llvm, what was the conclusion about gcc 4.9 for lm32? it works well? C? C++? or still a bit buggy on C++?
<sb0>
seems -or1k is generally more up-to-date, and with more people working on it
<ysionneau>
last time I tried I would not compile it, and I didn't retry
<ysionneau>
it would not compile*
<sb0>
gcc-lm32 4.9 generally works fine
<sb0>
the only source I've had problems with is this lnfpus.i that I posted on the list, which cause a ICE
<ysionneau>
17:44 < sb0> seems -or1k is generally more up-to-date, and with more people working on it < yeah they have a lot of dedicated people working on all toolchains aspects
<ysionneau>
*and* the support from a company which job is to do compilers
<ysionneau>
^^"
<ysionneau>
sb0: ok
<sb0>
or1k also doesn't have those stupid 'export control' clauses in its license
<ysionneau>
really?! you've got to check some crazy blacklist?
<ysionneau>
whaaa
<ysionneau>
but ... the fact that now it's on github ... basically violates the export rule doesn't it?
<ysionneau>
on github Iraq, north korean people etc can just see the code
<ysionneau>
and the black listed people as well
<sb0>
yeah... though I know of no one respecting that clause, not even Lattice themselves :-) (you could download it freely from their FTP, up until last year or so)
<ysionneau>
rha it sucks so much that they put this ridiculous license
<ysionneau>
means that "if they wish", they can just ask for removal on github
<ysionneau>
-_-
<sb0>
up until recently, the alternative to lm32 was to use a ludicrously bloated, slow and/or buggy CPU, but mor1kx is becoming reasonable now...
<sb0>
I'm going to get some hard numbers on mor1kx vs. lm32 ...
<ysionneau>
I guess mor1kx is the way forward if the performance is there
<ysionneau>
they have so much software/toolchain support
<ysionneau>
and a clean license(?)
<sb0>
regarding toolchain, their GCC isn't upstream yet
<sb0>
and there hasn't been a binutils release since it was merged. so it's actually a bit more painful to build than for lm32.
<ysionneau>
they have upstream linux support
<ysionneau>
but for very embedded stuff you don't care about linux
<ysionneau>
but it's just cool
<sb0>
do you know of any good CPU benchmark tools btw?
<ysionneau>
not at all, never did cpu benchmarking
<sb0>
that do a good number of SDRAM/bus accesses (unlike dhrystone) but still do not use a lot of libc/OS calls
<ysionneau>
I guess you could use things like sorting algorithm
<ysionneau>
so which one do we rewrite in Migen ? mor1kx or lm32 ? :)
<stekern>
and I have 16KB of cache in that config
<stekern>
which reminds me of something unrelated, when I was poking at the then milkymist-ng, I noticed that wrapped wb bursts weren't supported. Has that changed?
<sb0>
no, they are still unsupported
<sb0>
it expands the instruction to "l.mfspr r3,r0,6"
<sb0>
those spr instructions work correctly in the crt. are there restrictions on what registers can be used?
<stekern>
no, but it might be that it doesn't grok that what you are feeding that function is indeed a constant
<stekern>
ysionneau: why not both ;)
<sb0>
the 6?
<stekern>
umm, no the 6 should be fine.
<stekern>
ok, it's actually the assembler that chokes on that?
<ysionneau>
when I see this error message it's the assembler
<stekern>
(I just copied that from what other archs do, I claim no credit for it ;)
<sb0>
so, with clang, the performance is essentially the same: 133 iterations/s, 1273324392 ticks for the 2000 iterations (was 1256780316 with gcc)
<ysionneau>
stekern: ah nice trick indeed :)
<GitHub186>
[misoc] sbourdeauducq pushed 1 new commit to master: http://git.io/DPTM4A
<GitHub186>
misoc/master 4c2a209 Sebastien Bourdeauducq: libbase: remove crt during make clean
<ysionneau>
gn8
nicksydney has quit [Remote host closed the connection]
sh[4]rm4 has joined #m-labs
sh4rm4 has quit [Ping timeout: 252 seconds]
sh[4]rm4 is now known as sh4rm4
mumptai has quit [Ping timeout: 255 seconds]
sh4rm4 has quit [Remote host closed the connection]
sh4rm4 has joined #m-labs
<sb0>
stekern, any idea why TargetRegistry::lookupTarget would fail with "No available targets are compatible with this triple, see -version for the available targets." when passed any CPU type?
<sb0>
llc -version does list or1k
<sb0>
seriously, llvm arch management code is a gnu/autocrap-level fuckup
<sb0>
and they used C++ for it... only C or asm would have been worse
<sb0>
ah, that's because in that particular version of llvm, that stupid function ignores whatever arch you specify and uses the default (x86)