<tpb>
Title: GitHub - tommythorn/Reduceron: FPGA Haskell machine with game changing performance. Reduceron is Matthew Naylor, Colin Runciman and Jason Reichs high performance FPGA softcore for running lazy functional programs, including hardware garbage collection. Reduceron has been implemented on various FPGAs with clock frequency ranging from 60 to 150 MHz depending on the FPGA. A high degree of parallelism allows Reduceron to implement graph
<shapr>
yikes
<qu1j0t3>
holy shit.
<shapr>
did not think the title would be that large
<qu1j0t3>
i'm glad it was.
<qu1j0t3>
i am very pleased to know about this. Relevant to my interests
<qu1j0t3>
i wonder how small an fpga it can fit in.
<shapr>
don't know
<qu1j0t3>
i'd better study this one day when i get a vacation
<shapr>
but last I heard the prototype required you to build the program in when you compiled it for the FPGA
<shapr>
also, graph reduction CPUs require *much* more memory bandwidth than every day CPUs
<shapr>
but also allow parallel reduction
<sorear>
modern architectures are close to the point where you are generically starved for memory bandwidth rather than available ALUs
<shapr>
yup
<shapr>
sorear: yeah, thanks for making that explicit
<shapr>
I realize I should have said that
<shapr>
sorear: do you see a solution for that in the near future?
<pie_>
i havent heard of reconfigure.io
<pie_>
shapr, i wouldnt know but the only thing i can imagine helpin offhand is spatially local memory
<pie_>
problem there probably is...its spatially local
<pie_>
(..oh wait fpga already do this?)
<shapr>
NUMA systems?
<pie_>
(but its not a lot of memory)
<shapr>
yeah, I just don't know.
<pie_>
disclaimer i have no fucking clue what im talking about
<shapr>
heh
<sorear>
spatially local memory is the only thing you can do, and what we've been doing for years (caches move data closer to the cores that use them)
<sorear>
designs with scrachpads and explicit message passing can use less bandwidth but are much harder to program
<shapr>
more lanes from memory to CPU? reduce latency via alien magic? what's the fix?
<sorear>
adding more hardware means that your program executes using the same amount of memory in half the time
<sorear>
s/memory/energy/
<pie_>
oh reconfigure.io is go, meh
<shapr>
pie_: but they are making money using a Go -> Haskell -> FPGA setup, so that's cool
<pie_>
true true
<sorear>
silicon photonics will make communicating with memory cheaper and is worth followin
<pie_>
ive gone from maybe ill get around to playing with clash in the next few months to next few weeks (???)
<sorear>
(IMO silicon photonics is most interesting in the high end, since once your data is on fiber instead of copper you can send it a km instead of 20cm with no increase in energy per bit)
<pie_>
hm
<pie_>
oh god
<pie_>
fucking christ
<pie_>
i can just imagine pipelining involving "data on wire lag"
<shapr>
sorear: I hope we find a solution soon
<pie_>
so uhh propagation speed
<pie_>
lol
<pie_>
do we already use that?
<sorear>
yes
<sorear>
a bit in 10G ethernet is 2cm long
<pie_>
actually not really related, qu1j0t3 there was that 74xx computer (iirc?) thread where the dude was doing crazy shit like that? (hmm doesnt sound quitre right)
<pie_>
(might have just been stabilizatoin related stuff)
<pie_>
sorear, true, im not sure what i was going fore
<pie_>
*fore
<pie_>
...
<pie_>
*for
<sorear>
wires *inside* chips are kinda bad transmission lines, they have too much resistance and too little inductance
<pie_>
i guess i was thinking more in-cpu
<pie_>
i see
<pie_>
also, clocks
<sorear>
so you can't really have multiple bits in flight at once, but longer wires do take longer to switch
<pie_>
i see
<sorear>
unless you have buffers ;)
<qu1j0t3>
pie_: The 6502 on a breadboard thing?
<qu1j0t3>
pie_: i thikn i still have that open in a browser window somewhere...
<pie_>
ah yeah that was it
<qu1j0t3>
he made a big deal out of his graphics engine
* sorear
needs to dig into the reduceron at some point and understand what its secret sauce is
<qu1j0t3>
i guess i'd be pretty excited if my breadboard were that big, and if i got off my ass, it should be
<qu1j0t3>
sorear: update us with what you find out. it looks pretty interesting to me.
<pie_>
i googled crazy 6502 breadboard lol
<qu1j0t3>
pie_: :D
<pie_>
qu1j0t3, kind of makes me thing of those big mixing tables and walls of modular synths or whatever
<pie_>
just get a table where the whole table surface is breadboard
<pie_>
man that sounds like it would be pretty cool
<qu1j0t3>
LOL
<pie_>
albeit immobile :P
<qu1j0t3>
hahaha why not a whole room
<pie_>
that makes no sense? :P
<qu1j0t3>
pie_: you're giving me ideas for my Future Lab
<qu1j0t3>
yeah it does
<pie_>
err, sorry for the hijack anyway
<qu1j0t3>
its like a pad of breadboards
<qu1j0t3>
don't unplug, just move 2 feet right
<qu1j0t3>
and i have enough scopes to have one every six feet
<pie_>
haha
<qu1j0t3>
not a whole wall, just a strip at working height.
<qu1j0t3>
or all along a bench
<pie_>
oh
<pie_>
i see
<qu1j0t3>
p. cool idea pie_++
<pie_>
well yeah thats basically the whole table breadboard?
<pie_>
sorear, im not sure how this is related to the reduceron but iirc i was looking into it at the same time as this other thing, something about graph IRs (might be just something i made up), uhhh...what was it called
<awygle>
But actually you probably want something closer to raid 1
<pie_>
isnt that what ddr does? (i mean ive no idea)
<pie_>
isnt the problem be the bus bandwidth?
<pie_>
s/be//
<sorear>
awygle: my laptop splits each cache line between 8 RAM chips
<kc8apf>
RAM is already heavily interleaved
<awygle>
I did a bad job explaining myself lol
<pie_>
maybe :P
<pie_>
disparity of mental models
<sorear>
What I don't understand is why they did "each cache line takes 8 bytes from each of 8 RAM chips" and not "each cache line is in one RAM chip, and 8 of them can process 8 misses concurrently"
<pie_>
sorear, interaction combinators is what is was thinking of
* pie_
actively seeks weird (cool) shit but doesnt have the background knowledge to proces it (yet)
<awygle>
pie_: I feel like you'd like Papers We Love
<qu1j0t3>
awygle | The Crazy Loper Guy || Stan. He has a name. Stan. :-)
<qu1j0t3>
he;s on irc, too
<qu1j0t3>
he's done a drive-by of a channel i'm in
<pie_>
qu1j0t3, he sounds like you two would get along well :P
<pie_>
well ideologically anyway
<qu1j0t3>
wow, not sure whetehr to be flattered or not
<pie_>
well considering youve corrupted me quite a lot, probably the former
<qu1j0t3>
no, i like to think Stan is way farther out on the John Titor scale than i am
<pie_>
yeah i didnt think you were that far gone :P
<pie_>
(im sorry whos john titor)
<qu1j0t3>
thank goodness
<awygle>
"John Titor is a name used on several bulletin boards during 2000 and 2001 by a poster claiming to be an American military time traveler from 2036"
<awygle>
Wow, that's amazing.
<awygle>
"Subsequent closer examination of Titor's assertions provoked widespread skepticism" never change, Wikipedia
<qu1j0t3>
[Citation jet needed]
<pie_>
lol
<pie_>
john titor scale. hah.
* pie_
chuckles
<pie_>
sorear, pls make an IR for yosys that isnt vhdl :P
<sorear>
yosys already has an IR and it's not vhdl
<sorear>
i don't know much more about it
<pie_>
oh ok i must have confused something
emeb has quit [Quit: Leaving.]
m_w_ has quit [Quit: Leaving]
dys has quit [Ping timeout: 260 seconds]
digshadow has quit [Ping timeout: 264 seconds]
digshadow has joined #yosys
<pie_>
re: the ram speed stuff, the problem is bus speed right? so if you want more speed you make more bus, but i imagine that has its issues (space?) , so more bus with less bus is spatial localization
<pie_>
i never did look into how inifiniband works
<pie_>
s/localization/localization. right?/
<sorear>
pie_: the problem that doesn't go away no matter what you do is energy
<pie_>
how does that come up
<pie_>
well besides the obvious clock speed power dissipation problem, but i dont see how that would apply here
<sorear>
there's a statutory limit on the largest battery you can carry undeclared onto a US airplane
<sorear>
since people like 8 hour battery life and they also like being able to fly with laptops, you're limited to 15-30W for a laptop
<sorear>
also if you go much above that you start burning people's laps
<pie_>
but i dont see what that has anything to do with this :x
<pie_>
unless its just more memory bandwidth => faster cpu => more heat
formalnewb has joined #yosys
<sorear>
pie_: traces on a PCB have a lot of capacitance, and driving them at multiple GHz requires a *significant fraction of total system power*
<pie_>
aha
<pie_>
til 0.o
<sorear>
twice as many memory channels? twice the capacitance, twice the power
<pie_>
is that why bus clocks are relatively low
<sorear>
twice the data rate on the same number of wires? twice the frequency, twice the current, twice the power
<pie_>
(or just one of the many reasons)
<pie_>
well mumble mumble parasitics
<pie_>
ideal finite elements pls
<sorear>
I'm not quite sure? GDDR5 runs a much higher frequency over a smaller number of data lines compared to DDR4, and I wish I understood the tradeoffs better
<sorear>
as a general rule GPU hardware is more sensitive to BW and less sensitive to latency than CPU latency, which must be the driver of the differences, but I don't always see how to get from there to here
<shapr>
sorear: I have a scar from this laptop
<formalnewb>
how experienced in hdl design do you have to be to say you are an entry level professional?
<formalnewb>
i've designed a UART and some other basic stuff but i dont feel like ive done anything really complicated
<awygle>
i mean, you can call yourself whatever you want lol
<awygle>
i would say that's enough to apply for entry level _jobs_ if that's what you're asking
<formalnewb>
yeah pretty much
<formalnewb>
i mean ive done more complex stuff for my current work
<formalnewb>
its just usually i have a senior engineer helping architect and i implement it
formalnewb has quit [Quit: Page closed]
kraiskil has joined #yosys
rqou has quit [Remote host closed the connection]
rqou has joined #yosys
pie_ has quit [Ping timeout: 264 seconds]
GuzTech has joined #yosys
AlexDaniel has joined #yosys
pie_ has joined #yosys
SpaceCoaster has quit [Ping timeout: 245 seconds]
SpaceCoaster has joined #yosys
AlexDaniel has quit [Ping timeout: 248 seconds]
pie_ has quit [Ping timeout: 240 seconds]
GuzTech has quit [Ping timeout: 248 seconds]
kraiskil has quit [Ping timeout: 276 seconds]
leviathan has joined #yosys
pie_ has joined #yosys
GuzTech has joined #yosys
leviathan has quit [Remote host closed the connection]
leviathan has joined #yosys
cemerick has joined #yosys
AlexDaniel has joined #yosys
kraiskil has joined #yosys
kishore has joined #yosys
<kishore>
module dff (clk,reset, q, d); input clk, reset, d; output q; reg q; always @ (posedge clk ) begin if (reset == 1) q <= 0; else q <= d; end endmodule
pie_ has quit [Ping timeout: 256 seconds]
pie_ has joined #yosys
kishore has quit [Quit: Page closed]
proteus-guy has joined #yosys
FabM has quit [Quit: ChatZilla 0.9.93 [Firefox 52.5.0/20171114221957]]
eduardo_ has joined #yosys
eduardo__ has quit [Ping timeout: 260 seconds]
<ZipCPU>
awygle: (Responding to reddit) A* was always my favorite pathfinding algorithm from college. There's a nice wikipedia page on it.
<awygle>
Yes, I like it too
<awygle>
A nice, uncomplicated extension of djikstras
<awygle>
Which yields a significant improvement
<ZipCPU>
Would you believe I've tried to use a variant of the A* algorithm for determining computer moves in Checkers?
<awygle>
That sounds reasonable
<ZipCPU>
While the resulting strategy probably wasn't good enough for a grand master, and while it was quite memory intensive, it was good enough to beat my instructor--who wasn't expecting the game I was writing to be nearly as challenging.
cemerick_ has joined #yosys
cemerick has quit [Ping timeout: 240 seconds]
cemerick_ has quit [Ping timeout: 256 seconds]
kraiskil has quit [Quit: Leaving]
swick has quit [Ping timeout: 248 seconds]
swick has joined #yosys
<shapr>
ZipCPU: nice!
<ZipCPU>
Mornin, shapr!
<shapr>
mornin ZipCPU!
<shapr>
I'm in DC the rest of the week, should be a fun adventure.
<ZipCPU>
Especially with the weather!
<ZipCPU>
We're not all that far away. Once the weather clears, feel free to holler at we might have a chance to meet if you'd like.
<shapr>
sure, we're flying back to Atlanta on Sunday
<shapr>
My gf is doing capstone stuff for her master's degree, so I'll not have much to do.
<ZipCPU>
:D
AlexDaniel has quit [Ping timeout: 240 seconds]
seldridge has joined #yosys
clifford has quit [Read error: Connection reset by peer]
<ZipCPU>
promach_: You realize you are programming hardware, not software, right?
<promach__>
ZipCPU: I do realized that
<promach__>
but ...
<ZipCPU>
Problem #1: The blocking assignment operator (=) is unrealiable in simulation and should be avoided where possible
<promach__>
it is formal verification, not simulation
<ZipCPU>
Problem #2: The arrays you are setting are actually a set of wires. They need to be set on all clocks. (You might wish to set them using assign statements instead.)
<promach__>
yeah, true
<ZipCPU>
There is also a "keep" attribute that you (might) find valuable, but I'd fix the other problems first.
<ZipCPU>
promach__: Try this. Remove the assert from the same always @(*) block as the assignment. Then make certain that the value is *always* assigned.
<promach__>
ZipCPU: remove the assert() ? what ?!
<ZipCPU>
Place it into its own always@(*) block
<ZipCPU>
Separate it from the logic creating it.
<promach__>
I just commented the assert() line and it does not help
<ZipCPU>
BTW: It's illegal to set a "wire" from within an always block ...
<ZipCPU>
You need to use an "assign" statement for such values.
<ZipCPU>
Worse, you are only setting this value on some clocks, not all clocks. This means you are inferring a latch and ... may not be getting the logic you are intending.
<promach__>
i see
<promach__>
okay
<ZipCPU>
so, perhaps you want an: assign Tx_shift_reg_assertion[Tx_shift_reg_index] = (Tx_shift_reg_index > (INPUT_DATA_WIDTH-cnt)) || (!transmission_had_started) || (!UART_is_transmitting) ||(shift_reg[Tx_shift_g_index] == i_data[i_data_index[Tx_shift_reg_index]));
<ZipCPU>
Even better ... you want variable names that will fit within an 80 column display when using 8-space tabs for indentation. :P
<promach__>
what do you mean by 80 column display ?
<ZipCPU>
I mean my terminal has only 80 columns across, and hence when I try to examine your code within vi ... :P
<promach__>
mine one is 211x51 dimension
AlexDaniel has quit [Ping timeout: 256 seconds]
<promach__>
ZipCPU: I am not really familiar with your predicate logic simplication mechanism
<ZipCPU>
if (A) assert(B); is the same as assert((!A)||(B));
<ZipCPU>
Is your code posted?
proteus-guy has quit [Remote host closed the connection]
proteus-guy has joined #yosys
<promach__>
ZipCPU: one moment
<promach__>
Warning: Identifier `\Tx_shift_g_index' is implicitly declared at ../rtl/test_UART.v:149.
<promach__>
Warning: Wire test_UART.\Tx_shift_g_index is used but has no driver.