FFY00 has quit [Remote host closed the connection]
FFY00 has joined #nmigen
FFY00 has quit [Excess Flood]
FFY00 has joined #nmigen
esden has quit [Ping timeout: 264 seconds]
esden has joined #nmigen
jjeanthom has quit [Ping timeout: 272 seconds]
jaseg has quit [Ping timeout: 272 seconds]
<d1b2>
<EmilJ> vup: I think the critical warnings are a symptom of something bad. It is complaining that, for example, a 32-bit register wire [31:0] \$1071 ; is fed with 10-bit init value, but it only expects the init value to be 1-bit
<vup>
yep agreed, however my verilog is not good enough to actually figure out what is going wrong
<FL4SHK>
tpw_rules: so I think floating point operations in general are just going to be stuff tacked onto integer operations
<FL4SHK>
with this in mind
<FL4SHK>
I don't think an FPU is going to end up being very big
<tpw_rules>
which is why the sentiment "does anybody know why nobody's used a fast float divider to make a fast integer divider" confuses me
<FL4SHK>
well
<FL4SHK>
I said that
<FL4SHK>
because I was told you could maybe do a float divide in six cycles?
<FL4SHK>
I just pinged ZipCPU
<FL4SHK>
he's the guy who told me it's possible to do float divides quickly
<FL4SHK>
I'm not currently sure it's possible to do IEEE float operations correctly if they're that fast
<tpw_rules>
also who gets to define how long a cycle lasts
<FL4SHK>
hmmmm
<FL4SHK>
well, he was talking about using, perhaps, Newton-raphson division
<FL4SHK>
and I'm not sure you're guaranteed correct results with something like that...
<d1b2>
<EmilJ> vup: also rebuilding with the FIFO gave slightly worse slack :D it's slacking 2ns on things like hdmi/fifo/fifo/unbuffered/storage_reg/CLKARDCLK, hdmi/hdmi/encoder_b/n1q_m_reg[1]/C, hdmi/hdmi/timing_generator/x_reg[2]/C. Those aren't relevant to pushing data fast?
<FL4SHK>
tpw_rules: so I was told that FPUs tend to take up a lot of LUTs
<FL4SHK>
and if you do the integer unit plus extra stuff on top business, I just don't see it being a problem
<tpw_rules>
i mean afaik it's been that way since the dawn of time
<FL4SHK>
then 20 cycles should be reasonable for my float32 divide
<FL4SHK>
it should end up being about like that if I use my long udiv thing I'm currently implementing
<FL4SHK>
that thing takes one cycle to compute three quotient bits
<sorear>
rocket-chip uses newton-raphson for float division, with some funny control logic where you can have multiple in-flight fdivs
<FL4SHK>
does it improve speed?
<FL4SHK>
my understanding is that it does
<FL4SHK>
I was thinking of doing Newton-raphson for *integer* division
<sorear>
alpha and ia64 had no integer divide hardware, software was expected to convert to float before dividing anything, ia64 also required software newton-raphson steps (although the loop logic was partly hardware)
<FL4SHK>
but I am inexperienced with the stuff I need to do to do things correctly
<FL4SHK>
well
<FL4SHK>
My thinking is, if you can do Newton-raphson for floats, you can do it for fixed point
<FL4SHK>
and if you have fixed point, you've already got an integer
<FL4SHK>
all you need to do is shift...
<sorear>
the key thing to understand rocket-chip is that the mandate of the research project was to develop a vector microprocessor, the scalar core is cobbled together out of parts primarily designed for vector use
<Lofty>
sorear: I thought 32-bit integer multiply was performed as double-precision float, which can store all 32-bit integers precisely?