<anarsoul>
alyssa: have you looked into introducing a CAP to indicate that driver doesn't need in-memory zsbuf when rendering for scanout?
<alyssa>
anarsoul: Memory usage hasn't been high prio tbh
<alyssa>
So it'd be nice, but no, no plans to do so
<anarsoul>
alyssa: it's also a waste of memory bandwidth
_whitelogger has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
_whitelogger has joined #panfrost
raster has joined #panfrost
TheCycoTWO has quit [Ping timeout: 244 seconds]
TheCycoONE has joined #panfrost
stikonas has joined #panfrost
_whitelogger has joined #panfrost
<alyssa>
anarsoul: No? You're not actually writing to it (you disable that CAP or no CAP), the issue is just the unused block.
<alyssa>
So yes, it's a waste, but not for bw (at least not in 'frost)
raster has quit [Remote host closed the connection]
<HdkR>
ack! Cat jumped on computer desk and spooked me
<HdkR>
She's exploring the shelves in my storage room :D
<alyssa>
Meow.
<anarsoul>
alyssa: oh, so you're not setting it for scanout?
<alyssa>
anarsoul: Correct
<alyssa>
Well, I'm setting it but the hw doesn't touch it so it's 0 bw impact
<anarsoul>
alyssa: got it. Now I'm doing the same and it gives me 2fps in 'shadow' scene (14 vs 16)
<alyssa>
anarsoul: Nice :)
<alyssa>
Probably the extra allocation isn't affecting perf much so
<anarsoul>
well, it would be nice to fix it as well to save 8mb
<alyssa>
Sure
Lyude has quit [Quit: WeeChat 2.2]
Lyude has joined #panfrost
<anarsoul>
alyssa: do you know if there's a common nir lowering pass to lower fsin/fcos? Looks like GP on utgard can't do it
<alyssa>
anarsoul: Lower to..?
<anarsoul>
polynomial?
<alyssa>
anarsoul: ....You need to lower it to a polynomial? Ouch.
<alyssa>
I'm not aware of a pass for that, no
<alyssa>
I guess it's not too hard to emulate yourself, go back to high school math ;P
<anarsoul>
well, maybe not
<anarsoul>
let me see what blob does
<alyssa>
anarsoul: If you do need to lower to a polynomial, I mean, the Maclaurin series will be easy enough to implement via nir_builder
<alyssa>
x - x^3/6 + x^5/120 - x^7/... or something
<alyssa>
Although, even better, there's fancy games you can play to keep the multiplications down, I don't remember the name of the technique offhand
<alyssa>
anarsoul: Wikipedia diving says the word I was looking for was "Horner's method"
<alyssa>
Bear in mind I don't have any numerical analysis background so I'm probably talking uack
<anarsoul>
alyssa: thanks, I'll try to poke output of offline compiler first
<alyssa>
anarsoul: Probably fair -- there's a good chance you may have ops you don't know about yet
jolan has quit [Quit: leaving]
<cwabbott>
anarsoul: iirc, that was just handled by a huge polynomial
<alyssa>
cwabbott: Oh, hi!
jolan has joined #panfrost
<cwabbott>
alyssa: hi!
<HdkR>
HI!
* HdkR
needs more tea
<anarsoul>
cwabbott: thanks
<anarsoul>
cwabbott: and there's no nir pass for that, is there?
<cwabbott>
anarsoul: sadly, no
<alyssa>
anarsoul: Have fun :P
<cwabbott>
GP was the only thing crazy enough not to have dedicated sin/cos acceleration
<alyssa>
Hey, I kinda think implementing that would be fun!
<anarsoul>
alyssa: probably means that mesa doesn't support hardware with this level of sanity yet :)
<alyssa>
But I'll let anarsoul have that pleasure :P
<anarsoul>
cwabbott: but they have log and exp! :)
<cwabbott>
yeah, crazy right :)
<anarsoul>
looks like vc4 does something like that, but not with nir pass
<anarsoul>
probably anholt had this reason to implement it like this
<cwabbott>
well, that reason could've just been "no one else will need to do this" for all we know
<anarsoul>
or it just was there before he converted vc4 to nir
<anarsoul>
cwabbott: alyssa: do you know if there's input range for sin/cos in glsl? I.e. what will happen if I pass 4*PI to sin? Is it expected to return the same as sin(0)?
<cwabbott>
anarsoul: yeah, there are some precision limitations but it should be around 0
<cwabbott>
if you dump the blob's output, you'll see they do some range reduction before the polynomial
<anarsoul>
cwabbott: I'm not used yet to mbs_dump output for vertex shader :)
<cwabbott>
anarsoul: iirc there's a decompile option that will give you a much saner output
<cwabbott>
trying to read raw GP assembly is... not fun
stikonas has quit [Remote host closed the connection]
mifritscher has quit [Ping timeout: 252 seconds]
<anarsoul>
OK, I'm a bit puzzled why nir_alu_type_get_type_size() returns 1 for fmul
<anarsoul>
and as result for the very first fmul I get glmark2-es2-drm: ../src/compiler/nir/nir_builder.h:413: nir_build_alu: Assertion `src_bit_size == nir_alu_type_get_type_size(op_info->input_types[i])' failed.