alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - - Logs - <daniels> avoiding X is a huge feature
_whitelogger has joined #panfrost
tgall_foo has joined #panfrost
_whitelogger has joined #panfrost
_whitelogger has joined #panfrost
_whitelogger has joined #panfrost
mifritscher has quit [Ping timeout: 258 seconds]
mifritscher has joined #panfrost
_whitelogger has joined #panfrost
_whitelogger has joined #panfrost
<anarsoul> alyssa: hey
<anarsoul> alyssa: have you looked into introducing a CAP to indicate that driver doesn't need in-memory zsbuf when rendering for scanout?
<alyssa> anarsoul: Memory usage hasn't been high prio tbh
<alyssa> So it'd be nice, but no, no plans to do so
<anarsoul> alyssa: it's also a waste of memory bandwidth
_whitelogger has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
_whitelogger has joined #panfrost
raster has joined #panfrost
TheCycoTWO has quit [Ping timeout: 244 seconds]
TheCycoONE has joined #panfrost
stikonas has joined #panfrost
_whitelogger has joined #panfrost
<alyssa> anarsoul: No? You're not actually writing to it (you disable that CAP or no CAP), the issue is just the unused block.
<alyssa> So yes, it's a waste, but not for bw (at least not in 'frost)
raster has quit [Remote host closed the connection]
<HdkR> ack! Cat jumped on computer desk and spooked me
<HdkR> She's exploring the shelves in my storage room :D
<alyssa> Meow.
<anarsoul> alyssa: oh, so you're not setting it for scanout?
<alyssa> anarsoul: Correct
<alyssa> Well, I'm setting it but the hw doesn't touch it so it's 0 bw impact
<anarsoul> alyssa: got it. Now I'm doing the same and it gives me 2fps in 'shadow' scene (14 vs 16)
<alyssa> anarsoul: Nice :)
<alyssa> Probably the extra allocation isn't affecting perf much so
<anarsoul> well, it would be nice to fix it as well to save 8mb
<alyssa> Sure
Lyude has quit [Quit: WeeChat 2.2]
Lyude has joined #panfrost
<anarsoul> alyssa: do you know if there's a common nir lowering pass to lower fsin/fcos? Looks like GP on utgard can't do it
<alyssa> anarsoul: Lower to..?
<anarsoul> polynomial?
<alyssa> anarsoul: ....You need to lower it to a polynomial? Ouch.
<alyssa> I'm not aware of a pass for that, no
<alyssa> I guess it's not too hard to emulate yourself, go back to high school math ;P
<anarsoul> well, maybe not
<anarsoul> let me see what blob does
<alyssa> anarsoul: If you do need to lower to a polynomial, I mean, the Maclaurin series will be easy enough to implement via nir_builder
<alyssa> x - x^3/6 + x^5/120 - x^7/... or something
<alyssa> Although, even better, there's fancy games you can play to keep the multiplications down, I don't remember the name of the technique offhand
<alyssa> anarsoul: Wikipedia diving says the word I was looking for was "Horner's method"
<alyssa> Bear in mind I don't have any numerical analysis background so I'm probably talking uack
<anarsoul> alyssa: thanks, I'll try to poke output of offline compiler first
<alyssa> anarsoul: Probably fair -- there's a good chance you may have ops you don't know about yet
jolan has quit [Quit: leaving]
<cwabbott> anarsoul: iirc, that was just handled by a huge polynomial
<alyssa> cwabbott: Oh, hi!
jolan has joined #panfrost
<cwabbott> alyssa: hi!
<HdkR> HI!
* HdkR needs more tea
<anarsoul> cwabbott: thanks
<anarsoul> cwabbott: and there's no nir pass for that, is there?
<cwabbott> anarsoul: sadly, no
<alyssa> anarsoul: Have fun :P
<cwabbott> GP was the only thing crazy enough not to have dedicated sin/cos acceleration
<alyssa> Hey, I kinda think implementing that would be fun!
<anarsoul> alyssa: probably means that mesa doesn't support hardware with this level of sanity yet :)
<alyssa> But I'll let anarsoul have that pleasure :P
<anarsoul> cwabbott: but they have log and exp! :)
<cwabbott> yeah, crazy right :)
<anarsoul> looks like vc4 does something like that, but not with nir pass
<anarsoul> probably anholt had this reason to implement it like this
<cwabbott> well, that reason could've just been "no one else will need to do this" for all we know
<anarsoul> or it just was there before he converted vc4 to nir
<anarsoul> cwabbott: alyssa: do you know if there's input range for sin/cos in glsl? I.e. what will happen if I pass 4*PI to sin? Is it expected to return the same as sin(0)?
<cwabbott> anarsoul: yeah, there are some precision limitations but it should be around 0
<cwabbott> if you dump the blob's output, you'll see they do some range reduction before the polynomial
<anarsoul> cwabbott: I'm not used yet to mbs_dump output for vertex shader :)
<cwabbott> anarsoul: iirc there's a decompile option that will give you a much saner output
<cwabbott> trying to read raw GP assembly is... not fun
<anarsoul> ouch
<anarsoul> that's a bit longer than I expected
<cwabbott> it duplicates common subexpressions, so it can get a bit long
<anarsoul> you mean decompiler?
<cwabbott> yeah
<anarsoul> well, I'm pretty sure I can use anholt's vc4 code as a reference
<alyssa> cwabbott: *Resisting urge to write Midgard decompiler intensifies*
<anarsoul> decompiler? I thought that midgard assembly is sane enough
<alyssa> anarsoul: It definitely is, but that doesn't make decompiler authoring super enticing regardless ;P
<anarsoul> what's the difference between nir_op_flt and nir_op_slt?
<alyssa> Ask in dri-devel?
<anarsoul> OK, it compiles. Will see if it works in ~30mins (need to compile it on device now)
<alyssa> Ook
<alyssa> anarsoul: I love how #panfrost became #lima? :P
<anarsoul> :)
<anarsoul> there's no one else to ask on weekend
<alyssa> Ah, right, but I have no life so you can ask here, got it :)
<anarsoul> haha, do you imply I have no life either? :)
<alyssa> anarsoul: You're working on lima on a Sunday too ^_^
<anarsoul> it's raining here
<alyssa> Uh-huh ;)
<anarsoul> so I have an excuse :P
<alyssa> So what's if it's us, it's us and only us? And what came before, what count anymore or matter, can we thaaaat?
<anarsoul> darn, [jellyfish] <default>:gpir: unsupported nir_op: flog2
<alyssa> Ack!
stikonas has quit [Remote host closed the connection]
mifritscher has quit [Ping timeout: 252 seconds]
<anarsoul> OK, I'm a bit puzzled why nir_alu_type_get_type_size() returns 1 for fmul
<anarsoul> and as result for the very first fmul I get glmark2-es2-drm: ../src/compiler/nir/nir_builder.h:413: nir_build_alu: Assertion `src_bit_size == nir_alu_type_get_type_size(op_info->input_types[i])' failed.
<alyssa> anarsoul: More informatino please
<anarsoul> first nir_fmul_imm() throws this assertion
<alyssa> anarsoul: Sample source NIR shader
<alyssa> ?
<anarsoul> how do I print it?
<alyssa> nir_print_shader or something
<alyssa> Probably lima has an env flag for it
<alyssa> (Panfrost has MESA_MIDGARD_DEBUG=shaders)
<alyssa> anarsoul: General nitpick... nir_builder can make a lot of this code easier I think
<anarsoul> but it already uses nir_builder
<alyssa> Wait, you use that stuff there
<alyssa> Nvm
* alyssa eyes
<alyssa> This doesn't make sense :(
<anarsoul> interesting, it gets into lower_sin() twice
<anarsoul> well, there're 2 sins, so it's expected
<alyssa> Yeah
fysa has joined #panfrost
mifritscher has joined #panfrost