alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - Transientification is terminating. Memory reductions in progress.
<alyssa> Wonder if I should push my monochromatic thing to planet.fd.o
<alyssa> It's not Panfrost but it is graphics so
stikonas has quit [Remote host closed the connection]
* alyssa looks into sign()
<alyssa> Some wackiness
<HdkR> Nice function for getting a multiplier depending on sign-ness
<alyssa> HdkR: I meant wackiness in midgard :p
<HdkR> ah
<HdkR> a couple conditional selections I presume?
<alyssa> Not really
<HdkR> minmax?
<alyssa> A pair of unknown ops
<alyssa> sign(x) = -alu_op_2E(-alu_op_18(x, inf), -0.0f)
<alyssa> If you care to explain how a negative zero snuck in there, I'm listening ;P
<HdkR> Could use the sign of the constant to choose which direction it wants to go
<HdkR> two instructions to emulate it isn't bad though
<HdkR> Nvidia does 3 or 10 depending :P
<alyssa> I mean
<alyssa> The obvious explanation for - zero is that we're indexing the sign bit
<HdkR> yep
<alyssa> But I don't know that I buy it, since this works on fp32s and the constants are encoded as fp16
<HdkR> Could just be a derp
<alyssa> And these are floating point ops, not integer ops
<HdkR> Same ops if it is a float sign versus integer sign?
* alyssa checks
<alyssa> No matching overload for function 'sign'
<HdkR> #version 300 es?
<alyssa> Yupyup
<HdkR> ...That's a bug in their shader compiler then
<HdkR> Just checked spec and it exists
<HdkR> `genIType sign (genIType x)`
<alyssa> Second form is only there for 300
<HdkR> Yea, that's why I asked for `#version 300 es` :P
<HdkR> were you assigning to a float still? Since it returns integer now
<alyssa> For the integer sign function, it's simple min/max
<alyssa> Note: we know how fmin/fmax are, and it's not that
<HdkR> Curious
<alyssa> Though the ops do look superficially similar
<alyssa> fmin = 0x28
<alyssa> fmax = 0x2C
<alyssa> unknown 1 = 0x18
<alyssa> unknown2 = 0x2E
<alyssa> The "8"s are probably just coincidence, the latter pair could be saying something tho
<HdkR> Maybe an op that explicitly ignore nans?
<HdkR> :shruggie:
<alyssa> hm
<alyssa> Kind of wishing I had compute shaders ;P
<HdkR> Output them locally and see what they return? :D
<alyssa> Kind of a complex process but aight, here goes
<HdkR> Woo RE
<alyssa> No I just
<alyssa> Have a convoluted dev environmnet
<alyssa> Rebuilding mesa intensifies
marcodiego has quit [Ping timeout: 250 seconds]
<HdkR> Intense!
<alyssa> Well, this is... something
<alyssa> Our two unknown ops work exactly like fmin/fmax
<alyssa> So we have
<alyssa> sign(x) = -max(-min(x, inf), -0.0f)
<alyssa> But why on earth would you do min(x, inf)
<alyssa> But hypothetically, uh
<alyssa> sign(1) = -max(-min(1, inf), -0.0f) = -max(-1, -0) = -(-0) = 0
<alyssa> sign(0) = -max(-min(0, inf), -0.0) = -max(-0, -0) = -(-0) = 0
<alyssa> sign(-1) = -max(-min(-1, inf), -0.0) = -max(-(-1), -0.0) = -max(1, -0.0) = -1
<alyssa> :blink:
<HdkR> weird, sign(1) should return 1
<alyssa> Maybe it has to do with handling of negatives
<alyssa> Like, our new min choose based on smaller absolute value, rather than smaller value
<alyssa> I.e.
<HdkR> hm, min/max ignoring sign
<alyssa> sign(1) = -max(-min(1, inf), -0.0) = -max(-1, -0.0) = -(-1) = 1
<alyssa> sign(0) = -max(-min(0, inf), -0.0) = -max(-0, -0) = -(-0) = 0
<alyssa> sign(-1) = -max(-min(-1, inf), -0.0) = -max(-(-1), -0.0) = -max(1, -0.0) = -1
<HdkR> Would make sense
<alyssa> It's self-consistent, at least
<alyssa> And explains why there are multiple ops
<alyssa> What I don't quite get is what the min(x, inf) is for
<alyssa> Is it enough to do, uh
<alyssa> -max(-x, -0)
<alyssa> sign(1) = -max(-1, -0) = -(-1) = 1
<alyssa> sign(0) = -max(-0, -0) = -(-0) = 0
<alyssa> sign(-1) = -max(-(-1), -0) = -max(1, -0) = -1
<alyssa> What are they possibly trying to do with min(x, inf)
<alyssa> You can't have greater than infinity :P
<alyssa> Unless it's like
<alyssa> Erm wait
<alyssa> sign(2) = -max(-min(2, inf), -0.0) = -max(-2, -0) = -(-2) = 2
<alyssa> Not helpful :p
<alyssa> In the above formulation of min/max, sign(x) is just the identity function, so something more is at play
* alyssa laments lack of compute shaders, grah
<HdkR> SSBOs would work as well
<alyssa> Bah
<alyssa> But yeah, with fancy max,
<alyssa> max(in, 0.2)
<alyssa> Er
<alyssa> max(in, -0.2)
<alyssa> = in when abs(in) > 0.2, -0.2 otherwise
<alyssa> Similarly
<alyssa> max(-in, 0.5)
<alyssa> = -in when abs(in)> 0.5, 0.5 otherwise
<alyssa> Of course, recreating the pair of magic instructions in GLSL does do a fsign, but..
<alyssa> But why? :P
<HdkR> :D
<alyssa> Alright, with 0 < in < 1
<alyssa> min(in, 1.0/0.0) is infinite
<alyssa> Wat
<alyssa> Whaa?
<alyssa> Is min doing a... multiply?
<alyssa> It is definitely doing a multiply
<alyssa> --------And suddenly the mysteries start making sense
<alyssa> Okay, so,
<alyssa> min(a, b)
<alyssa> where both a,b positive = a*b
<alyssa> min(-a, b) = -min(a, b) = -a*b
<alyssa> Okay what
<alyssa> how is this not just a multiply now
<alyssa> K, let's suppose for a sec this is just a multiply
<alyssa> max appears to at least be an actual max
<alyssa> Yeah, max appears to be the max_abs thing
<alyssa> i.e.: max_abs(x, y) = x if abs(x)>abs(y), y otherwise
<alyssa> So, wait,
<alyssa> sign(x) = -max(-mul(x, inf), -0.0f) ??
<alyssa> sign(1) = -max(-mul(1, inf), -0.0) = -max(-inf, -0.0) = inf
<alyssa> sign(0) = -max(-mul(0, inf), -0.0) = -max(-0, -0) = 0
<alyssa> sign(-1) = -max(-mul(-1, inf), -0.0) = -max(inf, -0.0) = -inf
<alyssa> Which is fine, save for that, you know, factor of infinity :V
<alyssa> But it does mean, uh
<alyssa> sign(2) = -max(-mul(2, inf), -0.0) = -max(-inf, -0.0) = +inf
<alyssa> So one of the problems is resolved
<alyssa> I rather hypothesise that max starts also acting like a multiply in some way but not sure how
<alyssa> SDfsdlkjfds
<alyssa> This isn't even self-consistent
<alyssa> I'm emitting the same set of opcodes
<alyssa> Why is it behaving different
<alyssa> HdkR: Any brilliant ideas? :P
<HdkR> hm?
<alyssa> HdkR: When I do the above ops on here (composed the way they have it -- verified via disasm), it ends up emitting infinity, not 1
<alyssa> Wonder if there's a disams bug
<HdkR> Could be. If it is the exact same instructions then why would it be different? :P
_whitelogger has joined #panfrost
<alyssa> HdkR: So I feel pretty good that the inner op could be "multiply, with 0*inf = 0"
<alyssa> (Whereas the main multiply op would NaN, I think)
<HdkR> I could see that
<alyssa> HdkR: What's sign(-0) anyway
<alyssa> 0 or -1?
<HdkR> 0 I think
<alyssa> K
<alyssa> HdkR: ...Of course, these are not things I can test without COMPUTE SHADERS
<alyssa> :V
<alyssa> I mean or floating-point render targets but
<HdkR> Yea, once you get compute then testing of results is a lot easier since you can stuff everything in to SSBOs
hanetzer has quit [Ping timeout: 244 seconds]
yann has quit [Ping timeout: 245 seconds]
<davidlt> I got more picture of Pinebook Pro, incl. close ups of PCB with WiFi chip
<davidlt> alyssa, and others, Pine guys said that they could give you Pinebook Pros if you want it for development. You just need to write them.
<davidlt> I think, I have the contacts somewhere if you want.
ph5 has quit [Quit: bye]
indy has quit [Quit: ZNC - http://znc.sourceforge.net]
indy has joined #panfrost
ph5 has joined #panfrost
BenG83 has joined #panfrost
raster has joined #panfrost
<raster> nyan
raster has quit [Remote host closed the connection]
Elpaulo has quit [Quit: Elpaulo]
Elpaulo has joined #panfrost
afaerber has joined #panfrost
raster has joined #panfrost
<mmind00> HdkR: but it looks like you don't want to share your fun: "Paste not found" :-P
<HdkR> https://paste.fedoraproject.org/paste/v~K-8iC05wGOvdF8aS9yug Oops. I didn't realize the link had a tilde in it and my GNU screen setup eats them.
tgall_foo has quit [Ping timeout: 246 seconds]
yann has joined #panfrost
<alyssa> davidlt: Alright, thank you :)
<alyssa> HdkR: Oh dear, what've you done
jernej has joined #panfrost
yann has quit [Ping timeout: 246 seconds]
ph5 has quit [Quit: bye]
BenG83 has quit [Quit: Leaving]
stikonas has joined #panfrost
ph5 has joined #panfrost
raster has quit [Remote host closed the connection]
afaerber has quit [Quit: Leaving]
<HdkR> alyssa: Having fun of course
yann has joined #panfrost
<HdkR> Was actually curious about what all was required to kick off the beginnings of a vulkan driver in mesa. Definitely some duplication of scripts and things from anv
<HdkR> Which is what radv did as well
<HdkR> Also compute is like a core feature of Vulkan. Might be nice for testing things
<HdkR> :P
<HdkR> Zink is an interesting prospect as well
belgin has joined #panfrost
<robclark> alyssa, btw, did you look at intel genxml? iirc it was generating bitpacked structs, so might be a better fit for you? If not, that kinda sucks, I might still be tempted to invent something to autogen encoding vs decoding from to avoid keeping both in sync..
belgin has quit [Quit: leaving]
<Lyude> robclark: what's this about it you don't mind me asking?
* robclark just mixing mediums and replying to email on irc :-P
<robclark> re: somehow generating cmdstream encode and decode from single hw db
<robclark> (ie. xml or whatever you care to invent.. although genxml and rnndb both use xml and that more or less seems to work ok)
tgall_foo has joined #panfrost
<cwabbott> robclark: I don't know too much about the intel thing, but the tricky thing about it is that there are lots of variable-length structs
<cwabbott> and of course, everything is done with structs pointing to other structs instead of a single cmdstream
<robclark> I guess the question is whether something can't be represented as structs.. packed structs vs what rnndb gets rid of the restriction that things are packed in multiple of 32b, which seems useful for you.. but not sure if that is all you need
<cwabbott> they're usually aligned to at least 16 bytes iirc
<cwabbott> there's the framebuffer struct, where a bit set somewhere means that there's a whole section between the main struct and the array of render targets
<robclark> I mean't whether some field can span dwords, mostly.. which is awkward w/ envytools/rnndb.
<cwabbott> oh, I've never seen that yet
<cwabbott> I suspect they use actual C structs in their driver
<cwabbott> everything is always aligned
<cwabbott> gotta have dat cmdstream building efficiency!
<robclark> anyways, when I come across a new gen, I tend to go thru many iterations of updating xml and re-running decoder + regen headers when I'm debugging things.. so keeping the two sides in sync *somehow* seems like a useful thing..
<robclark> maybe there is something for dealing w/ network protocols that would be a better fit, idk..
<krh> if you think they're using C structs, you should take a look at genxml
<krh> it started as "lets use structs and bitfields for the intel command stream"
<krh> and then it turned out that compilers still generate terrible code for bitfields
<krh> what genxml gives you is autogenerated "templace structs" that you fill out, then pass to an autogenerated pack function that then shifts and masks the values into place
<krh> this sounds slow, but it generates about as good code as manual shifting and or'ing the values together, since the compiler propagates the values from the template struct
<krh> as a bonus you can compile it in debug mode, which gives you a place to hook in range checks (where bitfields silently truncate), valgrind checks or even automatic conversion from float to, say, fixed point S8.2 or whatever for linewidth
<krh> it doesn't handle the "if bit is set, add another optional struct" case, but it's easy enough to handle that by hand
<krh> ah, I actually wrote a little essay about it: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/intel/genxml/README
<HdkR> Alright. pushing this vulkan code to a local branch and ignoring. It'll turn in to a full time job if I attempt that
<HdkR> Only 20k-40k lines of code seemingly from anv and radv :P
<HdkR> alyssa: Congrats on commit access
<alyssa> HdkR: Fun indeed. :P
<alyssa> krh: Wouldn't it be a better idea just to, you know, optimize gcc or whatever? :P
<alyssa> Oh wait, this is gcc we're talking about. Never mind, understood ;)
<Lyude> wait, commit access
<Lyude> you aren't talking about mesa commit access are you? :)
<HdkR> Mesa commit access woo
<Lyude> oh hell yeah! so that means panfrost is upstream now too doesn't it?
<HdkR> I presume once it is actually pushed yes :P
<alyssa> Yeah, need to do the actual push. Guess that'll happen today :)
<HdkR> alyssa: We can start creating a Vulkan driver so we can use Zink right? :)
<Lyude> what is zink?
<HdkR> OpenGL emulation over Vulkan
<HdkR> It's kusma's side project
<alyssa> HdkR: I mean
<alyssa> I'm sticking with Gallium :P
<alyssa> Until Zink and Vulkan-on-Mesa mature
<HdkR> hehe