marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
macc24 has quit [Ping timeout: 260 seconds]
artemist has quit [Ping timeout: 260 seconds]
artemist has joined #asahi-gpu
<bloom> Taking a peak at fsin/fcos
<bloom> Observation 1: the asm of fcos is identical to fsin, except the first instruction
<bloom> 8: ba05c0021810 fmadd $r1, r0.discard, u0, 0.25
<bloom> vs
<bloom> fadd r1, r0, u0
<bloom> *fmul
<bloom> This corresponds to the trig identity cos(x) = sin(x + pi/2) = sin(x + tau/4), so we expect that u0 = 2pi = tau
<bloom> er, no, we want the reciprocal of that. units are hard
<bloom> anyway, sure enough checking the memory dump u0 = 0x3E22F983, and
<bloom> struct.unpack('f', bytes([0x83, 0xF9, 0x22, 0x3E])) = 0.15915493667125702
<bloom> 1.0 / math.tau = 0.15915494309189535
<bloom> So the first thing we do in either sin or cos is divide by tau to change from radians to "times around the circle"
<bloom> and for cos we just push forward by 90 degrees.
<bloom> Observation 2: The next few instructions are the sequence floor/fsub. We immediately recognize this as the fract(...) function. This is again what we'd expect -- fsin/fcos are periodic, with period 2pi, which is now period 1 after the above change of units.
<bloom> The last four instructions are where things get mysterious.
<bloom> There are special one-op instructions dparametrized as 0b1010 and 0b1110... call them F and G
<bloom> and reading off, we see take that angle 0 <= x < 1 and calculate... G(F(4 * x)) * F(4 * x)
<bloom> and somehow that equals sin?
<bloom> The multiplication by four is just another change of units. Now the fraction is angle within a quadrant, and the integer part is the quadrant. This is natural enough given the quadrant (anti)symmetries we have with sin.
<bloom> All we're doing is a series of reductions to make G and F as simple as possible. Or rather, as small as possible -- they're probably lookup tables.
<bloom> At this point we have two options:
<bloom> 1. Just call F and G sin_pt_1 and sin_pt_2 and call it a day. Who cares?
<bloom> 2. Using a compute shader test (I believe dougallj has a script for this), run each op in isolation and dump the results. Then plot them and see what happens.
<bloom> (There's no magic for tan. It's just computing fsin and fcos and dividing them. Just looks funny due to aggressive scheduling.)
<dougall> yeah, maybe 1 is the way to go for now? (i do have a script that i have technically done something like this with, but it'll be a lot easier once we've made more progress on everything else)
<bloom> dougall: sure
<bloom> I was secretly hoping there would be Taylor series involved
<bloom> The lowering for Bifrost is super cute
<dougall> but yeah, it's a nice self-contained challenge for anyone interested in numerics if people want to try :)
<dougall> haha nice!
odmir has quit [Remote host closed the connection]
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 246 seconds]
<bloom> patches incoming
<bloom> Patches sent
odmir has joined #asahi-gpu
<dougall> looks great!
<bloom> thanks!
<bloom> no clue on the extended fields, although IIRC you have a clever way to determine those that I don't :p
<bloom> also, I still can't get over the fact the shaders are keyed to formats
<bloom> I am not ok with this
<dougall> what does 'keyed to formats' mean?
<bloom> The code of the vertex shader depends on the format of the vertex attributes,
<bloom> likewise the code of the fragment shader depends on the format of the framebuffer
<bloom> That means you *can't* compile shaders up front,
<bloom> rather you have to wait until the app is actually drawing things and potentially have to recompile many times with different shader "keys" - combinations of "leaky" state
<bloom> ("shader variants")
<bloom> a number of architectures are specificalluy designed to eliminate all shader variants in e.g. core OpenGL ES
<bloom> Apple has gone the other end... why? well, because they can - Metal requires all this info at pipeline create time anyway, so why not use it? /s
<dougall> ah, that makes sense, thanks
odmir has quit [Ping timeout: 240 seconds]
<chrisf> only trouble for GL. we know this upfront for vulkan too
<bloom> chrisf: still means a lot more recompiles if the app swaps things in the pipeline
<chrisf> may be feasible to patch it?
<chrisf> is just the shader epilog that's affected?
<bloom> tbd
<bloom> I've just compiled my first shader (from GLSL down to AGX machine code) with Mesa :)
<dougall> :o
<bloom> it's gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0), don't get too excited :p
<bloom> doing all that dev on linux, let's see if it works on macos too :-p
<bloom> It does not work. Down the debug hole.
<bloom> Glaring problems include missing a stop instruction :-p
phiologe has quit [Ping timeout: 250 seconds]
phiologe has joined #asahi-gpu
<bloom> Also, while obviously I can't enforce this, I kindly ask y'all not to make a splash about that before I can get around to blogging ;-P
<bloom> ok, stop/trap implemented
<bloom> ok, now it doesn't fault but there are crazy colours, guessing I screwd up the register alloc
<bloom> ofc I don't know the regalloc field for FS...
<bloom> Yep, there it is.
<bloom> First shader compiled from GLSL with mesa running successfully on the hardware
<bloom> not bad seeing as I started writing the compiler yesterday :p
mxw39 has joined #asahi-gpu
<dougall> haha awesome!
<bloom> ld_vary is up next at which point I can start writing real shaders and implementing the heaps of ALU needed for anything interesting
<bloom> thanks :)
<bloom> have ld_var emitted but some cmdbuf work remains to use it in the demo..
j`ey_ has joined #asahi-gpu
mofux[m] has quit [*.net *.split]
meiji163[m] has quit [*.net *.split]
izzyisles[m] has quit [*.net *.split]
winocm has quit [*.net *.split]
j`ey has quit [*.net *.split]
winocm has joined #asahi-gpu
meiji163[m] has joined #asahi-gpu
izzyisles[m] has joined #asahi-gpu
mofux[m] has joined #asahi-gpu
mxw39 has quit [Quit: Konversation terminated!]
mxw39 has joined #asahi-gpu
mxw39 has quit [Quit: Konversation terminated!]
mxw39 has joined #asahi-gpu
the-mentor3 has quit [Quit: The Lounge - https://thelounge.chat]
the-mentor3 has joined #asahi-gpu
mxw39 has quit [Quit: Konversation terminated!]
mxw39 has joined #asahi-gpu
j`ey_ is now known as j`ey
j`ey has quit [Changing host]
j`ey has joined #asahi-gpu
cwabbott has joined #asahi-gpu
jaalsa has joined #asahi-gpu
macc24 has joined #asahi-gpu
odmir has joined #asahi-gpu
odmir has quit [Ping timeout: 240 seconds]
artemist has quit [Quit: artemist]
<chrisf> bloom: you seem to be making a ton of progress
<bloom> :)
<dhewg> woot, sounds like a milestone to me!
radex has joined #asahi-gpu
kendfinger has joined #asahi-gpu
vlixa has quit [Remote host closed the connection]
vlixa has joined #asahi-gpu
jixbo has joined #asahi-gpu
HeN has quit [Quit: Connection closed for inactivity]
jixbo has quit [Quit: jixbo]