marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics RE and development | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
aratuk has joined #asahi-gpu
aratuk has quit [Ping timeout: 265 seconds]
<bloom> Oh, and by properly splitting out unaligned/aligned portions, the M1 gets the job done in 0.33s while the RK3399 is at ~1.58s
<bloom> What's intriguing there is that the optimization helps only 4% on the Rockchip but 23% on the Apple.
<bloom> The optimization reducing ALU while keeping memory access constant. So it seems not only is the ALU on the Mac all around faster, the memory is _so_ much better.
<davidrysk[m]> This was making the rounds the other day, but these tests are too elementary to really provide any interesting info
<bloom> I don't think that should come as a surprise to anyone, memory performance is critical to both CPU and GPU performance, and if you want to kick ass at both for an SoC, optimizing the hell out of the memory access is a 2 for 1 deal..
<davidrysk[m]> I wish we knew more about Apple's (CPU) microarchitecture. From what I can tell, they're still using the scheduler model that was intended for Cyclone (A7) in LLVM, and that chip had much less instruction level parallelism available.
<bloom> icecream95: Oh, hello there.
<bloom> If this is about your NEON routine, yes, you win, I'm over it, we both know you're better at writing tiling routines than me :P
<icecream95> bloom: IIRC My NEON tiling routine was "only" about 3x faster
<bloom> How could I forget?
icecream95 has quit [Ping timeout: 265 seconds]
icecream95 has joined #asahi-gpu
klaus has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<krbtgt> iirc wrt tiling: i think powervr/nvidia/apple are the biggestg users of it
<bloom> krbtgt: Everybody uses it as much as they can.
<krbtgt> huh, i thought other companies had different strategies, but it's been a while since i looked into it
<bloom> krbtgt: The specific tiling patterns vary, but everyone uses some form of tiling.
<davidrysk[m]> found this with a quick google :)
<bloom> Mary_-: Sorry for forgetting to respond, but neat stuff! :) re: meallib
<bloom> tiling != tiling..
<bloom> Overloaded term :(
ransom has joined #asahi-gpu
Tokamak has joined #asahi-gpu
aratuk has joined #asahi-gpu
<bloom> Wat?
<bloom> Why is u64 load/stores more than 2x slower than u32 load/stores on a 64-bit machine?
<bloom> (the rockchip, mac is off)
aratuk has quit [Ping timeout: 264 seconds]
ransom has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Tokamak has quit [Remote host closed the connection]
<marcan> bloom: not just memory though, the CPU pipe to memory
<marcan> it seems their load/store buffers are way wider than anything else
<marcan> more outstanding transactions
Tokamak has joined #asahi-gpu
ransom has joined #asahi-gpu
q3k|m has quit [Remote host closed the connection]
ransom has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
Tokamak has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
klaus_ has joined #asahi-gpu
aratuk has joined #asahi-gpu
aratuk has quit [Ping timeout: 264 seconds]
icecream95 has quit [Ping timeout: 240 seconds]
<marcan> PSA: we are adjusting the RE policy and banning userspace GPU stack disassembly, since it is unlikely that approach will be necessary. at the same time, this channel is no longer allowed for any kind of disasm/binary RE
marcan changed the topic of #asahi-gpu to: Asahi Linux: porting Linux to Apple Silicon macs | GPU / 3D graphics stack black-box RE and development (NO binary reversing) | Keep things on topic | GitHub: https://alx.sh/g | Wiki: https://alx.sh/w | Logs: https://alx.sh/l/asahi-gpu
<marcan> we are not banning GPU kernel driver disassembly (but see https://alx.sh/re for the policy), but that discussion will be confined to #asahi-re
<marcan> this channel is therefore now "clean"
<marcan> the discussion here should focus on black-box RE of the userspace stack (what bloom is doing), and on hardware details and kernel driver development discussion (strictly *results* i.e. documentation resulting from kernel driver RE are allowed only, absolutely no discussion of the actual kext code)
<marcan> that is, this channel is the clean end of the clean room now
DarthCloud has quit [Ping timeout: 240 seconds]
DarthCloud has joined #asahi-gpu
aratuk has joined #asahi-gpu
aratuk has quit [Ping timeout: 265 seconds]
mogery has joined #asahi-gpu
aratuk has joined #asahi-gpu
aratuk has quit [Ping timeout: 272 seconds]
q3k|m has joined #asahi-gpu
robher has joined #asahi-gpu
sumoon[m] has joined #asahi-gpu
klaus_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
HeN has quit [Quit: Connection closed for inactivity]
Tokamak has joined #asahi-gpu
klaus_ has joined #asahi-gpu
aratuk has joined #asahi-gpu
<davidrysk[m]> bloom: Is there any documentation on retrieving compiled and assembled object code of a shader?
aratuk has quit [Ping timeout: 264 seconds]
ransom has joined #asahi-gpu
ransom_ has joined #asahi-gpu
ransom has quit [Ping timeout: 272 seconds]
ransom has joined #asahi-gpu
modwizcode has joined #asahi-gpu
ransom_ has quit [Ping timeout: 256 seconds]
mogeryy has joined #asahi-gpu
mogery has quit [Ping timeout: 260 seconds]
<bloom> davidrysk[m]: 'documentation'? no, fraid not
<bloom> I was hardcoded offsets
<bloom> *hardcoding
<bloom> but it shows up in mem_0.bin if you dump with the wrap library there
<bloom> and wrap should be clang'able by just *.c -framework IOkit -dylib or something like that
<bloom> (there is some code I need to push to help with that, but my local setup is still a giant mess of shell scripts split acorross two machines)
<davidrysk[m]> bloom: "Does that require dual licensing the other files too?" APSL sections 1.5 and 4 seem to say no
HeN has joined #asahi-gpu
<bloom> davidrysk[m]: Ah, thank you :)
aratuk has joined #asahi-gpu
aratuk has quit [Ping timeout: 256 seconds]
tiagom has joined #asahi-gpu
tiagom has quit [Quit: tiagom]
tiagom has joined #asahi-gpu
icecream95 has joined #asahi-gpu
mogeryy has quit [Quit: Leaving]
ransom has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
wolf511[m] has joined #asahi-gpu
DarthCloud has quit [Ping timeout: 240 seconds]
DarthCloud has joined #asahi-gpu
aratuk has joined #asahi-gpu
aratuk has quit [Ping timeout: 240 seconds]
tiagom has quit [Quit: tiagom]
ransom has joined #asahi-gpu
ransom has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]