<neonking>
i'm glad to be here, and first of all greetings to all devs who made panfrost, this might have been not an easy task
<neonking>
also saw today that collabora implemented a vulkan driver draft :o
kaspter has quit [Quit: kaspter]
<macc24>
correct
rcf has quit [Quit: WeeChat 3.2-dev]
<HdkR>
tomeu: Nice. Now we just need to get those applications running natively on panfrost capable boards :)
<macc24>
if only therew was an emulator that could run x86 code fast on arm... like box86
<alyssa>
macc24: especially x86_64 fast on modern armv8
<alyssa>
we could call it "fast x86 emulator"
<macc24>
yeah
<alyssa>
Maybe Valve would be interested in that sort of thing?
<macc24>
maybe
<alyssa>
Sounds like the kinda thing HdkR might be into.
<macc24>
hmmmm
<macc24>
maybe we should convince him to do that?
<HdkR>
pfft, dorks
<amonakov>
can't properly emulate multicore amd64 on armv8 due to weaker memory model of the latter (hw needs to provide native TSO)
<HdkR>
YOu can
<HdkR>
It'll just tear across 16byte and 64byte boundaries
<macc24>
close enough
<macc24>
oh wait, byte, not gigabyte
<HdkR>
AMD also tears across 64byte, not a big deal
<HdkR>
16byte also isn't resolved with ARM8v.1, Needs ARMv8.4 LSE2
<HdkR>
64byte is resolved with TME, which isn't in any CPU yet
<amonakov>
HdkR: so how would you translate a plain store?
<HdkR>
amonakov: Atomic store
<amonakov>
but then it can't have an unaligned address, can it?
<HdkR>
Catch the SIGBUS and translate it to an aligned 16byte LDAXP+STXLR loop
<HdkR>
This is why it'll tear at 16byte and why LSE2 fixes that issue
<HdkR>
LSE2 enables unaligned atomics
<alyssa>
HdkR: ..wait, atomic store for a plain store? won't that nerf your perf?
amonakov has left #panfrost [#panfrost]
<HdkR>
Yes. Yes it does.
amonakov has joined #panfrost
<macc24>
HdkR: what if you just align it back?
<HdkR>
You can't guarantee that
<amonakov>
yeah, unaligned memcpy's are common, that's why I had "properly" in my initial message
<HdkR>
It is proper aside from the tear on the edge
<amonakov>
a sigbus for each store in an unaligned memcpy? how is it usable?
<HdkR>
memcpy internally tries aligning the buffer. Usually ends up as a small number of unaligned moves compared to the whole buffer
<HdkR>
It's a concession for correctness. We have some speed hacks that can be done around those to lighten the pain
raster has quit [Quit: Gettin' stinky!]
<HdkR>
But with the majority of the SIGBUS going away with ARMv8.4? Kind of not a worry on post-hercules cores. Just need ARM to also provide a TSO enablement register like Apple :)
<alyssa>
Looking forward to FEX on the M1 lol
<HdkR>
Me too
<HdkR>
We need native linux and prctl for that register :)
rcf has joined #panfrost
<HdkR>
Hm, Civ5 is 32bit but works. Needs perf improvements because of 32bit. Civ6 is 64bit and works, needs a timing bug figured out which means its rendering is half what it could be
<HdkR>
Hollow Knight is Unity so that ends up with some roughness to support
<macc24>
why
<HdkR>
Its garbage collection signals require state reconstruction which will just take time to support
raster has joined #panfrost
archetech has quit [Quit: Konversation terminated!]
guillaume_g has quit [Quit: Konversation terminated!]