alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - Discord Discard
<alyssa> anarsoul|2: Since rn when we send invalid cmdstreams we get an opaque DATA_INVALID_FAULT and trying to debug from there is a crapshoot
<Lyude> well keep in mind most of the people trying to debug this are still familiarizing themselves with the mesa portions of panfrost too :p
<Lyude> it's quite likely that's another reason it's taking a while
<alyssa> Hm?
<alyssa> I meant for myself too?
<anarsoul|2> alyssa: you probably need to dump cmdstream, not to validate it
<Lyude> alyssa: oh
<Lyude> I didn't know you were debugging this as well
<alyssa> Lyude: Trying to? :
<alyssa> :P
<alyssa> anarsoul|2: Dumps are massive and it's rarely obvious looking at them what's wrong
<anarsoul|2> hm
<Lyude> we could try to get the actual kernel replay in mali_kbase working
<Lyude> so then we coud actually modify the command stream and play around with it, then isolate where the issue is
<alyssa> ...That's not what's for
<Lyude> it isn't? o_O
<alyssa> It's a very specific errata workaround, not a debug feature :p
<Lyude> really? wtf
<Lyude> how exactly does that erratum work
<alyssa> Lyude: IIRC, hardware race condition in the tiler that causes geometry stuff to fail nondeterministically, so they try resubmitting the job a bunch of times until it goes through
<Lyude> LOL
<Lyude> i mean it's not terribly surprising
<Lyude> but hw bugs like that always make me laugh
<alyssa> yupyup
<Lyude> reminds me of the powergating issues with nvidia tesla (the gpu generation, not the compute workhorses)
<Lyude> where with power gating enabled some bits of firmware in vram would randomly get corrupted, so they solved it by teaching the nvidia driver to periodically reupload all of the firmware whenever power gating was on
<alyssa> Nice
<alyssa> Trying to replay an apitrace for mpv: "error: waffle_context_create failed"
<alyssa> Bwah?
<alyssa> (Tracing/replaying es2gears works)
<alyssa> Oh, nvm, it's only when I'm trying to replay with panfrost that it bugs, I can replay with softpipe
paulk-leonov has quit [Ping timeout: 268 seconds]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Ping timeout: 252 seconds]
klaxa has quit [Ping timeout: 250 seconds]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Ping timeout: 240 seconds]
paulk-leonov has joined #panfrost
NeuroScr has quit [Ping timeout: 240 seconds]
NeuroScr has joined #panfrost
_whitelogger has joined #panfrost
anarsoul|2 has quit [Ping timeout: 240 seconds]
urjaman has quit [Ping timeout: 250 seconds]
urjaman has joined #panfrost
TheCycoONE has joined #panfrost
TheCycoONE has quit [Ping timeout: 250 seconds]
TheCycoONE has joined #panfrost
<HdkR> Lyude: I had to purchase that Razer Blade Stealth 13.3" 4k with the MX150. So I'll just come to you with problems right? :)
<alyssa> So there's no hw protection against infinite loops in branching shaders
<alyssa> It doesn't seem to cause any problems (you can just quit the app and restart) but I was rather expecting a timeout or something :P
* alyssa remembers this for when she does raytracing on a Mali just for fun
<HdkR> alyssa: You would need to have a watchdog outside the GPU
<HdkR> Only D3D9 had HW protection against infinite loops
<alyssa> I s'pose
<alyssa> Well, I guess loops work now.
<alyssa> Just not the break part
<HdkR> er, maybe D3D9 didn't protect against it in all cases either... It only made recursive calls bottom out at 32 deep..
<alyssa> Hmmm
mifritscher has quit [Ping timeout: 252 seconds]
mifritscher has joined #panfrost
anarsoul|2 has joined #panfrost
<alyssa> Almost have loops going
<Lyude> HdkR: it won't work!
<Lyude> I told you :s
<HdkR> =O!
<HdkR> The MX150 or the laptop entirely? :P
<Lyude> The laptop will not be able to shut off the GPU and nouveau will hang on the GPU
<HdkR> neat
<HdkR> Am I not able to just have the GPU not turn on? :P
<Lyude> Very possibly not
<HdkR> Interesting. I'll have to mess with it in the days before the end of the return period
<Lyude> It's due to firmware issues Nvidia hasn't wanted to fix
<HdkR> hm
<HdkR> Firmware issues outside of the signed PMU blobs?
<tomeu> alyssa: thanks, that's useful
<tomeu> alyssa: any hints at how we'd implement that?
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
ppchain has quit [Quit: No Ping reply in 180 seconds.]
<narmstrong> alyssa: here is the apitrace of kmscube (master branch of apitrace on github, master branch of kmscube) http://termbin.com/emu7
<narmstrong> alyssa: On the panwrap dump, on the last draw leading to the FAULT, it faults on the first JOB_TYPE_TILER, but I don't have a f***** clue what's wrong about this particular tiler job... and it fails systematically, maybe it needs the replay feature ?
chewitt has quit [Quit: Zzz..]
paulk-leonov has quit [Ping timeout: 252 seconds]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Excess Flood]
<narmstrong> Hmm, it locks at exactly the 127th draw
<narmstrong> But if I relaunch kmscube it's ok
paulk-leonov has joined #panfrost
paulk-leonov has quit [Ping timeout: 252 seconds]
<narmstrong> and I changed the content of the draw (redrawing the same or skipping some) so it's the content of the draw, but the number
<narmstrong> *not the content
<tomeu> nice finding :)
<tomeu> could be related to the atom numbers?
<narmstrong> I should have played with that much earlier
<tomeu> ah, right
<tomeu> it resets at 256
<tomeu> and there's two atoms per draw job
<narmstrong> oh yeas, maybe
<tomeu> don't know why I don't see this problem though
paulk-leonov has joined #panfrost
<tomeu> uint8_t atom_counter = 0;
<tomeu> see allocate_atom()
* narmstrong looking at this
paulk-leonov has quit [Ping timeout: 260 seconds]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Excess Flood]
paulk-leonov has joined #panfrost
<narmstrong> tomeu: nah it wraps correctly, but I suspect we don't read()
<narmstrong> `last_fragment_flushed = true;` in force_flush_fragment()
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
<tomeu> we don't read()?
<narmstrong> nop, it's disabled
<narmstrong> \o/
<narmstrong> now it works
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
* tomeu is curious to see the diff
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
<narmstrong> i feel stupid, you already fixed this
<narmstrong> but it still failed
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
* narmstrong Time to clean and resync with tomeu’s branch...
<narmstrong> Would be simpler if the winsys was in panfrost repo !
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
<tomeu> yeah, maybe we should merge it all
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
urjaman has quit [Ping timeout: 250 seconds]
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
chewitt has joined #panfrost
urjaman has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
tomeu has left #panfrost [#panfrost]
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
<chewitt> please merge down .. it's impossible to tracks stuff when there's many repo's involved
paulk-leonov has joined #panfrost
<narmstrong> chewitt: ok I have a 18.3 branch on top of lima
<narmstrong> that works
<narmstrong> no more locks
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
<narmstrong> driver build: make ARCH=arm64 CROSS_COMPILE=aarch64-linux-gnu- KDIR=/home/narmstrong/projects/amlogic/linux-upstream/ CONFIG_NAME=config.meson-gxm
<narmstrong> mesa build: meson -Dgbm=true -Dglx=disabled -Dgallium-drivers=lima,panfrost,meson -Dplatforms=drm build
<chewitt> I'll have a poke
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
tomeu has joined #panfrost
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
chewitt has quit [Quit: Zzz..]
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Max SendQ exceeded]
paulk-leonov has joined #panfrost
paulk-leonov has quit [Remote host closed the connection]
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
raster has joined #panfrost
chewitt has joined #panfrost
<narmstrong> tomeu: ok, your fixed was badly applied ;-) next time I cherry-pick your patches...
<narmstrong> alyssa: src/gallium/drivers/panfrost/include/meson.build breaks cross-compilation !
<narmstrong> damn I feel stupid...
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
<Lyude> narmstrong: that would be my fault
<Lyude> but also I only use cross compilation
<Lyude> what exactly is breaking?
<narmstrong> `src/gallium/drivers/panfrost/include/meson.build:3:16: ERROR: Can not run test applications in this cross environment.`
<Lyude> narmstrong: gah, it's supposed to print a notice that you can workaround it by setting page_size manually in your toolchain file. tbh though setting it up so you can run test applications with cross compilation isn't terribly difficult
<Lyude> (that's how I have it setup here, lemme grab my toolchain file)
<Lyude> in that instance /mnt/amethyst is an nfs mount of my vim2's rootfs
<narmstrong> interesting, but it's to cross-build in a build-system (LibreELEC for Kodi) so this kind of workaround is hard to implement
<Lyude> narmstrong: ahhh, just set page_size = 12 in your toolchain properties then
<Lyude> or page_shift rather
<Lyude> unfortunately without that defined mali_kbase doesn't really compile
<narmstrong> bad !
<narmstrong> did someone fixed the `mali_ptr ptr : 64 - 10;` error on 32bit builds ?
<Lyude> I haven't tried yet-should probably get my T6xx system setup for that
<Lyude> narmstrong: yeah it's annoying
<chewitt> LE has 64bit kernel and 32bit userspace for compatibility with 32bit only libwidevine for Netflix/Amazon support
<Lyude> almost considered just adding a constructor to our mesa driver to just grab the pagesize/pageshift at runtime, then point the macro that mali_kbase uses at the global variable
<narmstrong> Lyude: the macro could also store it once at first call and use a global to the rest of the runtime (dirty but won
<narmstrong> 't impact performance
<Lyude> narmstrong: mhm, true
<Lyude> either works
<narmstrong> So the ARM Mali blob had this hardcoded ?
<Lyude> narmstrong: I think? iirc android defines PAGE_SIZE
<Lyude> *PAGE_SHIFT
<Lyude> it's hard to tell
<alyssa> tomeu: Soooort of? In mali_payload_fragment, we have a pair for min_tile_coord/max_tile_coord, which controls (up to tile granularity) how much we writeout. (TODO: Combine with scissoring for a substantial perf opimization, probably). If you do a partial writeout, I'm assuming you need to set those appropriately to only "select" the window you care about. If the partial writeout doesn't divide nicely along tile lines (8x8 pixel tiles),
<alyssa> I'm assuming you also need to do a driver-side glScissor.
<alyssa> That said, I don't know how the above scheme translates into the corresponding OpenGL extension
<alyssa> I could look into it if you'd like (I'm itching to do some real work again :P)
<tomeu> alyssa: well, that would be great!
<tomeu> even if you only get far enough to give some more detailed directions, that would be helpful
<tomeu> my panfrost work keeps being trampled by other stuff, but hopefully next week it will be calmer
<alyssa> Sure :)
<tomeu> if partial updates worked, then it would be much more comfortable to work on egl wayland clients and fixing things such as the gallium HUD
<alyssa> Alright
<Lyude> btw tomeu: I will make sure to create branches and stuff on the main panfrost page this weekend
<alyssa> tomeu: The problem with partial updates, fwiw, is that there's some intense pipelining going on and tilers have to clear everything everytime, so unless you use the partial update egl extensions, the driver has to incur a serious performance hit loading everything back into memory first
<narmstrong> Lyude: for the mali_kbase driver, I did a clean rebase and clean support for the Amlogic S912, did you have a look to it ? https://gitlab.freedesktop.org/narmstrong/mali_kbase/tree/TX041-SW-99002-r27p0-01rel0_panfrost
<narmstrong> Lyude:
<narmstrong> Lyude: you should not have the initial fault anymore
<narmstrong> Lyude: seems Amlogic failed to integrate the T820 correctly and we need to manually enable/reset the cores
<alyssa> Rockchip <3
chewitt has quit [Quit: Zzz..]
<narmstrong> alyssa: does the mali_ptr need to be 64bit ?
<urjaman> yes
<urjaman> mali has 64bit pointers even if the ARM core has 32bit
<alyssa> ^^
<Lyude> narmstrong: a-ha!
<Lyude> I guessed that might be the case :)
<narmstrong> ok so mali_ptr should not be uintptr_t then...
<alyssa> Definitely not, no.
<Lyude> I may have done that by accident, whoops :p
<Lyude> narmstrong: anyway-I'll take a look next chance I get
<alyssa> No worries :)
<alyssa> (On my old branch, we have u64 mali_ptr. No idea what happened with the kbase stuff)
<narmstrong> No problem, it's still very WiP ;-) just trying to make it go further !
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
klaxa has joined #panfrost
anarsoul|2 has quit [Ping timeout: 272 seconds]
raster has quit [Remote host closed the connection]
jernej has joined #panfrost
<hanetzer> finally got distcc setup :)
chewitt has joined #panfrost
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
anarsoul|2 has joined #panfrost
chewitt has quit [Quit: Zzz..]
belgin has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
belgin has quit [Quit: Leaving]
urjaman has quit [Ping timeout: 250 seconds]
belgin has joined #panfrost
urjaman has joined #panfrost
pH5 has quit [Quit: -_-]
AntonioND has joined #panfrost
AntonioND has quit [Quit: Quit]