ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
kaspter has joined #lima
jernej has quit [Quit: Free ZNC ~ Powered by LunarBNC: https://LunarBNC.net]
jernej has joined #lima
jernej has quit [Client Quit]
jernej has joined #lima
jernej has quit [Client Quit]
jernej has joined #lima
camus has joined #lima
kaspter has quit [Ping timeout: 246 seconds]
camus is now known as kaspter
camus has joined #lima
kaspter has quit [Ping timeout: 246 seconds]
camus is now known as kaspter
camus has joined #lima
camus1 has joined #lima
kaspter has quit [Ping timeout: 256 seconds]
camus1 is now known as kaspter
camus has quit [Ping timeout: 265 seconds]
<bshah> I am afraid I did not manage to bisect it off (well bisect is strong word because I dont have any
<bshah> erm
<bshah> I don't have any good commit to begin with or which repo even
<bshah> as I understand previously frequency of this was quite low, but now its almost instant gpu crash on rotation :(
<anarsoul> so did it start with your compositor update?
<anarsoul> or with mesa update?
<anarsoul> or something else?
<bshah> now that I think about it, I realize that on pmOS where I can not reproduce this it have mesa 20.2 and other systems where I can reproduce this is 20.3 or master
kaspter has quit [Ping timeout: 256 seconds]
camus has joined #lima
camus is now known as kaspter
<bshah> hm or not
<bshah> both had mesa 20.2.3 :(
<bshah> 20.3 was released just yesterday
Barada has joined #lima
Net147 has quit [Quit: Quit]
Net147 has joined #lima
kaspter has quit [Ping timeout: 265 seconds]
kaspter has joined #lima
Barada has quit [Quit: Barada]
Viciouss has quit [Ping timeout: 246 seconds]
<enunes> bshah: can you upload all shaders you captured somewhere, just to take a look?
<bshah> sure give me moment
<enunes> bshah: hmm nothing too weird, I dont remember seeing one with that many texture references
<enunes> maybe tyring to simplify the most complex ones and seeing if it makes a difference could be something
<enunes> but kind of a blind shot
<bshah> enunes: if by some way I can figure out what shader exactly is causing this then I can do something about it, currently I have almost 0 idea where to look :/
<enunes> as in, finding wherever they are defined in Qt or something, patching the shader (removing mostly loops, conditionals, long calculation sequence and things hard to optimize) and rebuilding that component, and running to see if it makes a difference
<enunes> I'd just try to grep some part of that shader in the sources for the involved components
<enunes> again, I dont know if this will solve anything, but it would be interesting data to at least eliminate that
<bshah> there is no way to add some dbeug in mesa or something to see what shader it is processing?
<bshah> mostly because this 20-ish shaders are kind spread across 5-6 different repos
<enunes> I use mostly MESA_SHADER_CAPTURE_PATH, but you already have the shaders
<rellla> bshah: i didn't follow the whole disussion, but did you already try with LIMA_DEBUG=gp,pp ?
<bshah> no I did not, but I can try
<rellla> you can try first pp, then gp and see what lima compiler does with the shaders and if there is any dubious things ...
<rellla> thanks for the link
<enunes> that probably requires some knowledge of how pp and gp work, I was thinking more to find a way to help narrowing the issue down without having to know that
<bshah> huh of-course now with LIMA_DEBUG exported I somehow cant reproduce this
<bshah> :'(
<bshah> had been running for like 4 mins but thats' already more then what I was able to in past
<rellla> enunes: different context question: shouldn't the gp scheduler be successful even if https://gitlab.freedesktop.org/mesa/mesa/-/blob/master/src/gallium/drivers/lima/ir/gp/scheduler.c#L1507 is not done?
camus has joined #lima
kaspter has quit [Ping timeout: 256 seconds]
camus is now known as kaspter
<enunes> rellla: I'm a bit out of the loop in gpir, but seems like it tried to schedule a node as it is, and if it's not successful it needs to insert a move for that node and try again later
<enunes> in the loop at 1563, so the logic makes sense to me
e is now known as demiurge
<rellla> true - i try to figure out, why we have an endless loop here ...
<enunes> endless loop while compiling or endless loop in the generated code?
Viciouss has joined #lima
<rellla> i tried the remaining deqp test, which hit the 512 limit and if i drop sched_move i end up at a ready list, which can't be scheduled any more and generates no-op instructions until we reach 512
<rellla> but i guess i have the re-add the sched_move again and see what it does...
<enunes> yeah it is probably important, it is by design since the gp doesnt have registers so those moves are needed to keep the values alive until their nodes can be scheduled
<rellla> i guess, the bug results in having (mul0|mul1|add0|add1|pass|cmpl) blocked with sched_moves and so we are not able to schedule the instruction anymore
<bshah> sorry for noob question ... so we had been trying to narrow down issue, one thing I noticed is this crash happens only when device is locked, after we dismantled our lock greeter to bare bone it still caused crash
<bshah> now our current theory is there is some issue wrt window in background stopping their rendering when screen is locked
<bshah> (well not window, but compositor deciding to not update any windows in background)
camus has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
camus is now known as kaspter
<enunes> bshah: when you hit the issue, what does it say in status= ?
<enunes> in dmesg
<bshah> enunes: first process failing have:
<bshah> [ 85.037167] lima 1c40000.gpu: pp task error 0 int_state=0 status=1
<bshah> [ 85.043382] lima 1c40000.gpu: pp task error 1 int_state=0 status=1
<bshah> and then next
<bshah> [ 85.581390] lima 1c40000.gpu: pp task error 0 int_state=0 status=5
<bshah> [ 85.587612] lima 1c40000.gpu: pp task error 1 int_state=0 status=5
<enunes> so different status, but same way to reproduce?
<bshah> they both happen at same time
<bshah> one after another
<bshah> I guess one fails and then borks other processes?
<enunes> I have a different way to cause a pp task error but its status=5 only
<enunes> that issue report has status=0
<enunes> so, it's possible there are multiple issues in here, pp task error can be many things I suppose
<bshah> https://invent.kde.org/plasma/kwin/-/blob/master/composite.cpp#L674 we verified that if we remove these 3 lines from kwin then this crash is gone
<enunes> in my case I can only reproduce it by running multiple things in parallel, running one at a time doesn't reproduce it
<enunes> which is bad news I guess
<bshah> more or less same thing here I guess
<enunes> did you try bisecting kernel versions?
<bshah> I have 3 things running, compositor, and 2 client (lockscreen and app)
<enunes> like, going back to something like 5.7 or some time where it didn't reproduce
<bshah> I did not tbh
<enunes> lima didnt change much but there are many reworks going on in drm, maybe we missed something
<enunes> if you could try it, would save some debug time too
<bshah> I erm am kinda time stressed at moment, but probably can try in week after
<enunes> yeah, same for me
<bshah> the kwin code I refer to basically causes only rendering of lockscreen client and compositor to stay active, it stops rendering other clients
<bshah> maybe that kinda provides some hints on what might be going wrong... dunno
<enunes> to me it's a bit far away from something we can relate to lima
<enunes> but if it reduces the things running in parallel, it can be the same issue I hit
<bshah> yep sorry can't help much :D
<enunes> I don't remember this before so if I had time now I'd try a fairly old kernel
<enunes> the problem is it takes a while to reproduce so I'd have to run it for hours to ensure it really doesn't reproduce and bisect successfully
<enunes> maybe it can be accelerated by coming up with some application that renders really a lot of things at the same time to stress it
<bshah> in a way userspace setup I have can reproduce it easily
<bshah> I will try to find a time to bisect kernel
<enunes> is there some image?
<enunes> I have a pinephone too
<bshah> oooh
<enunes> so just boot it and rotate screen?
<bshah> easiest way to reproduce this issue is boot phone with phone in landscape
<bshah> and it will crash in 1 minute max
kaspter has quit [Quit: kaspter]
<bshah> (username password kde/123456)
<enunes> ok, no promises but maybe next week I can give it a try
<bshah> thanks, meanwhile I will see if we can do some "userspace" hack :D
kaspter has joined #lima
mmind00 has quit [Quit: No Ping reply in 180 seconds.]
mmind00 has joined #lima
yann has quit [Ping timeout: 272 seconds]
yann has joined #lima
champagneg has quit [Quit: WeeChat 2.3]
yann has quit [Ping timeout: 260 seconds]
dev1990 has quit [Quit: Konversation terminated!]
dev1990 has joined #lima
yann has joined #lima
warpme_ has joined #lima