ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
<enunes> ok I haven't found the root cause yet, but I am able to isolate the frame that causes it and I can see the backtrace of what it's trying to do when it triggers the gp error on sway
<enunes> it seems to be related to cursor
<enunes> all goes well until it switches from the normal resolution to a 64x64 viewport to do something with the cursor, and the following lima_flush triggers it
<enunes> everything else in the frame, other than the switch to 64x64 viewport seems to be things it did in the previous frames with success
<enunes> but too late now, continue tomorrow
yuq825 has joined #lima
megi has quit [Ping timeout: 240 seconds]
yuq825 has quit [Ping timeout: 265 seconds]
jrmuizel has quit [Remote host closed the connection]
<anarsoul> enunes: interesting
jonkerj has quit [Ping timeout: 264 seconds]
dddddd has quit [Remote host closed the connection]
yuq825 has joined #lima
_whitelogger has joined #lima
Barada has joined #lima
nerdboy has quit [Ping timeout: 258 seconds]
rellla has quit [Remote host closed the connection]
rellla has joined #lima
rellla has quit [Remote host closed the connection]
rellla has joined #lima
monstr has joined #lima
jernej has quit [Remote host closed the connection]
jernej has joined #lima
<MoeIcenowy> enunes: if you can retrace the play, maybe you can try to trim the operations of 64x64 cursor?
Barada has quit [Quit: Barada]
yuq825 has quit [Quit: Leaving.]
Barada has joined #lima
jbrown has quit [Quit: Leaving]
jbrown has joined #lima
<enunes> MoeIcenowy: I did that of course, though it also needs the previous frames to set up other stuff, I'm down to 18 frames that reproduce it and if I trim the last one off, it doesn't
jonkerj has joined #lima
jbrown has quit [Ping timeout: 276 seconds]
megi has joined #lima
jbrown has joined #lima
dddddd has joined #lima
mardikene193 has joined #lima
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
enunes has quit [Ping timeout: 265 seconds]
enunes has joined #lima
jrmuizel has quit [Remote host closed the connection]
Barada has quit [Quit: Barada]
<mardikene193> I that know you understand nothing much how modern chips are built! However I still demonstrate https://www.edaplayground.com/x/2K3J the queues as in simulation regards to resetting them.
jrmuizel has joined #lima
Elpaulo has joined #lima
nerdboy has joined #lima
nerdboy has quit [Changing host]
nerdboy has joined #lima
<mardikene193> so in case of rasterizer or vertex processor, reset signal is pushed down the pipeline to the shaders per pixel or per vertex
<mardikene193> in case of Opencl or Cuda kernels, it is a dispatcher which does that on entrance of the kernel
monstr has quit [Remote host closed the connection]
<mardikene193> this reset signal that gets pushed down the pipeline resets only the program counter after all still, so it is without memory protection impossible to have the fetch fail on graphics pipeline
<mardikene193> but on CUDA and OPENCL kernels there is no rasterizer neither any of the conventional fixed function pipelines present, so reset is toggled on kernel launch
<mardikene193> in other words, with modified host code it is possible to enter the queues without catching the program end and calling new entrance to the kernel
<mardikene193> you are all clueless monsters and conspirers , I have seen many of such violators in my personal life, crawling of such
<mardikene193> i can not behave better with such dickheads that I allready have, terrible people really!
<mardikene193> I feel like a bunch of flies over shit that you fly over me so to speak, attacked from every corner, you are total lunatics and never make sense either.
<mardikene193> What happens under the hood, in fetch module is that issue module finished_wf.v delegates signals to fetch module
<Tofe> anarsoul: I've tracked down the moment when the problematic const op node is created
<mardikene193> that delegates round_robin.v which is a program counter fetcher from wavefront buffers, so the decode id is
<mardikene193> forwarded but opcode isn't
<Tofe> it's actually created by gpir_lower_branch_uncond
<Tofe> but I'll continue my debug, it seems it's not the end of it
<mardikene193> the signal is is vacant from wfid_generator.v where the blockage happens, it looks like every chip from history does it similarly
<mardikene193> so yeah the reset signal understandably makes the finished_wf.v to toggle the fetching back on
<mardikene193> so to sum things up what i talked about, I made an edaplayground minor module which can be simulated, do not forget to open epwave window
<mardikene193> which shows that queues are not being reset with the reset signal
<mardikene193> so what is the point with this, is that when you fetch a wfid, it zeros 39 wfids in hardware due to continuous assignment
<Tofe> anarsoul: the issue seems to be that we create these const nodes *after* gpir_lower_const has been called, i.e. after gpir_pre_rsched_lower_prog
<mardikene193> so when one opcode is fetched it is replicated to fail on others, however when the fetch fails you sort of compute from stale or previous queue values
<mardikene193> of instructions
<mardikene193> I think it should be always programmed similarly, since that is what cadence or synopsis would generate, hardware is partly generated from scripts
<mardikene193> there however is no big mistakes, everything is done elegantly and clinically there
<Tofe> yes, we are in gpir_post_rsched_lower_prog when we create the problematic const node
<Tofe> I'll just try and call gpir_lower_const also in gpir_post_rsched_lower_prog -- but not sure it makes sense
<Tofe> Seems to work for my phone app at least!
<Tofe> I'll submit an MR
jernej has quit [Remote host closed the connection]
jernej has joined #lima
<mardikene193> I am unsure where in shader model spec they had queues as big as instruction limits, obviously they the instruction limits on sm5 spec are bigger then queues can accommadate
<Tofe> Ok, issue closed.
<mardikene193> MY eyes are so tired, when i drink some alcohol i feel like i am blind sometimes, eyes are painful and tired the same time, and vision is vague
<mardikene193> obviously I can not forever try to clean your crap, however i advise to look at milymist tmu2 and the kaist e-book still
<mardikene193> I am almost still sure about what i talk!
<mardikene193> I would like to see you having some memory, cause i crown the effort with some small lines of code i.e implementation, which you gonna need to know how to read.
<mardikene193> cause i hate random questions later, especially when i am unemployed and no one gives me the benefits for helping others.
<enunes> anarsoul|c: ha, I think I got it
<anarsoul> Tofe: always use mesa master :)
<enunes> anarsoul: and it's a 1-liner :)
<anarsoul> enunes: hehe
<anarsoul> what's the fix? :)
<enunes> plbu scissor clear seems to be using the values from the previous scissor settings
<enunes> so when sway switches framebuffer to a smaller one, the scissor from the old value is used in the clear, which is too big
<enunes> using the fb size in clear fixes it
<anarsoul> oh
<mardikene193> I identify people after their talks, and i talk very specific technology stuff, and i am very rarely wrong i think, where i am in a dark and shooting info that won't add up, hence i do not understand what you try to acheive.
<anarsoul> enunes: so we should reset scissors when fb is changed?
<enunes> lima_pack_clear_plbu_cmd uses PLBU_CMD_SCISSORS(scissor->minx, scissor->maxx, scissor->miny, scissor->maxy);
<mardikene193> i think mostly around a year or more and many times in the past i have been spot on and have given some efforts to study things, even have some common sense.
<enunes> but scissor is even disabled in the frame that crashes
jrmuizel has quit [Remote host closed the connection]
<enunes> maybe we shouldnt even send the scissor command
<enunes> anyway, now need to test a couple solutions to see what is best
<mardikene193> if you looked at the epwave of the simulation you'd understand that i had calculated of what i talked, and quite some few assumptions were in place.
<mardikene193> so basically the scoreboard data gets arbitrated perpendicular or according to alu/lsu/wr_done_wfid signals
arti has quit [Quit: No Ping reply in 180 seconds.]
<mardikene193> which when chip stalls get redirected from RFA to default X i.e X is propagated which is later made to be zero
<mardikene193> case the mux_XXX_PARAMb.... just asks for the built 111111 or 000000 which was made from rfa default case label
arti has joined #lima
<mardikene193> 6{1'bx}
<mardikene193> and it decodes that from the mux as zero, so after the chip stalls the wfid arbiter wraps around back to 0
<mardikene193> oooh damn, i think is where i was wrong now
<anarsoul> enunes: we should reset scissors if fb changes
<enunes> I think we already do that and even MoeIcenowy's solution was on that direction
<enunes> probably just the clear was using the wrong settings
<anarsoul> or rather we should clamp scissor to viewport
<anarsoul> that's what vc4 does
<enunes> yeah looks like there is support to partial clear, so we might need to just reset the scissor settings when fb is changed
<enunes> I'll test that
<enunes> if the A64 ever finishes compiling it, as I had to jump to a distrubution to test stuff with sway...
<anarsoul> use ccache
mardikene193 has quit [Quit: Leaving]
<anarsoul> panfrost also clips scissor to viewport
<anarsoul> see vc4_emit_state() for reference
jrmuizel has joined #lima
mardikene193 has joined #lima
<mardikene193> not much wrong still. it posts a garbage opcode actually, but this scheme lets the decode_opcode catch up with simd arbiters one, and indeed it wraps-around the stuff
mardikene193 has quit [Ping timeout: 250 seconds]
jrmuizel has quit [Remote host closed the connection]
<enunes> not as easy as it seemed to just clip, looks like the lima state has a sort of bogus internal clip coords while at least apitrace expects the state to be sane
<anarsoul> hm
<anarsoul> so we're not tracking state updates properly?
<enunes> we probably need to clip anyway as everyone else does it or in case the user explicitly loads bad clip coords
<anarsoul> yeah
<enunes> but I wonder if one of the api calls is expected to reset the coords
<anarsoul> ask #dri-devel? :)
<enunes> I was about to paste the same page
<enunes> indeed it is disabled
Kwiboo has quit [Quit: .]
<enunes> but I think it thinks it's enabled
Kwiboo has joined #lima
<anarsoul> if I understand correctly, it's bound to context, not to particular framebuffer
<anarsoul> so I guess it stay valid after framebuffer changes?
<enunes> yes, what I wonder now if our method if checking it with "ctx->rasterizer->base.scissor" works
<anarsoul> can you share your apitrace?
<anarsoul> enunes: that's what other drivers use
<enunes> ok, so it might be that that conditional is what is missing in the plbu clear path
Kwiboo has quit [Client Quit]
Kwiboo has joined #lima
Kwiboo has quit [Client Quit]
<anarsoul> it doesn't need it
<anarsoul> lima_pack_clear_plbu_cmd() is only called if lima_is_scissor_full_fb() is called
Kwiboo has joined #lima
<enunes> so it doesn't make sense, because as scissors is disabled, it is supposed to be full fb
<anarsoul> can you share your apitrace?
<enunes> I attached it to the MR
<anarsoul> what frame should I look into?
<enunes> I posted there, frame 21
<anarsoul> did you add traces to lima_set_scissor_states() and to lima_bind_rasterizer_state()?
<enunes> right now I only have them on lima_pack_clear_plbu_cmd
<anarsoul> add them there to confirm that scissor test is indeed disabled
<anarsoul> enunes: I'm pretty sure that sway worked fine ~2 months ago, so it could be generic bug
<anarsoul> I wonder if it's related to 2037478702adab5a3863a120be821626191b2e3e
<anarsoul> oh
<anarsoul> I think I got it
<anarsoul> or maybe not...
<enunes> there seems to be something strange with ctx->rasterizer->base.scissor indeed, that is the only frame where it is set to 1 but full_fb_clear is not set
<anarsoul> if rasterizer->base.scissor is true it means that scissor test is enabled
<enunes> I reverted that commit and it seems to help too
<enunes> scissor is not enabled, it is disabled at the end of frame 20
<anarsoul> yeah, but it's not reflected in rasterizer state
<anarsoul> enunes: yeah, looks like bug somewhere in mesa to me
<anarsoul> rasterizer state isn't updated when it should?
<anarsoul> ask #dri-devel :)
<anarsoul> I expect it to be faster than digging through mesa codebase
<enunes> it's too late for me for today, I'll post a comment pinging the guy from that commit
<anarsoul> sounds good
<anarsoul> with this commit reverted st_update_scissor() clamps scissor rects to (0,0, fb_width, fb_height)
<anarsoul> at this point I'm pretty sure it's a bug in mesa
<anarsoul> we don't get new rasterizer state
<enunes> well, at least it's tracked down and not so much of a mystery bug anymore
<anarsoul> yeah
<anarsoul> nice work! :)