#lima on 2019-09-19 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

01:12 <enunes> ok I haven't found the root cause yet, but I am able to isolate the frame that causes it and I can see the backtrace of what it's trying to do when it triggers the gp error on sway

01:12 <enunes> it seems to be related to cursor

01:13 <enunes> all goes well until it switches from the normal resolution to a 64x64 viewport to do something with the cursor, and the following lima_flush triggers it

01:14 <enunes> everything else in the frame, other than the switch to 64x64 viewport seems to be things it did in the previous frames with success

01:15 <enunes> but too late now, continue tomorrow

01:41 yuq825 has joined #lima

02:18 megi has quit [Ping timeout: 240 seconds]

02:38 yuq825 has quit [Ping timeout: 265 seconds]

02:40 jrmuizel has quit [Remote host closed the connection]

02:53 <anarsoul> enunes: interesting

02:57 jonkerj has quit [Ping timeout: 264 seconds]

03:30 dddddd has quit [Remote host closed the connection]

03:48 yuq825 has joined #lima

04:59 _whitelogger has joined #lima

05:12 Barada has joined #lima

07:04 nerdboy has quit [Ping timeout: 258 seconds]

07:28 rellla has quit [Remote host closed the connection]

07:28 rellla has joined #lima

07:42 rellla has quit [Remote host closed the connection]

07:43 rellla has joined #lima

08:38 monstr has joined #lima

10:13 jernej has quit [Remote host closed the connection]

10:20 jernej has joined #lima

10:27 <MoeIcenowy> enunes: if you can retrace the play, maybe you can try to trim the operations of 64x64 cursor?

10:29 Barada has quit [Quit: Barada]

10:29 yuq825 has quit [Quit: Leaving.]

10:33 Barada has joined #lima

10:50 jbrown has quit [Quit: Leaving]

10:55 jbrown has joined #lima

11:20 <enunes> MoeIcenowy: I did that of course, though it also needs the previous frames to set up other stuff, I'm down to 18 frames that reproduce it and if I trim the last one off, it doesn't

11:32 jonkerj has joined #lima

11:36 jbrown has quit [Ping timeout: 276 seconds]

11:41 megi has joined #lima

11:47 jbrown has joined #lima

11:51 dddddd has joined #lima

12:10 mardikene193 has joined #lima

12:37 jrmuizel has joined #lima

12:45 jrmuizel has quit [Remote host closed the connection]

12:50 jrmuizel has joined #lima

13:24 enunes has quit [Ping timeout: 265 seconds]

13:27 enunes has joined #lima

13:33 jrmuizel has quit [Remote host closed the connection]

13:41 Barada has quit [Quit: Barada]

14:07 <mardikene193> I that know you understand nothing much how modern chips are built! However I still demonstrate https://www.edaplayground.com/x/2K3J the queues as in simulation regards to resetting them.

14:12 jrmuizel has joined #lima

14:23 Elpaulo has joined #lima

15:24 nerdboy has joined #lima

15:25 nerdboy has quit [Changing host]

15:25 nerdboy has joined #lima

16:01 <mardikene193> so in case of rasterizer or vertex processor, reset signal is pushed down the pipeline to the shaders per pixel or per vertex

16:03 <mardikene193> in case of Opencl or Cuda kernels, it is a dispatcher which does that on entrance of the kernel

17:08 monstr has quit [Remote host closed the connection]

17:17 <mardikene193> this reset signal that gets pushed down the pipeline resets only the program counter after all still, so it is without memory protection impossible to have the fetch fail on graphics pipeline

17:18 <mardikene193> but on CUDA and OPENCL kernels there is no rasterizer neither any of the conventional fixed function pipelines present, so reset is toggled on kernel launch

17:20 <mardikene193> in other words, with modified host code it is possible to enter the queues without catching the program end and calling new entrance to the kernel

17:21 <mardikene193> you are all clueless monsters and conspirers , I have seen many of such violators in my personal life, crawling of such

17:23 <mardikene193> i can not behave better with such dickheads that I allready have, terrible people really!

17:30 <mardikene193> I feel like a bunch of flies over shit that you fly over me so to speak, attacked from every corner, you are total lunatics and never make sense either.

17:32 <mardikene193> What happens under the hood, in fetch module is that issue module finished_wf.v delegates signals to fetch module

17:34 <Tofe> anarsoul: I've tracked down the moment when the problematic const op node is created

17:34 <mardikene193> that delegates round_robin.v which is a program counter fetcher from wavefront buffers, so the decode id is

17:35 <mardikene193> forwarded but opcode isn't

17:35 <Tofe> it's actually created by gpir_lower_branch_uncond

17:37 <Tofe> but I'll continue my debug, it seems it's not the end of it

17:41 <mardikene193> the signal is is vacant from wfid_generator.v where the blockage happens, it looks like every chip from history does it similarly

17:45 <mardikene193> so yeah the reset signal understandably makes the finished_wf.v to toggle the fetching back on

17:47 <mardikene193> so to sum things up what i talked about, I made an edaplayground minor module which can be simulated, do not forget to open epwave window

17:47 <mardikene193> which shows that queues are not being reset with the reset signal

17:48 <mardikene193> so what is the point with this, is that when you fetch a wfid, it zeros 39 wfids in hardware due to continuous assignment

17:48 <Tofe> anarsoul: the issue seems to be that we create these const nodes *after* gpir_lower_const has been called, i.e. after gpir_pre_rsched_lower_prog

17:49 <mardikene193> so when one opcode is fetched it is replicated to fail on others, however when the fetch fails you sort of compute from stale or previous queue values

17:49 <mardikene193> of instructions

17:50 <mardikene193> I think it should be always programmed similarly, since that is what cadence or synopsis would generate, hardware is partly generated from scripts

17:51 <mardikene193> there however is no big mistakes, everything is done elegantly and clinically there

17:53 <Tofe> yes, we are in gpir_post_rsched_lower_prog when we create the problematic const node

17:53 <Tofe> I'll just try and call gpir_lower_const also in gpir_post_rsched_lower_prog -- but not sure it makes sense

17:58 <Tofe> Seems to work for my phone app at least!

17:59 <Tofe> I'll submit an MR

18:06 jernej has quit [Remote host closed the connection]

18:13 jernej has joined #lima

18:24 <Tofe> ooh but wait a bit... this was fixed by https://gitlab.freedesktop.org/mesa/mesa/commit/ee8cc90e553b56e8fbab5ecbaa5b1476221401d1 it seems

18:31 <mardikene193> I am unsure where in shader model spec they had queues as big as instruction limits, obviously they the instruction limits on sm5 spec are bigger then queues can accommadate

18:46 <Tofe> Ok, issue closed.

18:53 <mardikene193> MY eyes are so tired, when i drink some alcohol i feel like i am blind sometimes, eyes are painful and tired the same time, and vision is vague

18:54 <mardikene193> obviously I can not forever try to clean your crap, however i advise to look at milymist tmu2 and the kaist e-book still

18:54 <mardikene193> I am almost still sure about what i talk!

18:54 <mardikene193> https://doc.lagout.org/Others/Wiley-Mobile.3D.Graphics.SoC.From.Algorithm.to.Chip.2010.RETAiL.EBook.pdf

18:55 <mardikene193> I would like to see you having some memory, cause i crown the effort with some small lines of code i.e implementation, which you gonna need to know how to read.

18:56 <mardikene193> cause i hate random questions later, especially when i am unemployed and no one gives me the benefits for helping others.

18:57 <enunes> anarsoul|c: ha, I think I got it

18:58 <anarsoul> Tofe: always use mesa master :)

18:58 <enunes> anarsoul: and it's a 1-liner :)

18:58 <anarsoul> enunes: hehe

18:58 <anarsoul> what's the fix? :)

18:59 <enunes> plbu scissor clear seems to be using the values from the previous scissor settings

18:59 <enunes> so when sway switches framebuffer to a smaller one, the scissor from the old value is used in the clear, which is too big

18:59 <enunes> using the fb size in clear fixes it

19:00 <anarsoul> oh

19:00 <mardikene193> I identify people after their talks, and i talk very specific technology stuff, and i am very rarely wrong i think, where i am in a dark and shooting info that won't add up, hence i do not understand what you try to acheive.

19:00 <anarsoul> enunes: so we should reset scissors when fb is changed?

19:00 <enunes> lima_pack_clear_plbu_cmd uses PLBU_CMD_SCISSORS(scissor->minx, scissor->maxx, scissor->miny, scissor->maxy);

19:01 <mardikene193> i think mostly around a year or more and many times in the past i have been spot on and have given some efforts to study things, even have some common sense.

19:01 <enunes> but scissor is even disabled in the frame that crashes

19:04 jrmuizel has quit [Remote host closed the connection]

19:05 <enunes> maybe we shouldnt even send the scissor command

19:05 <enunes> anyway, now need to test a couple solutions to see what is best

19:06 <mardikene193> if you looked at the epwave of the simulation you'd understand that i had calculated of what i talked, and quite some few assumptions were in place.

19:07 <mardikene193> so basically the scoreboard data gets arbitrated perpendicular or according to alu/lsu/wr_done_wfid signals

19:08 arti has quit [Quit: No Ping reply in 180 seconds.]

19:08 <mardikene193> which when chip stalls get redirected from RFA to default X i.e X is propagated which is later made to be zero

19:09 <mardikene193> case the mux_XXX_PARAMb.... just asks for the built 111111 or 000000 which was made from rfa default case label

19:09 arti has joined #lima

19:09 <mardikene193> 6{1'bx}

19:10 <mardikene193> and it decodes that from the mux as zero, so after the chip stalls the wfid arbiter wraps around back to 0

19:28 <mardikene193> oooh damn, i think is where i was wrong now

19:38 <anarsoul> enunes: we should reset scissors if fb changes

19:39 <enunes> I think we already do that and even MoeIcenowy's solution was on that direction

19:40 <enunes> probably just the clear was using the wrong settings

19:42 <anarsoul> or rather we should clamp scissor to viewport

19:42 <anarsoul> that's what vc4 does

19:56 <enunes> yeah looks like there is support to partial clear, so we might need to just reset the scissor settings when fb is changed

19:56 <enunes> I'll test that

19:57 <enunes> if the A64 ever finishes compiling it, as I had to jump to a distrubution to test stuff with sway...

19:59 <anarsoul> use ccache

20:04 mardikene193 has quit [Quit: Leaving]

20:08 <anarsoul> panfrost also clips scissor to viewport

20:08 <anarsoul> see vc4_emit_state() for reference

20:25 jrmuizel has joined #lima

20:46 mardikene193 has joined #lima

20:49 <mardikene193> not much wrong still. it posts a garbage opcode actually, but this scheme lets the decode_opcode catch up with simd arbiters one, and indeed it wraps-around the stuff

21:48 mardikene193 has quit [Ping timeout: 250 seconds]

22:05 jrmuizel has quit [Remote host closed the connection]

22:44 <enunes> not as easy as it seemed to just clip, looks like the lima state has a sort of bogus internal clip coords while at least apitrace expects the state to be sane

22:45 <anarsoul> hm

22:45 <anarsoul> so we're not tracking state updates properly?

22:45 <enunes> we probably need to clip anyway as everyone else does it or in case the user explicitly loads bad clip coords

22:46 <anarsoul> yeah

22:46 <enunes> but I wonder if one of the api calls is expected to reset the coords

22:49 <anarsoul> ask #dri-devel? :)

22:50 <anarsoul> https://www.khronos.org/registry/OpenGL-Refpages/es2.0/xhtml/glScissor.xml

22:51 <enunes> I was about to paste the same page

22:51 <enunes> indeed it is disabled

22:51 Kwiboo has quit [Quit: .]

22:51 <enunes> but I think it thinks it's enabled

22:51 Kwiboo has joined #lima

22:52 <anarsoul> if I understand correctly, it's bound to context, not to particular framebuffer

22:52 <anarsoul> so I guess it stay valid after framebuffer changes?

22:53 <enunes> yes, what I wonder now if our method if checking it with "ctx->rasterizer->base.scissor" works

22:53 <anarsoul> can you share your apitrace?

22:53 <anarsoul> enunes: that's what other drivers use

22:54 <enunes> ok, so it might be that that conditional is what is missing in the plbu clear path

22:54 Kwiboo has quit [Client Quit]

22:55 Kwiboo has joined #lima

22:55 Kwiboo has quit [Client Quit]

22:56 <anarsoul> it doesn't need it

22:57 <anarsoul> lima_pack_clear_plbu_cmd() is only called if lima_is_scissor_full_fb() is called

22:59 Kwiboo has joined #lima

22:59 <enunes> so it doesn't make sense, because as scissors is disabled, it is supposed to be full fb

23:01 <anarsoul> can you share your apitrace?

23:03 <enunes> I attached it to the MR

23:05 <anarsoul> what frame should I look into?

23:05 <enunes> I posted there, frame 21

23:08 <anarsoul> did you add traces to lima_set_scissor_states() and to lima_bind_rasterizer_state()?

23:09 <enunes> right now I only have them on lima_pack_clear_plbu_cmd

23:10 <anarsoul> add them there to confirm that scissor test is indeed disabled

23:21 <anarsoul> enunes: I'm pretty sure that sway worked fine ~2 months ago, so it could be generic bug

23:22 <anarsoul> I wonder if it's related to 2037478702adab5a3863a120be821626191b2e3e

23:35 <anarsoul> oh

23:35 <anarsoul> I think I got it

23:35 <anarsoul> or maybe not...

23:38 <enunes> there seems to be something strange with ctx->rasterizer->base.scissor indeed, that is the only frame where it is set to 1 but full_fb_clear is not set

23:40 <anarsoul> if rasterizer->base.scissor is true it means that scissor test is enabled

23:40 <enunes> I reverted that commit and it seems to help too

23:40 <enunes> scissor is not enabled, it is disabled at the end of frame 20

23:40 <anarsoul> yeah, but it's not reflected in rasterizer state

23:48 <anarsoul> enunes: yeah, looks like bug somewhere in mesa to me

23:48 <anarsoul> rasterizer state isn't updated when it should?

23:48 <anarsoul> ask #dri-devel :)

23:49 <anarsoul> I expect it to be faster than digging through mesa codebase

23:49 <enunes> it's too late for me for today, I'll post a comment pinging the guy from that commit

23:50 <anarsoul> sounds good

23:51 <anarsoul> with this commit reverted st_update_scissor() clamps scissor rects to (0,0, fb_width, fb_height)

23:53 <anarsoul> at this point I'm pretty sure it's a bug in mesa

23:54 <anarsoul> we don't get new rasterizer state

23:56 <enunes> well, at least it's tracked down and not so much of a mystery bug anymore

23:56 <anarsoul> yeah

23:56 <anarsoul> nice work! :)