#lima on 2020-02-13 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

01:06 yuq825 has joined #lima

01:45 <yuq825> anarsoul: hi, I get "pak0.pk3" missing error when run ioquick3.arm how do you solve this

01:57 <yuq825> ok, get one from github

02:12 <anarsoul> yuq825: I've bought q3a on gog.com quite a while ago

02:15 <yuq825> oh, I can't play it , need key, and my keyboard freeze in the menu

02:15 <yuq825> i'll try other ones

02:24 <anarsoul> :(

02:25 <anarsoul> yuq825: that's weird that your kbd doesn't work

02:25 <anarsoul> yuq825: you can start demo from cmdline

02:25 <yuq825> how?

02:25 <anarsoul> try something like 'ioquake3.arm +demo four'

02:26 <anarsoul> yuq825: also make sure renderer is opengl1 in ~/.q3a/baseq3/q3config.cfg

02:34 <yuq825> get the demo run with opengl1, seem ok first time

02:35 <yuq825> maybe I need to try several times

02:39 <anarsoul> yuq825: yeah, with the latest MR it's more stable for q3a, it still fails once in a while but not as often as with version from two days ago

02:39 <anarsoul> supertuxkart still fails almost every time though :(

02:40 <yuq825> then seems I should try supertuxkart, I see it's readme say it need opengl3?

02:40 <yuq825> and wayland only?

02:40 <anarsoul> yuq825: it has fallback for gles1

02:41 <anarsoul> it's not wayland-only

02:41 <anarsoul> yuq825: you can run q3a demo in loop, let me find cmdline for you...

02:41 <yuq825> then could you get me a supertuxkart apitrace for X11?

02:42 <yuq825> your last one is for wayland

02:42 <anarsoul> try '+set loop "timedemo 1; demo four; set nextdemo vstr loop" +vstr loop'

02:43 <yuq825> I may also try to apitrace it on x86 then replay on arm

02:43 <anarsoul> sure, just give me some time...

02:46 <yuq825> how to show framerate with comand in q3a?

02:50 <anarsoul> +set cg_drawFPS 1

02:52 <anarsoul> check your ~/.q3a/baseq3/q3config.cfg

02:52 <anarsoul> :)

02:55 <yuq825> thanks

02:56 <yuq825> it's only around 10FPS on my H3...

03:03 <anarsoul> yuq825: up to 40 in 1024x768 on my A64

03:05 <yuq825> my monitor resolution is 1920x1080

03:06 <anarsoul> run it in windowed mode?

03:29 <yuq825> much better with '+set r_mode -1 +set r_customheight 768 +set r_customwidth 1024'

03:30 <yuq825> but I can't see ppmmu fail when loop demo, only see it once when do '/r_mode -1' in the console

03:32 <yuq825> this remind me it may be due to change resolution, about the PLB/PP stream

03:32 <yuq825> when on job with resolution A saved, then flushed when resolution B

04:48 dddddd has quit [Ping timeout: 260 seconds]

04:49 <yuq825> I get apitrace of supertuxkart from x86 and replay it on arm, before see ppmmu fail, i get glretrace: ../../mesa/src/gallium/drivers/lima/lima_draw.c:764: lima_update_gp_attribute_info: Assertion `pve->vertex_buffer_index < vb->count' failed.

05:15 buzzmarshall has quit [Remote host closed the connection]

05:33 Barada has joined #lima

05:46 megi has quit [Ping timeout: 240 seconds]

06:06 <yuq825> anarsoul: I see the ppmmu fail now

06:37 <anarsoul> yuq825: with supertuxkart?

06:37 <yuq825> yeah,

07:00 Barada has quit [Quit: Barada]

07:26 Barada has joined #lima

07:39 <yuq825> anarsoul: some experiment shows that with either "nobocache" or "nogrowheap" this problem gone, and "nogrowheap" works better for no damaged areas when race begin

07:42 <anarsoul> yuq825: likely it just hides the issue

07:45 <yuq825> maybe time delay expose it

07:47 <anarsoul> yeah

07:47 <anarsoul> have you been able to figure out what exactly it tries to read?

07:47 <anarsoul> it should be either varyings, uniforms or texture

07:48 <yuq825> no

07:48 <yuq825> or heap

07:48 <anarsoul> right, or heap

07:51 <anarsoul> if there was an easier reproducer we could just use LIMA_DEBUG=dump and grep for the address

08:08 <MoeIcenowy> yuq825: stk hs two renderers

08:08 <MoeIcenowy> has *

08:09 <MoeIcenowy> if you want to record it on PC, please restrict GL version

08:09 <MoeIcenowy> to enable it to use the fallback renderer

08:10 <yuq825> yeah, I set MESA_GL_VERSION_OVERRIDE=1.2

08:10 <yuq825> but now I just download a arm build to run on arm directly

08:11 <MoeIcenowy> yuq825: BTW seems that lima really sucks on 1080p

08:12 <anarsoul> MoeIcenowy: I bet mali450 at 700mhz is not that bad :)

08:12 <anarsoul> (one on amlogic SoCs)

08:12 <MoeIcenowy> ah maybe

08:12 <yuq825> +mp8

08:13 <anarsoul> but yeah, 1080p has 2.25x more pixels than 720p

08:13 <MoeIcenowy> I think Amlogic SoCs uses MP3

08:13 <anarsoul> so you need twice as fast GPU to keep the same FPS

08:15 <yuq825> lima performance still has space to improve, like the ppir

08:15 <yuq825> generate less code

08:16 <MoeIcenowy> yes

08:16 <MoeIcenowy> if possible, we should merge compare instructions with branch instruction

08:17 <yuq825> at the time of mali450, arm seems want to target 4K

08:22 <anarsoul> MoeIcenowy: that's actually not a good idea

08:22 <anarsoul> merging cmp into branch results in higher register pressure

08:23 <anarsoul> since branch is always the last instruction in block and 1-component condition is always smaller than any arguments to compare

08:24 <anarsoul> I have a patch that does it, but it increases number of spills

08:24 <anarsoul> there're other ways to optimize ppir

08:29 <yuq825> combine more arithmetic ops benefit ppir more

08:33 <anarsoul> also we should cache pp stream for scissored draws, currently it's generated every time

08:33 <anarsoul> and it's expensive, IIRC 2% of CPU time in q3a

08:36 yann has quit [Ping timeout: 260 seconds]

08:37 <yuq825> i'm thinking add a debug interface in the kernel which can save all the bo a submit contains when it fail

08:40 <MoeIcenowy> yuq825: BTW will it be possible for the kernel to tell which process triggers the fault?

08:41 <yuq825> yes

08:41 <MoeIcenowy> on desktop environment it's difficult to find out which process fails

08:42 <yuq825> right

08:44 <anarsoul> yuq825: yeah, that'd be very helpful

08:45 <anarsoul> I never got to https://gitlab.freedesktop.org/lima/linux/issues/27 myself

08:45 <anarsoul> MoeIcenowy: it's usually one with frozen window :)

08:52 mripard has joined #lima

08:54 megi has joined #lima

09:42 <MoeIcenowy> anarsoul: it's only guessing

09:43 <MoeIcenowy> when the whole screen freezes, I can only guess it's Xorg that is guilty

09:44 yann has joined #lima

10:52 <rellla> what are our restrictions regarding PLBU cmd size, VS cmd size ... ?

10:53 <rellla> i'm looking into dEQP-GLES2.functional.color_clear.long_masked_rgb and i guess the fail has sth to do with some "overflow"...

10:54 <rellla> 231 glClears (which are quad draws) in one frame work perfectly fine, while test begins to fail from 232 on.

10:55 <rellla> the only difference i could imagine are different sizes of streams...

11:00 <anarsoul> rellla: 232 doesn't look like sane limit, what address it starts with?

11:02 <rellla> http://imkreisrum.de/deqp/lima_cc/ is the one with 231 draws, http://imkreisrum.de/deqp/lima_cc_fail/ is the same test, but with one more glClear

11:03 <rellla> the blob jumps with a CONTINUE within the PLBU cmd at a size of 8MB - but i don't think this is related here

11:05 <anarsoul> what does the blob do here?

11:05 <rellla> anarsoul: i changed the test to run with a fixed number of clears (231 and 232) and 3 iterations btw

11:08 <rellla> the blob does the clear with a scissored blend instead of mesa's draw_quad if i can see correctly, so it doesn't have VS cmds at all

11:09 <anarsoul> I see

11:10 <rellla> afaik we cannot do it like the blob unless we do some changes in mesa.

11:14 <anarsoul> yeah

11:14 <anarsoul> rellla: try some other test with a lot of draws instead of clears?

11:40 yuq825 has quit [Remote host closed the connection]

11:48 deesix has quit [Ping timeout: 240 seconds]

11:48 deesix has joined #lima

11:56 <rellla> anarsoul: could you have a quick look at http://imkreisrum.de/deqp/tmp/

11:58 <rellla> i wonder why drawing stops at half of the width ...

11:59 <anarsoul> anything in dmesg?

11:59 <rellla> no.

12:00 <rellla> let me check the code again

12:00 <anarsoul> see SCISSORS: minx: 464.000000, maxx: 720.000000, miny: 559.000000, maxy: 560.000000 */

12:01 <anarsoul> that's roughly half screen for 1024x1024?

12:27 <rellla> anarsoul: 720 isn't half of 1024?

12:28 <rellla> let me try other values

12:29 <anarsoul> rellla: try the same with blob? or on your x86 PC

12:38 <rellla> anarsoul: sorry for the noise, it's simply the vertex coords :p

13:36 dddddd has joined #lima

13:44 yuq825 has joined #lima

14:02 Barada has quit [Quit: Barada]

14:05 yuq825 has quit [Ping timeout: 265 seconds]

14:46 <rellla> ok, so i need your help again. https://gitlab.freedesktop.org/rellla/gfx/tree/master/gbm-surface-draw2 is a simple program which reproduces the issue.

14:46 <rellla> it does several sciccored clears.

14:47 <rellla> it works fine until a specific point - which doesn't depend on stream size but on some other barrier i think.

14:48 <rellla> i did some experiments with the scissor rect and found out, that the issue also depends on scissor size.

14:48 <rellla> setting https://gitlab.freedesktop.org/rellla/gfx/blob/master/gbm-surface-draw2/main.c#L320 to 85 and above lets appear "artifacts" in the result, which shouldn't be there.

14:49 <rellla> any ideas?

15:12 <rellla> bam. nogrowheap makes it work.

15:14 <rellla> better, but at one point we get gp task error without growheap, which is predictable

15:27 buzzmarshall has joined #lima

18:12 warpme_ has quit [Quit: Connection closed for inactivity]

18:15 yann has quit [Ping timeout: 272 seconds]

18:28 yann has joined #lima

19:12 <anarsoul> rellla: looks like growing heap neads more work