ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
yuq825 has joined #lima
<yuq825> anarsoul: hi, I get "pak0.pk3" missing error when run ioquick3.arm how do you solve this
<yuq825> ok, get one from github
<anarsoul> yuq825: I've bought q3a on gog.com quite a while ago
<yuq825> oh, I can't play it , need key, and my keyboard freeze in the menu
<yuq825> i'll try other ones
<anarsoul> :(
<anarsoul> yuq825: that's weird that your kbd doesn't work
<anarsoul> yuq825: you can start demo from cmdline
<yuq825> how?
<anarsoul> try something like 'ioquake3.arm +demo four'
<anarsoul> yuq825: also make sure renderer is opengl1 in ~/.q3a/baseq3/q3config.cfg
<yuq825> get the demo run with opengl1, seem ok first time
<yuq825> maybe I need to try several times
<anarsoul> yuq825: yeah, with the latest MR it's more stable for q3a, it still fails once in a while but not as often as with version from two days ago
<anarsoul> supertuxkart still fails almost every time though :(
<yuq825> then seems I should try supertuxkart, I see it's readme say it need opengl3?
<yuq825> and wayland only?
<anarsoul> yuq825: it has fallback for gles1
<anarsoul> it's not wayland-only
<anarsoul> yuq825: you can run q3a demo in loop, let me find cmdline for you...
<yuq825> then could you get me a supertuxkart apitrace for X11?
<yuq825> your last one is for wayland
<anarsoul> try '+set loop "timedemo 1; demo four; set nextdemo vstr loop" +vstr loop'
<yuq825> I may also try to apitrace it on x86 then replay on arm
<anarsoul> sure, just give me some time...
<yuq825> how to show framerate with comand in q3a?
<anarsoul> +set cg_drawFPS 1
<anarsoul> check your ~/.q3a/baseq3/q3config.cfg
<anarsoul> :)
<yuq825> thanks
<yuq825> it's only around 10FPS on my H3...
<anarsoul> yuq825: up to 40 in 1024x768 on my A64
<yuq825> my monitor resolution is 1920x1080
<anarsoul> run it in windowed mode?
<yuq825> much better with '+set r_mode -1 +set r_customheight 768 +set r_customwidth 1024'
<yuq825> but I can't see ppmmu fail when loop demo, only see it once when do '/r_mode -1' in the console
<yuq825> this remind me it may be due to change resolution, about the PLB/PP stream
<yuq825> when on job with resolution A saved, then flushed when resolution B
dddddd has quit [Ping timeout: 260 seconds]
<yuq825> I get apitrace of supertuxkart from x86 and replay it on arm, before see ppmmu fail, i get glretrace: ../../mesa/src/gallium/drivers/lima/lima_draw.c:764: lima_update_gp_attribute_info: Assertion `pve->vertex_buffer_index < vb->count' failed.
buzzmarshall has quit [Remote host closed the connection]
Barada has joined #lima
megi has quit [Ping timeout: 240 seconds]
<yuq825> anarsoul: I see the ppmmu fail now
<anarsoul> yuq825: with supertuxkart?
<yuq825> yeah,
Barada has quit [Quit: Barada]
Barada has joined #lima
<yuq825> anarsoul: some experiment shows that with either "nobocache" or "nogrowheap" this problem gone, and "nogrowheap" works better for no damaged areas when race begin
<anarsoul> yuq825: likely it just hides the issue
<yuq825> maybe time delay expose it
<anarsoul> yeah
<anarsoul> have you been able to figure out what exactly it tries to read?
<anarsoul> it should be either varyings, uniforms or texture
<yuq825> no
<yuq825> or heap
<anarsoul> right, or heap
<anarsoul> if there was an easier reproducer we could just use LIMA_DEBUG=dump and grep for the address
<MoeIcenowy> yuq825: stk hs two renderers
<MoeIcenowy> has *
<MoeIcenowy> if you want to record it on PC, please restrict GL version
<MoeIcenowy> to enable it to use the fallback renderer
<yuq825> yeah, I set MESA_GL_VERSION_OVERRIDE=1.2
<yuq825> but now I just download a arm build to run on arm directly
<MoeIcenowy> yuq825: BTW seems that lima really sucks on 1080p
<anarsoul> MoeIcenowy: I bet mali450 at 700mhz is not that bad :)
<anarsoul> (one on amlogic SoCs)
<MoeIcenowy> ah maybe
<yuq825> +mp8
<anarsoul> but yeah, 1080p has 2.25x more pixels than 720p
<MoeIcenowy> I think Amlogic SoCs uses MP3
<anarsoul> so you need twice as fast GPU to keep the same FPS
<yuq825> lima performance still has space to improve, like the ppir
<yuq825> generate less code
<MoeIcenowy> yes
<MoeIcenowy> if possible, we should merge compare instructions with branch instruction
<yuq825> at the time of mali450, arm seems want to target 4K
<anarsoul> MoeIcenowy: that's actually not a good idea
<anarsoul> merging cmp into branch results in higher register pressure
<anarsoul> since branch is always the last instruction in block and 1-component condition is always smaller than any arguments to compare
<anarsoul> I have a patch that does it, but it increases number of spills
<anarsoul> there're other ways to optimize ppir
<yuq825> combine more arithmetic ops benefit ppir more
<anarsoul> also we should cache pp stream for scissored draws, currently it's generated every time
<anarsoul> and it's expensive, IIRC 2% of CPU time in q3a
yann has quit [Ping timeout: 260 seconds]
<yuq825> i'm thinking add a debug interface in the kernel which can save all the bo a submit contains when it fail
<MoeIcenowy> yuq825: BTW will it be possible for the kernel to tell which process triggers the fault?
<yuq825> yes
<MoeIcenowy> on desktop environment it's difficult to find out which process fails
<yuq825> right
<anarsoul> yuq825: yeah, that'd be very helpful
<anarsoul> MoeIcenowy: it's usually one with frozen window :)
mripard has joined #lima
megi has joined #lima
<MoeIcenowy> anarsoul: it's only guessing
<MoeIcenowy> when the whole screen freezes, I can only guess it's Xorg that is guilty
yann has joined #lima
<rellla> what are our restrictions regarding PLBU cmd size, VS cmd size ... ?
<rellla> i'm looking into dEQP-GLES2.functional.color_clear.long_masked_rgb and i guess the fail has sth to do with some "overflow"...
<rellla> 231 glClears (which are quad draws) in one frame work perfectly fine, while test begins to fail from 232 on.
<rellla> the only difference i could imagine are different sizes of streams...
<anarsoul> rellla: 232 doesn't look like sane limit, what address it starts with?
<rellla> http://imkreisrum.de/deqp/lima_cc/ is the one with 231 draws, http://imkreisrum.de/deqp/lima_cc_fail/ is the same test, but with one more glClear
<rellla> the blob jumps with a CONTINUE within the PLBU cmd at a size of 8MB - but i don't think this is related here
<anarsoul> what does the blob do here?
<rellla> anarsoul: i changed the test to run with a fixed number of clears (231 and 232) and 3 iterations btw
<rellla> the blob does the clear with a scissored blend instead of mesa's draw_quad if i can see correctly, so it doesn't have VS cmds at all
<anarsoul> I see
<rellla> afaik we cannot do it like the blob unless we do some changes in mesa.
<anarsoul> yeah
<anarsoul> rellla: try some other test with a lot of draws instead of clears?
yuq825 has quit [Remote host closed the connection]
deesix has quit [Ping timeout: 240 seconds]
deesix has joined #lima
<rellla> anarsoul: could you have a quick look at http://imkreisrum.de/deqp/tmp/
<rellla> i wonder why drawing stops at half of the width ...
<anarsoul> anything in dmesg?
<rellla> no.
<rellla> let me check the code again
<anarsoul> see SCISSORS: minx: 464.000000, maxx: 720.000000, miny: 559.000000, maxy: 560.000000 */
<anarsoul> that's roughly half screen for 1024x1024?
<rellla> anarsoul: 720 isn't half of 1024?
<rellla> let me try other values
<anarsoul> rellla: try the same with blob? or on your x86 PC
<rellla> anarsoul: sorry for the noise, it's simply the vertex coords :p
dddddd has joined #lima
yuq825 has joined #lima
Barada has quit [Quit: Barada]
yuq825 has quit [Ping timeout: 265 seconds]
<rellla> ok, so i need your help again. https://gitlab.freedesktop.org/rellla/gfx/tree/master/gbm-surface-draw2 is a simple program which reproduces the issue.
<rellla> it does several sciccored clears.
<rellla> it works fine until a specific point - which doesn't depend on stream size but on some other barrier i think.
<rellla> i did some experiments with the scissor rect and found out, that the issue also depends on scissor size.
<rellla> setting https://gitlab.freedesktop.org/rellla/gfx/blob/master/gbm-surface-draw2/main.c#L320 to 85 and above lets appear "artifacts" in the result, which shouldn't be there.
<rellla> any ideas?
<rellla> bam. nogrowheap makes it work.
<rellla> better, but at one point we get gp task error without growheap, which is predictable
buzzmarshall has joined #lima
warpme_ has quit [Quit: Connection closed for inactivity]
yann has quit [Ping timeout: 272 seconds]
yann has joined #lima
<anarsoul> rellla: looks like growing heap neads more work