ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
<anarsoul> rellla: oh, I think I know what's the issue
<rellla> ?
<anarsoul> working on fix
<anarsoul> will be ready in a min
<anarsoul> I'll be quite surprised if it doesn't fix the issue :)
<anarsoul> we underallocate space for varyings, gl_pos and gl_pointsize for indexed draw
<anarsoul> so gl_pos overwrites varyings
<anarsoul> thus wrong color
<rellla> sounds very reasonable, but i can sadly test it tomorrow first :p
<anarsoul> :)
<anarsoul> I can give it a run via CI
<rellla> yeah, then we finally can jump over to clipping :)
<anarsoul> that should fix *a lot* of weird failures
<anarsoul> cwabbott_: any plans to resume work in https://gitlab.freedesktop.org/cwabbott0/mesa/tree/lima-gpir-branch-opt-v4 ?
<anarsoul> rellla: yeah, it passes now
<anarsoul> interestingly it didn't fix any other tests from my list :)
<anarsoul> I'll submit MR later tonight
yuq825 has joined #lima
<yuq825> hi guys, need your review for the kernel patch: https://patchwork.kernel.org/patch/11315037/
<anarsoul> yuq825: I've seen it but haven't gotten to it yet. Will do in few days
<yuq825> OK, thanks
megi has quit [Ping timeout: 240 seconds]
dddddd has quit [Ping timeout: 258 seconds]
chewitt has joined #lima
hell__ has quit [Ping timeout: 250 seconds]
hellsenberg has joined #lima
Barada has joined #lima
Barada has quit [Quit: Barada]
Barada has joined #lima
<anarsoul> yuq825: ^^ you should also take a look
<anarsoul> I'm not sure how we missed that
chewitt has quit [Quit: Zzz..]
<yuq825> oh, right, does the origin way cause crash or problems?
<anarsoul> yes, dEQP-GLES2.functional.buffer.write.use.index_array.array fails
<anarsoul> if number of VS invocations is higher than info->count gl_pos overwrites varyings
<yuq825> I can't see how VS invocation will be bigger than info->count?
<anarsoul> because it's = (ctx->max_index - ctx->min_index + 1)
<anarsoul> that's what we pass to VS_CMD_DRAW()
<yuq825> ok, so it's caused by the command stream setting? I can see neither way of info->count or max-min way is accurate for the output space needed, but at least we should use the correct one
<anarsoul> it's supposed to be ctx->varyings_stride * num of VS invocations
<yuq825> then for index array [0, 1000, 2], we waste a lot of VS invocation and mem space for output
<anarsoul> yeah
<anarsoul> I guess that's why blob optimizes it :)
<anarsoul> rellla: ^^ and that's probably the answer for your question why blob splits VS_CMD_DRAW into several
<yuq825> lucky if we can divide a list of index, but what about single triangle with [0, 1000, 2]? can blob still divide it or just do 1001 VS?
yuq825 has quit [Ping timeout: 268 seconds]
yuq825 has joined #lima
Barada has quit [Quit: Barada]
<anarsoul> yuq825: I don't know the answer for that
<anarsoul> but looks like utgard architecture is not optimized for indexed draws
<anarsoul> and in worst case we have to shade every vertex
<yuq825> could you do a blob dump for this case to confirm, maybe there is some hidden command stream?
<anarsoul> also I'm not sure how common it is to have vertex buffer larger than index
<anarsoul> yuq825: I have pretty limited access to the hardware for next few weeks
<anarsoul> I can add it to my TODO though
<anarsoul> I doubt there's hidden command though
<yuq825> fine, I can give a try latter tody
<yuq825> me too, just for confirmation
yuq825 has quit [Ping timeout: 265 seconds]
yuq825 has joined #lima
yuq825 has quit [Ping timeout: 240 seconds]
yuq825 has joined #lima
Barada has joined #lima
abordado has joined #lima
Barada has quit [Quit: Barada]
abordado has quit [Remote host closed the connection]
abordado has joined #lima
abordado has quit [Client Quit]
yuq825 has quit [Remote host closed the connection]
megi has joined #lima
hellsenberg is now known as hell__
abordado has joined #lima
drod has joined #lima
dddddd has joined #lima
robertfoss has quit [Ping timeout: 265 seconds]
robertfoss has joined #lima
buzzmarshall has joined #lima
buzzmarshall has quit [Remote host closed the connection]
chewitt has joined #lima
chewitt has quit [Quit: Zzz..]
<enunes> anarsoul: btw, in the last days I've been looking at the ppir regalloc to see if I can make work more efficiently to not fail with the remaining glamor shaders from shader-db
<anarsoul> well
<enunes> I implemented a way to consider register pressure in spilling, that improved a few cases but still didn't solve regalloc for those
<anarsoul> enunes: we need to implement optimization passes for ppir
<anarsoul> e.g. we need copy propagation and dce
<anarsoul> copy propagation should improve reg pressure
<anarsoul> also we may want to reuse LRCA algo that is used in panfrost for RA
<enunes> I'll look into these options, I mentioned mostly to ensure you're not working on something similar
<anarsoul> not atm
<anarsoul> I'm focusing on command stream
<anarsoul> speaking of which
<enunes> ok, sounds good
<anarsoul> I ran q3a with https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2283 and it didn't cause any GPU errors
<anarsoul> so please review it when you get some time
<anarsoul> cursor movements in X11 should be smooth with this MR :)
<anarsoul> I have strong suspicion that due to underallocation fixed in 3266 GPU overwrote part of command and/or PLBU stream with varyings and/or gl_pos and that causes random GPU errors
adjtm_ has joined #lima
adjtm has quit [Ping timeout: 240 seconds]
afaerber has quit [Ping timeout: 250 seconds]
afaerber has joined #lima
<rellla> anarsoul: can you confirm, that the second attributes desc address here http://imkreisrum.de/deqp/index_array.array_mali/mali.dump.0001
afaerber has quit [Quit: Leaving]
<rellla> sry forget it
<rellla> second /* 0x10090468 (0x00000068) */0x10095fc0 0x20040000/* ATTRIBUTES_ADDRESS: address: 0x10095fc0, size: 2 */
<rellla> checking 0x10095fc0 gives me 0x3e8e155d, 0x3e9861f5, 0x3e34ebe8, 0x3ec89941, /* 0x00001BC0 */
<rellla> which looks strange to me. can you confirm that?
<rellla> /* 0x00001BA0 */ looks like the right address
afaerber has joined #lima
<rellla> this seems to be strange only in the draw_arrays dumps - each second one of http://imkreisrum.de/deqp/index_array.array_mali/
<rellla> for draw_elements it seems to be right. let me upload the new dumps with vary and attr desc parsed in a few minutes...
<rellla> online now-
<rellla> @all: if you want me to upload deqp mali dumps, let me know. i have all locally, sadly they are 84GB in size, so uploading seems a bit difficult :p
drod has quit [Excess Flood]
<anarsoul> ouch
<anarsoul> rellla: note command difference
<anarsoul> oh, nevermind
<anarsoul> rellla: yeah, it's weird.
<anarsoul> rellla: could be a bug in blob? :)
<anarsoul> rellla: so looks at semaphores
<anarsoul> PLBU waits for VS to complete shading here: /* 0x10092490 (0x00000090) */0x00010001 0x60000000/* ARRAYS_SEMAPHORE_END */
<anarsoul> VS signals completion here: /* 0x10090448 (0x00000048) */0x00018000 0x50000000/* SEMAPHORE_END: index_draw enabled */
<anarsoul> so technically whatever "/* 0x10090478 (0x00000078) */0x1b000001 0x00000000/* DRAW: num: 27, index_draw: true */" does is even not rasterized?
<anarsoul> maybe kernel driver just swallows GPU error?
<rellla> so everything after /* 0x10090448 (0x00000048) */0x00018000 0x50000000/* SEMAPHORE_END: index_draw enabled */ is probably bogus and not needed?
<anarsoul> actually
<anarsoul> .frame.vs_commands_end = 0x10090450
<anarsoul> I'm not sure why you're parsing it further :)
<rellla> f*** it's me :p
<rellla> have to think about it again, but maybe my comment here https://gitlab.freedesktop.org/lima/mali-syscall-tracker/blob/master/main.c#L1251 shows result :p
<anarsoul> :)
<anarsoul> it's easier with PLBU where it has explicit end frame cmd
<anarsoul> it'd be nice to merge it before 20.0 is branched out
BenG83 has joined #lima
BenG83 has quit [Ping timeout: 258 seconds]
<rellla> anarsoul: ok.
<anarsoul> great
<anarsoul> post an MR? :)
<rellla> i haven't tested the case, when the cmd stream is continued at another address. ideas should trigger that, but i haven't set this up
<rellla> done
<rellla> i will do some dumps tomorrow.
<rellla> anybody here that has glmark2 working with blob (would prevent me from setting it up :p)
<rellla> ?
<anarsoul> nope