<anarsoul>
if number of VS invocations is higher than info->count gl_pos overwrites varyings
<yuq825>
I can't see how VS invocation will be bigger than info->count?
<anarsoul>
because it's = (ctx->max_index - ctx->min_index + 1)
<anarsoul>
that's what we pass to VS_CMD_DRAW()
<yuq825>
ok, so it's caused by the command stream setting? I can see neither way of info->count or max-min way is accurate for the output space needed, but at least we should use the correct one
<anarsoul>
it's supposed to be ctx->varyings_stride * num of VS invocations
<yuq825>
then for index array [0, 1000, 2], we waste a lot of VS invocation and mem space for output
<anarsoul>
yeah
<anarsoul>
I guess that's why blob optimizes it :)
<anarsoul>
rellla: ^^ and that's probably the answer for your question why blob splits VS_CMD_DRAW into several
<yuq825>
lucky if we can divide a list of index, but what about single triangle with [0, 1000, 2]? can blob still divide it or just do 1001 VS?
yuq825 has quit [Ping timeout: 268 seconds]
yuq825 has joined #lima
Barada has quit [Quit: Barada]
<anarsoul>
yuq825: I don't know the answer for that
<anarsoul>
but looks like utgard architecture is not optimized for indexed draws
<anarsoul>
and in worst case we have to shade every vertex
<yuq825>
could you do a blob dump for this case to confirm, maybe there is some hidden command stream?
<anarsoul>
also I'm not sure how common it is to have vertex buffer larger than index
<anarsoul>
yuq825: I have pretty limited access to the hardware for next few weeks
<anarsoul>
I can add it to my TODO though
<anarsoul>
I doubt there's hidden command though
<yuq825>
fine, I can give a try latter tody
<yuq825>
me too, just for confirmation
yuq825 has quit [Ping timeout: 265 seconds]
yuq825 has joined #lima
yuq825 has quit [Ping timeout: 240 seconds]
yuq825 has joined #lima
Barada has joined #lima
abordado has joined #lima
Barada has quit [Quit: Barada]
abordado has quit [Remote host closed the connection]
abordado has joined #lima
abordado has quit [Client Quit]
yuq825 has quit [Remote host closed the connection]
megi has joined #lima
hellsenberg is now known as hell__
abordado has joined #lima
drod has joined #lima
dddddd has joined #lima
robertfoss has quit [Ping timeout: 265 seconds]
robertfoss has joined #lima
buzzmarshall has joined #lima
buzzmarshall has quit [Remote host closed the connection]
chewitt has joined #lima
chewitt has quit [Quit: Zzz..]
<enunes>
anarsoul: btw, in the last days I've been looking at the ppir regalloc to see if I can make work more efficiently to not fail with the remaining glamor shaders from shader-db
<anarsoul>
well
<enunes>
I implemented a way to consider register pressure in spilling, that improved a few cases but still didn't solve regalloc for those
<anarsoul>
enunes: we need to implement optimization passes for ppir
<anarsoul>
e.g. we need copy propagation and dce
<anarsoul>
copy propagation should improve reg pressure
<anarsoul>
also we may want to reuse LRCA algo that is used in panfrost for RA
<enunes>
I'll look into these options, I mentioned mostly to ensure you're not working on something similar
<anarsoul>
so please review it when you get some time
<anarsoul>
cursor movements in X11 should be smooth with this MR :)
<anarsoul>
I have strong suspicion that due to underallocation fixed in 3266 GPU overwrote part of command and/or PLBU stream with varyings and/or gl_pos and that causes random GPU errors
<rellla>
for draw_elements it seems to be right. let me upload the new dumps with vary and attr desc parsed in a few minutes...
<rellla>
online now-
<rellla>
@all: if you want me to upload deqp mali dumps, let me know. i have all locally, sadly they are 84GB in size, so uploading seems a bit difficult :p
drod has quit [Excess Flood]
<anarsoul>
ouch
<anarsoul>
rellla: note command difference
<anarsoul>
oh, nevermind
<anarsoul>
rellla: yeah, it's weird.
<anarsoul>
rellla: could be a bug in blob? :)
<anarsoul>
rellla: so looks at semaphores
<anarsoul>
PLBU waits for VS to complete shading here: /* 0x10092490 (0x00000090) */0x00010001 0x60000000/* ARRAYS_SEMAPHORE_END */
<anarsoul>
so technically whatever "/* 0x10090478 (0x00000078) */0x1b000001 0x00000000/* DRAW: num: 27, index_draw: true */" does is even not rasterized?
<anarsoul>
maybe kernel driver just swallows GPU error?
<rellla>
so everything after /* 0x10090448 (0x00000048) */0x00018000 0x50000000/* SEMAPHORE_END: index_draw enabled */ is probably bogus and not needed?
<anarsoul>
actually
<anarsoul>
.frame.vs_commands_end = 0x10090450
<anarsoul>
I'm not sure why you're parsing it further :)