ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
yuq825 has joined #lima
megi has quit [Ping timeout: 258 seconds]
niceplaces has quit [Ping timeout: 260 seconds]
minicom has quit [Ping timeout: 260 seconds]
minicom1 has joined #lima
niceplace has joined #lima
bshah has quit [Ping timeout: 260 seconds]
<anarsoul> yuq825: so looks like we're PP bound in most cases?
bshah has joined #lima
buzzmarshall has quit [Remote host closed the connection]
_whitelogger has joined #lima
chewitt has quit [Quit: Zzz..]
Barada has joined #lima
Barada has quit [Quit: Barada]
gcl_ has quit [Ping timeout: 240 seconds]
gcl has joined #lima
<MoeIcenowy> anarsoul: personally I think so
chewitt has joined #lima
chewitt has quit [Client Quit]
<yuq825> yeah, pp GPU time is obviously too long
chewitt has joined #lima
<yuq825> time between two gp task submit is ~70ms, gp+pp task GPU time is ~50ms
<anarsoul> yuq825: likely because it takes some time to generate cmd stream?
<yuq825> you mean the 20ms CPU time?
chewitt has quit [Client Quit]
<yuq825> 50ms is pure GPU time
<anarsoul> yuq825: yeah
<anarsoul> flamegraph of ioq3 shows that lima_bo_wait() is pretty expensive
<anarsoul> it's 21% of time spent in lima_draw_vbo()
<anarsoul> also lima_job_add_bo() itself is expensive
<anarsoul> yuq825: I think using "util_dynarray_grow" in lima_job_add_bo() likely makes it slow
<yuq825> lima_bo_wait() will wait for task finish, if the task is slow (GPU time), that's expected
<anarsoul> yuq825: it uses zero timeout, so it should return immediately and indicate whether BO is busy or not
<anarsoul> it's called from lima_bo_create()
<yuq825> ok
<yuq825> try set give a bigger init size to bo array or build a cache for lima_job may work
<anarsoul> yeah
<yuq825> may be add some trace point to check the lima_bo_wait
<yuq825> oh, your flamegraph already has kernel time
<anarsoul> yes
<anarsoul> I'll send an MR that preallocated dynarrays for job BOs in few mins
monstr has joined #lima
dddddd has quit [Ping timeout: 240 seconds]
<anarsoul> rellla: please collect tags for 3884 and merge it
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo has joined #lima
<anarsoul> yuq825: I think it spends most of time in util_dynarray_foreach()
<anarsoul> so preallocating dynarrays doesn't help
<anarsoul> guess we need another hash table to check whether BO is already in a job?
<rellla> anarsoul: done
<anarsoul> rellla: thanks!
<rellla> btw, i noticed that a full deqp run is terribly slow with lima compared to the blob :)
<rellla> i have not measured anything but i'm feeling blob is ~10 times faster ...
yuq825 has quit [Remote host closed the connection]
yuq825 has joined #lima
<anarsoul> rellla: do you compile lima with debug?
<anarsoul> if yes, it does a ton of extra validations
<rellla> anarsoul: :/
<rellla> probably :)
yann has quit [Ping timeout: 265 seconds]
_whitelogger has joined #lima
<anarsoul> beside that lima_update_textures() is expensive
<anarsoul> with lima_texture_desc_set_res() taking 1/3 time
Elpaulo has quit [Quit: Elpaulo]
yann has joined #lima
<MoeIcenowy> anarsoul: should I make a MR to expose derivatives?\
<enunes> I have still been working in ppir scheduler and instruction combine optimizations fyi, hopefully that helps pp execution time
minicom1 is now known as minicom
<MoeIcenowy> WHAT? enabling derivatives doesn't lead to CI failure
<rellla> MoeIcenowy: should it?
<MoeIcenowy> I think it's exposing new feature
<rellla> "PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: Whether the fragment shader supports
<rellla> the FINE versions of DDX/DDY."
<rellla> what are the "FINE" versions?
megi has joined #lima
<rellla> plaes: yeah, found that. does mali support that?
<plaes> no clue :(
<plaes> but it should be possible to create a test for that? :P
<rellla> i think, we need piglit or a selfmade test to test this, because it's not in the dEQP-GLES2 mustpass series. dFdx and dFdy are shading language 3.00+
<MoeIcenowy> rellla: it's an ext in 1.0
<rellla> MoeIcenowy: but we don't have a deqp test for it, do we?
<MoeIcenowy> I don't know
<rellla> anarsoul: i disabled debug build of deqp and now it's much faster :p
dddddd has joined #lima
gcl has quit [Ping timeout: 258 seconds]
gcl has joined #lima
yuq825 has quit [Quit: Leaving.]
monstr has quit [Remote host closed the connection]
<rellla> finally got ETC1 fixed
megi has quit [Ping timeout: 260 seconds]
<anarsoul> rellla: great!
<anarsoul> MoeIcenowy: yeah, go ahead
<anarsoul> I don't see why derivatives shouldn't be exposed
yann has quit [Ping timeout: 255 seconds]
yann has joined #lima
yann has quit [Ping timeout: 255 seconds]
megi has joined #lima
gcl_ has joined #lima
gcl has quit [Ping timeout: 240 seconds]
yann has joined #lima
buzzmarshall has joined #lima
<anarsoul> enunes: btw if you're doing PP rework please make several incremental MRs if possible
<anarsoul> otherwise it would be hard to review it
<enunes> yes definitely
enunes has quit [Ping timeout: 240 seconds]
enunes has joined #lima