01:02
yuq825 has joined #lima
03:26
megi has quit [Ping timeout: 258 seconds]
03:51
niceplaces has quit [Ping timeout: 260 seconds]
03:51
minicom has quit [Ping timeout: 260 seconds]
03:51
minicom1 has joined #lima
03:51
niceplace has joined #lima
03:53
bshah has quit [Ping timeout: 260 seconds]
03:54
<
anarsoul >
yuq825: so looks like we're PP bound in most cases?
03:55
bshah has joined #lima
04:19
buzzmarshall has quit [Remote host closed the connection]
04:35
_whitelogger has joined #lima
04:39
chewitt has quit [Quit: Zzz..]
05:33
Barada has joined #lima
05:39
Barada has quit [Quit: Barada]
05:41
gcl_ has quit [Ping timeout: 240 seconds]
05:41
gcl has joined #lima
05:45
<
MoeIcenowy >
anarsoul: personally I think so
06:02
chewitt has joined #lima
06:03
chewitt has quit [Client Quit]
06:09
<
yuq825 >
yeah, pp GPU time is obviously too long
06:09
chewitt has joined #lima
06:11
<
yuq825 >
time between two gp task submit is ~70ms, gp+pp task GPU time is ~50ms
06:12
<
anarsoul >
yuq825: likely because it takes some time to generate cmd stream?
06:12
<
yuq825 >
you mean the 20ms CPU time?
06:13
chewitt has quit [Client Quit]
06:13
<
yuq825 >
50ms is pure GPU time
06:14
<
anarsoul >
yuq825: yeah
06:14
<
anarsoul >
flamegraph of ioq3 shows that lima_bo_wait() is pretty expensive
06:15
<
anarsoul >
it's 21% of time spent in lima_draw_vbo()
06:15
<
anarsoul >
also lima_job_add_bo() itself is expensive
06:20
<
anarsoul >
yuq825: I think using "util_dynarray_grow" in lima_job_add_bo() likely makes it slow
06:20
<
yuq825 >
lima_bo_wait() will wait for task finish, if the task is slow (GPU time), that's expected
06:21
<
anarsoul >
yuq825: it uses zero timeout, so it should return immediately and indicate whether BO is busy or not
06:21
<
anarsoul >
it's called from lima_bo_create()
06:23
<
yuq825 >
try set give a bigger init size to bo array or build a cache for lima_job may work
06:24
<
yuq825 >
may be add some trace point to check the lima_bo_wait
06:28
<
yuq825 >
oh, your flamegraph already has kernel time
06:31
<
anarsoul >
I'll send an MR that preallocated dynarrays for job BOs in few mins
06:48
monstr has joined #lima
06:48
dddddd has quit [Ping timeout: 240 seconds]
06:52
<
anarsoul >
rellla: please collect tags for 3884 and merge it
07:18
Elpaulo has quit [Read error: Connection reset by peer]
07:19
Elpaulo has joined #lima
07:42
<
anarsoul >
yuq825: I think it spends most of time in util_dynarray_foreach()
07:43
<
anarsoul >
so preallocating dynarrays doesn't help
07:49
<
anarsoul >
guess we need another hash table to check whether BO is already in a job?
07:51
<
rellla >
anarsoul: done
07:52
<
anarsoul >
rellla: thanks!
07:52
<
rellla >
btw, i noticed that a full deqp run is terribly slow with lima compared to the blob :)
07:53
<
rellla >
i have not measured anything but i'm feeling blob is ~10 times faster ...
08:00
yuq825 has quit [Remote host closed the connection]
08:00
yuq825 has joined #lima
08:03
<
anarsoul >
rellla: do you compile lima with debug?
08:03
<
anarsoul >
if yes, it does a ton of extra validations
08:03
<
rellla >
anarsoul: :/
08:03
<
rellla >
probably :)
08:09
yann has quit [Ping timeout: 265 seconds]
08:41
_whitelogger has joined #lima
08:43
<
anarsoul >
beside that lima_update_textures() is expensive
08:44
<
anarsoul >
with lima_texture_desc_set_res() taking 1/3 time
09:00
Elpaulo has quit [Quit: Elpaulo]
09:04
yann has joined #lima
09:45
<
MoeIcenowy >
anarsoul: should I make a MR to expose derivatives?\
09:50
<
enunes >
I have still been working in ppir scheduler and instruction combine optimizations fyi, hopefully that helps pp execution time
10:01
minicom1 is now known as minicom
10:21
<
MoeIcenowy >
WHAT? enabling derivatives doesn't lead to CI failure
10:21
<
rellla >
MoeIcenowy: should it?
10:22
<
MoeIcenowy >
I think it's exposing new feature
10:23
<
rellla >
"PIPE_CAP_TGSI_FS_FINE_DERIVATIVE: Whether the fragment shader supports
10:23
<
rellla >
the FINE versions of DDX/DDY."
10:23
<
rellla >
what are the "FINE" versions?
10:33
megi has joined #lima
10:42
<
rellla >
plaes: yeah, found that. does mali support that?
10:43
<
plaes >
but it should be possible to create a test for that? :P
10:53
<
rellla >
i think, we need piglit or a selfmade test to test this, because it's not in the dEQP-GLES2 mustpass series. dFdx and dFdy are shading language 3.00+
11:02
<
MoeIcenowy >
rellla: it's an ext in 1.0
11:12
<
rellla >
MoeIcenowy: but we don't have a deqp test for it, do we?
11:40
<
MoeIcenowy >
I don't know
12:08
<
rellla >
anarsoul: i disabled debug build of deqp and now it's much faster :p
13:07
dddddd has joined #lima
13:37
gcl has quit [Ping timeout: 258 seconds]
13:39
gcl has joined #lima
13:49
yuq825 has quit [Quit: Leaving.]
15:01
monstr has quit [Remote host closed the connection]
15:41
<
rellla >
finally got ETC1 fixed
16:39
megi has quit [Ping timeout: 260 seconds]
16:45
<
anarsoul >
rellla: great!
16:45
<
anarsoul >
MoeIcenowy: yeah, go ahead
16:46
<
anarsoul >
I don't see why derivatives shouldn't be exposed
17:20
yann has quit [Ping timeout: 255 seconds]
17:53
yann has joined #lima
18:02
yann has quit [Ping timeout: 255 seconds]
18:09
megi has joined #lima
18:54
gcl_ has joined #lima
18:56
gcl has quit [Ping timeout: 240 seconds]
18:56
yann has joined #lima
21:34
buzzmarshall has joined #lima
22:08
<
anarsoul >
enunes: btw if you're doing PP rework please make several incremental MRs if possible
22:09
<
anarsoul >
otherwise it would be hard to review it
22:09
<
enunes >
yes definitely
23:36
enunes has quit [Ping timeout: 240 seconds]
23:49
enunes has joined #lima