MistahDarcy has quit [Remote host closed the connection]
davidlt has joined #panfrost
megi has quit [Ping timeout: 272 seconds]
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 258 seconds]
davidlt_ has quit [Ping timeout: 248 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 245 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 246 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 258 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
sravn has quit [Quit: WeeChat 2.4]
davidlt has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
vstehle has joined #panfrost
robert_ancell has quit [Ping timeout: 272 seconds]
davidlt has quit [Ping timeout: 268 seconds]
milloni has quit [Quit: No Ping reply in 210 seconds.]
<tomeu>
alyssa: oh, thanks for mentioning that, I guess that's what is going on
milloni has joined #panfrost
NeuroScr has joined #panfrost
<tomeu>
alyssa, Prf_Jakob: actually, those failures seem to happen when using --deqp-gl-config-name=rgba8888d24s8ms0
<tomeu>
otherwise, we get NotSupported
<tomeu>
Prf_Jakob: what do you think about adding an arg to set the config?
pH5 has joined #panfrost
<tomeu>
Prf_Jakob: with it, I think I could be close to stop nagging you :)
<tomeu>
the time spent actually running the deqp tests have been reduced to 2 and a half minutes
<tomeu>
so I think we can increase test coverage quitea bit without impacting much total time
megi has joined #panfrost
<tomeu>
robher: the shrinker seems to be working very well now
<tomeu>
I still got the deplock warning though
<tomeu>
but I can smoothly run glmark2 almost to the end on gnome-shell when booting with 512MB
<tomeu>
then the OOM killer steps in, but that was towards the end
<tomeu>
will review next
davidlt has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
<daniels>
tomeu: how long was it previously?
<tomeu>
daniels: around 8 mins I think
<tomeu>
most of the time is now spent for gitlab runners to be assigned and for the lava device to boot up
raster has joined #panfrost
<daniels>
woosh, that's awesome
unoccupied has quit [Ping timeout: 246 seconds]
<tomeu>
yeah, we can now increase coverage without seriously impacting total times
<daniels>
when you say waiting for runners to be assigned - which runners?
<daniels>
arm64, x86-64, idle?
<tomeu>
x86-64 and idle
<daniels>
idle should be assigned instantly
<daniels>
you're the only one using idle, so I have no idea why it doesn't immediately take it
<tomeu>
yeah, sometimes it takes several minutes
<tomeu>
but in general, I think it just takes a while to setup the env in the assigned runner, etc
<daniels>
not several minutes it doesn't
<tomeu>
I would merge everything in a single stage if I could :)
<tomeu>
it definitely has in the past
<daniels>
whois agx
<daniels>
*slaps forehead*
<daniels>
tomeu: weird, i have no idea what's going on in that case, sorry
<tomeu>
np
<tomeu>
maybe we could come up with some scripts to submit jobs to lava without going through gitlab-ci, for the remote testing use case
<tomeu>
then gitlab-ci not being as fast as it could isn't such a problem
<tomeu>
alyssa: just tested Steve's patch for _NEXT with glmark2 and the perf governor, and I see a drop in performance
<tomeu>
do you have any ideas on why that could be?
<tomeu>
I'm guessing mesa isn't submitting jobs to the kernel as fast as it could
<daniels>
the only thing I can think of is that you end up with suboptimal job distribution between the different job slots?
<daniels>
like, you end up with JS0 idle whilst JS1 has one job queued for now and another in _NEXT, when they could have been running in parallel
<daniels>
tomeu: gitlab-ci is fixable, and it should usually pick up jobs in _seconds_
<daniels>
so i have no idea why idle is being so rubbish
stepri01 has joined #panfrost
<tomeu>
I think in general it's being fast
unoccupied has joined #panfrost
<tomeu>
what takes longer always is checking out the code, setting up the container, etc
<tomeu>
ah, another source of delays is that we have one job per arch per stage
<tomeu>
and all jobs need to have finished in one stage to get to the next one
<tomeu>
so any delay in a particular job delays as well the others
<stepri01>
daniels: We have a scheduler per JS so there shouldn't be a situation of JS1 blocking JS0 like that
<stepri01>
But clearly if you submit work faster to the same job slot you could end up slowing down work on the other slot
<stepri01>
So it may be that the order of the work is less optimal with the _NEXT registers
<daniels>
tomeu: yeah, that's pretty unavoidable I'm afraid - what's the slowest one? aarch64 or armhf?
<daniels>
if it's armhf we could look at whether standing up a native runner would be any quicker but tbh I'm not convinced it would be
<daniels>
stepri01: you'd definitely know better than me, I'm just spitballing :)
<stepri01>
kbase used to be quite good at scheduling vertex work multiple frames in advance and not quite getting round to doing the fragment work :)
<stepri01>
it creates an interesting stuttering effect
<daniels>
heh
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 246 seconds]
adjtm has quit [Ping timeout: 268 seconds]
<tomeu>
daniels: sometimes one is slowe, sometimes the other
<tomeu>
I think it's mostly which one has slower network at that moment
<tomeu>
or maybe i/o load
<daniels>
if you can tie that down to a particular runner (or group of runners) being generally slower than the others, that would be interesting to find out
<tomeu>
ack, will keep an eye on it
<robher>
tomeu: How? I pushed 2 fixes to my kernel.org tree, but am in lockdep hell now. Basically, we can't take any lock the ever allocs memory in the shrinker call.
<tomeu>
robher: how what?
<robher>
tomeu: devfreq->lock is one.
<robher>
Runtime pm in panfrost_mmu_unmap.
* robher
about to loose internet...
<tomeu>
robher: I mean, I don't know what you are asking about when you said "How?"
<tomeu>
is it how I reproduce the lockdep warning?
<robher>
How is it fine?
davidlt_ has quit [Ping timeout: 245 seconds]
<tomeu>
robher: ah, the shrinker just seems to work as expected
<tomeu>
as long as memory pressure kicks in, memory gets released and we get to do more stuff than otherwise
<robher>
tomeu: devfreq takes a lock on registration. Seems unnecessary, but if I remove it, just get lockdep warning on another lock. What we need is to lock the runtime pm state, and skip hw access if suspended. No point in waking up.
<tomeu>
sounds good!
<robher>
tomeu: yes, but I haven't found a way to prevent waking, only keeping awake.
<robher>
Yet another lock I guess...
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
adjtm has joined #panfrost
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt_ has quit [Ping timeout: 258 seconds]
raster has quit [Read error: Connection reset by peer]
raster has joined #panfrost
mupuf has joined #panfrost
<mupuf>
I just introduced in fd.o's bugzilla a "not set" value for the priority and severity. Would anyone mind if I made it the default for new bugs?
<alyssa>
mupuf: #dri-devel?
<mupuf>
alyssa: already asked there :)
<alyssa>
Ah
<alyssa>
Well, I don't even know if we use the fd.o bugzilla to presumably nobody here would mind :)
<mupuf>
I asked in #dri-devel, #freedesktop, #freedreno, #intel-3d, #intel-gfx, #nouveau, #panfrost, #xorg-devel :D
<alyssa>
Right now, our tracing tools are essentially pretty-printers for GPU memory
<alyssa>
That's faithful to the original representation -- good for computers -- but is extremely verbose
<alyssa>
500 lines per draw, huge numbers of draws per frame, huge numbers of frames ===> traces are often many megabytes big and when debugging you're looking for a needle in a haystack.
<alyssa>
I'd like to add some analysis passes into the tracer so it reports *intention* rather than the raw bits
<alyssa>
In particular, it goes through the cmdstream, and determines semantically what you tried to do
<alyssa>
If it matches our understanding of the hardware, we print that semantic.
<alyssa>
If it does not, we print a huge XXXXXXXXX comment and dump the original bit-level representation
<alyssa>
(So if we trace the blob and see that coment, we know our understanding is wrong so we fix it. If we trace Panfrost and get that, we know we have a driver bug.)
<shadeslayer>
alyssa: nice
<shadeslayer>
sounds pretty useful
<alyssa>
shadeslayer: You interested? :)
<alyssa>
hereby dubbed "anti-Gallium" since that name sounds awesome
stikonas has joined #panfrost
fysa has joined #panfrost
fysa has quit [Ping timeout: 268 seconds]
adjtm has quit [Ping timeout: 248 seconds]
* alyssa
thinks this holds promise.
<endrift>
just don't call it Mercury
<alyssa>
What I can't decide if I want to use literal Gallium structs or roll our own thing.
<alyssa>
context functions don't make sense but maybe the structs in p_state.h and the enums might be useful.
* alyssa
thinks
<alyssa>
Yeah, I think that might be good.
<alyssa>
Okay, yes, this is interesting
davidlt has quit [Remote host closed the connection]
fysa has joined #panfrost
fysa has quit [Ping timeout: 245 seconds]
unoccupied has quit [Ping timeout: 244 seconds]
unoccupied has joined #panfrost
davidlt has joined #panfrost
adjtm has joined #panfrost
NeuroScr has joined #panfrost
adjtm_ has joined #panfrost
fysa has joined #panfrost
adjtm has quit [Ping timeout: 268 seconds]
fysa has quit [Ping timeout: 268 seconds]
sravn has joined #panfrost
stikonas has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
<alyssa>
Essentially, I'm doing a shift of pandecode from focusing on packing to focusing on meaning.
<alyssa>
Focusing on packing was useful in the early days (when we were able to literally compile a trace and execute it against a Mali)
<alyssa>
But now, we want to go to higher levels of understanding, and focusing on the nitty gritty details of packing bits for the hw is... not useful.
<alyssa>
Better to simply *validate* the packing is correct / canonical.
<alyssa>
if it's not, yeah, dump everything around so we can fix things
<alyssa>
But if it is, no need to print anything except the actually decoded meaning.
<alyssa>
Cute..
<alyssa>
TILER jobs within a job chain are totally independant.
<alyssa>
So you can have different framebuffers (of different sizes / polygon lists) bound for different ones
<alyssa>
And then you just have multiple FRAGMENT jobs
<alyssa>
Blob does this to avoid flushing like crazy while mipmapping
<alyssa>
And likewise, you can have a whole bunch of FRAGMENT jobs in a chain together, with job_barrier set.
<alyssa>
This hw quirk was discovered via anti-Gallium.
<alyssa>
Good news: we know more about the hardware now!
<alyssa>
Bad news: I need to redesign a bunch of things in anti-Gallium to account for the new model of the hw.
<alyssa>
Still a net win.
<alyssa>
New model for anti-Gallium has to be stateless (well, not totally stateless, but not as state-like as GL)
<alyssa>
Instead having hashmaps for things, so the check is more "this framebuffer size corresponds to this framebuffer / polygon list / etc"
<alyssa>
This will take... somewhat more thought to get right.
fysa has joined #panfrost
<alyssa>
As for workgroups_x_shift_*:
<alyssa>
workgroups_x_shift_3 I've seen as 7 or 8, even
<alyssa>
and zero?
<alyssa>
0x8 seems to all be CL workloads with images
fysa has quit [Ping timeout: 245 seconds]
<alyssa>
Wut.
<alyssa>
Oh
<alyssa>
So, in OpenCL: workgroup_x_shift_size_3 =