alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
davidlt has quit [Ping timeout: 244 seconds]
vstehle has quit [Ping timeout: 245 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
MistahDarcy has joined #panfrost
MistahDarcy has quit [Remote host closed the connection]
davidlt has joined #panfrost
megi has quit [Ping timeout: 272 seconds]
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 258 seconds]
davidlt_ has quit [Ping timeout: 248 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 245 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 246 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 258 seconds]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
sravn has quit [Quit: WeeChat 2.4]
davidlt has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
vstehle has joined #panfrost
robert_ancell has quit [Ping timeout: 272 seconds]
davidlt has quit [Ping timeout: 268 seconds]
milloni has quit [Quit: No Ping reply in 210 seconds.]
<tomeu> alyssa: oh, thanks for mentioning that, I guess that's what is going on
milloni has joined #panfrost
NeuroScr has joined #panfrost
<tomeu> alyssa, Prf_Jakob: actually, those failures seem to happen when using --deqp-gl-config-name=rgba8888d24s8ms0
<tomeu> otherwise, we get NotSupported
<tomeu> Prf_Jakob: what do you think about adding an arg to set the config?
pH5 has joined #panfrost
<tomeu> Prf_Jakob: with it, I think I could be close to stop nagging you :)
<tomeu> the time spent actually running the deqp tests have been reduced to 2 and a half minutes
<tomeu> so I think we can increase test coverage quitea bit without impacting much total time
megi has joined #panfrost
<tomeu> robher: the shrinker seems to be working very well now
<tomeu> I still got the deplock warning though
<tomeu> but I can smoothly run glmark2 almost to the end on gnome-shell when booting with 512MB
<tomeu> then the OOM killer steps in, but that was towards the end
<tomeu> will review next
davidlt has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
<daniels> tomeu: how long was it previously?
<tomeu> daniels: around 8 mins I think
<tomeu> most of the time is now spent for gitlab runners to be assigned and for the lava device to boot up
raster has joined #panfrost
<daniels> woosh, that's awesome
unoccupied has quit [Ping timeout: 246 seconds]
<tomeu> yeah, we can now increase coverage without seriously impacting total times
<daniels> when you say waiting for runners to be assigned - which runners?
<daniels> arm64, x86-64, idle?
<tomeu> x86-64 and idle
<daniels> idle should be assigned instantly
<daniels> you're the only one using idle, so I have no idea why it doesn't immediately take it
<tomeu> yeah, sometimes it takes several minutes
<tomeu> but in general, I think it just takes a while to setup the env in the assigned runner, etc
<daniels> not several minutes it doesn't
<tomeu> I would merge everything in a single stage if I could :)
<tomeu> it definitely has in the past
<daniels> whois agx
<daniels> *slaps forehead*
<daniels> tomeu: weird, i have no idea what's going on in that case, sorry
<tomeu> np
<tomeu> maybe we could come up with some scripts to submit jobs to lava without going through gitlab-ci, for the remote testing use case
<tomeu> then gitlab-ci not being as fast as it could isn't such a problem
<tomeu> alyssa: just tested Steve's patch for _NEXT with glmark2 and the perf governor, and I see a drop in performance
<tomeu> do you have any ideas on why that could be?
<tomeu> I'm guessing mesa isn't submitting jobs to the kernel as fast as it could
<daniels> the only thing I can think of is that you end up with suboptimal job distribution between the different job slots?
<daniels> like, you end up with JS0 idle whilst JS1 has one job queued for now and another in _NEXT, when they could have been running in parallel
<daniels> tomeu: gitlab-ci is fixable, and it should usually pick up jobs in _seconds_
<daniels> so i have no idea why idle is being so rubbish
stepri01 has joined #panfrost
<tomeu> I think in general it's being fast
unoccupied has joined #panfrost
<tomeu> what takes longer always is checking out the code, setting up the container, etc
<tomeu> ah, another source of delays is that we have one job per arch per stage
<tomeu> and all jobs need to have finished in one stage to get to the next one
<tomeu> so any delay in a particular job delays as well the others
<stepri01> daniels: We have a scheduler per JS so there shouldn't be a situation of JS1 blocking JS0 like that
<stepri01> But clearly if you submit work faster to the same job slot you could end up slowing down work on the other slot
<stepri01> So it may be that the order of the work is less optimal with the _NEXT registers
<daniels> tomeu: yeah, that's pretty unavoidable I'm afraid - what's the slowest one? aarch64 or armhf?
<daniels> if it's armhf we could look at whether standing up a native runner would be any quicker but tbh I'm not convinced it would be
<daniels> stepri01: you'd definitely know better than me, I'm just spitballing :)
<stepri01> kbase used to be quite good at scheduling vertex work multiple frames in advance and not quite getting round to doing the fragment work :)
<stepri01> it creates an interesting stuttering effect
<daniels> heh
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 246 seconds]
adjtm has quit [Ping timeout: 268 seconds]
<tomeu> daniels: sometimes one is slowe, sometimes the other
<tomeu> I think it's mostly which one has slower network at that moment
<tomeu> or maybe i/o load
<daniels> if you can tie that down to a particular runner (or group of runners) being generally slower than the others, that would be interesting to find out
<tomeu> ack, will keep an eye on it
<robher> tomeu: How? I pushed 2 fixes to my kernel.org tree, but am in lockdep hell now. Basically, we can't take any lock the ever allocs memory in the shrinker call.
<tomeu> robher: how what?
<robher> tomeu: devfreq->lock is one.
<robher> Runtime pm in panfrost_mmu_unmap.
* robher about to loose internet...
<tomeu> robher: I mean, I don't know what you are asking about when you said "How?"
<tomeu> is it how I reproduce the lockdep warning?
<robher> How is it fine?
davidlt_ has quit [Ping timeout: 245 seconds]
<tomeu> robher: ah, the shrinker just seems to work as expected
<tomeu> as long as memory pressure kicks in, memory gets released and we get to do more stuff than otherwise
<tomeu> that's what is fine :)
<tomeu> robher: but I got this other spinlock warning: http://paste.debian.net/1095964/
<tomeu> guess we slept somewhere within panfrost_gem_open ?
afaerber has joined #panfrost
<tomeu> robher: are you going to push yourself your madvise patches to mesa?
<tomeu> robher: any hints in kbase's shrinker code?
davidlt has joined #panfrost
<tomeu> robher: maybe something similar to the patch in https://lkml.org/lkml/2014/5/29/673 could help?
<robher> tomeu: devfreq takes a lock on registration. Seems unnecessary, but if I remove it, just get lockdep warning on another lock. What we need is to lock the runtime pm state, and skip hw access if suspended. No point in waking up.
<tomeu> sounds good!
<robher> tomeu: yes, but I haven't found a way to prevent waking, only keeping awake.
<robher> Yet another lock I guess...
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
adjtm has joined #panfrost
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
davidlt_ has quit [Ping timeout: 258 seconds]
raster has quit [Read error: Connection reset by peer]
raster has joined #panfrost
mupuf has joined #panfrost
<mupuf> I just introduced in fd.o's bugzilla a "not set" value for the priority and severity. Would anyone mind if I made it the default for new bugs?
<alyssa> mupuf: #dri-devel?
<mupuf> alyssa: already asked there :)
<alyssa> Ah
<alyssa> Well, I don't even know if we use the fd.o bugzilla to presumably nobody here would mind :)
<mupuf> I asked in #dri-devel, #freedesktop, #freedreno, #intel-3d, #intel-gfx, #nouveau, #panfrost, #xorg-devel :D
<mupuf> Well, weekend time, have a good one guys!
<daniels> o/
<tomeu> Prf_Jakob: no need to add a way to override the config I think
<tomeu> I think it's better to run with that config, as everybody else
<tomeu> so I just updated the expected failures
<tomeu> Prf_Jakob: if we had an armhf build, I think we could enable it in master :)
<tomeu> Prf_Jakob: btw, I guess it would be better if we were able to build deqp itself as in virgl ci
<tomeu> guess that should just work?
<tomeu> of course, ideally we would install it from debian :)
megi has quit [Ping timeout: 246 seconds]
<Prf_Jakob> tomeu: With armhf probably, the problem with AArch64 was a incomplete port of a dependancy of Volt.
<tomeu> Prf_Jakob: wonder what happened here:
<tomeu> 2019-08-16T14:47:13 Unknown argument '--print-failing=false'
<tomeu> 2019-08-16T14:47:13 Print the failing tests
<tomeu> 2019-08-16T14:47:13 Argument: --print-failing
<tomeu> 2019-08-16T14:47:13 Config: printFailing=[true|false]
<Prf_Jakob> tomeu: The argument is just a flag with no argument, the config is the one that support true or false.
<Prf_Jakob> tomeu: Just remove the flag from the arguments and it will be the same as false.
pH5 has quit [Quit: bye]
megi has joined #panfrost
raster has quit [Read error: Connection reset by peer]
davidlt has joined #panfrost
fysa has joined #panfrost
fysa has quit [Ping timeout: 272 seconds]
<alyssa> I'm embarking on a new (Panfrost) project
<alyssa> Automated command stream static analysis.
<alyssa> Right now, our tracing tools are essentially pretty-printers for GPU memory
<alyssa> That's faithful to the original representation -- good for computers -- but is extremely verbose
<alyssa> 500 lines per draw, huge numbers of draws per frame, huge numbers of frames ===> traces are often many megabytes big and when debugging you're looking for a needle in a haystack.
<alyssa> I'd like to add some analysis passes into the tracer so it reports *intention* rather than the raw bits
<alyssa> In particular, it goes through the cmdstream, and determines semantically what you tried to do
<alyssa> If it matches our understanding of the hardware, we print that semantic.
<alyssa> If it does not, we print a huge XXXXXXXXX comment and dump the original bit-level representation
<alyssa> (So if we trace the blob and see that coment, we know our understanding is wrong so we fix it. If we trace Panfrost and get that, we know we have a driver bug.)
<shadeslayer> alyssa: nice
<shadeslayer> sounds pretty useful
<alyssa> shadeslayer: You interested? :)
<alyssa> hereby dubbed "anti-Gallium" since that name sounds awesome
stikonas has joined #panfrost
fysa has joined #panfrost
fysa has quit [Ping timeout: 268 seconds]
adjtm has quit [Ping timeout: 248 seconds]
* alyssa thinks this holds promise.
<endrift> just don't call it Mercury
<alyssa> What I can't decide if I want to use literal Gallium structs or roll our own thing.
<alyssa> context functions don't make sense but maybe the structs in p_state.h and the enums might be useful.
* alyssa thinks
<alyssa> Yeah, I think that might be good.
<alyssa> Okay, yes, this is interesting
davidlt has quit [Remote host closed the connection]
fysa has joined #panfrost
fysa has quit [Ping timeout: 245 seconds]
unoccupied has quit [Ping timeout: 244 seconds]
unoccupied has joined #panfrost
davidlt has joined #panfrost
adjtm has joined #panfrost
NeuroScr has joined #panfrost
adjtm_ has joined #panfrost
fysa has joined #panfrost
adjtm has quit [Ping timeout: 268 seconds]
fysa has quit [Ping timeout: 268 seconds]
sravn has joined #panfrost
stikonas has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
<alyssa> Essentially, I'm doing a shift of pandecode from focusing on packing to focusing on meaning.
<alyssa> Focusing on packing was useful in the early days (when we were able to literally compile a trace and execute it against a Mali)
<alyssa> But now, we want to go to higher levels of understanding, and focusing on the nitty gritty details of packing bits for the hw is... not useful.
<alyssa> Better to simply *validate* the packing is correct / canonical.
<alyssa> if it's not, yeah, dump everything around so we can fix things
<alyssa> But if it is, no need to print anything except the actually decoded meaning.
<alyssa> Cute..
<alyssa> TILER jobs within a job chain are totally independant.
<alyssa> So you can have different framebuffers (of different sizes / polygon lists) bound for different ones
<alyssa> And then you just have multiple FRAGMENT jobs
<alyssa> Blob does this to avoid flushing like crazy while mipmapping
<alyssa> And likewise, you can have a whole bunch of FRAGMENT jobs in a chain together, with job_barrier set.
<alyssa> This hw quirk was discovered via anti-Gallium.
<alyssa> Good news: we know more about the hardware now!
<alyssa> Bad news: I need to redesign a bunch of things in anti-Gallium to account for the new model of the hw.
<alyssa> Still a net win.
<alyssa> New model for anti-Gallium has to be stateless (well, not totally stateless, but not as state-like as GL)
<alyssa> Instead having hashmaps for things, so the check is more "this framebuffer size corresponds to this framebuffer / polygon list / etc"
<alyssa> This will take... somewhat more thought to get right.
fysa has joined #panfrost
<alyssa> As for workgroups_x_shift_*:
<alyssa> workgroups_x_shift_3 I've seen as 7 or 8, even
<alyssa> and zero?
<alyssa> 0x8 seems to all be CL workloads with images
fysa has quit [Ping timeout: 245 seconds]
<alyssa> Wut.
<alyssa> Oh
<alyssa> So, in OpenCL: workgroup_x_shift_size_3 =
<alyssa> { 8 if (128 < local < 256)
<alyssa> 7 if (64 < l <= 128)
<alyssa> 6 otherwise }
<alyssa> (er <= 256)
<alyssa> That's in 1D
<alyssa> Need to check 2D/3D now
robink has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
<alyssa> wut
<alyssa> What am I missing
<alyssa> ^ Some of that is good stuff to keep in mind
<HdkR> :)
fysa has joined #panfrost
<anarsoul> mmm, scalar
<HdkR> Bifrost and Valhall are nice architectures :D
<anarsoul> iirc bifrost has the same command stream as midgard, what about valhall?
<HdkR> Probably the same but it isn't released yet
<HdkR> Seems like mostly an ISA improvement again
<alyssa> Things I've learned so far:
<alyssa> - I don't know what I'm doing??
fysa has quit [Ping timeout: 272 seconds]
<alyssa> For OpenCL, workgroups_x_shift_2 == workgroups_x
<alyssa> for GL, it has the MAX2 thing going on
<alyssa> Wut.
<alyssa> Oh, and even crazier, for GL compute it's just.. always 0x2?
<alyssa> Also, for GL, workgroups_z_shift = 32, at least for the blob I have
<alyssa> It doesn't really matter what it is so I'm guessing it's just a blob quirk
<alyssa> (when instancing is disabled, anyway)
stikonas_ has quit [Remote host closed the connection]
* alyssa shrugs
<alyssa> This is progress, and we shall revisit later.
<anarsoul> this was a triumph? :)
<HdkR> Only once Panfrost can play Portal? :P
raster has joined #panfrost
<anarsoul> oh, I wish they released source code so it could be ported to ARM
<anarsoul> anyway it has no commercial value nowadays
<HdkR> It was already ported to ARM, just Nvidia only
<alyssa> and we're out of beta we're releasing on tiiiiiiiiiiiime
fysa has joined #panfrost
<anarsoul> HdkR: and android-only :(
<HdkR> aye
<alyssa> Okay, new approach
<alyssa> Anti-Gallium is going to maintain a hashmap of framebuffer keys to framebuffers.
<HdkR> anarsoul: We just need a fast x86 emulator that can run it :P
<alyssa> And then when the framebuffer is read in the FRAGMENT job, we ensure it's there
<anarsoul> HdkR: *sigh*
<alyssa> (Unles it's a clear-only, in which case we ensure there wasn't anything there)
<alyssa> And then evict the framebuffer from the hashmap
<alyssa> ("Even I know this is ridiculous")
<alyssa> Hrmph
<alyssa> I think the polygon_list pointer is the right key to use
<alyssa> I'm just questioning the wisdom of this design
<alyssa> I suppose hashmaps of objects more generally is the way to go
<alyssa> I'm just questioning whether:
<alyssa> - This could ever work on any architecture beside Midgard/Bifrost
<alyssa> - If it matters.
<alyssa> This is inherently hw specific
fysa has quit [Ping timeout: 244 seconds]
raster has quit [Remote host closed the connection]