buzzmarshall has quit [Remote host closed the connection]
vstehle has joined #panfrost
robert_ancell has quit [Ping timeout: 272 seconds]
macc24 has quit [Ping timeout: 265 seconds]
icecream95 has joined #panfrost
kaspter has quit [Quit: kaspter]
davidlt has quit [Remote host closed the connection]
mixfix41_ has quit [Ping timeout: 256 seconds]
davidlt has joined #panfrost
raster has joined #panfrost
cwabbott has joined #panfrost
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #panfrost
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #panfrost
kaspter has joined #panfrost
stikonas has joined #panfrost
stikonas_ has joined #panfrost
stikonas has quit [Ping timeout: 240 seconds]
<robmur01>
HdkR: FWIW we've got Radeons in the eMAG and TX2 desktops at work, although that's not overly helpful just now :)
<robmur01>
(I don't have the patience for remote desktop shenanigans)
<robmur01>
mmind00: since the switch to generic OPP code we fail to actually change the regulator voltage ever, so for boards with kernel-controlled regulators it depends on how close the initial default is to that of the max OPP as to how wonky things get - Chromebooks seem worse off than most
<mmind00>
robmur01: so it's a matter of devfreq acting up ... I don't remember seeing cpufreq-related reports though
<robmur01>
there have been at least 3 attempts to fix it, but they all seem to get stuck in the mess of clock/regulator/OPP/devfreq optionality
<robmur01>
why would panfrost driver changes affect cpufreq? :P
<mmind00>
:-P ... I only read tidbits here yesterday, so was thinking if that was a opp-problem
<mmind00>
[aka the TL;DR I deduced from the backlog yesterday was "panfrost broken on Kevin due to frequency scaling" ;-) ]
<robmur01>
nope, it's a "panfrost fails to attach regulators to its own OPP table" problem ;)
macc24 has joined #panfrost
Elpaulo has quit [Quit: Elpaulo]
<shadeslayer>
austriancoder: did you manage to get a trace visualized ?
<daniels>
icecream95: thanks! is it useful to you at all?
<HdkR>
robmur01: Dang, no Xavier to get ARMv8.1 I guess?
<robmur01>
HdkR: pff, once N1SDP boards start turning up in numbers to replace the Junos it'll be v8.2 all the way... and then we wait and hope for Altra (and possibly KunPeng) :D
<HdkR>
haha sure, I'm not saying Xavier is a good choice :P
<HdkR>
Is the N1SDP board even something that will be available to purchase?
<robmur01>
On the more affordable side, I believe Macchiatobins are a popular "stick a GPU card in it" board
<icecream95>
daniels: I have been too busy hacking the Midgard instruction scheduler to use it much so far...
<HdkR>
Dang, only A72 on those though
<daniels>
icecream95: heh, that's cool :) what are you doing in the scheduler ooi?
<HdkR>
Alternatively, just need Panfrost to support GL 3.3 :p
<HdkR>
er, Bifrost GL 3.3 for SoCs that support ARMv8.1*
<icecream95>
HdkR: I'm sure you could very carefully remove the Bifrost GPU and glue in a Midgard one and everything would still work. :P
<HdkR>
Those atomics are too good to live without :)
<robmur01>
N1SDP> dunno - as far as I'm aware the original intent was a very-limited-scope CCIX demonstration platform, but I've since heard mumblings that there *might* be some shift to productise it at some point
<HdkR>
dang
<robmur01>
I wouldn't worry - if you don't care about CCIX then it's basically just 4 2-and-bit GHz cores plus a handful of PCIe lanes for ~$n000 ;)
<HdkR>
hm
<robmur01>
if you want a cheap v8.2 platform right now, consider bashing your head against S905X3
<HdkR>
I have a couple of the ODROID-C4 boards, just can't use that for sticking a dGPU on it :P
<HdkR>
Really I'm probably going to be waiting for an Nvidia Orin dev board, which is sad to say
<robmur01>
yeah, Cortex-A55 + usable PCIe is probably an unlikely combination, except perhaps for high-core-count networking stuff
<HdkR>
Especially with Orin being slated for 2022 and being Nvidia GPU. So no Bifrost/Valhall fun :P
<daniels>
icecream95: heh! that's neat!
<HdkR>
Looks like my best bet over the next few months is buying another Xavier and cringing and performance numbers though
<HdkR>
cringing at performance numbers*
* HdkR
stacks JITs
<daniels>
you can take the boy out of NVIDIA ...
nlhowell has joined #panfrost
<HdkR>
haha
<HdkR>
Sadly nobody makes Exynos devboards anymore, which would have been fun targets :)
<HdkR>
I guess I just have unreasonable performance desires
<daniels>
not interested in Snapdragon for perf?
<robmur01>
what is the "performance" you speak of? This is 2020, where 'hello world' is 300MB of packaged standalone JavaScript environment...
<daniels>
you can actually get pretty reasonable performance out of those now
<daniels>
er. *actually get pretty reasonably-priced devboards
<HdkR>
I'd like the Snapdragon 865 dev board if it was a reasonable Linux target :P
<daniels>
we gave up on Exynos long before they gave up on actually selling the silicon to anyone else - a few iterations of us fixing Exynos for mainline in one kernel release, then Samsung breaking it for everyone apart from Tizen in the very next release, was pretty demoralising
<HdkR>
yea, I saw that over the years. Such a pain
<robmur01>
FWIW, I can vouch for the performance of SDM835 running x86 Thunderbird being somewhat less than "reasonable" (cue stall for ~10s in the middle of typing this...)
<daniels>
HdkR: not to try to talk you out of Panfrost or anything, but :P there are patches out there atm for 865 display & GPU
<HdkR>
My Snapdragon 850 device destroys my Snapdragon 8cx device in unit test run time. But that's just because of WSL being terrible on the 8cx and running real linux on the 850 device
<HdkR>
daniels: Oh yea, I saw that! Going to be a good time soon there
<HdkR>
Freedreno I've already confirmed that it runs fine in an x86-64 environment, going to need to ensure Panfrost userspace also works in the same environment at some point :)
<HdkR>
Mali-G78, sounds like a good time for more Valhall
<HdkR>
Can't tell if the 24core unit can get near Adreno top end perf
buzzmarshall has joined #panfrost
<alyssa>
"up to some more radical changes such as a complete redesign of its FMA units."
<alyssa>
robmur01: "The one key changed of the Mali-G78 that Arm had talked about the most, was the change from a single global frequency domain for the whole GPU to a new two-tier hierarchy, with decoupled frequency domains between the top-level shared GPU blocks, and the actual shader cores."
<alyssa>
I just read this as more devfreq bugs for us down the line.
<alyssa>
Bugs are O(N^2) to complexity IME ;)
stikonas_ has quit [Remote host closed the connection]
macc24 has quit [Read error: Connection reset by peer]
macc24 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
nlhowell has joined #panfrost
raster has joined #panfrost
<robmur01>
yeah, I can't even imagine off-hand how you'd even present the OPP tables for that, and I do wonder whether software is expected to forecast shader vs. tiler load for itself :/
<alyssa>
bbrezillon: what's the idea for BO_ACCESS_VERTEX/FRAGMENT flags?
<alyssa>
oh, for the dep graph later. got it.
<bbrezillon>
alyssa: yep, knowing which one is used in the frag job
<bbrezillon>
and which ones are used in the !frag job
* alyssa
is looking into refactoring away the hash tables so
<bbrezillon>
alyssa: what's the key of this hashtab?
<alyssa>
bbrezillon: currently, we have a lot indexed by panfrost_bo * with a hash table
<alyssa>
when we can get away with bo->gem_handle into an array/bitset/etc
<alyssa>
(see discussion with jekstrand in dri-devel yesterday - this is how it's handled in anv)
<bbrezillon>
yep, I saw that one
<bbrezillon>
and that sounds like a good idea, indeed
<alyssa>
bbrezillon: I don't see where PAN_BO_ACCESS_FRAGMENT is read, though
<alyssa>
(It looks like we only track deps on a per-batch level)
<alyssa>
and batch_submit_ioctl only uses the deps for the v/t side
* alyssa
wonders if we're losing perf there
<bbrezillon>
alyssa: yep, it's a per-batch thing
<bbrezillon>
inter-batch dep is handled through BOs
<bbrezillon>
I mean FBs, not BOs
<bbrezillon>
alyssa: the dep of a fragment job, is the V/T job
<bbrezillon>
which already has deps on other jobs defined
<bbrezillon>
so the frag job indirectly depends on the V/T deps
<bbrezillon>
is that wrong?
<bbrezillon>
note that BO_ACCESS flags are here for resource refcounting, not deps
raster has quit [Quit: Gettin' stinky!]
marex-cloud has quit [Ping timeout: 256 seconds]
<bbrezillon>
well, they also act as implicit deps, since the kernel driver waits for all referenced BOs to be idle before schedule a job
<bbrezillon>
*scheduling
<alyssa>
bbrezillon: Suppose batch A renders a cat to FBO #1.
<alyssa>
Then batch B renders a fullscreen quad (so no deps in vertex/tiler) which in the fragment shader textures from FBO #1 to do some post-processing to make the cat rainbow and bounce and say nyan.
<alyssa>
Ideally we would have:
<alyssa>
VERTEX: [ A ] [ B ]
<alyssa>
FRAGME: [ A ] [ B ]
<alyssa>
since the vertex job of B does not depend on the fragment job of A, they can run concurrent
<alyssa>
If I understand the code right, though, it would actually end up being
<alyssa>
VERTEX: [ A ] [ B ]
<alyssa>
FRAGME: [ A ] [ B ]
<alyssa>
which is slower due to the unnecessary dep.
<alyssa>
The ACCESS flags would signal that that's unnecessary, but I don't see how the kernel would know since it just sees a dep of B on A, and it just sees B accesses a BO written from A (the FBO)
<bbrezillon>
right, I forgot that the tiler job was not responsible for texture sampling
<bbrezillon>
so we could indeed remove this dep
<alyssa>
(it's a bit confusing -- TILER jobs specify all the fragment shaders but they don't actually run until FRAGMENT)
<Lyude>
alyssa: you working on midgard perf stuff?
<alyssa>
Lyude: Yeah :-)
<bbrezillon>
yep, I think last time we discussed that you said tiler jobs were referencing textures, which is why I thought there was a hard dep here
<alyssa>
yeah, it's tricky. the TILER job does reference it in the sense that the job has the pointer, but it doesn't actually access it
<bbrezillon>
if that's not the case, then we should remove the explicit dep on BOs flagged with BO_ACCESS_FRAGMENT only
<bbrezillon>
we'd still pass the BO to the BO list, that's not a problem
<bbrezillon>
we can just get rid of the dep
<alyssa>
would that work if the kernel does implicit deps from the BO list..?
<bbrezillon>
anyway, none of that will help improving the perfs if the kernel is not patched to support skipping the implicit waits on BOs
<bbrezillon>
which was in the pipe when I submitted the batch pipelining stuff
<alyssa>
I'm not convinced we need to specify that texture in the vertex/tiler BO list at all, though
<alyssa>
When I say it's a pointed, I literally just mean it's a pointer. It shouldn't ever get dereferenced by the GPU until the corresponding frag job executes.
<alyssa>
pointer
<alyssa>
Not sure if that's a kosher use of the BO list, but it should work at this point
<alyssa>
and then it becomes UABI by default or something
<alyssa>
robher: *ducks*
<bbrezillon>
alyssa: I wouldn't worry about that, the BO is still referenced by the frag job
<bbrezillon>
which is executed after the tiler job is done
<bbrezillon>
so omitting the BO in the tiler BO list shouldn't be a problem
<alyssa>
agreed, just not sure it's totally intended :)
<bbrezillon>
probably not
<bbrezillon>
but adding a flag to skip the implicit deps would also be a good thing
<bbrezillon>
I mean, etnaviv has that too
<alyssa>
Mm
<bbrezillon>
don't you have cases where 2 jobs read from the same BO but never write it?
<bbrezillon>
clearly we don't want things to be serialized in this case
<bbrezillon>
but that's what happens
<alyssa>
ah, right. good point
<bbrezillon>
we have all the pieces to skip this unneccessary serialization already
<bbrezillon>
we just need this flag (and a lot of testing to make sure it doesn't regress things :))
<alyssa>
testing? don't you mean pushing to master and waiting for the bug reports?
<alyssa>
(thanks icecream95 ;P)
<bbrezillon>
:D
<alyssa>
bbrezillon: As an aside, I notice we spend serious CPU time in the SUBMIT ioctl.. wonder what's up with that
<alyssa>
9.21% on this trace in panfrost_ioctl_submit
<alyssa>
within that 4.01% in panfrost_job_push, 1.23% in drm_gem..lookup, 1.18% in gem_mapping_get
<alyssa>
1.26% waiting on the wake up lock in drm_sched_wakeup
nerdboy has joined #panfrost
<alyssa>
Maybe we're using way too many BOs
stikonas has joined #panfrost
<robher>
I seem to recall some discussion on multiple readers. Related to resv_obj's I think.