alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
_whitelogger has joined #panfrost
<SolidHal> Cool, so assuming everything works, installing all of those packages should enable the use of panfrost?
<SolidHal> or rather grabbing that source package, building it, and installing it since it doesn't look like prebuilt packages are available
<SolidHal> *for everyting
<alyssa> Yeah, with a 5.2+ kernel with DRM_PANFROST=m loaded in
<alyssa> Mesa 19.2 or master (either way) built and installed
<alyssa> Can't think of anything else
<SolidHal> Awesome, thanks a ton.
<anarsoul> alyssa: dts
rcf has quit [Ping timeout: 258 seconds]
<alyssa> anarsoul: Ah, yeah, right
<alyssa> Hal is working on RK3288 which should be dts'd as is
<alyssa> bbrezillon: "in the vt1,frag1 + vt2,frag2 example, can vt1 and vt2 run in //"
<alyssa> Not sure what you mean by //?
<anarsoul> parallel
<alyssa> Ahhh
<alyssa> Yes, conceptually vt1/vt2 could run in parallel
<alyssa> In practice, AFAIK they can't because the hardware only has one job slot you can use for that (at least as implemented in DRM kernel)
<alyssa> See some of the emails with stepri0
<anarsoul> oh, alyssa, I have a question for you
<anarsoul> how do you recover from GPU hangs in userspace driver?
<alyssa> We don't since they almost never happen
<anarsoul> infinite loop?
<alyssa> Shouldn't happen
<anarsoul> you can have a shader with infinite loop in it
<alyssa> Not in the backend!
<anarsoul> you can't detect all the infinite loops
* alyssa hears Turing rolling in his grave
<alyssa> I suppose not... In practice this has not been an issue
<alyssa> With WebGL, conceivably it could be.
<alyssa> WebGL is usually the problem.
<alyssa> Joking aside, the hardware has a watchdog. If the job takes too long, it times out which is equivalent to a fault.
<alyssa> As long as you don't overwhelm it it usually recovers automatically since the hw/kernel is reliable
<anarsoul> right
<anarsoul> but what if vertex job failed
<alyssa> What about it?
<anarsoul> do you run fragment job in this case?
<alyssa> Dunno
rcf has joined #panfrost
vstehle has quit [Ping timeout: 246 seconds]
<TheCycoTWO> has anyone investigated the weston 7 segfault?
lvrp16 has quit [Read error: Connection reset by peer]
austriancoder has quit [Ping timeout: 276 seconds]
austriancoder has joined #panfrost
lvrp16 has joined #panfrost
belgin has joined #panfrost
vstehle has joined #panfrost
fysa has joined #panfrost
chewitt has quit [Quit: Adios!]
NeuroScr has joined #panfrost
davidlt has joined #panfrost
davidlt_ has joined #panfrost
belgin has quit [Quit: Leaving]
davidlt has quit [Ping timeout: 245 seconds]
davidlt__ has joined #panfrost
davidlt_ has quit [Read error: Connection reset by peer]
davidlt_ has joined #panfrost
davidlt__ has quit [Ping timeout: 240 seconds]
<bbrezillon> alyssa: ok, I think the part I was missing is that tiles are written back to DRAM and the fragment job has an fbd that describes where those tiles are (tiler_heap I guess)
<daniels> TheCycoTWO: which 'weston 7 segfault'? a fair few people here are running weston 7 and it works fine
<bbrezillon> anarsoul: just had a look at the scheduler code, and the fence is not signaled when a job fails. AFAIU, if the vertex job fails repetedly (the scheduler retries a few times), the fragment job will never be executed (actually, all jobs coming from this context/FD will be ignored after that)
warpme_ has joined #panfrost
adjtm has quit [Ping timeout: 246 seconds]
warpme_ has quit [Quit: warpme_]
warpme_ has joined #panfrost
<narmstrong> tomeu: I fixed the rootfs build, added the tags and switched to my kernel branch + fixed the arm64 config https://gitlab.freedesktop.org/narmstrong/mesa/-/jobs/592083
<narmstrong> tomeu: the job submission won't work, but I'll test locally
adjtm has joined #panfrost
warpme_ has quit [Quit: warpme_]
<narmstrong> tomeu: using variables is better for handling which board to test, otherwise the pipeline is set to `stuck` if we use tags https://gitlab.freedesktop.org/narmstrong/mesa/pipelines/62723
raster has joined #panfrost
<narmstrong> I managed to have artifacts, testing them then I'll setup the jobs submission
davidlt_ is now known as davidlt
<tomeu> narmstrong: oh, forgot to CC you on the CI changes I sent yesterday :/
<narmstrong> tomeu: no problem, I'm locally testing the artifact, and it works :-)
<narmstrong> tomeu: using surfaceless means it won't need a valid display anymore right ?
NeuroScr has quit [Quit: NeuroScr]
NeuroScr has joined #panfrost
<narmstrong> :: All test completed.
<narmstrong> ok: 11570, warn: 54, bad: 5771, skip: 76, total: 11767
<narmstrong> nice !
yann has joined #panfrost
<tomeu> narmstrong: yep
<tomeu> narmstrong: btw, I have started work on moving our CI to mesa's main CI scripts
<tomeu> so it's probably not worth it to make many changes to the current panfrost setup
<narmstrong> tomeu: ok, so in this case, tags is preferable to test only t820 on my runner
raster has quit [Remote host closed the connection]
<tomeu> narmstrong: maybe yeah, is the runner ready now?
<narmstrong> tomeu: nop, I need to finalize the lava part
<tomeu> ok
<tomeu> I need to check some regressions that have come from core mesa
<tomeu> narmstrong: btw, one more comment: for now we should probably only test one arch per board
<tomeu> once we think we have free capacity, we can change that
chewitt has joined #panfrost
raster has joined #panfrost
<TheCycoTWO> daniels on start after showing GL info. "segmentation fault (core dumped) weston peerset by peer'1' of size 16ev) 48own', model 'unknown', serial 'unknown'ent0.
<daniels> TheCycoTWO: i've not seen that and it generally works really well for us - one thing you can do is to redirect stdout and stderr to a file to capture that, as well as look at the core dump to see where the crash happened
<TheCycoTWO> Stack trace shows #0 0x0 n/a (n/a) #1 0x0000ffffab6e0668 n/a (gl-renderer.so) and on more n/a level above that.
<TheCycoTWO> Was trying to build weston myself with symbols but ran into complaints about egl.
<TheCycoTWO> That's archlinuxarm on kevin with latest master mesa (and several other mesa versions back to its release) linux 5.2
* TheCycoTWO posted a file: weston-start.log (9KB) < https://matrix.org/_matrix/media/v1/download/matrix.org/yysLfNBgeULeZRcRcvidkUST >
<TheCycoTWO> will the coredump be useful without symbols?
warpme_ has joined #panfrost
chewitt has quit [Quit: Adios!]
<TheCycoTWO> daniels ^?
<daniels> TheCycoTWO: unfortunately not really useful without symbols, no - getting a dump with symbols would be really useful
<TheCycoTWO> ok. First thing to do is figure out why it says "Runtime dependency egl found: NO" when I run meson then
raster has quit [Remote host closed the connection]
<alyssa> bbrezillon: Re tiles written to memory and fragment reading it back, correct. Think what you will of the efficiency..... but that's what the tiler_* structures are for.
raster has joined #panfrost
raster- has joined #panfrost
<TheCycoTWO> ah, mesa had stopped providing the pc file for egl when compiled with glvnd - um - back in july. I have no need for glvnd anyway.
<bbrezillon> alyssa: I have a version with the dep graph stuff that starts working (at least CI is happy with it :)) => https://gitlab.freedesktop.org/bbrezillon/mesa/tree/panfrost-job-rework
<bbrezillon> but I have a few questions
<bbrezillon> looks like I have a few BOs that I was expecting to be used only by the vertex+tiler jobs, but if I don't attach them to the fragment job it starts failing
<bbrezillon> alyssa: the SSBO buf for instance
<alyssa> bbrezillon: Ah! :)
<alyssa> Fragment shaders run during FRAGMENT, not during TILER
<alyssa> Ummm
<alyssa> VERTEX jobs = upload the vertex shader, run the vertex shader
<alyssa> TILER jobs = upload the fragment shader, run the *tiler*
<alyssa> FRAGMENT jobs = run the fragment shader, copy to framebuffer
<alyssa> So if a fragment shader needs a BO, it needs to be attached to the FRAGMENT job
<alyssa> But if a vertex shader needs it, to the VERTEX
<alyssa> And if tiling itself needs it (varyings, tiler structures, fragment shader binaries, etc), to the TILER
raster- has quit [Read error: Connection reset by peer]
NeuroScr has quit [Quit: NeuroScr]
<bbrezillon> alyssa: ok
<TheCycoTWO> ok - weston 7 is working for me now. (mesa built without glvnd) so I guess it's not panfrost's fault at any rate (probably my fault)
NeuroScr has joined #panfrost
<bbrezillon> alyssa: so, the FRAGMENT jobs knows where the fragment shader (and associated BOs) are
<bbrezillon> something defined in the fbd I guess
<tomeu> alyssa: o/
<tomeu> alyssa: do you know why this commit broke two tests in panfrost, but not on llvmpipe? https://gitlab.freedesktop.org/tomeu/mesa/commit/b6384e57f5f6e454c06ec1ada1c1138dd0dc84f2
<tomeu> 2019-09-11T11:18:09 dEQP-GLES2.functional.uniform_api.random.3 Fail REGRESSED from (NotListed)
<tomeu> 2019-09-11T11:18:09 dEQP-GLES2.functional.uniform_api.random.21 Fail REGRESSED from (NotListed)
<alyssa> bbrezillon: Yeah, somehow
<alyssa> I suspect it's somewhere in the tiler_heap but I never investigated
<TheCycoTWO> popolon: you were the other person that mentioned weston 7, and you are on almost an identical setup to me. Did you sort it out already, or see my libglvnd notes?
<alyssa> tomeu: THat is a good question :|
<alyssa> I mean
<alyssa> The obvious answer is that Panfrost ingests NIR but llvmpipe ingests LLVM IR :P
<alyssa> (and that commit is in the glsl_to_nir pass)
<tomeu> guess that iris isn't broken by it either, but <i haven't seen the CI run for it
<alyssa> Mm, the uniform/varying/attribute counting code is... fragile... so I'm not surprised it broke on some edge cases
<alyssa> Could you grab a testlog.xml so I can see what the shaders are?
<tomeu> hmm, let me check
* tomeu is at plumbers now
raster has quit [Read error: Connection reset by peer]
<alyssa> How's the plumbing going?
<tomeu> there's no plumbing that I can see, only plumbing professionals going around and eating and drinking
<alyssa> No plumbing?
<alyssa> What if you need to use the washroom? :/
raster has joined #panfrost
raster has quit [Read error: Connection reset by peer]
raster has joined #panfrost
<tomeu> hmm, maybe there's secret plumber somewhere...
<bbrezillon> alyssa, daniels, tomeu: looks like -bideas prefers the serialization => http://code.bulix.org/lje2vf-865832
<narmstrong> robmur01: thanks ! testing them right away
<daniels> bbrezillon: *blink*
urjaman has joined #panfrost
urjaman has quit [Quit: WeeChat 2.5]
urjaman has joined #panfrost
davidlt has quit [Ping timeout: 240 seconds]
warpme_ has quit [Quit: warpme_]
<tomeu> bbrezillon: any ideas why it would?
<alyssa> bbrezillon: What's the differnece between v1 and v2 of serialization here?
<alyssa> Also, it is possible the parallel submission code has a subtle bug that could manifest as suboptimal performance (even if it's still correct)
<tomeu> narmstrong: it helps to have arm people around :p
<bbrezillon> alyssa: v1 is the version I sent last week
<bbrezillon> v2 is the one using a dep graph to delay even more the batch submission
<bbrezillon> and I'd expect v2 to be faster than v1 once Steven patch has landed
<bbrezillon> tomeu: nope
adjtm has quit [Ping timeout: 258 seconds]
<bbrezillon> well, delaying the submission also means the GPU starts executing jobs later, and I suspect it makes a difference for gpu bound workloads
<bbrezillon> or it could be a bug, as pointed by alyssa :)
<tomeu> yeah, I would bet on that
<bbrezillon> hm, I applied Steven patch and it doesn't change the results
<tomeu> we need to improve tools to see how CPU and GPU load affect throughput
davidlt has joined #panfrost
raster has quit [Remote host closed the connection]
adjtm has joined #panfrost
yann has quit [Ping timeout: 276 seconds]
jolan has quit [Quit: leaving]
jolan has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
NeuroScr has joined #panfrost
<alyssa> Agh, GLSL IR, my eyes! :p
<alyssa> I just meant the GLSL itself (and also the NIR)
<alyssa> bbrezillon: Hmm, I'm trying to think why the dep graph could possibly be slower
<alyssa> "delaying the submission also means the GPU starts executing jobs later, and I suspect it makes a difference for gpu bound workloads"
<alyssa> This is a very real possibly and something I thought about; I think I had concluded to myself it should be neglible but let's see..
<alyssa> The drop between v1/v2 on the simple scenes is probably just the extra overhead
<alyssa> Since there's no dependent FBOs or anything, so there's nothing to delay really
<alyssa> (We should try to keep down the overhead but again, neglible)
<alyssa> v2 is a big win on -bdesktop:effect=shadow
<alyssa> If I had to guess it maybe is eliminating wallpapers and things
<alyssa> I'm trying to figure out why -bideas is hurt with both v2 and v1..
* alyssa is suspecting some sort of bug introducing waits in places they shouldn't be but she's not sure
yann has joined #panfrost
<bbrezillon> alyssa: look like calling bo_wait() (even with a 0 timeout or when the BO is ready) has a non negligible impatch
<bbrezillon> impact
<bbrezillon> userspace -> kernelspace context switch hurt
<bbrezillon> *hurts
davidlt has quit [Ping timeout: 244 seconds]
<alyssa> bbrezillon: Eek, yeah, that sounds like it could defn contribute..
<alyssa> Wonder if that can be mitigated in user somehow...?
<anarsoul> why do you call bo_wait()?
<bbrezillon> alyssa: it does not explain the 100 fps diff
<bbrezillon> but accounts for at least 20 AFAICT
<bbrezillon> we can mark BOs ready, but they need to be tested once at least
<alyssa> bbrezillon: I'm questioning if we made a bad ABI choice that we're stuck with now :-(
* alyssa wishes she understood fences/syncs when this stuff was first being done
<alyssa> Wait
<alyssa> bbrezillon: Why can't we use syncs?
<alyssa> When a BO is in flight, track "BO<->sync"
<alyssa> -----------Oh!
<alyssa> I am in Class but you know what this seems like fun, sec
<alyssa> Yeah, can't we specify the dependency graph in terms of syncs and eliminate the waits entirely?
<alyssa> You would still need wait_bo for shared BOs, I guess.... but the majority of BOs we deal with are not shared so it doesn't -terribly- matter
<alyssa> If there's really a reason that can't work, I mean, we can patch the kernel to do waits on `bo_handles` automatically, but ideally we make it work without any ABI changes so everything continues to work against 5.2
<anarsoul> bbrezillon: just check first BO, if it's ready - grab it, if it's not - others aren't ready either, so allocate new BO. Make sure that you add BO to list tail when you free them
<anarsoul> that'd be 1-2 syscall per BO allocation and that's not bad
* alyssa reads up on how sync objects work
<bbrezillon> alyssa: that's what I do
<bbrezillon> to track job lifetime
<bbrezillon> but we still need to wait for something when testing BO readiness
<bbrezillon> and calling bo_wait() is the same as calling wait_syncobj
<bbrezillon> anarsoul: exactly what we do already
<bbrezillon> we have extra BO waits in transfer_map()
<bbrezillon> but that's all
<anarsoul> it doesn't look like it's possible to optimize it further
<bbrezillon> the big difference is that, by serializing jobs (and waiting on their out_sync fences) we were having a single sync point, instead of one per BO-cache alloc + one per transfer_map
<anarsoul> don't serialize jobs?
<bbrezillon> that's what I'm trying to do
<bbrezillon> and it regresses the -bideas benchmaek
<bbrezillon> *benchmark
<bbrezillon> it's definitely not the only problem, but still accounts for almost 20fps on 100
<anarsoul> interesting
<bbrezillon> what I noticed is that, in that test, batching is not possible (only ever have one job to submit)
<bbrezillon> and when that happens, it seems we have a huge perf drop
<anarsoul> profile it?
<anarsoul> bbrezillon: also check if you're actually getting BO from cache or you're allocating it
<bbrezillon> I'm definitely not an expert in profiling, but I tried callgrind, and didn't spot anything obvious
<anarsoul> try perf and flamegraph
<bbrezillon> good point, I'll check if the cache does it job
<bbrezillon> *its
* anarsoul loves flamegraphs
<bbrezillon> thanks for the tip, I'll try that
<bbrezillon> anarsoul: I have a pretty decent number of cache hit
<anarsoul> then profile it
<bbrezillon> yep, will do that tomorrow
<bbrezillon> anarsoul, alyssa: thanks for your help
stikonas has joined #panfrost
NeuroScr has quit [Quit: NeuroScr]
NeuroScr has joined #panfrost
<alyssa> I swear by the gnome profiling thing
<HdkR> This a wrapper around perf?
<alyssa> Dunno
<bbrezillon> anarsoul: flamegraph is awesome!
<anarsoul> yeah, told you :)
<HdkR> I use flamegraphs all the time
<bbrezillon> alyssa: I think I found the problem
<HdkR> flamegraph with xray and perf works great
<anarsoul> who's the culprit?
<bbrezillon> it's a mix of "panfrost is currently a bit lax regarding resource sync on transfer_map" (which my patch is addressing)
<bbrezillon> and "the kernel driver is a bit to restrictive when it comes to reserving BOs that we only read from"
<anarsoul> what side of 'we' you're referring to?
<bbrezillon> I think we'll have to extend the submit ioctl
<anarsoul> GPU? CPU?
<bbrezillon> GPU
<anarsoul> you don't have per-bo flags in submit ioctl yet?
<anarsoul> :qa
<anarsoul> oops
<bbrezillon> no, we son't
<anarsoul> wrong window :)
<bbrezillon> don't
<bbrezillon> hm, let me check
<bbrezillon> no, we definitely don't have that
enunes has quit [Quit: ZNC 1.7.2 - https://znc.in]
enunes has joined #panfrost
cwabbott has quit [Remote host closed the connection]
griffinp has quit [Quit: ZNC - http://znc.in]
cwabbott has joined #panfrost
stikonas has quit [Remote host closed the connection]
<alyssa> bbrezillon: Hm, not sure I follow. Could you elaborate?