<tomeu>
just that in util_make_fragment_tex_shader_writemask
<bbrezillon>
ok
<tomeu>
afaics, the cube is drawn after we have reloaded the FB contents just fine, but somehow the cube isn't fully drawn when we present it
<bbrezillon>
yes, I noticed that sometimes the new output was recovered by old content
<bbrezillon>
I thought is was something related to Z-testing
<bbrezillon>
*it
<tomeu>
well, I don't think it's new content covered by old content, as sometimes a whole cube side is missing
<tomeu>
so it looks to me as incomplete rendering
<tomeu>
bbrezillon: do you think it could be due to the linking patching done at the end of panfrost_draw_wallpaper ?
<tomeu>
cannot say I understand that code
<bbrezillon>
tomeu: could be
<bbrezillon>
but if you leave the relaod FB tiler job at the end, it simply doesn't work
<bbrezillon>
I tried that
<bbrezillon>
what happens is that you get the output completely replaced by the old FB content, which ends up being black since you always start with a cleared-FB (all bytes to 0)
<tomeu>
yep, I see all green here
<bbrezillon>
looks like specifying the dependency is not enough
<bbrezillon>
jobs have to be linked in the order they're supposed to be executed
<tomeu>
bbrezillon: and what should that be?
<tomeu>
I don't even know how many jobs we have there
<bbrezillon>
the very first tiler job should be the "reload FB" fragment shader
<bbrezillon>
then following jobs come in the order they were submitted by the gl app (glDraw() calls)
<bbrezillon>
and the sequence should be flushed when glFlush() is called
<tomeu>
what about the "reload FB" vertex shader?
pH5 has quit [Quit: bye]
<bbrezillon>
vertex shaders are all executed before fragment shaders
<tomeu>
and we don't need to set deps?
<bbrezillon>
we do need to have a dep, and we do set it
<bbrezillon>
it's done through dependency1 IIRC
<tomeu>
for the vertex job?
<bbrezillon>
a dep between the fragment/tiler and vertex job
<bbrezillon>
tiler job depends on vertex job
<tomeu>
ah, guess it's set somewhere else
<bbrezillon>
in pan_context
<tomeu>
panfrost_draw_wallpaper:81 tiler jobs 6 draw cnt 6
<tomeu>
panfrost_draw_wallpaper:83 tiler jobs 7 draw cnt 7
<tomeu>
why do we have 6 tiler jobs before we add the "reload FB" one?
<cwabbott>
also, you need a dep between each tiler job
<cwabbott>
each tiler job adds to the tile datastructure in the tiler heap
<bbrezillon>
tomeu: depends what the app does
<bbrezillon>
if you have 6 glDraw calls without a glFlush in between, that should generate 6 tiler jobs
<cwabbott>
you can't have multiple things appending it at the same time, or it'll get corrupted or things will get drawn in the wrong order
<tomeu>
was thinking that the reload jobs would be first of all, before anything from the app
<bbrezillon>
tomeu: except you don't know if it will be needed ahead of time
<cwabbott>
yes, the reload should be the first job
<bbrezillon>
cwabbott: it is placed first, at link time
<cwabbott>
if anything happens before it you need to flush by submitting a fragment job
<cwabbott>
oh, yeah, I see what you mean
<bbrezillon>
tomeu: we really need to check how the DDK handle that
<cwabbott>
bbrezillon: actually, you should know before the app starts drawing whether you need to insert a reload job or not
<bbrezillon>
maybe they have a simpler way for reloading the FB content (like instructing the GPU you don't want to clear/discard before rendering)
<bbrezillon>
cwabbott: do you know ahaed of time that you'll have a tiler job?
<cwabbott>
you can construct a reload job, then throw it away if the app clears
<bbrezillon>
that would work too, yes
<cwabbott>
or just delay it until right before the app does the first draw
<bbrezillon>
cwabbott: so you think the order jobs are created matters?
<cwabbott>
not so much, as long as they're linked together correctly at the end
<bbrezillon>
yes, that was also my understanding (actually the set_value job is created last but is placed first at link time)
<tomeu>
bbrezillon: pandecode seems to be broken in this scenario :/
<tomeu>
bbrezillon: oh, we generate the reload shaders before submitting to the kernel, so after the app has drawn? that's why we have 6 tiler jobs already?
<bbrezillon>
tomeu: yes
<tomeu>
cool
<bbrezillon>
we generate it at flush time
<bbrezillon>
only if there's at least one tiler job
<bbrezillon>
(and in the original code, also only if glClear(COLOR) was not called
<bbrezillon>
)
<bbrezillon>
tomeu: BTW, thanks for helping me with that, as I was running out of ideas ;-)
<tomeu>
ok, guess I need to dump the job chain at submission time and checking everything is fine
<tomeu>
well, so far I haven't done much of value :)
<bbrezillon>
tomeu: discussing the thing is already useful, it helps making sure I understood the problem correctly
<bbrezillon>
tomeu: I remember adding traces to make sure the jobs were linked in the correct order
<tomeu>
I see some traces here about that, but I want to dump the chain at submit time, to make sure I know what the hw ends up seeing
<bbrezillon>
sure
<tomeu>
what's the difference between job_dependency_index_* and next_64?
<bbrezillon>
if only I knew it :)
<cwabbott>
tomeu: so, what you submit to the hw is a "job chain", which is a singly-linked list of jobs defined by the next_64 pointer
<cwabbott>
the job manager scans the linked list, keeps a certain number of jobs in its internal memory, and submits them when they're ready
<cwabbott>
job_index is like the register that's written, and job_dependency_index_* are the source registers
<bbrezillon>
cwabbott: ok
<bbrezillon>
but then I don't get why specifying a dependency of the first (non reload) tiler job on the reload job is not enough
<cwabbott>
probably because the scoreboard starts off in the "dependency finished" state
<bbrezillon>
hm
<bbrezillon>
makes sense
<cwabbott>
it sees the first (non reload) job depending on a job it hasn't seen yet, and just assumes it has finished
<tomeu>
ok, so it's not enough with setting the job_dependency_index_* fields, but we also need to make sure that the job chain has the jobs in the right order
<tomeu>
so no job appears in the chain before one of its dependencies?
<cwabbott>
that's right, but it sounds like bbrezillon already fixed that
<tomeu>
with the memmove, yeah
pH5 has joined #panfrost
<cwabbott>
you shouldn't need to move the job with memmove
stikonas has joined #panfrost
<cwabbott>
the GPU only cares about the linked list, not which order things are in memory
<bbrezillon>
cwabbott: was just easier this way
<bbrezillon>
since it didn't involve reworking the function that's linking the jobs
<bbrezillon>
but I agree, we could avoid the memmove
<cwabbott>
why is there a function that's linking the jobs at all? I'd expect you to just keep appending to the linked list as you go along
<cwabbott>
that seems what the HW interface is set up for
<bbrezillon>
cwabbott: that's a question for alyssa I guess
<tomeu>
wonder if we can assume that the hw will schedule the vertex jobs before the tiler jobs
<tomeu>
other than vertex jobs not having deps, it looks fine to me
<bbrezillon>
tomeu: because of the deps between tiler jobs it's already guaranteed
<bbrezillon>
tiler0 depends on vertex0
<bbrezillon>
and tiler1 depends on vertex1+tiler0
<bbrezillon>
hm, just realized it's not guaranteed while writing it :)
<tomeu>
but couldn't tiler job 14 execute before vertex job 1?
<bbrezillon>
tomeu: good question
<bbrezillon>
I hope not
<bbrezillon>
actually, I'm not even sure that's a problem
<bbrezillon>
vertex shader are independent from each other (at least in our case), the only thing you must ensure is that tiler jobs are done in the right order
<alyssa>
tomeu: TBF, I don't understand the linking patching code either and I wrote it....
<bbrezillon>
tomeu: sorry, got interrupted by things on a different project
<alyssa>
tomeu: 6 tiler jobs = 6 draws = 6 faces of kmscube
<alyssa>
tomeu: pandecode is broken in a lot of scenarios... I really need to fix this stuff...
<tomeu>
alyssa: I think you can ignore most of my comments from today, except probably the last ones
<alyssa>
tomeu: But the blob handles reload stuff in literally this way, adding a special TILER job with a passthrough tex fragment shader. No vertex shader, specially computed varyings used instead.
<tomeu>
ok, I think the vertex side of things is fine
<tomeu>
but the sampling...
<alyssa>
cwabbott: In my defense the linking stuff is some of the oldest code in the driver, I'm well-aware it needs to be rewritten ......
<alyssa>
OA
<alyssa>
tomeu: Tiler job 14 depends on tiler job 13 dep... tiler job 1 depends on vertex job 1.
<tomeu>
alyssa: I think the linking is fine now
<alyssa>
Vertex jobs are genuinely independent
<alyssa>
bbrezillon: If the blit shader *flushes* then... we're in a loop since the whole bug this is around is "flushing without reloading first" >_<
<alyssa>
tomeu: "Just keeps increasing every frame" That's how vbo offsets are often setup... the app keeps changing stuff but it's better to append than modify in place for various reasons
<alyssa>
cwabbott: Yeah, the original (buggy) implementation was building the tiler job manually, as the blob does.. Still think that's the way to go
<alyssa>
cwabbott: The blob is doing a texelFetch op instead of a texture, and there's a weird interp0 on the varying, but beside that I don't think much is different..? But caches etc are totally likely
MoeIcenowy has quit [Quit: ZNC 1.6.5+deb1+deb9u1 - http://znc.in]
<tomeu>
I think we anyway want to end up there because of performance, but I liked the idea of starting with something that is used by other drivers, for sake of correctness
<alyssa>
cwabbott: kmscube won't be AFBC at this point, since it's going straight to GBM and I haven't wired up AFBC in the Rockchip display driver yet
<tomeu>
would have been cool to have weston working out of the box in debian without having to wait for 19.2
fysa has joined #panfrost
<alyssa>
tomeu: Better to wait for 19.2, for performance improvements and maybe glamor support and such..?
<alyssa>
I was more worried about getting in the kernel on-time
BenG83 has joined #panfrost
BenG83 has quit [Remote host closed the connection]
pH5 has quit [Quit: bye]
BenG83 has joined #panfrost
<alyssa>
tomeu: (Right now, if using `ondemand` governor, Weston uses the min clock and then is noticably very sluggish.)
<alyssa>
..Also, gets warm very easily. This is annoying.
<alyssa>
Just for Weston painting at minimal clock level, several watts drawn... I'm assuming this is very much not the expected
pH5 has joined #panfrost
stikonas has joined #panfrost
BenG83 has quit [Remote host closed the connection]
<tomeu>
alyssa: do you know what's the problem here with the width field in mali_texture_descriptor, and how things should be?
<tomeu>
other than "switch to kevin"? :p
BenG83 has joined #panfrost
BenG83 has quit [Client Quit]
raster has quit [Remote host closed the connection]
<alyssa>
tomeu: The texture stride = width * bytes per pixel
<alyssa>
The framebuffer stride = (width aligned to 16) * bytes per pixel
<alyssa>
If width is not aligned to 16, those differ
<anarsoul>
alyssa: utgard can sample from textures where stride != (width * bpp), I doubt that midgard can't
<alyssa>
anarsoul: It probably can, I just don't know the magic bits to do that
<anarsoul>
how long is texture descriptor?
<alyssa>
About as long as it is wide.
<alyssa>
anarsoul: mali_texture_descriptor in panfrost-job.h
<alyssa>
Let's see what wob blob does
afaerber has quit [Quit: Leaving]
<alyssa>
In the blend descriptor for the preserevd write, |= 0x800 flags
<alyssa>
clip set tighter than expected
<alyssa>
Texture descriptorage usage2=0x32 instead of 0x12
<alyssa>
Not obvious how it's recuperating the stride
adjtm has joined #panfrost
<anarsoul>
alyssa: there's stride somewhere in descriptor
<alyssa>
Apparently
<anarsoul>
for utgard it's only present if stride flag is set
<alyssa>
Hm
<anarsoul>
see lima_texture_desc_set_res() in lima_texture.c
adjtm_ has quit [Ping timeout: 258 seconds]
<alyssa>
Wait
<anarsoul>
grep for "for padded linear texture" in this fn
<anarsoul>
(I know code in lima_texture.c is a mess, need to refactor it...)
<alyssa>
Sec
<alyssa>
No, wait, I was hopeful for a second
<alyssa>
Ooooo
<alyssa>
Got it.
<alyssa>
It's the field _after_ swizzle_bitmaps
<alyssa>
Also it uses a negative stride + adjusted bitmap which solves the Y flip issue, same way we render upside down
<alyssa>
tomeu: ^^
<anarsoul>
so texture descriptor is larger than you thought?
<alyssa>
anarsoul: Yeah
<alyssa>
Thank you :)
<anarsoul>
np, I literally did nothing
<alyssa>
You offered encouragement and believed in me! That totally counts :)
<anarsoul>
:)
<alyssa>
anarsoul: Yup, the stride field is only there optionally if it's actually needed. That explains that, then
<alyssa>
(And explains why I never found it -- since it wasn't there!)
<alyssa>
Gotta run, but I'll write a patch for this evening if I get a chance :)
<anarsoul>
alyssa: on utgard place for it is always reserved (since it's at the very beginning of descriptor), but it's not set if flag is not set
<alyssa>
anarsoul: Here it's at the very end... Whether it's actually there or not isn't obvious, since I don't pay *that* much attention to alignment vs reserved memory etc
<anarsoul>
and IIRC Qiang spend quite some time to figure out that. Blob doesn't like to reveal its secrets!
<alyssa>
anarsoul: What I don't know is how the stride field interacts with mipmaps/cubemaps
<anarsoul>
it doesn't?
<alyssa>
anarsoul: On mdg
<anarsoul>
my guess it's the same here
<alyssa>
The texture descriptor is a fixed-size descriptor, followed by a variable number of points (levels*faces)
<alyssa>
And for the non-cube 1-lvl sample, after the one pointer was the stride field
<anarsoul>
on utgard we have fixed number of mipmap levels - 6
<alyssa>
Wack
<anarsoul>
so I guess no mipmapped cube textures? (is it even possible?)
<alyssa>
Yes? :p
<alyssa>
Pretty sure there are deqp tests for it
<alyssa>
So if I have a mipmapped cubemap, does that have one stride field for everything? One stride per mipmap but shared per cube? One stride for every pointer?
<anarsoul>
poor utgard is out of luck here
<alyssa>
Are they all in a chunk at the end? Or are they interleaved?
<alyssa>
Is it a 64-bit slot for 64-bit pointers? Or is 32-bit and just not aligned nicely?
<anarsoul>
good question
<alyssa>
anarsoul: Granted, if you're using cubemaps/mipmaps, hopefully you're not using a custom stride with a linear texture ....
<alyssa>
I'm trying to think of ways to get the blob to do that, but I'm coming up empty
<alyssa>
From GL, the only way I got linear textures at all was with wallpapering
<alyssa>
Ooo, this'll let be fix a bug in Weston
<alyssa>
Happy!
<HdkR>
woo
<alyssa>
...but now Weston is crashing. It was fine this morning .-.