<raster>
region reported is not "the whole buffer"
<raster>
DMG REGION 72,104 -> 616,600
<raster>
DMG REGION 568,560 -> 624,600
<raster>
DMG REGION 576,544 -> 624,600
<raster>
(minx/y -> max/y)
<bbrezillon>
hm, so something is calling the ->partial_update() hook
<bbrezillon>
raster: what's the res size?
<raster>
1920x1080
<raster>
oh wait
<raster>
wtf?
<raster>
we DO have support for partial update?
<raster>
i never through we added it
<raster>
wait
<raster>
pause
<raster>
this may be us
<raster>
it was added in 2016...
<raster>
well well
<raster>
wth...
<raster>
i didnt thnik we had it...
<raster>
we've been doing this in x11, wayland clients and drm/kms ever since then
<bbrezillon>
still doesn't explain why it doesn't work :)
<raster>
i'm digging
<mmind00>
urjaman: just saw in the logs that you seem to be doing x11 with panfrost ... any hints on what needs to be set? It seems to find panfrost for glamor, but things like glmark-es2 still seem to fall back onto llvmpipe ... glmark2-es2-drm works fine though, so it looks like some sort of x11 setting I'm missing
<alyssa>
mmind00: What does es2_info say?
<urjaman>
also i wouldnt know for i didnt try glmark (and i'm not running x11 w/ panfrost right now)
<daniels>
raster: are you sure your partial_update use is correct? :)
<raster>
i'm checking
<bbrezillon>
daniels: I'm also not sure my partial_update() implementation is correct :)
<raster>
it seems to be
<raster>
i was just tracing our regions higher up the stack
<raster>
they match the geometry that min/max x/y region give
<raster>
a small snippet of debug:
<raster>
BUFFER AGE: 2
<raster>
MASTER CLIP 72 104 8x24
<raster>
SWAP TIME...
<raster>
REGION SET 72 104 8x24
<raster>
DMG REGION 72,104 -> 80,128
<raster>
the DMG REGION is from inside mesa
<bbrezillon>
raster: did you try with after adding the missing damage_reset() call in panfrost_resource_from_handle()?
<bbrezillon>
raster: BTW, what are the symptoms?
<raster>
oh no
<raster>
not yet
<raster>
umm
<raster>
a black screen with some trails of update reagions
<raster>
with 1 frame seemingly mostly correct
<raster>
(most of the screen redrew)
<raster>
its just "buffer age/partial update" smells all over it :)
<bbrezillon>
well, if reverting 65ae86b85422
<raster>
gimme a bit
<raster>
going ot get more debug
<bbrezillon>
make it work, it's definitely related to partial-update :)
<raster>
ki wonder if perhaps we call eglSetDamageRegionKHR multiple times per draw/swap
<raster>
let me check
<raster>
like set it up for 1region, then draw
<raster>
then another, then draw
<raster>
nope. we set them all up in advance in one go
<raster>
would our scissor clips be the issue?
<raster>
we specifically also set up scissor clips to be our update region
<raster>
?
<raster>
interesting
<bbrezillon>
can you check the fine rendering region (batch->{minx,maxx,miny,max}) in draw_wallpaper()?
<raster>
when we set a single update region this code does the right yhing for mg region
<raster>
when we setup > 1 ... it drops back to "full update"
<mmind00>
urjaman: but you didn't do anything specific I guess?
<bbrezillon>
raster: nope
<mmind00>
urjaman: I'm just guessing that I must be missing something very basic
<raster>
:)
<daniels>
raster: just to be clear, you know that partial_update and swap_buffers_with_damage take totally different regions, right?
<bbrezillon>
raster: you mean several rects passed to a single partial_update() call?
<bbrezillon>
or several calls to to partial_update()?
<raster>
the logic reading this does seem to only handle 1 region for update
<daniels>
the partial_update spec has a pretty good visual explanation, and after writing a buggy Weston implementation, I went through and commented the hell out of all the surrounding code to explain the region juggling we go through
<raster>
daniels: i know - for us they have always been 1:1
<daniels>
ok, that's incorrect :)
<daniels>
SwapBuffersWithDamage takes _surface-relative_ damage - the change relative to the last time you called SwapBuffers(WithDamage)
<daniels>
SetUpdateRegion takes _buffer-relative_ damage - the region you're intending to change relative to the last time you _used that buffer_
<raster>
oh
<raster>
for us i think that doesnt matter because we "accumulate"
<raster>
let me mull it over but i think i've been here before
<daniels>
so your swap region is going to be driven by only the area you're redrawing in that cycle; the partial-update region needs to look at buffer_age, and accumulate all the damage back to the last time that buffer was current
<raster>
so if we're triple buffering our update region is this frame + last frame +_ frame before that
<raster>
so we kind of redraw a moving window across N frames
<urjaman>
mmind00: i did that revert that anarsoul linked above to get glamor (or even mpv) to eglInitialize() but that shouldnt have an effect if you have glamor working ...
<daniels>
right
<raster>
its not as efficient as it could be
<raster>
but it kind of side-steps these issues .. i think...
<raster>
(and if buffer age ever changes from being a constant value we throw our hands in the air and do a full redraw for that frame thus throwing in a full update in the pipeline if there is a hiccup)
<raster>
so let me think
<raster>
yeah a quick mull makes me think its moot due to the "over n frame accumulate" thing
<grw>
hiya- is panfrost known to work/not work on amlogic s912?
<raster>
so let's assume 0 == actual updates for the current frame
<raster>
thats where things changed this frame
<raster>
1 == what changes 1 frame ago
<raster>
and 2 == 2 frames ago
<raster>
every draw we actually have a sliding window and we actually draw the union of 0, 1, 2
<raster>
but the 1, 2 regions have no actual changes in them - tis dumbly repainging what was drawn a frame ago
<raster>
in general this isnt actually that bad in performance as most of the time those regiosn tend to be very close together or identical. but let's leave that aside
<raster>
i dont think this would break the partial update as such if it was per buffer not per surface
<daniels>
right, that's the only correct implementation if you don't have EGL_SWAP_BEHAVIOR_PRESERVED
<raster>
in this case we can actually dumbly keep them 1:1
<daniels>
you have to redraw the union regardless - you can optimise to only pass 0 to SwapBuffersWithDamage, but you have to pass (0 u 1 u 2) to both actual redraw as well as partial_update
<raster>
so as best i can tell here... we're submitting the right stuff if we are assuming regions 2, 1 are dumb "what was there before anyway" paints
<raster>
yeah
<raster>
we definitely could optimize the last bit for sure
<raster>
but it's been kind of moot in real life data so never bothered out of simplicity :)
<daniels>
(this is assuming that your buffer_age query returns 2)
<raster>
so my mulling is... wer're doing the right thing...
<raster>
am i wrong?
<raster>
i just want to not hunt in the wrong place
<raster>
that does not matuch up with the small rects we submitted for our partial update rects
<daniels>
rects
<raster>
oh
<raster>
hmm
<raster>
then its right
<daniels>
The list should consist of <n_rects> groups of four values, with each group representing a single rectangle in surface coordinates in the form {x, y, width, height}. Coordinates are specified relative to the lower left corner of the surface. It is not necessary to avoid overlaps of the specified rectangles.
<raster>
bbl
<raster>
meeting
<daniels>
don't worry if the damage co-ords are larger - for various reasons we don't submit each rect to reload as separate reload jobs, but we just take the largest possible union area and reload that
<shadeslayer>
well, I guess what I'm asking is, a context can have more than one colour buffer attached right?
<narmstrong>
daniels: oh, ok, but I don't understand why it worked without this at some point.... thanks for the pointer
<daniels>
narmstrong: i've made many mistakes in my life, but knowing too much about MMUs isn't one of them
<daniels>
shadeslayer: yeah, it can, but on the other hand Panfrost doesn't actually support simultaneous emission to multiple render targets
<daniels>
the hardware doesn't support it, so we have to emulate it by running the fragment shader multiple times, masking the outputs so we write to one cbuf at a time
<narmstrong>
daniels: no offense ! I only understand the basic concepts of IOMMU, I'm only wondering why it worked as-in on the initial panfrost patchset, and now no more, but it seems this `ias` check was there already...
<daniels>
narmstrong: no no, absolutely no offence taken! :) i'm very happy & comfortable with not knowing the details of Arm MMUs. my brain is already full enough with random crap.
<daniels>
narmstrong: so why it's regressed I have no idea ...
<daniels>
shadeslayer: i couldn't tell you which part was incorrect
<shadeslayer>
ok, so at least they're definitely at odds with each other?
<shadeslayer>
alyssa: ^^ thoughts?
megi has quit [Ping timeout: 245 seconds]
<alyssa>
shadeslayer: If there are multiple cbufs attached, it can't be scanout; MRT must be off-screen
<alyssa>
So the check in context.c#L159 is right
<alyssa>
The comment is misleading
<alyssa>
daniels: "the hardware doesn't support it, so we have to emulate it by running the fragment shader multiple times, masking the outputs so we write to one cbuf at a time"
<alyssa>
On T760+, I'm not sure if this is true
<narmstrong>
ok I know now, the initial panfrost had :
<narmstrong>
+ .ias = 48,
<narmstrong>
+ .oas = 40,
<shadeslayer>
alyssa: I see, any particular reason we only copy 4 cbufs then?
<alyssa>
shadeslayer: We only support 4 cbufs
<shadeslayer>
alyssa: but PIPE_MAX_COLOR_BUFS is 8 :S
<alyssa>
shadeslayer: Yes, but we don't support more than 4 even on chips that do MRT
<alyssa>
Desktop OpenGL requires 8 cbufs
<alyssa>
OpenGL ES only requires 4, and Mali only does 4 (when it does MRT at all... T720 and T6xx don't do any and have to soft-emulate it)
<alyssa>
Could we emulate 8 cbufs? Yes. Do we? No.
<alyssa>
So this will take the rest of eternity to implement.
<mmind00>
"rest of eternity" doesn't sound thaaaat long
<raster>
bbrezillon: yeah. the code looked pretty pessimistic reading it it seemed to want to do a bbox basically (well the inverse of one actually)
<anarsoul>
alyssa: looks pretty much like utgard to me
<raster>
bbrezillon: it seems the damage rect (the one NOT to copy in) doesnt match what we're submitting as regions it seems
yann has quit [Ping timeout: 244 seconds]
<anarsoul>
although we don't have condition register
<anarsoul>
select takes two slots -- condition is output of smul; branch can consume condition directly (any reg)
<alyssa>
anarsoul: Do you have the crazy scheduling reqs?
<anarsoul>
like what?
<alyssa>
All the VLIW stuff
<alyssa>
Because of scheduling alone, the blob is getting better compiles than us
<alyssa>
So I aim to fix that today
<alyssa>
I feel
<alyssa>
frozen not being able to start this um
<alyssa>
time to put on some music and start breaking things.
<raster>
bbrezillon: what damage_reset() call? cant find one i mesa grepping... ?
<anarsoul>
alyssa: yeah, Utgard PP is also VLIW
<alyssa>
anarsoul: Good luck.
<anarsoul>
it's more or less figured out
<anarsoul>
:)
<alyssa>
The lgood luck was for writing a scheduler.
<anarsoul>
we already do that
<anarsoul>
I'm working on merging loads into instruction where it's used
<alyssa>
:+1:
<anarsoul>
well, "working"
<anarsoul>
I know how to do that, but actual work hasn't been started yet
<alyssa>
In other news I've seen all of "LUT" "VLUT" and "VSFU" used to refer to the fifth unit in the ALU... not sure which is the real Arm term, I think VLUT but who knows
<alyssa>
(I've seen both VLUT and VSFU in public Arm stuff I think? I dunno)
<anarsoul>
what is vlut?
<alyssa>
anarsoul: vector lookup table
<anarsoul>
load uniform/temporary?
<anarsoul>
oh
<alyssa>
The unit for sin/cos as well as an extra multiplier
<alyssa>
The lookup tables are scalar but the multiplier/moves are vector so go figure
<anarsoul>
same here
<anarsoul>
:)
<anarsoul>
we lower sin/cos to scalar for this reason
<anarsoul>
but there's no dedicated unit for it in Utgard