alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - - Logs - <daniels> avoiding X is a huge feature
stikonas_ has quit [Remote host closed the connection]
chewitt has joined #panfrost
belgin has joined #panfrost
belgin has quit [Quit: Leaving]
vstehle has quit [Ping timeout: 246 seconds]
jeez_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<alyssa> Intermediate goal: learn more about ld/st
<HdkR> alyssa: They touch memory
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
chewitt has quit [Quit: Adios!]
vstehle has joined #panfrost
<alyssa> Since posting that statement, I have now identified 27 (!) ld/st opcodes
<alyssa> That beats my usual records a few times over
<alyssa> ...Granted ld/st opcodes are extremely repetitive and formulaic....
<alyssa> OpenCL stuff
<alyssa> Generic memory load/stores (knwon to be used for OpenCL kernel args, register spilling, probably othrer stuff), image writes, atomics
<alyssa> Integer gl stuff
<alyssa> Mostly going for identification, not nearly enough understood to actually write compiler pieces for the above \shrug/
<HdkR> RE is the first step to implementation :P
<HdkR> Even if it would be nicer to say that reading the ISA manual is the first step to implementation....
<alyssa> /* We write the ISA manual */
<HdkR> At least Intel and AMD are nice enough to release documentation on the ISA :P
<tomeu> alyssa, bbrezillon: to isolate the problems, I changed the fragment shader to return a solid color:
<HdkR> Whoa, you can really see the tiling jobs now
<tomeu> I'm not sure if that's z-fighting though, or just that we scanout before the GPU is done
<HdkR> I'm going to assume the latter until proven otherwise
<bbrezillon> tomeu: you mean the one reloading FB content?
<tomeu> bbrezillon: yep
<tomeu> + struct ureg_src imm2 = ureg_imm4f( ureg, 0, 1, 0, 1 );
<tomeu> + ureg_MOV(ureg, out, imm2);
<tomeu> just that in util_make_fragment_tex_shader_writemask
<bbrezillon> ok
<tomeu> afaics, the cube is drawn after we have reloaded the FB contents just fine, but somehow the cube isn't fully drawn when we present it
<bbrezillon> yes, I noticed that sometimes the new output was recovered by old content
<bbrezillon> I thought is was something related to Z-testing
<bbrezillon> *it
<tomeu> well, I don't think it's new content covered by old content, as sometimes a whole cube side is missing
<tomeu> so it looks to me as incomplete rendering
<tomeu> bbrezillon: do you think it could be due to the linking patching done at the end of panfrost_draw_wallpaper ?
<tomeu> cannot say I understand that code
<bbrezillon> tomeu: could be
<bbrezillon> but if you leave the relaod FB tiler job at the end, it simply doesn't work
<bbrezillon> I tried that
<bbrezillon> what happens is that you get the output completely replaced by the old FB content, which ends up being black since you always start with a cleared-FB (all bytes to 0)
<tomeu> yep, I see all green here
<bbrezillon> looks like specifying the dependency is not enough
<bbrezillon> jobs have to be linked in the order they're supposed to be executed
<tomeu> bbrezillon: and what should that be?
<tomeu> I don't even know how many jobs we have there
<bbrezillon> the very first tiler job should be the "reload FB" fragment shader
<bbrezillon> then following jobs come in the order they were submitted by the gl app (glDraw() calls)
<bbrezillon> and the sequence should be flushed when glFlush() is called
<tomeu> what about the "reload FB" vertex shader?
pH5 has quit [Quit: bye]
<bbrezillon> vertex shaders are all executed before fragment shaders
<tomeu> and we don't need to set deps?
<bbrezillon> we do need to have a dep, and we do set it
<bbrezillon> it's done through dependency1 IIRC
<tomeu> for the vertex job?
<bbrezillon> a dep between the fragment/tiler and vertex job
<bbrezillon> tiler job depends on vertex job
<tomeu> ah, guess it's set somewhere else
<bbrezillon> in pan_context
<tomeu> panfrost_draw_wallpaper:81 tiler jobs 6 draw cnt 6
<tomeu> panfrost_draw_wallpaper:83 tiler jobs 7 draw cnt 7
<tomeu> why do we have 6 tiler jobs before we add the "reload FB" one?
<cwabbott> also, you need a dep between each tiler job
<cwabbott> each tiler job adds to the tile datastructure in the tiler heap
<bbrezillon> tomeu: depends what the app does
<bbrezillon> if you have 6 glDraw calls without a glFlush in between, that should generate 6 tiler jobs
<cwabbott> you can't have multiple things appending it at the same time, or it'll get corrupted or things will get drawn in the wrong order
<tomeu> was thinking that the reload jobs would be first of all, before anything from the app
<bbrezillon> tomeu: except you don't know if it will be needed ahead of time
<cwabbott> yes, the reload should be the first job
<bbrezillon> cwabbott: it is placed first, at link time
<cwabbott> if anything happens before it you need to flush by submitting a fragment job
<cwabbott> oh, yeah, I see what you mean
<bbrezillon> tomeu: we really need to check how the DDK handle that
<cwabbott> bbrezillon: actually, you should know before the app starts drawing whether you need to insert a reload job or not
<bbrezillon> maybe they have a simpler way for reloading the FB content (like instructing the GPU you don't want to clear/discard before rendering)
<bbrezillon> cwabbott: do you know ahaed of time that you'll have a tiler job?
<cwabbott> you can construct a reload job, then throw it away if the app clears
<bbrezillon> that would work too, yes
<cwabbott> or just delay it until right before the app does the first draw
<bbrezillon> cwabbott: so you think the order jobs are created matters?
<cwabbott> not so much, as long as they're linked together correctly at the end
<bbrezillon> yes, that was also my understanding (actually the set_value job is created last but is placed first at link time)
<tomeu> bbrezillon: pandecode seems to be broken in this scenario :/
<tomeu> bbrezillon: oh, we generate the reload shaders before submitting to the kernel, so after the app has drawn? that's why we have 6 tiler jobs already?
<bbrezillon> tomeu: yes
<tomeu> cool
<bbrezillon> we generate it at flush time
<bbrezillon> only if there's at least one tiler job
<bbrezillon> (and in the original code, also only if glClear(COLOR) was not called
<bbrezillon> )
<bbrezillon> tomeu: BTW, thanks for helping me with that, as I was running out of ideas ;-)
<tomeu> ok, guess I need to dump the job chain at submission time and checking everything is fine
<tomeu> well, so far I haven't done much of value :)
<bbrezillon> tomeu: discussing the thing is already useful, it helps making sure I understood the problem correctly
<bbrezillon> tomeu: I remember adding traces to make sure the jobs were linked in the correct order
<tomeu> I see some traces here about that, but I want to dump the chain at submit time, to make sure I know what the hw ends up seeing
<bbrezillon> sure
<tomeu> what's the difference between job_dependency_index_* and next_64?
<bbrezillon> if only I knew it :)
<cwabbott> tomeu: so, what you submit to the hw is a "job chain", which is a singly-linked list of jobs defined by the next_64 pointer
<cwabbott> the job manager scans the linked list, keeps a certain number of jobs in its internal memory, and submits them when they're ready
<cwabbott> the job_dependency_index_* plus job_index forms a scoreboard ( used to determine when the job is ready
<cwabbott> job_index is like the register that's written, and job_dependency_index_* are the source registers
<bbrezillon> cwabbott: ok
<bbrezillon> but then I don't get why specifying a dependency of the first (non reload) tiler job on the reload job is not enough
<cwabbott> probably because the scoreboard starts off in the "dependency finished" state
<bbrezillon> hm
<bbrezillon> makes sense
<cwabbott> it sees the first (non reload) job depending on a job it hasn't seen yet, and just assumes it has finished
<tomeu> ok, so it's not enough with setting the job_dependency_index_* fields, but we also need to make sure that the job chain has the jobs in the right order
<tomeu> so no job appears in the chain before one of its dependencies?
<cwabbott> that's right, but it sounds like bbrezillon already fixed that
<tomeu> with the memmove, yeah
pH5 has joined #panfrost
<cwabbott> you shouldn't need to move the job with memmove
stikonas has joined #panfrost
<cwabbott> the GPU only cares about the linked list, not which order things are in memory
<bbrezillon> cwabbott: was just easier this way
<bbrezillon> since it didn't involve reworking the function that's linking the jobs
<bbrezillon> but I agree, we could avoid the memmove
<cwabbott> why is there a function that's linking the jobs at all? I'd expect you to just keep appending to the linked list as you go along
<cwabbott> that seems what the HW interface is set up for
<bbrezillon> cwabbott: that's a question for alyssa I guess
<cwabbott> yeah, that seems kinda silly
<cwabbott> and won't work with compute jobs
<tomeu> cwabbott: what will be the problem with compute jobs?
<cwabbott> compute jobs can read/write images that are used by the other jobs
<cwabbott> or interact in other ways that require you to insert a compute job between two other random jobs
stikonas has quit [Remote host closed the connection]
<cwabbott> so you can't assume that the job chain is always just "vertex, tiler, vertex, tiler, ..., fragment"
<bbrezillon> I'd like to setup a config using the DDK on rockchip, is this the blob I should use ?
<tomeu> cwabbott: actually, this is currently vertex, vertex, ..., tiler, tiler, ...
<tomeu> with the fragment job being submitted separately, afterwards
<tomeu> most jobs don't have the dep fields filled:
<tomeu> guess that's fine if no jobs have those fields used
<tomeu> but as soon as we use the deps fields for some jobs, the others should as well, I think
<tomeu> otherwise, a job with no deps can execute before another job earlier in the chain
raster has joined #panfrost
<bbrezillon> tomeu: my understanding is that all vextex jobs have to be executed before tiler jobs anyway, because the GPU is tile-based
<tomeu> bbrezillon: but, based on that log, couldn't the tiler job 2 execute before the vertex job 13?
<tomeu> while the set_value job 15 executes
<bbrezillon> tomeu: hm, there's something weird
<bbrezillon> each tiler job should depend on the previous tiler job in the list, if any
<tomeu> yeah, when the jobs are created, the deps fields are set
<tomeu> but I see them zeroed at link time
<tomeu> maybe I'm doing something wrong when logging
<tomeu> was looking at that :)
<tomeu> but even then, it should at most overwrite the next_job pointer
<tomeu> not the deps
<tomeu> I will print the values after the copy
<tomeu> after the copy the values look fine
* tomeu valgrinds
<tomeu> nothing
<tomeu> wonder about the memmoves in wallpaper.c
<tomeu> nope
<bbrezillon> I'm just moving pointers around, not the actual value in the job desc
<tomeu> will have to put a watchpoint
<tomeu> (gdb) p job_p->job_dependency_index_1
<tomeu> Cannot access memory at address 0xb0ee1e14
* tomeu scratches head
<tomeu> yeah, I was printing the vertex jobs' deps, but only tiler jobs have
<tomeu> so a bug in my logging
<bbrezillon> I see odd and even indexes
<bbrezillon> in the trace you provide
<bbrezillon> *provided
<bbrezillon> so tiler jobs are there
<tomeu> bbrezillon: this is better:
<tomeu> wonder if we can assume that the hw will schedule the vertex jobs before the tiler jobs
<tomeu> other than vertex jobs not having deps, it looks fine to me
<bbrezillon> tomeu: because of the deps between tiler jobs it's already guaranteed
<bbrezillon> tiler0 depends on vertex0
<bbrezillon> and tiler1 depends on vertex1+tiler0
<bbrezillon> hm, just realized it's not guaranteed while writing it :)
<tomeu> but couldn't tiler job 14 execute before vertex job 1?
<bbrezillon> tomeu: good question
<bbrezillon> I hope not
<bbrezillon> actually, I'm not even sure that's a problem
<bbrezillon> vertex shader are independent from each other (at least in our case), the only thing you must ensure is that tiler jobs are done in the right order
<tomeu> guess you are right, because I'm seeing the same results with
<bbrezillon> tomeu: but you can try to add explicit deps between vertex jobs
<bbrezillon> just to see what happens then
<tomeu> yeah, that's what I just did
<tomeu> for no change
<tomeu> I think I'm going to try generating the reload shaders on first draw, instead of on flush
afaerber has joined #panfrost
<tomeu> wow, that works much better
<tomeu> I only have a cube, but otherwise it looks fine :)
<tomeu> so indeed, seems to be some problem with jobs chaining and deps
<tomeu> and we also seem to have some problem when sampling
<tomeu> so I guess there's something we don't yet understand regarding job submission
<bbrezillon> tomeu: you're adding the reload job before or after the first draw?
<bbrezillon> maybe the index number matters too
<tomeu> bbrezillon: before
<tomeu> HdkR: more demoscene stuff :)
<bbrezillon> tomeu: doesn't load on my end
<HdkR> oooo, so swirly
<bbrezillon> tomeu: looks like firefox doesn't like it, it works under chrome
<tomeu> works here on FF on fedora 29, but maybe I have some extra packages installed
<bbrezillon> nevermind, now it works
<bbrezillon> my internet connection is not working terribly well recently
<tomeu> besides the sampling of the texture when reloading, there's still an issue even when the reload fragment shader just uses a constant color:
<tomeu> there's a vertex that is always missing
<tomeu> so even if it the index numbers mattered and had to be sequential, there would still be something we are missing regarding job scheduling
<bbrezillon> tomeu: did you try interleaving vertex and tiler jobs?
<bbrezillon> something like vertex1 <- tiler1 <- vertex2 <- tiler2 ...
<tomeu> bbrezillon: no, but why would that matter only when we generate reload jobs?
<bbrezillon> I don't know
<tomeu> FWICS, the draw that blitter submits causes the subsequent draws to lack a vertex
<bbrezillon> are you sure it's only one vertex?
<tomeu> wonder how I could figure out if it's the first vertex or not
<bbrezillon> I see at least 2 missing
<bbrezillon> hm, maybe not
<bbrezillon> tomeu: could be a problem in the save/restore vertex state/bufs
<bbrezillon> tomeu: you can add traces to panfrost_set_vertex_buffers() to see if things are properly restored
<tomeu> bbrezillon: hmm, but it's the first draw, is there anything to be restored?
* tomeu has no clue about that code
<tomeu> oh, I see now
<bbrezillon> tomeu: vertex buffers are bound before the draw call
<bbrezillon> and since you're inserting a new draw, you have to save restore the context
<tomeu> yep
<tomeu> maybe I should try generating the reload shaders before the first vertex buffers are bound :)
<tomeu> oh, but that's not done at every frame
chewitt has joined #panfrost
<bbrezillon> tomeu: what?
<tomeu> binding the vertex buffer
<bbrezillon> no, you likely do it once, and then apply a transform on top
<bbrezillon> I didn't check kmscube code though
<tomeu> yeah, was just thinking it could be an easy way to test without the vertex buffer restore getting in the middle
<tomeu> I don't see anything obviously wrong with how the vertex buffer gets restored
<bbrezillon> IIRC, u_blitter complains if you don't save restore things properly
<bbrezillon> so that's not something you can skip
<bbrezillon> unless to want to make ->set_vertex_buffers() a NOOP just before calling the u_blitter_save/restore funcs
<bbrezillon> tomeu: can you push the code you have?
<tomeu> sure
<bbrezillon> tomeu: just reminds me that there was a flush forced by the blitter-based version of the code
<bbrezillon> when assigning the new FB
<bbrezillon> so I don't think adding the reload shader on first draw is doing what we expect
<tomeu> what would be the flush for?
<bbrezillon> have a look at panfrost_set_framebuffer_state()
<bbrezillon> which means you'll flush the queue when restoring the FB state (at the end of the u_blitter_blit() call)
<tomeu> hmm, should that be a problem?
<bbrezillon> which in turn means the reload is lost
<tomeu> hmm
<tomeu> so the vertex buffer isn't bound?
<bbrezillon> yes, that's a problem, because a reload is only here to load things back in the tile buffer
<bbrezillon> if you flush the queue, that means the tile buffers starts in a cleared state on the following draw calls
<tomeu> hmm, just commented that out and I'm seeing what seems to be the same thing
chewitt has quit [Quit: Adios!]
<bbrezillon> tomeu: kmscube is unmodified, right?
<tomeu> bbrezillon: yep
<tomeu> I think there's some problem with the blitter's vertex buffer, because its offset just keeps increasing every frame
<tomeu> or maybe it cycles around after a while
MoeIcenowy has quit [Quit: ZNC 1.6.5+deb1+deb9u1 -]
MoeIcenowy has joined #panfrost
<bbrezillon> tomeu: what do you mean by the vertex buffer offset?
<tomeu> bbrezillon: buffers[0].buffer_offset
<bbrezillon> ok
<tomeu> ah, it cycled after 1048448
<tomeu> so that's fine I guess
<bbrezillon> tomeu: wallpaper_draw() doesn't seem to be called on the branch you pushed
<tomeu> bbrezillon: maybe you need to do this?
<tomeu> - if (ctx->draw_count == 0 && !ctx->blitter->running && 0)
<tomeu> + if (ctx->draw_count == 0 && !ctx->blitter->running)
<bbrezillon> nevermind, there's an && 0
<bbrezillon> yep
<bbrezillon> tomeu: hm, aren't we calling panfrost_queue_draw() recursively with a draw_wallpaper() call placed in queue_draw()?
<bbrezillon> that's why you have this blitter running check
<tomeu> we guard against that
<tomeu> yep
<bbrezillon> got it
<tomeu> so, after moving the call to wallpaper_draw to the start of panfrost_draw_vbo, I get the cube complete
<tomeu> so something that is done before panfrost_queue_draw wasn't being properly restored
<tomeu> but I guess that it's more correct anyway to do it there?
<tomeu> now the only problem I see is with what gets sampled when reloading
<tomeu> bbrezillon: so this:
<bbrezillon> tomeu: not surprised calling draw from within the draw implem is not entirely safe :)
<cwabbott> I wonder if it isn't better from a maintainability point of view to just build a tiler job yourself
<cwabbott> sure, you'd be duplicating some stuff, but you'd avoid all the circular dependency pain
<cwabbott> you could reuse the compiler bits for the most part, and you wouldn't even need to make a vertex job, just a tiler job
<bbrezillon> cwabbott: that's what I ended up doing based on alyssa's initial implem
<bbrezillon> tomeu: reverting your change in simpler_shader.c triggers an assert()
<tomeu> bbrezillon: yeah, we don't have yet texf (sampling with LOD)
<bbrezillon> but commenting out the depth=1 line seems to do the trick
<tomeu> wonder if that's what is causing problems when sampling
<tomeu> hmm
<tomeu> cwabbott: I think from the maintainability POV this is quite good
<cwabbott> just a random guess, but have you checked that the blob isn't doing something weird with the sampler or texture descriptors?
<tomeu> but right now I think we should learn as much as possible , as it's clear some things aren't working as expected
<tomeu> I think bbrezillon was going to try that?
<cwabbott> there could be some cache it has to bypass or something
<bbrezillon> tomeu: yep
<bbrezillon> just got distracted by your live debugging session :-)
<cwabbott> also, is the framebuffer afbc, and if so do you have sampling from afbc wired up correctly?
<tomeu> nice, pandecode works now that we have the job chain probperly built
<tomeu> cwabbott: don't think it's using afbc, this is the pandecode corresponding to the last video:
<alyssa> Scrolly guacomole
<tomeu> alyssa: hi!
<tomeu> bbrezillon: any progress with the blob?
<alyssa> tomeu: TBF, I don't understand the linking patching code either and I wrote it....
<bbrezillon> tomeu: sorry, got interrupted by things on a different project
<alyssa> tomeu: 6 tiler jobs = 6 draws = 6 faces of kmscube
<alyssa> tomeu: pandecode is broken in a lot of scenarios... I really need to fix this stuff...
<tomeu> alyssa: I think you can ignore most of my comments from today, except probably the last ones
<alyssa> tomeu: But the blob handles reload stuff in literally this way, adding a special TILER job with a passthrough tex fragment shader. No vertex shader, specially computed varyings used instead.
<tomeu> ok, I think the vertex side of things is fine
<tomeu> but the sampling...
<alyssa> cwabbott: In my defense the linking stuff is some of the oldest code in the driver, I'm well-aware it needs to be rewritten ......
<alyssa> OA
<alyssa> tomeu: Tiler job 14 depends on tiler job 13 dep... tiler job 1 depends on vertex job 1.
<tomeu> alyssa: I think the linking is fine now
<alyssa> Vertex jobs are genuinely independent
<alyssa> bbrezillon: If the blit shader *flushes* then... we're in a loop since the whole bug this is around is "flushing without reloading first" >_<
<alyssa> tomeu: "Just keeps increasing every frame" That's how vbo offsets are often setup... the app keeps changing stuff but it's better to append than modify in place for various reasons
<alyssa> cwabbott: Yeah, the original (buggy) implementation was building the tiler job manually, as the blob does.. Still think that's the way to go
<alyssa> cwabbott: The blob is doing a texelFetch op instead of a texture, and there's a weird interp0 on the varying, but beside that I don't think much is different..? But caches etc are totally likely
MoeIcenowy has quit [Quit: ZNC 1.6.5+deb1+deb9u1 -]
<tomeu> I think we anyway want to end up there because of performance, but I liked the idea of starting with something that is used by other drivers, for sake of correctness
<alyssa> cwabbott: kmscube won't be AFBC at this point, since it's going straight to GBM and I haven't wired up AFBC in the Rockchip display driver yet
MoeIcenowy has joined #panfrost
<tomeu> alyssa: btw, I think this screenshot is more illustrative of what's going on, as it's the second frame:
<cwabbott> tomeu: looks like it could be a wrong stride?
<tomeu> yeah, but I'm at a loss on which stride that could be :)
<tomeu> have played with randomly changing the formats, but nothing positive has come from that
<cwabbott> did you check the texture state stride?
MoeIcenowy has quit [Client Quit]
<tomeu> saw nothing interesting in the cmdstream
MoeIcenowy has joined #panfrost
<tomeu> cwabbott: you mean somewhere else than mali_texture_descriptor ?
<cwabbott> no, I mean the mali_texture_descriptor
<cwabbott> although I guess that's gonna come from the blitter ultimately
<tomeu> don't see any stride in there
<cwabbott> my bet is on a unit mismatch somewhere, that would explain the roughly 4:1 slope of the lines
<tomeu> have this in the cmdstream:
<tomeu> .format = MALI_RGBA8_UNORM,
<tomeu> guess that plus width is used by the hw to compute the stride:
<cwabbott> that's fine
<tomeu> .width = MALI_POSITIVE(1366),
MoeIcenowy has quit [Client Quit]
<alyssa> tomeu: 1366..?
<alyssa> On a RK3288?
<alyssa> Er Veyron
MoeIcenowy has joined #panfrost
<alyssa> 1366 is not a multiple of 16 (the tile width)
<alyssa> When we render we have to round up to 16
<alyssa> But then in the texture we're not doing it right there
<tomeu> yep, a veyron jaq
<alyssa> Try on a Kevin, 10c it'll go away
* alyssa class
<cwabbott> is width in units of bytes or pixels?
<tomeu> cwabbott: should be pixels
* tomeu tries on a random hdmi monitor
MoeIcenowy has quit [Client Quit]
MoeIcenowy has joined #panfrost
<tomeu> haha
<tomeu> with this, I get it almost right:
<tomeu> .width = MALI_POSITIVE(ALIGN(texture->width0, 16)),
<tomeu> seems to be flipped veritcally
<tomeu> and we are missing a few _columns_ at the end
<cwabbott> the flipping thing is probably the FBO flipping madness
<cwabbott> the system framebuffer in GLX is always flipped in the y direction, for what I've heard are "hilarious historical reasons"
<cwabbott> well, with DRI i guess
<cwabbott> normally the gallium state tracker handles that for you though... hmm
<urjaman> that is quite artistic
<tomeu> it has been only art since I started working on this
<alyssa> ^ what cwabbot said
<alyssa> t
<alyssa> tomeu: Pixels
<tomeu> wonder what happened to the wallpaper
<tomeu> pity that we missed the 19.1 deadline :/
<tomeu> would have been cool to have weston working out of the box in debian without having to wait for 19.2
fysa has joined #panfrost
<alyssa> tomeu: Better to wait for 19.2, for performance improvements and maybe glamor support and such..?
<alyssa> I was more worried about getting in the kernel on-time
BenG83 has joined #panfrost
BenG83 has quit [Remote host closed the connection]
pH5 has quit [Quit: bye]
BenG83 has joined #panfrost
<alyssa> tomeu: (Right now, if using `ondemand` governor, Weston uses the min clock and then is noticably very sluggish.)
<alyssa> ..Also, gets warm very easily. This is annoying.
<alyssa> Just for Weston painting at minimal clock level, several watts drawn... I'm assuming this is very much not the expected
pH5 has joined #panfrost
stikonas has joined #panfrost
BenG83 has quit [Remote host closed the connection]
<tomeu> alyssa: do you know what's the problem here with the width field in mali_texture_descriptor, and how things should be?
<tomeu> other than "switch to kevin"? :p
BenG83 has joined #panfrost
BenG83 has quit [Client Quit]
raster has quit [Remote host closed the connection]
<alyssa> tomeu: The texture stride = width * bytes per pixel
<alyssa> The framebuffer stride = (width aligned to 16) * bytes per pixel
<alyssa> If width is not aligned to 16, those differ
<anarsoul> alyssa: utgard can sample from textures where stride != (width * bpp), I doubt that midgard can't
<alyssa> anarsoul: It probably can, I just don't know the magic bits to do that
<anarsoul> how long is texture descriptor?
<alyssa> About as long as it is wide.
<alyssa> anarsoul: mali_texture_descriptor in panfrost-job.h
<alyssa> Let's see what wob blob does
afaerber has quit [Quit: Leaving]
<alyssa> In the blend descriptor for the preserevd write, |= 0x800 flags
<alyssa> clip set tighter than expected
<alyssa> Texture descriptorage usage2=0x32 instead of 0x12
<alyssa> Not obvious how it's recuperating the stride
adjtm has joined #panfrost
<anarsoul> alyssa: there's stride somewhere in descriptor
<alyssa> Apparently
<anarsoul> for utgard it's only present if stride flag is set
<alyssa> Hm
<anarsoul> see lima_texture_desc_set_res() in lima_texture.c
adjtm_ has quit [Ping timeout: 258 seconds]
<alyssa> Wait
<anarsoul> grep for "for padded linear texture" in this fn
<anarsoul> (I know code in lima_texture.c is a mess, need to refactor it...)
<alyssa> Sec
<alyssa> No, wait, I was hopeful for a second
<alyssa> Ooooo
<alyssa> Got it.
<alyssa> It's the field _after_ swizzle_bitmaps
<alyssa> Also it uses a negative stride + adjusted bitmap which solves the Y flip issue, same way we render upside down
<alyssa> tomeu: ^^
<anarsoul> so texture descriptor is larger than you thought?
<alyssa> anarsoul: Yeah
<alyssa> Thank you :)
<anarsoul> np, I literally did nothing
<alyssa> You offered encouragement and believed in me! That totally counts :)
<anarsoul> :)
<alyssa> anarsoul: Yup, the stride field is only there optionally if it's actually needed. That explains that, then
<alyssa> (And explains why I never found it -- since it wasn't there!)
<alyssa> Gotta run, but I'll write a patch for this evening if I get a chance :)
<anarsoul> alyssa: on utgard place for it is always reserved (since it's at the very beginning of descriptor), but it's not set if flag is not set
<alyssa> anarsoul: Here it's at the very end... Whether it's actually there or not isn't obvious, since I don't pay *that* much attention to alignment vs reserved memory etc
<anarsoul> and IIRC Qiang spend quite some time to figure out that. Blob doesn't like to reveal its secrets!
<alyssa> anarsoul: What I don't know is how the stride field interacts with mipmaps/cubemaps
<anarsoul> it doesn't?
<alyssa> anarsoul: On mdg
<anarsoul> my guess it's the same here
<alyssa> The texture descriptor is a fixed-size descriptor, followed by a variable number of points (levels*faces)
<alyssa> And for the non-cube 1-lvl sample, after the one pointer was the stride field
<anarsoul> on utgard we have fixed number of mipmap levels - 6
<alyssa> Wack
<anarsoul> so I guess no mipmapped cube textures? (is it even possible?)
<alyssa> Yes? :p
<alyssa> Pretty sure there are deqp tests for it
<alyssa> So if I have a mipmapped cubemap, does that have one stride field for everything? One stride per mipmap but shared per cube? One stride for every pointer?
<anarsoul> poor utgard is out of luck here
<alyssa> Are they all in a chunk at the end? Or are they interleaved?
<alyssa> Is it a 64-bit slot for 64-bit pointers? Or is 32-bit and just not aligned nicely?
<anarsoul> good question
<alyssa> anarsoul: Granted, if you're using cubemaps/mipmaps, hopefully you're not using a custom stride with a linear texture ....
<alyssa> I'm trying to think of ways to get the blob to do that, but I'm coming up empty
<alyssa> From GL, the only way I got linear textures at all was with wallpapering
<alyssa> Ooo, this'll let be fix a bug in Weston
<alyssa> Happy!
<HdkR> woo
<alyssa> ...but now Weston is crashing. It was fine this morning .-.
<alyssa> Oh, I was messing with my .config
<alyssa> It usually *is* my fault, huh? :P
<Lyude> maybe it'll be fine tommorrow morning :)
<Lyude> oh hey the job thing is up
<HdkR> Everyone*
<Lyude> yeah if you want a recommendation and I'm familiar with your work, feel free to let me know
<Lyude> (so, that's pretty much anyone who is active in here)
<HdkR> If only that was available six months ago :P
<alyssa> Ouch.
<alyssa> HOWTO make code suck less?
<alyssa> Ha-ha! Bug fixed! Take that, Weston!
<anarsoul> what bug?
<alyssa> anarsoul: Stride being messed up when rendering to NPOT windows
<alyssa> Breaking gears
<alyssa> Now gears works again :)
<anarsoul> I see
<alyssa> anarsoul: Oh, don't be smug :V
<anarsoul> :P
<alyssa> Er, not NPOTs
<alyssa> Non-16-aligned
<anarsoul> non-tile-aligned
<alyssa> Yeah
<alyssa> Anyways, series on the list. Alyssa out.
* alyssa unplugs Matrix-esque cable from skull