#panfrost on 2019-05-14 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:40 stikonas_ has quit [Remote host closed the connection]

01:18 chewitt has joined #panfrost

01:18 belgin has joined #panfrost

02:15 belgin has quit [Quit: Leaving]

02:15 vstehle has quit [Ping timeout: 246 seconds]

02:25 jeez_ has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

02:59 <alyssa> Intermediate goal: learn more about ld/st

03:00 <HdkR> alyssa: They touch memory

03:08 chewitt has quit [Quit: Zzz..]

03:16 chewitt has joined #panfrost

03:24 chewitt has quit [Quit: Adios!]

05:00 vstehle has joined #panfrost

05:14 <alyssa> Since posting that statement, I have now identified 27 (!) ld/st opcodes

05:15 <alyssa> That beats my usual records a few times over

05:15 <alyssa> ...Granted ld/st opcodes are extremely repetitive and formulaic....

05:15 <alyssa> OpenCL stuff

05:16 <alyssa> Generic memory load/stores (knwon to be used for OpenCL kernel args, register spilling, probably othrer stuff), image writes, atomics

05:16 <alyssa> Integer gl stuff

05:16 <alyssa> Mostly going for identification, not nearly enough understood to actually write compiler pieces for the above \shrug/

05:19 <HdkR> RE is the first step to implementation :P

05:23 <HdkR> Even if it would be nicer to say that reading the ISA manual is the first step to implementation....

05:25 <alyssa> /* We write the ISA manual */

05:33 <HdkR> At least Intel and AMD are nice enough to release documentation on the ISA :P

06:53 <tomeu> alyssa, bbrezillon: to isolate the problems, I changed the fragment shader to return a solid color: https://people.collabora.com/~tomeu/panfrost_wallpaper_solid.mp4

06:55 <HdkR> Whoa, you can really see the tiling jobs now

06:55 <tomeu> I'm not sure if that's z-fighting though, or just that we scanout before the GPU is done

06:56 <HdkR> I'm going to assume the latter until proven otherwise

06:59 <bbrezillon> tomeu: you mean the one reloading FB content?

06:59 <tomeu> bbrezillon: yep

07:00 <tomeu> + struct ureg_src imm2 = ureg_imm4f( ureg, 0, 1, 0, 1 );

07:00 <tomeu> + ureg_MOV(ureg, out, imm2);

07:00 <tomeu> just that in util_make_fragment_tex_shader_writemask

07:00 <bbrezillon> ok

07:01 <tomeu> afaics, the cube is drawn after we have reloaded the FB contents just fine, but somehow the cube isn't fully drawn when we present it

07:01 <bbrezillon> yes, I noticed that sometimes the new output was recovered by old content

07:02 <bbrezillon> I thought is was something related to Z-testing

07:02 <bbrezillon> *it

07:02 <tomeu> well, I don't think it's new content covered by old content, as sometimes a whole cube side is missing

07:02 <tomeu> so it looks to me as incomplete rendering

07:03 <tomeu> bbrezillon: do you think it could be due to the linking patching done at the end of panfrost_draw_wallpaper ?

07:03 <tomeu> cannot say I understand that code

07:03 <bbrezillon> tomeu: could be

07:04 <bbrezillon> but if you leave the relaod FB tiler job at the end, it simply doesn't work

07:04 <bbrezillon> I tried that

07:05 <bbrezillon> what happens is that you get the output completely replaced by the old FB content, which ends up being black since you always start with a cleared-FB (all bytes to 0)

07:06 <tomeu> yep, I see all green here

07:06 <bbrezillon> looks like specifying the dependency is not enough

07:06 <bbrezillon> jobs have to be linked in the order they're supposed to be executed

07:06 <tomeu> bbrezillon: and what should that be?

07:07 <tomeu> I don't even know how many jobs we have there

07:07 <bbrezillon> the very first tiler job should be the "reload FB" fragment shader

07:08 <bbrezillon> then following jobs come in the order they were submitted by the gl app (glDraw() calls)

07:09 <bbrezillon> and the sequence should be flushed when glFlush() is called

07:10 <tomeu> what about the "reload FB" vertex shader?

07:11 pH5 has quit [Quit: bye]

07:11 <bbrezillon> vertex shaders are all executed before fragment shaders

07:12 <tomeu> and we don't need to set deps?

07:13 <bbrezillon> we do need to have a dep, and we do set it

07:13 <bbrezillon> it's done through dependency1 IIRC

07:13 <tomeu> for the vertex job?

07:13 <bbrezillon> a dep between the fragment/tiler and vertex job

07:14 <bbrezillon> tiler job depends on vertex job

07:14 <tomeu> ah, guess it's set somewhere else

07:14 <bbrezillon> in pan_context

07:14 <tomeu> panfrost_draw_wallpaper:81 tiler jobs 6 draw cnt 6

07:14 <tomeu> panfrost_draw_wallpaper:83 tiler jobs 7 draw cnt 7

07:15 <tomeu> why do we have 6 tiler jobs before we add the "reload FB" one?

07:15 <cwabbott> also, you need a dep between each tiler job

07:16 <cwabbott> each tiler job adds to the tile datastructure in the tiler heap

07:16 <bbrezillon> tomeu: depends what the app does

07:16 <bbrezillon> if you have 6 glDraw calls without a glFlush in between, that should generate 6 tiler jobs

07:16 <cwabbott> you can't have multiple things appending it at the same time, or it'll get corrupted or things will get drawn in the wrong order

07:16 <tomeu> was thinking that the reload jobs would be first of all, before anything from the app

07:17 <bbrezillon> tomeu: except you don't know if it will be needed ahead of time

07:17 <cwabbott> yes, the reload should be the first job

07:18 <bbrezillon> cwabbott: it is placed first, at link time

07:18 <cwabbott> if anything happens before it you need to flush by submitting a fragment job

07:18 <cwabbott> oh, yeah, I see what you mean

07:19 <bbrezillon> tomeu: we really need to check how the DDK handle that

07:22 <cwabbott> bbrezillon: actually, you should know before the app starts drawing whether you need to insert a reload job or not

07:22 <bbrezillon> maybe they have a simpler way for reloading the FB content (like instructing the GPU you don't want to clear/discard before rendering)

07:23 <bbrezillon> cwabbott: do you know ahaed of time that you'll have a tiler job?

07:23 <cwabbott> you can construct a reload job, then throw it away if the app clears

07:23 <bbrezillon> that would work too, yes

07:24 <cwabbott> or just delay it until right before the app does the first draw

07:24 <bbrezillon> cwabbott: so you think the order jobs are created matters?

07:24 <cwabbott> not so much, as long as they're linked together correctly at the end

07:25 <bbrezillon> yes, that was also my understanding (actually the set_value job is created last but is placed first at link time)

07:26 <tomeu> bbrezillon: pandecode seems to be broken in this scenario :/

07:27 <tomeu> bbrezillon: oh, we generate the reload shaders before submitting to the kernel, so after the app has drawn? that's why we have 6 tiler jobs already?

07:28 <bbrezillon> tomeu: yes

07:28 <tomeu> cool

07:28 <bbrezillon> we generate it at flush time

07:28 <bbrezillon> only if there's at least one tiler job

07:29 <bbrezillon> (and in the original code, also only if glClear(COLOR) was not called

07:29 <bbrezillon> )

07:30 <bbrezillon> tomeu: BTW, thanks for helping me with that, as I was running out of ideas ;-)

07:30 <tomeu> ok, guess I need to dump the job chain at submission time and checking everything is fine

07:30 <tomeu> well, so far I haven't done much of value :)

07:31 <bbrezillon> tomeu: discussing the thing is already useful, it helps making sure I understood the problem correctly

07:38 <bbrezillon> tomeu: I remember adding traces to make sure the jobs were linked in the correct order

07:39 <tomeu> I see some traces here about that, but I want to dump the chain at submit time, to make sure I know what the hw ends up seeing

07:39 <bbrezillon> sure

07:40 <tomeu> what's the difference between job_dependency_index_* and next_64?

07:41 <bbrezillon> if only I knew it :)

07:41 <cwabbott> tomeu: so, what you submit to the hw is a "job chain", which is a singly-linked list of jobs defined by the next_64 pointer

07:42 <cwabbott> the job manager scans the linked list, keeps a certain number of jobs in its internal memory, and submits them when they're ready

07:43 <cwabbott> the job_dependency_index_* plus job_index forms a scoreboard (https://en.wikipedia.org/wiki/Scoreboarding) used to determine when the job is ready

07:44 <cwabbott> job_index is like the register that's written, and job_dependency_index_* are the source registers

07:45 <bbrezillon> cwabbott: ok

07:46 <bbrezillon> but then I don't get why specifying a dependency of the first (non reload) tiler job on the reload job is not enough

07:47 <cwabbott> probably because the scoreboard starts off in the "dependency finished" state

07:47 <bbrezillon> hm

07:48 <bbrezillon> makes sense

07:48 <cwabbott> it sees the first (non reload) job depending on a job it hasn't seen yet, and just assumes it has finished

07:54 <tomeu> ok, so it's not enough with setting the job_dependency_index_* fields, but we also need to make sure that the job chain has the jobs in the right order

07:54 <tomeu> so no job appears in the chain before one of its dependencies?

07:59 <cwabbott> that's right, but it sounds like bbrezillon already fixed that

08:01 <tomeu> with the memmove, yeah

08:02 pH5 has joined #panfrost

08:04 <cwabbott> you shouldn't need to move the job with memmove

08:05 stikonas has joined #panfrost

08:05 <cwabbott> the GPU only cares about the linked list, not which order things are in memory

08:08 <bbrezillon> cwabbott: was just easier this way

08:09 <bbrezillon> since it didn't involve reworking the function that's linking the jobs

08:09 <bbrezillon> but I agree, we could avoid the memmove

08:10 <cwabbott> why is there a function that's linking the jobs at all? I'd expect you to just keep appending to the linked list as you go along

08:11 <cwabbott> that seems what the HW interface is set up for

08:11 <bbrezillon> cwabbott: that's a question for alyssa I guess

08:13 <bbrezillon> cwabbott: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/panfrost/pan_context.c#L1289

08:13 <cwabbott> yeah, that seems kinda silly

08:13 <cwabbott> and won't work with compute jobs

08:14 <tomeu> cwabbott: what will be the problem with compute jobs?

08:15 <cwabbott> compute jobs can read/write images that are used by the other jobs

08:16 <cwabbott> or interact in other ways that require you to insert a compute job between two other random jobs

08:16 stikonas has quit [Remote host closed the connection]

08:17 <cwabbott> so you can't assume that the job chain is always just "vertex, tiler, vertex, tiler, ..., fragment"

08:17 <bbrezillon> I'd like to setup a config using the DDK on rockchip, is this the blob I should use https://github.com/rockchip-linux/libmali ?

08:37 <tomeu> cwabbott: actually, this is currently vertex, vertex, ..., tiler, tiler, ...

08:40 <tomeu> with the fragment job being submitted separately, afterwards

08:47 <tomeu> most jobs don't have the dep fields filled: http://paste.debian.net/1083793/

08:47 <tomeu> guess that's fine if no jobs have those fields used

08:47 <tomeu> but as soon as we use the deps fields for some jobs, the others should as well, I think

08:48 <tomeu> otherwise, a job with no deps can execute before another job earlier in the chain

08:59 raster has joined #panfrost

09:05 <bbrezillon> tomeu: my understanding is that all vextex jobs have to be executed before tiler jobs anyway, because the GPU is tile-based

09:06 <tomeu> bbrezillon: but, based on that log, couldn't the tiler job 2 execute before the vertex job 13?

09:07 <tomeu> while the set_value job 15 executes

09:11 <bbrezillon> tomeu: hm, there's something weird

09:11 <bbrezillon> each tiler job should depend on the previous tiler job in the list, if any

09:11 <tomeu> yeah, when the jobs are created, the deps fields are set

09:12 <tomeu> but I see them zeroed at link time

09:12 <tomeu> maybe I'm doing something wrong when logging

09:15 <bbrezillon> tomeu: can you check the offset value here https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/panfrost/pan_context.c#L652 ?

09:15 <bbrezillon> this memcpy() is suspicious https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/panfrost/pan_context.c#L659

09:15 <tomeu> was looking at that :)

09:15 <tomeu> but even then, it should at most overwrite the next_job pointer

09:15 <tomeu> not the deps

09:16 <tomeu> I will print the values after the copy

09:19 <tomeu> after the copy the values look fine

09:19 * tomeu valgrinds

09:21 <tomeu> nothing

09:21 <tomeu> wonder about the memmoves in wallpaper.c

09:22 <tomeu> nope

09:22 <bbrezillon> I'm just moving pointers around, not the actual value in the job desc

09:22 <tomeu> will have to put a watchpoint

09:26 <tomeu> (gdb) p job_p->job_dependency_index_1

09:26 <tomeu> Cannot access memory at address 0xb0ee1e14

09:26 * tomeu scratches head

09:32 <tomeu> yeah, I was printing the vertex jobs' deps, but only tiler jobs have

09:32 <tomeu> so a bug in my logging

09:34 <bbrezillon> I see odd and even indexes

09:34 <bbrezillon> in the trace you provide

09:34 <bbrezillon> *provided

09:34 <bbrezillon> so tiler jobs are there

09:35 <tomeu> bbrezillon: this is better: https://pastebin.com/e1AV2i7c

09:36 <tomeu> wonder if we can assume that the hw will schedule the vertex jobs before the tiler jobs

09:37 <tomeu> other than vertex jobs not having deps, it looks fine to me

09:38 <bbrezillon> tomeu: because of the deps between tiler jobs it's already guaranteed

09:38 <bbrezillon> tiler0 depends on vertex0

09:38 <bbrezillon> and tiler1 depends on vertex1+tiler0

09:39 <bbrezillon> hm, just realized it's not guaranteed while writing it :)

09:39 <tomeu> but couldn't tiler job 14 execute before vertex job 1?

09:43 <bbrezillon> tomeu: good question

09:43 <bbrezillon> I hope not

09:43 <bbrezillon> actually, I'm not even sure that's a problem

09:45 <bbrezillon> vertex shader are independent from each other (at least in our case), the only thing you must ensure is that tiler jobs are done in the right order

09:47 <tomeu> guess you are right, because I'm seeing the same results with https://pastebin.com/jYLmfYs7

09:47 <bbrezillon> tomeu: but you can try to add explicit deps between vertex jobs

09:47 <bbrezillon> just to see what happens then

09:47 <tomeu> yeah, that's what I just did

09:48 <tomeu> for no change

10:02 <tomeu> I think I'm going to try generating the reload shaders on first draw, instead of on flush

10:03 afaerber has joined #panfrost

10:11 <tomeu> wow, that works much better

10:12 <tomeu> I only have a cube, but otherwise it looks fine :)

10:12 <tomeu> so indeed, seems to be some problem with jobs chaining and deps

10:14 <tomeu> and we also seem to have some problem when sampling

10:14 <tomeu> so I guess there's something we don't yet understand regarding job submission

10:44 <bbrezillon> tomeu: you're adding the reload job before or after the first draw?

10:44 <bbrezillon> maybe the index number matters too

10:48 <tomeu> bbrezillon: before

10:51 <tomeu> bbrezillon: http://people.collabora.co.uk/~tomeu/panfrost_wallpaper_progress.webm

10:57 <tomeu> HdkR: more demoscene stuff :)

10:57 <bbrezillon> tomeu: doesn't load on my end

10:58 <HdkR> oooo, so swirly

10:58 <tomeu> bbrezillon: a similar one: https://people.collabora.com/~tomeu/panfrost_wallpaper_progress.mp4

10:59 <bbrezillon> tomeu: looks like firefox doesn't like it, it works under chrome

10:59 <tomeu> works here on FF on fedora 29, but maybe I have some extra packages installed

11:00 <bbrezillon> nevermind, now it works

11:00 <bbrezillon> my internet connection is not working terribly well recently

11:04 <tomeu> besides the sampling of the texture when reloading, there's still an issue even when the reload fragment shader just uses a constant color: http://people.collabora.co.uk/~tomeu/panfrost_wallpaper_missing_vertex.webm

11:05 <tomeu> there's a vertex that is always missing

11:12 <tomeu> so even if it the index numbers mattered and had to be sequential, there would still be something we are missing regarding job scheduling

12:05 <bbrezillon> tomeu: did you try interleaving vertex and tiler jobs?

12:06 <bbrezillon> something like vertex1 <- tiler1 <- vertex2 <- tiler2 ...

12:06 <tomeu> bbrezillon: no, but why would that matter only when we generate reload jobs?

12:06 <bbrezillon> I don't know

12:07 <tomeu> FWICS, the draw that blitter submits causes the subsequent draws to lack a vertex

12:07 <bbrezillon> are you sure it's only one vertex?

12:07 <tomeu> wonder how I could figure out if it's the first vertex or not

12:07 <bbrezillon> I see at least 2 missing

12:09 <bbrezillon> hm, maybe not

12:10 <bbrezillon> tomeu: could be a problem in the save/restore vertex state/bufs

12:12 <bbrezillon> tomeu: you can add traces to panfrost_set_vertex_buffers() to see if things are properly restored

12:13 <tomeu> bbrezillon: hmm, but it's the first draw, is there anything to be restored?

12:13 * tomeu has no clue about that code

12:21 <tomeu> oh, I see now

12:25 <bbrezillon> tomeu: vertex buffers are bound before the draw call

12:25 <bbrezillon> and since you're inserting a new draw, you have to save restore the context

12:25 <tomeu> yep

12:26 <tomeu> maybe I should try generating the reload shaders before the first vertex buffers are bound :)

12:27 <tomeu> oh, but that's not done at every frame

12:29 chewitt has joined #panfrost

12:31 <bbrezillon> tomeu: what?

12:31 <tomeu> binding the vertex buffer

12:31 <bbrezillon> no, you likely do it once, and then apply a transform on top

12:32 <bbrezillon> I didn't check kmscube code though

12:34 <tomeu> yeah, was just thinking it could be an easy way to test without the vertex buffer restore getting in the middle

12:34 <tomeu> I don't see anything obviously wrong with how the vertex buffer gets restored

12:35 <bbrezillon> IIRC, u_blitter complains if you don't save restore things properly

12:35 <bbrezillon> so that's not something you can skip

12:36 <bbrezillon> unless to want to make ->set_vertex_buffers() a NOOP just before calling the u_blitter_save/restore funcs

12:36 <bbrezillon> tomeu: can you push the code you have?

12:37 <tomeu> sure

12:39 <tomeu> bbrezillon: https://gitlab.freedesktop.org/tomeu/mesa/commits/panfrost-partial-updates

12:40 <bbrezillon> tomeu: just reminds me that there was a flush forced by the blitter-based version of the code

12:40 <bbrezillon> when assigning the new FB

12:41 <bbrezillon> so I don't think adding the reload shader on first draw is doing what we expect

12:41 <tomeu> what would be the flush for?

12:42 <bbrezillon> have a look at panfrost_set_framebuffer_state()

12:42 <bbrezillon> I remember I was hitting that case https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/gallium/drivers/panfrost/pan_context.c#L2066

12:43 <bbrezillon> which means you'll flush the queue when restoring the FB state (at the end of the u_blitter_blit() call)

12:43 <tomeu> hmm, should that be a problem?

12:43 <bbrezillon> which in turn means the reload is lost

12:44 <tomeu> hmm

12:44 <tomeu> so the vertex buffer isn't bound?

12:44 <bbrezillon> yes, that's a problem, because a reload is only here to load things back in the tile buffer

12:45 <bbrezillon> if you flush the queue, that means the tile buffers starts in a cleared state on the following draw calls

12:45 <tomeu> hmm, just commented that out and I'm seeing what seems to be the same thing

12:47 chewitt has quit [Quit: Adios!]

12:49 <bbrezillon> tomeu: kmscube is unmodified, right?

12:49 <tomeu> bbrezillon: yep

12:53 <tomeu> I think there's some problem with the blitter's vertex buffer, because its offset just keeps increasing every frame

12:58 <tomeu> or maybe it cycles around after a while

12:58 MoeIcenowy has quit [Quit: ZNC 1.6.5+deb1+deb9u1 - http://znc.in]

12:59 MoeIcenowy has joined #panfrost

12:59 <bbrezillon> tomeu: what do you mean by the vertex buffer offset?

13:00 <tomeu> bbrezillon: buffers[0].buffer_offset

13:00 <bbrezillon> ok

13:00 <tomeu> ah, it cycled after 1048448

13:00 <tomeu> so that's fine I guess

13:04 <bbrezillon> tomeu: wallpaper_draw() doesn't seem to be called on the branch you pushed

13:05 <tomeu> bbrezillon: maybe you need to do this?

13:05 <tomeu> - if (ctx->draw_count == 0 && !ctx->blitter->running && 0)

13:05 <tomeu> + if (ctx->draw_count == 0 && !ctx->blitter->running)

13:05 <bbrezillon> nevermind, there's an && 0

13:05 <bbrezillon> yep

13:07 <bbrezillon> tomeu: hm, aren't we calling panfrost_queue_draw() recursively with a draw_wallpaper() call placed in queue_draw()?

13:08 <bbrezillon> that's why you have this blitter running check

13:08 <tomeu> we guard against that

13:08 <tomeu> yep

13:08 <bbrezillon> got it

13:08 <tomeu> so, after moving the call to wallpaper_draw to the start of panfrost_draw_vbo, I get the cube complete

13:09 <tomeu> so something that is done before panfrost_queue_draw wasn't being properly restored

13:09 <tomeu> but I guess that it's more correct anyway to do it there?

13:09 <tomeu> now the only problem I see is with what gets sampled when reloading

13:10 <tomeu> bbrezillon: so this: https://pastebin.com/wqA9k83F

13:15 <bbrezillon> tomeu: not surprised calling draw from within the draw implem is not entirely safe :)

13:17 <cwabbott> I wonder if it isn't better from a maintainability point of view to just build a tiler job yourself

13:18 <cwabbott> sure, you'd be duplicating some stuff, but you'd avoid all the circular dependency pain

13:18 <cwabbott> you could reuse the compiler bits for the most part, and you wouldn't even need to make a vertex job, just a tiler job

13:19 <bbrezillon> cwabbott: that's what I ended up doing based on alyssa's initial implem

13:20 <bbrezillon> tomeu: reverting your change in simpler_shader.c triggers an assert()

13:21 <tomeu> bbrezillon: yeah, we don't have yet texf (sampling with LOD)

13:21 <bbrezillon> but commenting out the depth=1 line seems to do the trick

13:21 <tomeu> wonder if that's what is causing problems when sampling

13:21 <tomeu> hmm

13:26 <tomeu> so, this is the state of the art here: https://people.collabora.com/~tomeu/panfrost_wallpaper_fixed_vertex.webm

13:31 <tomeu> cwabbott: I think from the maintainability POV this is quite good

13:31 <cwabbott> just a random guess, but have you checked that the blob isn't doing something weird with the sampler or texture descriptors?

13:31 <tomeu> but right now I think we should learn as much as possible , as it's clear some things aren't working as expected

13:32 <tomeu> I think bbrezillon was going to try that?

13:32 <cwabbott> there could be some cache it has to bypass or something

13:32 <bbrezillon> tomeu: yep

13:33 <bbrezillon> just got distracted by your live debugging session :-)

13:33 <cwabbott> also, is the framebuffer afbc, and if so do you have sampling from afbc wired up correctly?

13:33 <tomeu> nice, pandecode works now that we have the job chain probperly built

13:36 <tomeu> cwabbott: don't think it's using afbc, this is the pandecode corresponding to the last video: https://people.collabora.com/~tomeu/pandecode.txt

14:51 <alyssa> Scrolly guacomole

14:52 <tomeu> alyssa: hi!

14:52 <tomeu> bbrezillon: any progress with the blob?

14:55 <alyssa> tomeu: TBF, I don't understand the linking patching code either and I wrote it....

14:56 <bbrezillon> tomeu: sorry, got interrupted by things on a different project

14:56 <alyssa> tomeu: 6 tiler jobs = 6 draws = 6 faces of kmscube

14:57 <alyssa> tomeu: pandecode is broken in a lot of scenarios... I really need to fix this stuff...

14:58 <tomeu> alyssa: I think you can ignore most of my comments from today, except probably the last ones

14:58 <alyssa> tomeu: But the blob handles reload stuff in literally this way, adding a special TILER job with a passthrough tex fragment shader. No vertex shader, specially computed varyings used instead.

14:58 <tomeu> ok, I think the vertex side of things is fine

14:58 <tomeu> but the sampling...

15:00 <alyssa> cwabbott: In my defense the linking stuff is some of the oldest code in the driver, I'm well-aware it needs to be rewritten ......

15:00 <alyssa> OA

15:01 <alyssa> tomeu: Tiler job 14 depends on tiler job 13 dep... tiler job 1 depends on vertex job 1.

15:01 <tomeu> alyssa: I think the linking is fine now

15:01 <alyssa> Vertex jobs are genuinely independent

15:03 <alyssa> bbrezillon: If the blit shader *flushes* then... we're in a loop since the whole bug this is around is "flushing without reloading first" >_<

15:03 <alyssa> tomeu: "Just keeps increasing every frame" That's how vbo offsets are often setup... the app keeps changing stuff but it's better to append than modify in place for various reasons

15:05 <alyssa> cwabbott: Yeah, the original (buggy) implementation was building the tiler job manually, as the blob does.. Still think that's the way to go

15:06 <alyssa> cwabbott: The blob is doing a texelFetch op instead of a texture, and there's a weird interp0 on the varying, but beside that I don't think much is different..? But caches etc are totally likely

15:06 MoeIcenowy has quit [Quit: ZNC 1.6.5+deb1+deb9u1 - http://znc.in]

15:06 <tomeu> I think we anyway want to end up there because of performance, but I liked the idea of starting with something that is used by other drivers, for sake of correctness

15:06 <alyssa> cwabbott: kmscube won't be AFBC at this point, since it's going straight to GBM and I haven't wired up AFBC in the Rockchip display driver yet

15:06 MoeIcenowy has joined #panfrost

15:07 <tomeu> alyssa: btw, I think this screenshot is more illustrative of what's going on, as it's the second frame: https://people.collabora.com/~tomeu/2019-05-14-170228.jpg

15:07 <cwabbott> tomeu: looks like it could be a wrong stride?

15:08 <tomeu> yeah, but I'm at a loss on which stride that could be :)

15:08 <tomeu> have played with randomly changing the formats, but nothing positive has come from that

15:08 <cwabbott> did you check the texture state stride?

15:09 MoeIcenowy has quit [Client Quit]

15:09 <tomeu> saw nothing interesting in the cmdstream

15:09 MoeIcenowy has joined #panfrost

15:09 <tomeu> cwabbott: you mean somewhere else than mali_texture_descriptor ?

15:10 <cwabbott> no, I mean the mali_texture_descriptor

15:10 <cwabbott> although I guess that's gonna come from the blitter ultimately

15:11 <tomeu> don't see any stride in there

15:11 <cwabbott> my bet is on a unit mismatch somewhere, that would explain the roughly 4:1 slope of the lines

15:12 <tomeu> have this in the cmdstream:

15:12 <tomeu> .swizzle = MALI_CHANNEL_BLUE | (MALI_CHANNEL_GREEN << 3) | (MALI_CHANNEL_RED << 6) | (MALI_CHANNEL_ONE << 9),

15:12 <tomeu> .format = MALI_RGBA8_UNORM,

15:12 <tomeu> guess that plus width is used by the hw to compute the stride:

15:12 <cwabbott> that's fine

15:12 <tomeu> .width = MALI_POSITIVE(1366),

15:13 MoeIcenowy has quit [Client Quit]

15:13 <alyssa> tomeu: 1366..?

15:13 <alyssa> On a RK3288?

15:13 <alyssa> Er Veyron

15:13 MoeIcenowy has joined #panfrost

15:13 <alyssa> 1366 is not a multiple of 16 (the tile width)

15:13 <alyssa> When we render we have to round up to 16

15:13 <alyssa> But then in the texture we're not doing it right there

15:13 <tomeu> yep, a veyron jaq

15:13 <alyssa> Try on a Kevin, 10c it'll go away

15:13 * alyssa class

15:13 <cwabbott> is width in units of bytes or pixels?

15:14 <tomeu> cwabbott: should be pixels

15:14 * tomeu tries on a random hdmi monitor

15:15 MoeIcenowy has quit [Client Quit]

15:15 MoeIcenowy has joined #panfrost

15:17 <tomeu> haha

15:17 <tomeu> with this, I get it almost right:

15:17 <tomeu> .width = MALI_POSITIVE(ALIGN(texture->width0, 16)),

15:17 <tomeu> seems to be flipped veritcally

15:18 <tomeu> and we are missing a few _columns_ at the end

15:18 <cwabbott> the flipping thing is probably the FBO flipping madness

15:19 <cwabbott> the system framebuffer in GLX is always flipped in the y direction, for what I've heard are "hilarious historical reasons"

15:19 <cwabbott> well, with DRI i guess

15:20 <cwabbott> normally the gallium state tracker handles that for you though... hmm

15:20 <tomeu> https://people.collabora.com/~tomeu/2019-05-14-171956.jpg

15:23 <tomeu> this one is hillarious: https://people.collabora.com/~tomeu/panfrost_wallpaper_flippy.webm

15:24 <urjaman> that is quite artistic

15:24 <tomeu> it has been only art since I started working on this

15:25 <alyssa> ^ what cwabbot said

15:25 <alyssa> t

15:26 <alyssa> tomeu: Pixels

15:27 <tomeu> wonder what happened to the wallpaper

15:29 <tomeu> if anybody else wants to play with it: https://gitlab.freedesktop.org/tomeu/mesa/tree/panfrost-partial-updates

15:29 <tomeu> pity that we missed the 19.1 deadline :/

15:29 <tomeu> would have been cool to have weston working out of the box in debian without having to wait for 19.2

15:30 fysa has joined #panfrost

15:54 <alyssa> tomeu: Better to wait for 19.2, for performance improvements and maybe glamor support and such..?

16:03 <alyssa> I was more worried about getting in the kernel on-time

16:18 BenG83 has joined #panfrost

16:20 BenG83 has quit [Remote host closed the connection]

16:26 pH5 has quit [Quit: bye]

16:38 BenG83 has joined #panfrost

16:39 <alyssa> tomeu: (Right now, if using `ondemand` governor, Weston uses the min clock and then is noticably very sluggish.)

16:39 <alyssa> ..Also, gets warm very easily. This is annoying.

16:56 <alyssa> Just for Weston painting at minimal clock level, several watts drawn... I'm assuming this is very much not the expected

17:11 pH5 has joined #panfrost

17:21 stikonas has joined #panfrost

17:40 BenG83 has quit [Remote host closed the connection]

17:59 <tomeu> alyssa: do you know what's the problem here with the width field in mali_texture_descriptor, and how things should be?

17:59 <tomeu> other than "switch to kevin"? :p

18:29 BenG83 has joined #panfrost

18:32 BenG83 has quit [Client Quit]

18:37 raster has quit [Remote host closed the connection]

18:54 <alyssa> tomeu: The texture stride = width * bytes per pixel

18:54 <alyssa> The framebuffer stride = (width aligned to 16) * bytes per pixel

18:54 <alyssa> If width is not aligned to 16, those differ

18:57 <anarsoul> alyssa: utgard can sample from textures where stride != (width * bpp), I doubt that midgard can't

18:58 <alyssa> anarsoul: It probably can, I just don't know the magic bits to do that

18:58 <anarsoul> how long is texture descriptor?

19:07 <alyssa> About as long as it is wide.

19:08 <alyssa> anarsoul: mali_texture_descriptor in panfrost-job.h

19:10 <alyssa> Let's see what wob blob does

19:13 afaerber has quit [Quit: Leaving]

19:19 <alyssa> In the blend descriptor for the preserevd write, |= 0x800 flags

19:19 <alyssa> clip set tighter than expected

19:22 <alyssa> Texture descriptorage usage2=0x32 instead of 0x12

19:22 <alyssa> Not obvious how it's recuperating the stride

19:23 adjtm has joined #panfrost

19:24 <anarsoul> alyssa: there's stride somewhere in descriptor

19:24 <alyssa> Apparently

19:24 <anarsoul> for utgard it's only present if stride flag is set

19:25 <alyssa> Hm

19:25 <anarsoul> see lima_texture_desc_set_res() in lima_texture.c

19:25 adjtm_ has quit [Ping timeout: 258 seconds]

19:25 <alyssa> Wait

19:25 <anarsoul> grep for "for padded linear texture" in this fn

19:26 <anarsoul> (I know code in lima_texture.c is a mess, need to refactor it...)

19:26 <alyssa> Sec

19:26 <alyssa> No, wait, I was hopeful for a second

19:28 <alyssa> Ooooo

19:29 <alyssa> Got it.

19:29 <alyssa> It's the field _after_ swizzle_bitmaps

19:29 <alyssa> Also it uses a negative stride + adjusted bitmap which solves the Y flip issue, same way we render upside down

19:29 <alyssa> tomeu: ^^

19:30 <anarsoul> so texture descriptor is larger than you thought?

19:30 <alyssa> anarsoul: Yeah

19:30 <alyssa> Thank you :)

19:30 <anarsoul> np, I literally did nothing

19:31 <alyssa> You offered encouragement and believed in me! That totally counts :)

19:31 <anarsoul> :)

19:35 <alyssa> anarsoul: Yup, the stride field is only there optionally if it's actually needed. That explains that, then

19:36 <alyssa> (And explains why I never found it -- since it wasn't there!)

19:36 <alyssa> Gotta run, but I'll write a patch for this evening if I get a chance :)

19:36 <anarsoul> alyssa: on utgard place for it is always reserved (since it's at the very beginning of descriptor), but it's not set if flag is not set

19:37 <alyssa> anarsoul: Here it's at the very end... Whether it's actually there or not isn't obvious, since I don't pay *that* much attention to alignment vs reserved memory etc

19:37 <anarsoul> and IIRC Qiang spend quite some time to figure out that. Blob doesn't like to reveal its secrets!

22:17 <alyssa> anarsoul: What I don't know is how the stride field interacts with mipmaps/cubemaps

22:17 <anarsoul> it doesn't?

22:17 <alyssa> anarsoul: On mdg

22:18 <anarsoul> my guess it's the same here

22:18 <alyssa> The texture descriptor is a fixed-size descriptor, followed by a variable number of points (levels*faces)

22:18 <alyssa> And for the non-cube 1-lvl sample, after the one pointer was the stride field

22:18 <anarsoul> on utgard we have fixed number of mipmap levels - 6

22:18 <alyssa> Wack

22:18 <anarsoul> so I guess no mipmapped cube textures? (is it even possible?)

22:19 <alyssa> Yes? :p

22:19 <alyssa> Pretty sure there are deqp tests for it

22:19 <alyssa> So if I have a mipmapped cubemap, does that have one stride field for everything? One stride per mipmap but shared per cube? One stride for every pointer?

22:19 <anarsoul> poor utgard is out of luck here

22:19 <alyssa> Are they all in a chunk at the end? Or are they interleaved?

22:19 <alyssa> Is it a 64-bit slot for 64-bit pointers? Or is 32-bit and just not aligned nicely?

22:20 <anarsoul> good question

22:23 <alyssa> anarsoul: Granted, if you're using cubemaps/mipmaps, hopefully you're not using a custom stride with a linear texture ....

22:26 <alyssa> I'm trying to think of ways to get the blob to do that, but I'm coming up empty

22:26 <alyssa> From GL, the only way I got linear textures at all was with wallpapering

22:44 <alyssa> Ooo, this'll let be fix a bug in Weston

22:44 <alyssa> Happy!

22:44 <HdkR> woo

22:48 <alyssa> ...but now Weston is crashing. It was fine this morning .-.

22:49 <alyssa> Oh, I was messing with my .config

22:49 <alyssa> It usually *is* my fault, huh? :P

22:49 <Lyude> maybe it'll be fine tommorrow morning :)

22:49 <HdkR> https://www.phoronix.com/scan.php?page=news_item&px=Red-Hat-Hiring-Gfx-2019 Everyoe needs graphics devs :D

22:49 <Lyude> oh hey the job thing is up

22:49 <HdkR> Everyone*

22:50 <Lyude> yeah if you want a recommendation and I'm familiar with your work, feel free to let me know

22:50 <Lyude> (so, that's pretty much anyone who is active in here)

22:57 <HdkR> If only that was available six months ago :P

23:02 <alyssa> Ouch.

23:12 <alyssa> HOWTO make code suck less?

23:13 <alyssa> Ha-ha! Bug fixed! Take that, Weston!

23:14 <anarsoul> what bug?

23:14 <alyssa> anarsoul: Stride being messed up when rendering to NPOT windows

23:14 <alyssa> Breaking gears

23:14 <alyssa> Now gears works again :)

23:15 <anarsoul> I see

23:17 <alyssa> anarsoul: Oh, don't be smug :V

23:17 <anarsoul> :P

23:19 <alyssa> Er, not NPOTs

23:19 <alyssa> Non-16-aligned

23:22 <anarsoul> non-tile-aligned

23:26 <alyssa> Yeah

23:26 <alyssa> Anyways, series on the list. Alyssa out.

23:26 * alyssa unplugs Matrix-esque cable from skull