#panfrost on 2019-02-18 — irc logs at freenode.irclog.whitequark.org

2019-02-15 17:52 alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature

00:05 Kwiboo has joined #panfrost

00:06 <alyssa> HdkR: Mmhmm

00:06 <alyssa> I implemented far jumps and have moved on to debugging `discard`

00:06 <alyssa> It's fine in simple shaders, but decidedly broken for complex stuff

00:07 <HdkR> Are the specifically broken for shaders that do texture fetching after discard? :D

00:09 <alyssa> No(t as far as I can tell)

00:09 <alyssa> Specifically broken for shaders that do branching

00:09 <HdkR> Conditional discard? :P

00:11 <alyssa> Um, no, branching and discard at once

00:14 <HdkR> Very common for early shader alpha test or something + discard, then a bunch of heavy things afterwards

00:15 <HdkR> There's a good chance that you need to change some state when using discard or fragment depth writing

00:16 gtucker has quit [Ping timeout: 246 seconds]

00:17 <HdkR> I asked about texture fetches specifically because if you have killed threads then calculating derivatives becomes more difficult and hardware might need some state setup to handle it

00:37 Kwiboo has quit [Quit: .]

00:44 Kwiboo has joined #panfrost

01:21 stikonas has quit [Remote host closed the connection]

01:29 Kwiboo has quit [Ping timeout: 245 seconds]

02:42 <alyssa> Gr, discard works fine in my test case

02:45 <HdkR> :D

03:14 _whitelogger has joined #panfrost

04:10 <alyssa> Oops just got distracted and rewrote a STK shader to use Phong shading (rather than flat) :P

04:10 <alyssa> ...Worth it

04:11 <HdkR> lol

04:43 <alyssa> Alrighty! I just sent off the branching patch series. Lots of goodies in there :)

04:46 <alyssa> I think I'll merge earlier stuff in the meantime

04:46 <alyssa> And after this, uh, tbh I'm feeling sick of the compiler so I'll move back to cmdstream stuff ;P

04:47 * alyssa gups

05:01 <cyrozap> urjaman: "maybe define the bits and then define an enum of the 4 logically sensible things you could set it to using them"

05:04 <cyrozap> urjaman: I would just `#define REGNAME_DITHER_DOWN_SEL (1 << 0)`, `#define REGNAME_DITHER_DOWN_MODE (1 << 1)`, etc. and then just add a comment above the list of #defines that explain what each bit does. Then you just set or clear each bit in the code by or-ing/and-ing the #define'd bit names.

05:07 <cyrozap> And of course you can always explain why certain bits are set or cleared in a comment near where you do so.

05:12 <cyrozap> But having a magic enum that has a set of pre-defined "modes that make logical sense" would annoy me, personally, since ideally you shouldn't care what the exact value of items in an enum are. If you really want to have a set of "known-good presets", you could still use an enum, but have that enum only be used as an input to a function that returns a const uint of the register value and uses a switch

05:12 <cyrozap> statement to map the enum to the register values.

05:33 <cyrozap> e.g., https://hastebin.com/emasiruviz.c

05:37 <cyrozap> The compiler will optimize out the function call, so it won't impact performance at all.

07:50 <urjaman> cyrozap: does "bits |= DITHER_DOWN_EN | DITHER_DOWN_MODE;" actually tell you anything about what mode was picked?

07:51 <urjaman> like yes if you know the hardware, but not otherwise... you'd need to go look the bit definitions up to see that "MODE" means RGB666 when set, and that there's a "SEL" bit that wasnt set so is Allegro...

07:54 <urjaman> i considered naming the bits "ROCKCHIP_DITHER_RGB666" and "ROCKCHIP_DITHER_FRC" but that still only tells you what the "1" state does, and doesnt help you find them in the TRM

07:56 <urjaman> also yeah the longness of these bit names would make using them (especially if you picked FRC and RGB666) a longer affair than the 80 character line limit :P

08:02 <urjaman> so basically to me either the line has to be sufficiently self-documenting that i can figure out how to change RGB666 to 565 (and switch to FRC if i know it's a thing) just by looking at it, or otherwise it's not really better than 0x6;

08:06 <urjaman> also i gotta say, rockchip didnt put much effort into making these two bits have super clear and informative names either :P ... "mode" and "sel" ... sigh.

08:23 Elpaulo has joined #panfrost

09:08 <cwabbott> alyssa: btw, if you want to work on the compiler some more, something like https://eli.thegreenplace.net/2013/01/03/assembler-relaxation might be of interest to you

10:02 * urjaman reads his emails late as usual...

10:07 <mmind00> urjaman: there should be review comments for your dither-patch in your inbox :-)

10:12 <urjaman> yes that was what that comment was about

10:15 <mmind00> :-D

10:38 raster has joined #panfrost

10:41 <urjaman> and yeah i hadnt looked at the rockchip kernel tree basically at all (just chromeos and mainline)

10:47 <mmind00> yeah, it makes me slightly sad to see that especially in the drm area the vendor kernel has diverged so much

12:08 pH5 has joined #panfrost

12:10 <urjaman> mainline rockchip_drm_vop.c: 1600-something lines... rockchip rockchip_drm_vop.c: 4628 lines.

12:11 <urjaman> no wonder i was like "this file seems really long..."

12:14 <HdkR> Is it that long due to additional features or is the mainline one shorter due to cleanup? :P

12:21 <urjaman> i mean sure there's more features in the rockchip one ... (it'd take me a while to parse this all so i'm not really qualified to say how much of this is necessary etc)

12:40 <mmind00> HdkR: additional features ... with mainline being the stepchild sadly

12:51 <HdkR> Nothing too unusual then

13:02 <mmind00> HdkR: yep, with the ChromeOS projects phasing out it looks like mainline motivations dimished somewhat

13:19 sphalerite has quit [Ping timeout: 268 seconds]

13:26 <tomeu> robher, hanetzer: none of the IP irqs seem to fire, do you know if we need to do anything else to properly power the GPU up?

13:31 sphalerite has joined #panfrost

13:39 <tomeu> hanetzer: I see that in your original code you are busy-waiting for the soft reset completion

13:40 <tomeu> guess you had also trouble getting interrupts from the hw?

13:42 <robher> tomeu: here's a log of register accesses. https://usercontent.irccloud-cdn.com/file/ofMf7s5i/mali-reg-drm.log

13:43 <tomeu> awesome, thanks

13:43 <tomeu> have checked, and there's quite a bit of work left so the startup sequence matches that of ARM's driver

13:43 <tomeu> will work on that next

14:12 <Lyude> mmind00: chromeos projects phasing out?

14:14 <tomeu> robher: no luck, even if the register writes are the same, I don't get the GPU_IRQ_RESET_COMPLETED interrupt

14:14 <tomeu> here's the log with the start annotated: http://paste.debian.net/1068716/

14:14 <tomeu> robher: I guess the regulator and the clock must be working fine as we are able to read stuff from GPU_INT_RAWSTAT?

14:25 <mmind00> Lyude: From the activity in the chromeos repos it looks like there won't be more Rockchip based devices for now

14:26 <Lyude> Huh

14:29 <mmind00> Lyude: additional devices ... aka Kevin, Bob and Scarlet are the current set and amount of changes has been reduced

14:30 <robher> tomeu: the irq mask registers are enables

14:30 <mmind00> Lyude: and most things seem to work on the chromeos side, hence Rockchip's mainline enthusiasm decreases

14:31 <robher> tomeu: so the reset sequence disables the irqs, does reset, enables all irqs.

14:33 <tomeu> robher: yeah, this is how I have hacked the gpu init sequence to match the reg log: http://paste.debian.net/1068720/

14:33 <tomeu> mmind00: no rumours of a soc newer than rk3399pro?

14:35 <mmind00> tomeu: not that I know of ... i.e. there is rk3326 (bifrost-mali but less powerful than rk3399) and rk1808 (also smaller cpu cores)

14:36 <robher> tomeu: does the reset clear the mask? The vendor driver sets the mask after the reset command. Maybe there's enough overhead that they get lucky and set the mask after the reset happened.

14:39 <tomeu> robher: reset doesn't clear the mask

15:14 <tomeu> robher: from reading ARM's code, IRQF_TRIGGER_HIGH would end up being passed to devm_request_irq, but changing that doesn't make any difference

15:15 <robher> tomeu: GIC inputs generally aren't programmable anyways.

15:17 afaerber has quit [Quit: Leaving]

15:20 * tomeu goes back to scratch his head

15:30 afaerber has joined #panfrost

15:38 <tomeu> well, I'm out of clues, so I'm going to clean up what I have and try again another day

15:55 <tomeu> robher: https://gitlab.freedesktop.org/tomeu/linux/commits/panfrost

15:55 <tomeu> job submission seemed to work, but nothing was displayed on the screen with kmscube

15:55 <tomeu> so I tried enabling interrupts to get an idea of where the problem could be

15:57 <robher> tomeu: I really think we need some "hello world" jobs. Simple enough to verify the job ran and something that touches memory.

16:13 <cwabbott> robher: have you seen shader_runner?

16:13 <cwabbott> it's basically that

16:14 <cwabbott> for ARM's kernel, and bifrost, though

16:14 <cwabbott> it'd take some massaging to work on midgard

16:14 <robher> cwabbott: no, I hadn't.

16:15 <cwabbott> basically it takes a compute shader binary (you can extract this from ARM's offline compiler), gives it a pointer, and runs a single instance of it

16:15 <cwabbott> it's in the old panfrost repo

16:15 <cwabbott> I'd been using it to probe the instruction set on bifrost

16:16 <hanetzer> tomeu: very initial code. I think iirc I just dummied the irq code and it doesn't do anything specifically

16:31 ente has joined #panfrost

16:34 Kwiboo has joined #panfrost

16:46 paulk-leonov has quit [Ping timeout: 272 seconds]

16:49 paulk-leonov has joined #panfrost

16:51 <alyssa> A rare cwabbott sighting!

17:01 <alyssa> cwabbott: Aaaaahhhhhh, thank you so much for linking that article. Definitely in the category of "I should've thought that, d'oh" :)

17:02 <alyssa> robher: So, vertex jobs are compute jobs (essentially) and do writeback to main memory

17:03 <alyssa> So it's easy enough to just only a single VERTEX job (comment out the TILER and FRAGMENT pieces) and dump the varyings manually after, which is as close to "Hello world" as you'll get I think

17:08 <urjaman> hmm now i'm reading the article too .. funnily i thought relaxation was from long jump instructions to short ones instead of from short ones to long ones as in the article

17:13 rhyskidd_ has joined #panfrost

17:14 rhyskidd has quit [Ping timeout: 255 seconds]

17:14 rhyskidd_ is now known as rhyskidd

17:14 <robher> alyssa: I was looking for "here's some job chain bytes that do X." IOW, someone else's hello world that we need to get to work. Did I mention I'm lazy. :)

17:15 <alyssa> robher: Hehe

17:15 <alyssa> It's not that simple, since jobs contain a _lot_ of pointers nested in pointers nested in..

17:19 <robher> alyssa: okay. I guess we'll take what mesa gives us...

17:30 <alyssa> robher: After a job finishes (or at least after you submit and sleep a while), dump the contents of ctx->varying_mem.cpu

17:30 <alyssa> If it's all zeroes, bad luck. If there's data there, your VERTEX job went through.

17:32 <hanetzer> robher: thanks for the access logs, that could be of great use :P

17:33 <robher> hanetzer: just needed a '#define DEBUG' in the driver...

17:33 <hanetzer> orly.

17:34 <hanetzer> was it like that on the older kbase?

17:34 <alyssa> Oof

17:35 <robher> hanetzer: no idea. there's also a more complicated reg dumping mechanism (because vendor driver :) ).

17:36 <alyssa> robher: ^_^

17:37 <alyssa> robher: kbase_pm_enable_interrupts is what's doing the IRQ stuff btw

17:38 <alyssa> (I think)

17:39 <alyssa> Actually that whole file (backend/gpu/mali_kbase_pm_driver.c) is full of goodies

17:43 stikonas_ has joined #panfrost

17:43 <robher> alyssa: yeah, I didn't see anything special there other than what tomeu and I discussed above. It's just 8 register writes until an irq should occur.

17:43 <alyssa> Mm

17:44 * alyssa reads your code

17:44 <robher> https://www.irccloud.com/pastebin/6Xg4bXFa/

17:46 <alyssa> "HW_FEATURE_LD_ST_LEA_TEX"

17:46 <alyssa> Gah, every time I see bits of actual Midgard asm (in the kernel) I realise just how much I don't know about the shader core :p

17:48 <alyssa> robher: "kbase_job_write_affinity(pfdev, 0 /*katom->core_req*/, js);"

17:48 <alyssa> Why is core_req commented out?

17:48 <alyssa> I'm not sure how it works in the hardware, but from userspace, if I submit a core_req of 0, *the hw does nothing*

17:49 <alyssa> I'm not sure if the job gets stopped in kernel space or in hardware, tho

17:50 <robher> because things boil down to being either a tiler only job or not. If not, affinity is always 0xf. It looked like you never request a tiler only job.

17:51 <alyssa> Ah

17:51 <alyssa> (Just wanted to ask because that could come up with identical symptoms to what you reported)

17:53 <rtp> alyssa: hi. fwiw, I found out why I can use panwrap on the blob on my system: different ioctls

17:54 <alyssa> rtp: Yeah, there are way too many kernel versions and none of them work ;)

17:56 <robher> alyssa: the other reason was just I hadn't wired that up and just wanted to push out something to tomeu. But it looks like he has now.

17:56 <alyssa> robher: To rule out the obvious, we're sure it's enabling power/clocking/etc?

17:56 <alyssa> I see that code hasn't been touched in months, so

17:57 <rtp> alyssa: yeah, but that sucks badly. I really want to get logs on my system to understand what's not working :(

17:57 <robher> alyssa: there are some power bits for each core that probably do need touching. But not for reset irq, because the vendor driver doesn't need to.

17:58 <alyssa> robher: Hm

18:00 <alyssa> robher: Alright. Are we sure it's actually request_irq'ing and setting up the handler and so forth, so you're actually catching the IRQ when it fires?

18:00 <robher> just need time to debug it. Someone want to review DT bindings for me? Just have a backlog of 200 patches...

18:01 <alyssa> Oh, I do see it devm_request_irq now uh

18:03 <alyssa> robher: I don't believe panfrost_gpu_init is ever called?

18:03 <alyssa> And that function is in turn what ends up registering IRQs etc..

18:04 * alyssa afk

18:08 <robher> alyssa: That's what prints out the version of the GPU, so I think it is. There's some magic with the whole panfrost_ip struct abstraction (which honestly I don't love).

18:16 pH5 has quit [Quit: bye]

18:23 <Lyude> HdkR: system received

18:49 <alyssa> robher: Hm.

18:56 * alyssa has to implement proper job/BO management and is not happy about it

18:59 * alyssa procrasinates on Panfrost by doing homework ("..Wait..")

19:00 raster has quit [Ping timeout: 246 seconds]

19:06 raster has joined #panfrost

19:07 raster has quit [Remote host closed the connection]

19:15 afaerber has quit [Quit: Leaving]

19:31 raster has joined #panfrost

19:51 jolan has quit [Quit: leaving]

19:53 jolan has joined #panfrost

20:52 afaerber has joined #panfrost

21:02 <urjaman> ... now i'm figuring out the dithering bits on an RK3188 ...

21:03 <alyssa> urjaman: ^_^

21:04 <urjaman> *cough* someone *cough* added support for the RK3188 in the mainline whilst it is not supported by rockchip kernel :P

21:06 <urjaman> but yeah i think i got it (even though the TRM isnt the most helpful one)

21:28 <alyssa> mmind00: Was that you> :3

21:29 <mmind00> alyssa: rk3188? ... maybe :-D

21:29 <alyssa> :D

21:30 <mmind00> 6 years ago I had this nice rk3188 tablet and thought "hmm ... shouldn't take too long to get mainline running on it" ... it only took till december 2018 ;-)

21:45 <urjaman> https://github.com/urjaman/linux/commit/7270dbe77822658a55c9147e50d51821985be4b3 i still gotta like do all the tests, checks, writing of sane messages, etc but this is what the actual patch is kinda looking like atm

22:02 raster has quit [Remote host closed the connection]

22:04 <robher> alyssa, tomeu: the irq problem is very simple. reset was being done before request_irq. On to more interesting interrupts.

22:05 <alyssa> robher: ^_^

22:19 <alyssa> Erg, where is this bottleneck

22:26 <mmind00> urjaman: double empty line in line 210 :-P ... other than that this approach looks very nice on first glance

22:27 <mmind00> (line 210 in the header of course)

22:30 <urjaman> hmmm ... ever noticed that that header defines enum sacle_up_mode ?

22:31 <urjaman> (with the typo that is)

22:33 <urjaman> and yeah i got rid of that empty line ... i was presently trying to judge whether i should wrap some of these lines that are technically a bit too long for the kernel

22:33 <Lyude> sacle_up pardner

22:50 <HdkR> Lyude: Nice!

22:52 <Lyude> HdkR: hopefully I can start doing some serious work on the ra stuff soon

22:52 <HdkR> Hopefully GDC demos will stop burning me out soon

22:52 <Lyude> a note though, I'll probably be on vacation this weekend and next weekend (and probably more then that since I'm going to sweden, but I'm hoping to use the time on the flight)

22:53 <Lyude> HdkR: i feel the pain of burn out :(

22:54 <Lyude> (also, I'm stuck on trying to figure out which crytal gem to name the new board after ):

22:54 <HdkR> :D

23:17 Kwiboo has quit [Ping timeout: 246 seconds]

23:28 * alyssa has been looking through performance counters

23:29 <alyssa> Something obvious I see happening, though, is that late Z testing is being used

23:29 <alyssa> Not sure that exploits the performance gap but it could be part

23:34 <HdkR> Going to work on early-z? :P

23:39 <alyssa> HdkR: See, I don't understand why it's not already enabled

23:40 <HdkR> Missing a bit on the command stream side?

23:40 <alyssa> Yeah, I guess

23:40 <alyssa> HdkR: There are a lot of bits in the cmdstream :)

23:40 <HdkR> :D

23:41 <HdkR> Trace the blob with one side relying on vertex depth and the other doing fragment depth, see what bits differ? :)

23:41 <alyssa> I might have to, yeah

23:41 <alyssa> Oh, uh, there's blending?

23:42 <HdkR> Does blending disable early-z on Mali? :D

23:44 <alyssa> Well, I just disabled blending rather forcibly, and still LZS

23:47 <alyssa> Oooo this is good

23:47 <alyssa> There's an errata "Fragment frontend heuristic bias to force early-z required", and it applies to this hw

23:47 <alyssa> I have *no* idea what a "heuristic bias" is, but this seems relevant

23:49 <HdkR> Wonder how forcing early-z works in cases where you can't use early-z :P

23:52 <alyssa> Pff

23:53 <Lyude> HdkR: btw, any idea what the baud rate on The Fancy New Board defaults to?

23:54 <HdkR> Uh, I knew but haven't touched it for a while. Let me see if the wiki shows it

23:54 <Lyude> oh-there's already a wiki for this

23:54 <Lyude> ?

23:56 <HdkR> 115200 probably

23:56 <Lyude> hopefully :_

23:56 <Lyude> *:)

23:56 <HdkR> https://wiki.odroid.com/accessory/development/usb_uart_kit :P

23:56 <HdkR> https://wiki.odroid.com/odroid-n2/odroid-n2

23:56 <HdkR> Definitely not entirely filled out

23:56 <Lyude> ahhhh, ok so I can say the name of the board and stuff cool

23:57 <Lyude> anyway-was mostly wondering so i could reuse one of the serial consoles on one of the boards I'm not using atm to get serial console access on it

23:57 <Lyude> because i always forget to buy a serial adapter when someone tells me they're sending me a board

23:57 <HdkR> https://github.com/hardkernel/linux/tree/odroidn2-4.9.y https://github.com/hardkernel/u-boot/tree/odroidn2-v2015.01

23:58 <HdkR> If it is a hardkernel uart then yes

23:58 <HdkR> er, if it is a newer uart that supports the 1.8v

23:58 <Lyude> yeah I've got a hardkernel uart from an odroid xu3

23:58 <Lyude> oh :(, don't think my voltage goes that low

23:59 <Lyude> or high? uart is confusing sometimes

23:59 <HdkR> Oh, it's 3.3v on this one actually, just checked their wiki :P

23:59 <Lyude> aaaah phew

23:59 <Lyude> that I can do

23:59 <HdkR> https://wiki.odroid.com/odroid-n2/hardware#uart_console_connector

23:59 <alyssa> So uh, my Rockpro64 doesn't feel like booting today..