alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
Kwiboo has joined #panfrost
<alyssa> HdkR: Mmhmm
<alyssa> I implemented far jumps and have moved on to debugging `discard`
<alyssa> It's fine in simple shaders, but decidedly broken for complex stuff
<HdkR> Are the specifically broken for shaders that do texture fetching after discard? :D
<alyssa> No(t as far as I can tell)
<alyssa> Specifically broken for shaders that do branching
<HdkR> Conditional discard? :P
<alyssa> Um, no, branching and discard at once
<HdkR> Very common for early shader alpha test or something + discard, then a bunch of heavy things afterwards
<HdkR> There's a good chance that you need to change some state when using discard or fragment depth writing
gtucker has quit [Ping timeout: 246 seconds]
<HdkR> I asked about texture fetches specifically because if you have killed threads then calculating derivatives becomes more difficult and hardware might need some state setup to handle it
Kwiboo has quit [Quit: .]
Kwiboo has joined #panfrost
stikonas has quit [Remote host closed the connection]
Kwiboo has quit [Ping timeout: 245 seconds]
<alyssa> Gr, discard works fine in my test case
<HdkR> :D
_whitelogger has joined #panfrost
<alyssa> Oops just got distracted and rewrote a STK shader to use Phong shading (rather than flat) :P
<alyssa> ...Worth it
<HdkR> lol
<alyssa> Alrighty! I just sent off the branching patch series. Lots of goodies in there :)
<alyssa> I think I'll merge earlier stuff in the meantime
<alyssa> And after this, uh, tbh I'm feeling sick of the compiler so I'll move back to cmdstream stuff ;P
* alyssa gups
<cyrozap> urjaman: "maybe define the bits and then define an enum of the 4 logically sensible things you could set it to using them"
<cyrozap> urjaman: I would just `#define REGNAME_DITHER_DOWN_SEL (1 << 0)`, `#define REGNAME_DITHER_DOWN_MODE (1 << 1)`, etc. and then just add a comment above the list of #defines that explain what each bit does. Then you just set or clear each bit in the code by or-ing/and-ing the #define'd bit names.
<cyrozap> And of course you can always explain why certain bits are set or cleared in a comment near where you do so.
<cyrozap> But having a magic enum that has a set of pre-defined "modes that make logical sense" would annoy me, personally, since ideally you shouldn't care what the exact value of items in an enum are. If you really want to have a set of "known-good presets", you could still use an enum, but have that enum only be used as an input to a function that returns a const uint of the register value and uses a switch
<cyrozap> statement to map the enum to the register values.
<cyrozap> The compiler will optimize out the function call, so it won't impact performance at all.
<urjaman> cyrozap: does "bits |= DITHER_DOWN_EN | DITHER_DOWN_MODE;" actually tell you anything about what mode was picked?
<urjaman> like yes if you know the hardware, but not otherwise... you'd need to go look the bit definitions up to see that "MODE" means RGB666 when set, and that there's a "SEL" bit that wasnt set so is Allegro...
<urjaman> i considered naming the bits "ROCKCHIP_DITHER_RGB666" and "ROCKCHIP_DITHER_FRC" but that still only tells you what the "1" state does, and doesnt help you find them in the TRM
<urjaman> also yeah the longness of these bit names would make using them (especially if you picked FRC and RGB666) a longer affair than the 80 character line limit :P
<urjaman> so basically to me either the line has to be sufficiently self-documenting that i can figure out how to change RGB666 to 565 (and switch to FRC if i know it's a thing) just by looking at it, or otherwise it's not really better than 0x6;
<urjaman> also i gotta say, rockchip didnt put much effort into making these two bits have super clear and informative names either :P ... "mode" and "sel" ... sigh.
Elpaulo has joined #panfrost
<cwabbott> alyssa: btw, if you want to work on the compiler some more, something like https://eli.thegreenplace.net/2013/01/03/assembler-relaxation might be of interest to you
* urjaman reads his emails late as usual...
<mmind00> urjaman: there should be review comments for your dither-patch in your inbox :-)
<urjaman> yes that was what that comment was about
<mmind00> :-D
raster has joined #panfrost
<urjaman> and yeah i hadnt looked at the rockchip kernel tree basically at all (just chromeos and mainline)
<mmind00> yeah, it makes me slightly sad to see that especially in the drm area the vendor kernel has diverged so much
pH5 has joined #panfrost
<urjaman> mainline rockchip_drm_vop.c: 1600-something lines... rockchip rockchip_drm_vop.c: 4628 lines.
<urjaman> no wonder i was like "this file seems really long..."
<HdkR> Is it that long due to additional features or is the mainline one shorter due to cleanup? :P
<urjaman> i mean sure there's more features in the rockchip one ... (it'd take me a while to parse this all so i'm not really qualified to say how much of this is necessary etc)
<mmind00> HdkR: additional features ... with mainline being the stepchild sadly
<HdkR> Nothing too unusual then
<mmind00> HdkR: yep, with the ChromeOS projects phasing out it looks like mainline motivations dimished somewhat
sphalerite has quit [Ping timeout: 268 seconds]
<tomeu> robher, hanetzer: none of the IP irqs seem to fire, do you know if we need to do anything else to properly power the GPU up?
sphalerite has joined #panfrost
<tomeu> hanetzer: I see that in your original code you are busy-waiting for the soft reset completion
<tomeu> guess you had also trouble getting interrupts from the hw?
<robher> tomeu: here's a log of register accesses. https://usercontent.irccloud-cdn.com/file/ofMf7s5i/mali-reg-drm.log
<tomeu> awesome, thanks
<tomeu> have checked, and there's quite a bit of work left so the startup sequence matches that of ARM's driver
<tomeu> will work on that next
<Lyude> mmind00: chromeos projects phasing out?
<tomeu> robher: no luck, even if the register writes are the same, I don't get the GPU_IRQ_RESET_COMPLETED interrupt
<tomeu> here's the log with the start annotated: http://paste.debian.net/1068716/
<tomeu> robher: I guess the regulator and the clock must be working fine as we are able to read stuff from GPU_INT_RAWSTAT?
<mmind00> Lyude: From the activity in the chromeos repos it looks like there won't be more Rockchip based devices for now
<Lyude> Huh
<mmind00> Lyude: additional devices ... aka Kevin, Bob and Scarlet are the current set and amount of changes has been reduced
<robher> tomeu: the irq mask registers are enables
<mmind00> Lyude: and most things seem to work on the chromeos side, hence Rockchip's mainline enthusiasm decreases
<robher> tomeu: so the reset sequence disables the irqs, does reset, enables all irqs.
<tomeu> robher: yeah, this is how I have hacked the gpu init sequence to match the reg log: http://paste.debian.net/1068720/
<tomeu> mmind00: no rumours of a soc newer than rk3399pro?
<mmind00> tomeu: not that I know of ... i.e. there is rk3326 (bifrost-mali but less powerful than rk3399) and rk1808 (also smaller cpu cores)
<robher> tomeu: does the reset clear the mask? The vendor driver sets the mask after the reset command. Maybe there's enough overhead that they get lucky and set the mask after the reset happened.
<tomeu> robher: reset doesn't clear the mask
<tomeu> robher: from reading ARM's code, IRQF_TRIGGER_HIGH would end up being passed to devm_request_irq, but changing that doesn't make any difference
<robher> tomeu: GIC inputs generally aren't programmable anyways.
afaerber has quit [Quit: Leaving]
* tomeu goes back to scratch his head
afaerber has joined #panfrost
<tomeu> well, I'm out of clues, so I'm going to clean up what I have and try again another day
<tomeu> job submission seemed to work, but nothing was displayed on the screen with kmscube
<tomeu> so I tried enabling interrupts to get an idea of where the problem could be
<robher> tomeu: I really think we need some "hello world" jobs. Simple enough to verify the job ran and something that touches memory.
<cwabbott> robher: have you seen shader_runner?
<cwabbott> it's basically that
<cwabbott> for ARM's kernel, and bifrost, though
<cwabbott> it'd take some massaging to work on midgard
<robher> cwabbott: no, I hadn't.
<cwabbott> basically it takes a compute shader binary (you can extract this from ARM's offline compiler), gives it a pointer, and runs a single instance of it
<cwabbott> it's in the old panfrost repo
<cwabbott> I'd been using it to probe the instruction set on bifrost
<hanetzer> tomeu: very initial code. I think iirc I just dummied the irq code and it doesn't do anything specifically
ente has joined #panfrost
Kwiboo has joined #panfrost
paulk-leonov has quit [Ping timeout: 272 seconds]
paulk-leonov has joined #panfrost
<alyssa> A rare cwabbott sighting!
<alyssa> cwabbott: Aaaaahhhhhh, thank you so much for linking that article. Definitely in the category of "I should've thought that, d'oh" :)
<alyssa> robher: So, vertex jobs are compute jobs (essentially) and do writeback to main memory
<alyssa> So it's easy enough to just only a single VERTEX job (comment out the TILER and FRAGMENT pieces) and dump the varyings manually after, which is as close to "Hello world" as you'll get I think
<urjaman> hmm now i'm reading the article too .. funnily i thought relaxation was from long jump instructions to short ones instead of from short ones to long ones as in the article
rhyskidd_ has joined #panfrost
rhyskidd has quit [Ping timeout: 255 seconds]
rhyskidd_ is now known as rhyskidd
<robher> alyssa: I was looking for "here's some job chain bytes that do X." IOW, someone else's hello world that we need to get to work. Did I mention I'm lazy. :)
<alyssa> robher: Hehe
<alyssa> It's not that simple, since jobs contain a _lot_ of pointers nested in pointers nested in..
<robher> alyssa: okay. I guess we'll take what mesa gives us...
<alyssa> robher: After a job finishes (or at least after you submit and sleep a while), dump the contents of ctx->varying_mem.cpu
<alyssa> If it's all zeroes, bad luck. If there's data there, your VERTEX job went through.
<hanetzer> robher: thanks for the access logs, that could be of great use :P
<robher> hanetzer: just needed a '#define DEBUG' in the driver...
<hanetzer> orly.
<hanetzer> was it like that on the older kbase?
<alyssa> Oof
<robher> hanetzer: no idea. there's also a more complicated reg dumping mechanism (because vendor driver :) ).
<alyssa> robher: ^_^
<alyssa> robher: kbase_pm_enable_interrupts is what's doing the IRQ stuff btw
<alyssa> (I think)
<alyssa> Actually that whole file (backend/gpu/mali_kbase_pm_driver.c) is full of goodies
stikonas_ has joined #panfrost
<robher> alyssa: yeah, I didn't see anything special there other than what tomeu and I discussed above. It's just 8 register writes until an irq should occur.
<alyssa> Mm
* alyssa reads your code
<alyssa> "HW_FEATURE_LD_ST_LEA_TEX"
<alyssa> Gah, every time I see bits of actual Midgard asm (in the kernel) I realise just how much I don't know about the shader core :p
<alyssa> robher: "kbase_job_write_affinity(pfdev, 0 /*katom->core_req*/, js);"
<alyssa> Why is core_req commented out?
<alyssa> I'm not sure how it works in the hardware, but from userspace, if I submit a core_req of 0, *the hw does nothing*
<alyssa> I'm not sure if the job gets stopped in kernel space or in hardware, tho
<robher> because things boil down to being either a tiler only job or not. If not, affinity is always 0xf. It looked like you never request a tiler only job.
<alyssa> Ah
<alyssa> (Just wanted to ask because that could come up with identical symptoms to what you reported)
<rtp> alyssa: hi. fwiw, I found out why I can use panwrap on the blob on my system: different ioctls
<alyssa> rtp: Yeah, there are way too many kernel versions and none of them work ;)
<robher> alyssa: the other reason was just I hadn't wired that up and just wanted to push out something to tomeu. But it looks like he has now.
<alyssa> robher: To rule out the obvious, we're sure it's enabling power/clocking/etc?
<alyssa> I see that code hasn't been touched in months, so
<rtp> alyssa: yeah, but that sucks badly. I really want to get logs on my system to understand what's not working :(
<robher> alyssa: there are some power bits for each core that probably do need touching. But not for reset irq, because the vendor driver doesn't need to.
<alyssa> robher: Hm
<alyssa> robher: Alright. Are we sure it's actually request_irq'ing and setting up the handler and so forth, so you're actually catching the IRQ when it fires?
<robher> just need time to debug it. Someone want to review DT bindings for me? Just have a backlog of 200 patches...
<alyssa> Oh, I do see it devm_request_irq now uh
<alyssa> robher: I don't believe panfrost_gpu_init is ever called?
<alyssa> And that function is in turn what ends up registering IRQs etc..
* alyssa afk
<robher> alyssa: That's what prints out the version of the GPU, so I think it is. There's some magic with the whole panfrost_ip struct abstraction (which honestly I don't love).
pH5 has quit [Quit: bye]
<Lyude> HdkR: system received
<alyssa> robher: Hm.
* alyssa has to implement proper job/BO management and is not happy about it
* alyssa procrasinates on Panfrost by doing homework ("..Wait..")
raster has quit [Ping timeout: 246 seconds]
raster has joined #panfrost
raster has quit [Remote host closed the connection]
afaerber has quit [Quit: Leaving]
raster has joined #panfrost
jolan has quit [Quit: leaving]
jolan has joined #panfrost
afaerber has joined #panfrost
<urjaman> ... now i'm figuring out the dithering bits on an RK3188 ...
<alyssa> urjaman: ^_^
<urjaman> *cough* someone *cough* added support for the RK3188 in the mainline whilst it is not supported by rockchip kernel :P
<urjaman> but yeah i think i got it (even though the TRM isnt the most helpful one)
<alyssa> mmind00: Was that you> :3
<mmind00> alyssa: rk3188? ... maybe :-D
<alyssa> :D
<mmind00> 6 years ago I had this nice rk3188 tablet and thought "hmm ... shouldn't take too long to get mainline running on it" ... it only took till december 2018 ;-)
<urjaman> https://github.com/urjaman/linux/commit/7270dbe77822658a55c9147e50d51821985be4b3 i still gotta like do all the tests, checks, writing of sane messages, etc but this is what the actual patch is kinda looking like atm
raster has quit [Remote host closed the connection]
<robher> alyssa, tomeu: the irq problem is very simple. reset was being done before request_irq. On to more interesting interrupts.
<alyssa> robher: ^_^
<alyssa> Erg, where is this bottleneck
<mmind00> urjaman: double empty line in line 210 :-P ... other than that this approach looks very nice on first glance
<mmind00> (line 210 in the header of course)
<urjaman> hmmm ... ever noticed that that header defines enum sacle_up_mode ?
<urjaman> (with the typo that is)
<urjaman> and yeah i got rid of that empty line ... i was presently trying to judge whether i should wrap some of these lines that are technically a bit too long for the kernel
<Lyude> sacle_up pardner
<HdkR> Lyude: Nice!
<Lyude> HdkR: hopefully I can start doing some serious work on the ra stuff soon
<HdkR> Hopefully GDC demos will stop burning me out soon
<Lyude> a note though, I'll probably be on vacation this weekend and next weekend (and probably more then that since I'm going to sweden, but I'm hoping to use the time on the flight)
<Lyude> HdkR: i feel the pain of burn out :(
<Lyude> (also, I'm stuck on trying to figure out which crytal gem to name the new board after ):
<HdkR> :D
Kwiboo has quit [Ping timeout: 246 seconds]
* alyssa has been looking through performance counters
<alyssa> Something obvious I see happening, though, is that late Z testing is being used
<alyssa> Not sure that exploits the performance gap but it could be part
<HdkR> Going to work on early-z? :P
<alyssa> HdkR: See, I don't understand why it's not already enabled
<HdkR> Missing a bit on the command stream side?
<alyssa> Yeah, I guess
<alyssa> HdkR: There are a lot of bits in the cmdstream :)
<HdkR> :D
<HdkR> Trace the blob with one side relying on vertex depth and the other doing fragment depth, see what bits differ? :)
<alyssa> I might have to, yeah
<alyssa> Oh, uh, there's blending?
<HdkR> Does blending disable early-z on Mali? :D
<alyssa> Well, I just disabled blending rather forcibly, and still LZS
<alyssa> Oooo this is good
<alyssa> There's an errata "Fragment frontend heuristic bias to force early-z required", and it applies to this hw
<alyssa> I have *no* idea what a "heuristic bias" is, but this seems relevant
<HdkR> Wonder how forcing early-z works in cases where you can't use early-z :P
<alyssa> Pff
<Lyude> HdkR: btw, any idea what the baud rate on The Fancy New Board defaults to?
<HdkR> Uh, I knew but haven't touched it for a while. Let me see if the wiki shows it
<Lyude> oh-there's already a wiki for this
<Lyude> ?
<HdkR> 115200 probably
<Lyude> hopefully :_
<Lyude> *:)
<HdkR> Definitely not entirely filled out
<Lyude> ahhhh, ok so I can say the name of the board and stuff cool
<Lyude> anyway-was mostly wondering so i could reuse one of the serial consoles on one of the boards I'm not using atm to get serial console access on it
<Lyude> because i always forget to buy a serial adapter when someone tells me they're sending me a board
<HdkR> If it is a hardkernel uart then yes
<HdkR> er, if it is a newer uart that supports the 1.8v
<Lyude> yeah I've got a hardkernel uart from an odroid xu3
<Lyude> oh :(, don't think my voltage goes that low
<Lyude> or high? uart is confusing sometimes
<HdkR> Oh, it's 3.3v on this one actually, just checked their wiki :P
<Lyude> aaaah phew
<Lyude> that I can do
<alyssa> So uh, my Rockpro64 doesn't feel like booting today..