austriancoder changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://freenode.irclog.whitequark.org/etnaviv
pcercuei has quit [Quit: dodo]
T_UNIX has joined #etnaviv
pcercuei has joined #etnaviv
smurray has quit [Ping timeout: 272 seconds]
austriancoder has quit [Ping timeout: 272 seconds]
smurray has joined #etnaviv
austriancoder has joined #etnaviv
berton has joined #etnaviv
lynxeye has joined #etnaviv
<marex> lynxeye: hey
<marex> lynxeye: I have a few fixes for that PGC, but I also noticed one thing about the hang
<marex> lynxeye: it seems like the hangs are clock related, as if there is some weird clock glitch
<marex> lynxeye: it almost seems like it might be particularly bad idea to let the PGC and etnaviv manage the clock at the same time
<lynxeye> marex: I'm all ears
<marex> austriancoder: also, MR6454, that was reproducible only after some weeks of runtime (ew)
<marex> lynxeye: all ears regarding the clock or what ?
<lynxeye> Yep, I would be very intersted to know what you found
<marex> lynxeye: well look at what the TFA does , I suspect they enable all the clock in gpumix for that exact reason
<marex> lynxeye: because if you keep enabling and disabling the clock, the GPU cluster becomes unstable
<lynxeye> driving the clocks from both PGC and etnaviv is what we do on i.MX6 and i.MX8M
<marex> lynxeye: except there you drive the clock for the entire cluster, right ?
<lynxeye> marex: is it the _bus clock that does funky things for you?
<marex> lynxeye: I didnt identify the clock just yet
<marex> lynxeye: btw you should sync the HSK only after you turn the PGC PUP request on
<marex> lynxeye: and tear the HSK down before PDN
<marex> my understanding of that HSK is that it disconnects some bus bridge
<lynxeye> marex: yeah, I thought about splitting power up/down in the gpc driver to better handle this sequencing
<lynxeye> yep, the HSK drives the amba domain bridges
<marex> lynxeye: thats likely what needs to be done
<marex> lynxeye: the current way is awful
<lynxeye> agreed
<marex> lynxeye: but it seems like maybe we should rather do as the imx-scu , with .start / .stop
<marex> because with the current setup, the platform_device_add turns the power domain on and off on boot for no good reason
<marex> but the comment in soc-imx-scu.c explains that all NXP drivers are broken and needs to be fixed first :(
<marex> lemme find that one
<marex> ah, here drivers/firmware/imx/scu-pd.c
<marex> lynxeye: I sent you the patches I have now, still needs work
<marex> back to digging in the clock stuff
<lynxeye> marex: Re patch 2: shouldn't regmap already guard the register access?
<marex> lynxeye: I _think_ I've seen concurrent access there, so I just tossed the mutex in to be sure
<marex> lynxeye: and no, I think regmap only serializes specific accesses, not across the entire function call
<marex> lynxeye: i.e. regmap_update_bits is called with a lock help, but the entire Pxx function isn't called with a lock
<marex> lynxeye: I suspect if you get both gpu-2d and gpu-3d access the HSK at the same time, that might mess things up
<marex> for example
<lynxeye> marex: And patch 3 is wrong from my understanding. This bit needs to be set in before the power down, otherwise the domain won't power down when you trigger the pdn_req
<marex> lynxeye: shouldn't the bit then be set unconditionally ?
<marex> lynxeye: the PCR needs to be set before power up at least, and likely also before power down then
<marex> lynxeye: surely enable_power_control (which = !on) is wrong too
<lynxeye> marex: the doc says the bit should be asserted before pdn_req and should not be changed until the domain is completely powered up again.
<lynxeye> so yep, setting it all the time looks like the right thing to do
<marex> lynxeye: lemme recheck what NXP does in their TFA too
<lynxeye> marex: they switch it before powering up, which is inconsitent with the docs
<marex> lynxeye: yep :)
<marex> lynxeye: I would expect that is the sensible thing to do though, no?
<marex> lynxeye: I mean, if that bit enables "power control"
<lynxeye> marex: I think we should just both enable the CPU mapping and power control on probe of the domain drivers
<lynxeye> I see no reason to ever switch those two things while the driver is active
<marex> lynxeye: I suspect it is TFA defensive programming
<marex> lynxeye: btw do you use NXP TFA fork or upstream one ?
<marex> lynxeye: because they do different PGC init
<lynxeye> marex: I'm still using downstream NXP TFA, because I can't get the DRAM to reclock with upstream TFA
<lynxeye> and too much other fires right now
<marex> yep, very much the reason I got back to you only nowish
<marex> lynxeye: I'll keep digging and let you know if I find something
<marex> lynxeye: lets ask daniel in linux-imx about the PGC
<daniels> lynxeye: speaking of downstream imx, did you ever get a chance to test the last dcss patchset & push to drm-misc?
<marex> daniels: dcss is the DSI stuff ?
<daniels> marex: it's the display controller in imx8mq (and maybe also imx8m for some display types?)
<lynxeye> daniels: Still on my list. I dropped the ball a bit, as there were still discussions ongoing while I was on vacation and I'm still trying to dig out from the usual after-holidays crazyness
<daniels> fair enough, good luck :)
<lynxeye> nope, dcss is only on the i.MX8MQ. All the others only have the simple eLCDIF controller
<daniels> ah, I thought there was one which had elcdif + dcss for different display paths
<daniels> but maybe that's the only bit about imx8 they made even a little bit simple ;)
<lynxeye> the MQ has both DCSS and eLCDIF
<daniels> ah, right
<marex> lynxeye: wasnt lcdif paralel RGB only ?
<lynxeye> apparently DCSS was a bit too big/power hungry for the scaled down variants
<marex> lynxeye: ah duh, that pdn_req, its not asserted only by PGC I think, there might be others
<marex> lynxeye: I need to check again
shoragan has quit [Ping timeout: 240 seconds]
shoragan has joined #etnaviv
_daniel_ has joined #etnaviv
JohnnyonFlame has quit [Read error: Connection reset by peer]
<marex> lynxeye: um, btw, SRC_GPU_RCR is shared between gpu2d and gpu3d , and the reset is only released after both GPU PDs are up
<marex> lynxeye: maybe thats why you cant turn them on/off separately
<lynxeye> marex: Yea, I'm not exactly sure about this thing. We don't really use the SRC reset, exactly because it's shared (even on MX6 the GPUs have a shared reset from the SRC), but I'm not sure what the PGC does to those resets
<marex> lynxeye: well I wouldn't be surprised if it did some de-glitch
<marex> lynxeye: which could explain the odd behavior of gpu2d
<lynxeye> marex: yea, maybe we actually need to smash those domains together, like we did with the PCIe domains on 8mq
<lynxeye> which would be a shame, as now we have 3(!) power domains for the GPUs, but still can't gate them separately
<marex> lynxeye: how does it not surprise me at all
<marex> lynxeye: I think maybe the people who implemented the TFA code can tell us more about that reset
_daniel_ has quit [Quit: Leaving.]
<lynxeye> marex: eLCDIF is just the display controller with parallel output. On i.MX8MM there is a MIPI DSI bridge attached to the controller, on MX8MP they even added the LVDS bridge again.
<marex> ha
JohnnyonFlame has joined #etnaviv
lynxeye has quit [Quit: Leaving.]
<mntmn> on i.MX8MQ, either DCSS or LCDIF can drive MIPI-DSI, but only DCSS can drive HDMI/DP.
<marex> this PGC is total madness
<marex> turn on VPUMIX, PU indicates GPUMIX PD failed to turn on
<marex> wt-actual-f
T_UNIX has quit [Quit: Connection closed for inactivity]
berton_ has joined #etnaviv
berton has quit [Ping timeout: 264 seconds]
berton_ has quit [Client Quit]
_daniel_ has joined #etnaviv
<cphealy> marek: For your MR6454, what i.MX platform was this on?
<mntmn> minetest merged workaround for etnaviv/gc7000 https://github.com/minetest/minetest/pull/10036
<mntmn> now just needs the mesa thing merged...
<mntmn> kicad also merged workaround for etnaviv https://gitlab.com/kicad/code/kicad/-/merge_requests/252
<mntmn> but sadly glamor/xorg needs 2 fixes that will never be accepted i guess
<mntmn> (remove 1 glClear() and add 1 glFinish())
<austriancoder> cphealy: should not matter as it is a general multi-context problem
<cphealy> I was asking as we were doing some work with qtwebengine too and I'm curious which cores it effects. GC2000? GC3000? GC7000L?
<austriancoder> cphealy: all.. as it is a general driver problem
<austriancoder> not connected to any specific core
<cphealy> ack
_daniel_ has quit [Quit: Leaving.]
<marex> austriancoder: thanks for the review, I'll update the patch and then ask for another month of testing (since thats how long the original one was under test)
<austriancoder> marex: hmm.. I know such test from day job and it sucks to no have a faster reproducer
<marex> austriancoder: it's OK, I have a bugfix locally, upstream will have to wait a month or two
<austriancoder> marex: we can land the first patch really fast if thats okay for you
<marex> austriancoder: the first patch (remove whatever function) is useless without the locking, FYI
<austriancoder> marex: I know - thats why we can land it faster
<marex> sounds totally pointless to me :)
<austriancoder> okay
<austriancoder> so you can drop this patch at all :)
<marex> no, because the function has broken locking
<marex> and I'm not gonna fix it, since its unused
<austriancoder> marex: okay.. but if its not used we can drop it now - or not? I am happy with the change when you (or I) remove the stable marker
<marex> its just waiting for someone to use it and break the state, just remove it
<marex> I was really considering adding the same fixes: tag to it
<austriancoder> aha .. so shall we drop it now in master or do you want to wait until you get test feedback?
<marex> it doesn't matter without the locking fix ; with the locking fix, the function is broken
<marex> without it, it was also broken
<marex> I find applying useless half of patchset minus actual bugfixes kinda pointless
<austriancoder> marex: puhh.. in maser etna_resource_get_status(..) has no callers -> dead code -> can be remove with out your other patch. correct?
<marex> austriancoder: I pushed what I think is harmless update to the bugfix patch, I didn't test it, it should address most of your concerns
<marex> I'm rather concerned about the performance of this enormous convoluted locking than about newlines
<austriancoder> marex: okay.. but do you want me to wait ~1 month to get your test feedback before landing it? If yes I would love to land "etnaviv: Remove etna_resource_get_status()" the next 2-3 days.
<marex> austriancoder: I believe with this update, you can land them both
<marex> austriancoder: if you need any more involved changes, then you need to wait
<marex> austriancoder: because putting untested crap upstream isn't good
<austriancoder> marex: thats why I ask
* austriancoder hates untestes crap
<marex> look at the changes, they're newlines + that bool, that should be harmless
<marex> austriancoder: note that the patch also should be easy to backport, so splitting functions isn't what I want to see in a bugfix
<marex> austriancoder: although I do agree that the function you pointed out is horrific
<austriancoder> marex: patch is okay.. but you have now to many new lines . I only wanted them after the if's (as I wrote in the review)
<austriancoder> /* if resource has no pending ctx's reset its status */
<austriancoder> rsc->status &= ~ETNA_PENDING_READ;
<austriancoder> if (_mesa_set_next_entry(rsc->pending_ctx, NULL) == NULL)
<austriancoder> mtx_unlock(&rsc->lock);
<austriancoder> marex: but lets land it
<marex> austriancoder: and backport to stable , should still apply to 20.1.y at least
<marex> I still think the locking should be somehow reworked and it should be possible to make it much simpler ... or ?
<austriancoder> marex: yep
<marex> in fact, how come mesa doesn't have something generic ?
<marex> I mean, this must be a problem with other GPUs too
<austriancoder> marex: but that needs to wait or we have a fix after an other round of testing
<marex> austriancoder: well obviously
<marex> austriancoder: I'm just concerned that this convoluted locking will have other even nastier warts
<marex> austriancoder: and it is just too difficult to reason about the correctness about this
<austriancoder> marex: me too .. but I dont want to block/delay your work that fixes a real world problem. performance wise you should have a feeling if it got much worser or not
<marex> austriancoder: performance-wise its the same
<marex> austriancoder: got any ideas about how to go about the locking rework ?
<austriancoder> marex: I think batching would be the way to look into - like done in other drivers
<marex> austriancoder: iirc robclark explained to me at some point there's a specific reason for batching in freedreno
<marex> but I was too green back then to understand it fully (still am)
<marex> austriancoder: and that it might not be necessary for vivante
<austriancoder> aha
<marex> austriancoder: it had to do with being able to interrupt the command stream and flush it at any point I think
<marex> you should be able to do it on vivante, and not on freedreno
<austriancoder> I am not sure about this statement .. but for the moment I am fine with it as I want to work on other etnaviv stuff :)
<marex> right, because on vivante, we generate the entire BO with the command stream and then flush it into the kernel
<marex> so uh ... the generation would have to be turned into something like ... ummm
<marex> somehow the generation of the command stream would have to be changed so it's not just adding into a BO
<marex> or something
<marex> hmmmm
<marex> austriancoder: I dont think I want to touch the locking for a bit, it makes my stomach turn
<marex> austriancoder: but uh, isn't the ^ some form of batching already ?
<austriancoder> marex: hmm.. na... it is more about describing of draws/clears etc. in a batch that descripes dependencies to resources etc.
<austriancoder> but.. I have put it on my long list of todos
<marex> austriancoder: well right now the draw calls almost directly pipe stuff into the final command stream, so somehow batching them should avoid the locking at the command stream end
<marex> austriancoder: I think we are talking roughly about the same
<mntmn> austriancoder: btw maybe i can pile some more work on top... any news about https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5456 ?
<austriancoder> mntmn: r-b
<mntmn> austriancoder: awesome!
<mntmn> austriancoder: what happens after that? :D
<austriancoder> mntmn: update the change with my r-b, remove the wip and I will do the rest
<marex> mntmn: add the RB to your commit (git commit --amend ...) and repush
<marex> mntmn: that's necessary to increase efficiency of the process ... or something :)
<austriancoder> bed time now
<mntmn> ok, will do, thanks for explaining!
<mntmn> marex: correct like this?
<marex> mntmn: like what ? :)
<marex> mntmn: ah well, I guess
<mntmn> ok
<marex> well where is lynxeye when you need him ...
<marex> the MX8MM GPCv2 can fail because if the gpu2d exits imx_gpc_pu_pgc_sw_pxx_req() in one thread and is just before pm_runtime_put() , and gpu3d enters imx_gpc_pu_pgc_sw_pxx_req() and is just before pm_runtime_get_sync(), then depending on the order, the gpumix might just be enabled and disabled right away, followed by gpu3d PU enabling, which obviously fails
<marex> so PD nesting might need some locking work