austriancoder changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://freenode.irclog.whitequark.org/etnaviv
JohnnyonFlame has quit [Ping timeout: 252 seconds]
chewitt has joined #etnaviv
chewitt has quit [Quit: Adios!]
chewitt has joined #etnaviv
agx_ has quit [Ping timeout: 246 seconds]
agx_ has joined #etnaviv
JohnnyonFlame has joined #etnaviv
JohnnyonFlame has quit [Ping timeout: 260 seconds]
pcercuei has joined #etnaviv
lynxeye has joined #etnaviv
<marex> OK, so yesterday I continued on this MMU stuff ...
<marex> I think I am still confused by it and I didn't manage to find any useful information on this
<marex> but it seems there is a BO that is missing in the MMUv2 tables
<marex> there is apparently MTLB entry, but no STLB entry, but maybe I'm decoding the information wrong
<marex> lynxeye: austriancoder: ^
<lynxeye> marex: Did you take into account that the devcoredump only contains STLB entries for the populated STLB slots?
<marex> lynxeye: yeah
<marex> lynxeye: there is one 4k page for MTLB and then populated 4k pages for STLB, right ?
<lynxeye> marex: yep
<lynxeye> Okay, so you have a BO referenced by the current job, but no page table entries? This smells like a kernel driver bug, the page tables entries should not disappear before the BO...
<lynxeye> Is this just a single BO and all the others have pagetable entries?
<marex> lynxeye: let me decode this again and pastebin it, but yes, that's what it is
<marex> lynxeye: so e.g. I get this
<marex> [ 2263.976187] etnaviv-gpu 59000000.gpu: MMU fault status 0x00000002
<marex> [ 2263.980880] etnaviv-gpu 59000000.gpu: MMU 0 fault addr 0xfe5b0f80
JohnnyonFlame has joined #etnaviv
<marex> and then
<marex> viv-unpack etnaviv-20210426232145.bin . ; hexdump -C mmu.bin
<marex> produces
<marex> and also there is this
<marex> so I suppose, I am looking at MTLB entry
<marex> 00000fe0 01 40 86 fb 01 50 86 fb 01 e0 85 fb 01 f0 85 fb |.@...P..........|
<marex> 0xfe4 ^^^^^^^^^^^
<marex> that one is PRESENT
<marex> and since I have 14 MTLB entries in total, this is the 8th, then the STLB offset is 0x8000
<marex> which is 0x2 , EXCEPTION
<marex> but there is a BO which is supposed to be there, 5 bo fe4e9000 00100000 1048576
Chewi has quit [Ping timeout: 245 seconds]
chewitt has quit [Quit: Adios!]
<marex> austriancoder: lynxeye: ^
<lynxeye> marex: I agree with your analysis. This looks very suspicious.
<lynxeye> However I'm currently can't see how we get into this situation. Looks like I need to do an audit on the kernel driver MMU code...
<marex> lynxeye: got any hints for debugging this ?
berton has joined #etnaviv
pcercuei has quit [Quit: brb]
pcercuei has joined #etnaviv
<marex> lynxeye: got any hints for debugging this ?
<lynxeye> marex: Not really, yet. One thing I wonder however is if you can make it more likely to hit this issue by disabling the userspace BO cache.
<mntmn> marex: sorry but which gpu are you working with atm? gc7000?
<marex> mntmn: gc400
<marex> lynxeye: is there an env var for that or do I have to hack it out ?
<mntmn> ok
<lynxeye> marex: no env var afair
JohnnyonFlame has quit [Ping timeout: 252 seconds]
JohnnyonFlame has joined #etnaviv
lynxeye has quit [Quit: lynxeye]
Chewi has joined #etnaviv
JohnnyonFlame has quit [Read error: Connection reset by peer]
JohnnyonFlame has joined #etnaviv
pcercuei has quit [Quit: dodo]
paulk-leonov has quit [Ping timeout: 252 seconds]
paulk-leonov has joined #etnaviv