austriancoder changed the topic of #etnaviv to: #etnaviv - the home of the reverse-engineered Vivante GPU driver - Logs https://freenode.irclog.whitequark.org/etnaviv
DPA has quit [Ping timeout: 240 seconds]
DPA has joined #etnaviv
mth has quit [Quit: Konversation terminated!]
mth has joined #etnaviv
srk has joined #etnaviv
lynxeye has joined #etnaviv
<marex> lynxeye: hey, morning, so now that I have the devcoredump(s) , what next ? I can run viv-unpack on those and they indicate MMU checks failed with a long list of entries
<austriancoder> marex: you are a really impatient guy
<austriancoder> marex: when doing the viv_unpack it tells you where the fetch engine was stuck .. in your case it is in the ring buffer
<austriancoder> marex: use the cmd stream dumper on the ring buffer and see what the GPU should do at the bad address
<austriancoder> marex: also look what the mmu fault address is
<austriancoder> marex: this should give you a starting point
<marex> austriancoder: impatient ? I've been at this for a month and half already with zero progress ...
<lynxeye> marex: I don't think the MMU checker supports MMUv2, so ignore those errors.
<marex> austriancoder: how did you find out its in the ring ?
<marex> lynxeye: ha, ok
<marex> austriancoder: the cmd stream dumper is this dump_cmdstream.py from etna_viv ?
<austriancoder> marex: viv-unpack tells you that it is in the ring: "* 2 ring 00001000 00001000 4096" -- with a little * at the beginning of the line
<austriancoder> marex: yes
<marex> $ ./tools/dump_cmdstream.py ring.bin gives me 'Magic value 40000005 not recognized'
<marex> maybe the format is different than the FDR files ?
<austriancoder> marex:
<austriancoder> marex: sorry.. should be dump_separate_cmdbuf.py
<marex> austriancoder: ah ok, that crashes on UnicodeDecodeError: 'ascii' codec can't decode byte 0x94 in position 64: ordinal not in range(128)
<marex> austriancoder: I'll take a look into this
<austriancoder> marex: ./dump_separate_cmdbuf.py -b /stuff/ring.bin
<austriancoder> works for me
<marex> austriancoder: oh, yeah, that works
pcercuei has joined #etnaviv
<marex> austriancoder: all right, the MMU fault address is nowhere in the ring dump
<marex> but there is a BO with that address
<marex> bo-fd945000.bin in this dump here
<lynxeye> marex: What's the thing that is supposed to be executed at that ring address or slightly before?
<lynxeye> A cache flush?
<marex> lynxeye: what address ?
<lynxeye> the ring address where the FE stopped
<marex> lynxeye: that's the MMU fault address? that address is not in the ring dump
<austriancoder> marex: 00000660 = 00000815 Cmd: [stall DMA: idle Fetch: valid] Req idle Cal idle
<austriancoder> 00000664 = 00001138 Command DMA address
<austriancoder> 00001138 is the address
<marex> 1138 is also not in the ring dump
<austriancoder> 00001138 = the ring address where the FE stopped
<lynxeye> marex: 1138 is an address in the ring
<lynxeye> What command is at that address?
<marex> dump_separate_cmdbuf.py -b ring.bin | grep 1138 gives 0 results
<austriancoder> that not how it works..
<marex> that's not what the documentation says ;-)
* austriancoder is in a video conference the next 15 minutes
<marex> https://paste.debian.net/hidden/6e983c05/ could it be line 85 here ?
<marex> lynxeye: ^
<austriancoder> marex: https://paste.debian.net/1192588/
<austriancoder> lynxeye: some flushing is happening
<marex> austriancoder: so where is the 1138 in that ?
<austriancoder> marex: 00001138 Command DMA address ... ring starts at 00001000 --> offset 138
<marex> austriancoder: how do you get from 0x1000 to offset 138 ?
<marex> austriancoder: how do you get from 0x1000 to offset 0x138 ?
<austriancoder> as I wrote.. the FE DAM Engine stopped at 0x1138 .. that address in the ring (which starts at 0x1000) -> offset = 0x138
lrusak has quit [Ping timeout: 248 seconds]
<marex> austriancoder: d'oh ...
<austriancoder> next video call is coming quickly
<marex> sigh
chewitt has quit [Quit: Adios!]
<lynxeye> marex: Is there a buffer at the MMU fault address?
<marex> there is bo-fd945000.bin produced by viv_unpack, which patches the MMU fault address
<marex> lynxeye: ^
<marex> *matches
<lynxeye> hm, maybe now is the time to add MMUv2 support to the checker...
<lynxeye> If the BO is in the dump, it's clearly still alive and there is no MMU context switching going on in your ring dump
<lynxeye> so I'm not sure why this would fault
<lynxeye> What's the fault status?
<marex> 2
<marex> lynxeye: 2
<lynxeye> which is a page not present, so either the pagetables are really wrong, or we are hitting some more obscure GPU bug
<marex> lynxeye: what exactly does "page not present" mean ?
<lynxeye> marex: Some engine of the GPU tried to access an address where no valid VA->PA translation is present in the pagetables.
<marex> lynxeye: the GPU page tables, right ?
<lynxeye> yep
<marex> lynxeye: could there be some race ?
<marex> i.e. the pagetables are populated too late ?
<lynxeye> I wouldn't rule out that possibility. But what's happening here is that you return from a user command stream and only the forced cache flush hits the MMU fault. I would expect that the user command stream did in fact write something into the buffer already, so the page entries should have been there.
<lynxeye> So from that it seems more likely that the pagetable get depopulated too early, but I'm not sure how this would happen, as long as the BO is alive. We don't depopulate pagetable entries for live BOs on MMUv2 IIRC.
<marex> lynxeye: I am running glmark on weston, I would expect that to be rather linear, i.e. not something that would trigger races
<marex> lynxeye: could it be one of the locking issues we had with BOs ?
<marex> but then, with glmark ... that sounds odd
<lynxeye> I don't think so. The kernels view of the buffer alive status seems to match what is programmed into the GPU hw.
<lynxeye> Really the first thing now would be to actually type up that MMU checker for MMUv2, to see if the pagetables look sane.
<marex> lynxeye: thats something that goes into the viv_unpack or also needs some part in the devcoredump kernel part ?
<lynxeye> marex: The kernel already dumps the pagetables. viv_unpack needs to learn the new dump format for MMUv2.
<marex> oh
<lynxeye> MMUv1 is just a large linear one level pagetable. MMUv2 is a two level pagetable. First page in the dump is first level. Following pages are the populated second level entries.
berton has joined #etnaviv
karolherbst has quit [Quit: duh 🐧]
karolherbst has joined #etnaviv
lrusak has joined #etnaviv
chewitt has joined #etnaviv
gbisson has quit [Remote host closed the connection]
flto has quit [Remote host closed the connection]
gbisson has joined #etnaviv
srk has quit [Ping timeout: 260 seconds]
kherbst has joined #etnaviv
karolherbst has quit [Ping timeout: 260 seconds]
kherbst has quit [Client Quit]
karolherbst has joined #etnaviv
pcercuei has quit [Quit: dodo]
berton has quit [Remote host closed the connection]
lynxeye has quit [Quit: lynxeye]