alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
icecream95 has quit [Ping timeout: 260 seconds]
icecream95 has joined #panfrost
stikonas has quit [Remote host closed the connection]
vstehle has quit [Ping timeout: 246 seconds]
yann|work has quit [Ping timeout: 260 seconds]
vstehle has joined #panfrost
yann has joined #panfrost
davidlt has joined #panfrost
buzzmarshall has quit [Remote host closed the connection]
davidlt has quit [Ping timeout: 258 seconds]
chewitt has quit [Quit: Zzz..]
davidlt has joined #panfrost
chewitt has joined #panfrost
_whitelogger has joined #panfrost
cwabbott_ has joined #panfrost
cwabbott has quit [Ping timeout: 246 seconds]
cwabbott_ is now known as cwabbott
<tomeu> alyssa: yeah, I was already there :/
<tomeu> flakiness in power supply could lead to the GPU reading wrong values from the cmdstream or behaving erratically, but I think it's far less likely that it would start overwriting random parts of the cmdstream
<tomeu> that said, I never checked for cmdstream changes in the spurious failures in the gles3 tests on the kevin when running at the highest frequencies
<tomeu> because I don't have a kevin, mainly :p
_whitelogger has joined #panfrost
<tomeu> narmstrong: guess the kbase you are using on the G52 is using the aarch64 page tables?
<tomeu> if so, wonder if it would be too much work to test with the midgard page table format and see if the same erratic behaviour is observed
blaze_cornbread has joined #panfrost
<tomeu> narmstrong: nm, I'm trying to hack aarch64 format support in
<tomeu> may be faster
<tomeu> grr, why am I getting permission faults when the GPU tries to write to the cmdstream?
blaze_cornbread has quit [Quit: blaze_cornbread]
<tomeu> ah, I was using ARM_64_LPAE_S2 instead of ARM_64_LPAE_S1
<tomeu> robher: I guess the same erratic behavior with the aarch64 page table format, so must be something else
bbrezillon has joined #panfrost
<tomeu> robher: narmstrong: robmur01: so, if I check the contents of the command stream *before* submitting to the GPU, in the bad runs it's all zeroes
<tomeu> in the good runs, it's the expected cmdstream
<tomeu> so it has nothing to do with the GPU, and rather with how the panfrost kernel driver allocates those buffers
<narmstrong> tomeu: how is this possible ?
<tomeu> no idea, because seems to be plain shmem
<tomeu> so maybe mesa is overwriting the cmdstream?
<tomeu> no idea how that could happen, though, in this random fashion
<narmstrong> tomeu: do you dump at panfrost or mesa level ?
raster has joined #panfrost
<narmstrong> tomeu: are you on the Odroid-N2 ?
<narmstrong> tomeu: are you using the Hardkernel U-boot ?
<narmstrong> I hope it doesn't allocate in a reserved memory zone, it can't be otherwise the kernel and userspace would crash at some point
<narmstrong> tomeu: what do I need to reproduce ? I can try on the VIM3
<narmstrong> I also have the N2 if necessary
<tomeu> narmstrong: yep, on a odroid-n2
<tomeu> haven't tried yet to map the buffer from within the kernel
<tomeu> let me do before some sanity checking within mesa
<tomeu> narmstrong: afaics, if I make a new mapping, then I read zeroes instead of what I last wrote with a previous mapping
<tomeu> well, not always, maybe 4 out of 5 times
icecream95 has quit [Ping timeout: 240 seconds]
<tomeu> when I read back using the same mapping as when I wrote, then I get the expected contents back fine
cwabbott has quit [Remote host closed the connection]
<narmstrong> wtf
cwabbott has joined #panfrost
<narmstrong> can you share your kernel tree ?
<tomeu> narmstrong: also, using odroid's u-boot
<tomeu> yeah, guess I should go back first to the mali page table format
<narmstrong> if you loose data between maps, it can't be an hw issue
<tomeu> okthink it's a hw issue
<tomeu> when it works, the second mmap is the same address than before
<tomeu> when it doesn't, I get a different address
<tomeu> I suspect some problem with BO caching or so
<narmstrong> all of this is pure sw
<tomeu> yeah, sorry, meant the opposite
<narmstrong> ok
<narmstrong> are you using a stable kernel tree working for midgard ?
<tomeu> it's 5.6-rc5 plus the reset hacks, but it's
<tomeu> ..also the one we use in the Mesa CI
<tomeu> oops, the thing about different mappings and reading zeroes from there was all my fault
<tomeu> too many hacks on top of other hacks
<tomeu> we're back now to zeroes appearing around the fields in the cmdstream that the GPU is expected to be updating
<tomeu> as if writes were spilling around, could be a problem with the write back cache not being prepopulated?
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #panfrost
stikonas has joined #panfrost
<tomeu> alyssa: if one allocates a bo for the checksum data, it gets much more reliable
<tomeu> and looks like the bigger the BO, the more reliable it becomes :p
<tomeu> I've been allocating transiently for this experiment, and noticed that if I made it too small, the header descriptor which is allocated next in the transient pool is overwritten with sequences of ff808080 c0008080 :p
<tomeu> which are the values in the new fields in the extra descriptor
<tomeu> alyssa: you should be able to reproduce that with https://gitlab.freedesktop.org/tomeu/mesa/-/commits/bifrost
robmur01_ has joined #panfrost
<robmur01_> tomeu: could it be an insufficient alignment thing? i.e. does the point where the corruption starts look like a rounded-up/rounded-down version of some pointer the GPU was previously given?
maciejjo has quit [Remote host closed the connection]
cwabbott has quit [Ping timeout: 246 seconds]
cwabbott has joined #panfrost
<tomeu> robmur01_: hard to tell because there's a lot of zeroes around the values that change
<tomeu> but there was indeed a clear alignment requirement on the first header descriptor, that I already took care of
<tomeu> hmm, there's a bunch of cache_clean-related functions in mali_kbase_device_hw.c that weren't in the kbase I had before
<tomeu> one more difference: we don't handle BASE_HW_FEATURE_CLEAN_ONLY_SAFE
<tomeu> one more:
<tomeu> + /* Ensure page-tables reads use read-allocate cache-policy in
<tomeu> + * the L2
<tomeu> + transcfg |= AS_TRANSCFG_R_ALLOCATE;
<tomeu> + */
maciejjo has joined #panfrost
Depau_ has quit [Quit: ZNC 1.7.5 - https://znc.in]
Depau has joined #panfrost
cwabbott has quit [Quit: cwabbott]
cwabbott has joined #panfrost
robmur01_1 has joined #panfrost
robmur01_1 has quit [Client Quit]
<narmstrong> the bifrost kbase is slightly different
<narmstrong> kind of astonishing ARM distributes 2 different version of kbase...
robmur01_ has quit [Ping timeout: 250 seconds]
robmur01_ has joined #panfrost
robmur01_ has quit [Client Quit]
robmur01_ has joined #panfrost
<chewitt> someone from Amlogic has previously pointed out that you can use the bifrost kbase on midgard too
davidlt has quit [Remote host closed the connection]
enunes has quit [Quit: ZNC 1.7.2 - https://znc.in]
Depau has quit [Quit: ZNC 1.7.5 - https://znc.in]
Depau has joined #panfrost
Depau has quit [Quit: ZNC 1.7.5 - https://znc.in]
Depau has joined #panfrost
enunes has joined #panfrost
yann has quit [Ping timeout: 264 seconds]
<narmstrong> tomeu: https://gitlab.freedesktop.org/narmstrong/mesa/-/jobs/2053874 `Fatal Python error: initfsencoding: unable to load the file system codec`
davidlt has joined #panfrost
enunes has quit [Quit: ZNC - https://znc.in]
<narmstrong> trying to run your stuff on an aarch64 runner
enunes has joined #panfrost
<tomeu> narmstrong: what python version are you using?
<narmstrong> tomeu: the version in the arm_build !
<tomeu> oh, so the same docker image?
<tomeu> well, how could it be? ...
<narmstrong> no idea
<tomeu> hmm, I'm quite lost
<tomeu> let's see if somebody else in this channel has any ideas
<narmstrong> tomeu: what do you use as aarch64 runner ?
<tomeu> narmstrong: I think it's in one of the arm64 servers from packet, that we use to build
<tomeu> daniels: is that right?
<daniels> correct
<daniels> it's one of the fd.o shared runners
Depau has quit [Quit: ZNC 1.7.5 - https://znc.in]
<narmstrong> ok, but what's the soc ? weird it faults on my runner
Depau has joined #panfrost
<daniels> Cavium ThunderX
<narmstrong> ok, can't compete :-p
<daniels> i mean, it's not running Gentoo or anything, it's just a Debian system which should run on any armv8 ...
<daniels> this _shouldn't_ be it, but could you push a script change which executes 'locale' right before it tries to run the Python script which fails?
yann has joined #panfrost
<narmstrong> yeah I know, the system is running ubuntu with a shitload of python already
<narmstrong> I restarted a pipeline, and I'll do that if it still fails
<daniels> on that machine, LANG/LANGUAGE/LC_ALL are all unset, and the rest of the LC_* come out as POSIX
<daniels> you can do this to get a shell in the exact same environment btw: docker run -ti registry.freedesktop.org/narmstrong/mesa/debian/arm_build:2020-03-24 /bin/bash
tomboy64 has quit [Remote host closed the connection]
tomboy64 has joined #panfrost
<narmstrong> ok, python3 faults alone
<narmstrong> ok with registry.freedesktop.org/tomeu/mesa/debian/arm_build:2020-03-24 it's fine :-/
<tomeu> narmstrong: in case you can spot something obvious: https://gitlab.freedesktop.org/tomeu/linux/-/tree/bifrost
<tomeu> it belongs exactly the same with either aarch64 page tables or mali legacy
<tomeu> s/belongs/behaves
<daniels> narmstrong: so that's an interesting point of difference then - does 'LC_ALL=C python' work?
<daniels> or 'LANG= python3'
megi has quit [Quit: WeeChat 2.7.1]
megi has joined #panfrost
enunes has quit [Quit: ZNC - https://znc.in]
<narmstrong> daniels: i deleted all my images in the registry and now it's ok, seems something got corrupted
<daniels> narmstrong: bizarre! thanks for working through it though :)
<narmstrong> daniels: tomeu: got the `Serve files for LAVA via separate service` running, but with an internal URL (no https, :8080 port)
<narmstrong> I think the FILES_HOST_NAME and a new FILES_HOST_URL should be a runner-specific variable
<daniels> narmstrong: \o/ you should be able to change the URL definition to have your LAVA dispatcher variable pull from there
<daniels> yeah, that would work
<daniels> narmstrong: awesome! thanks a lot!
anarsoul|c has joined #panfrost
cwabbott has quit [Read error: Connection reset by peer]
cwabbott has joined #panfrost
clementp[m] has quit [Ping timeout: 240 seconds]
thefloweringash has quit [Ping timeout: 256 seconds]
thefloweringash has joined #panfrost
clementp[m] has joined #panfrost
mixfix41 has quit [Quit: Leaving.]
robmur01_ has quit [Quit: robmur01_]
robmur01_ has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
rcf has quit [Quit: WeeChat 2.7]
rcf has joined #panfrost
raster has joined #panfrost
chewitt has quit [Quit: Zzz..]
icecream95 has joined #panfrost
chewitt has joined #panfrost
TheKit has quit [Ping timeout: 246 seconds]
icecream95 has quit [Ping timeout: 250 seconds]
icecream95 has joined #panfrost
mias has joined #panfrost
mias_ has quit [Ping timeout: 256 seconds]
adjtm_ has joined #panfrost
adjtm has quit [Ping timeout: 260 seconds]
davidlt has quit [Ping timeout: 250 seconds]
robmur01_ has quit [Quit: robmur01_]
cwabbott has quit [Quit: cwabbott]
raster has quit [Quit: Gettin' stinky!]