alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
tgall_foo has quit [Read error: Connection reset by peer]
stikonas has quit [Remote host closed the connection]
stikonas_ has quit [Remote host closed the connection]
<urjaman> icecream95: because it broke
<icecream95> urjaman: What broke?
<urjaman> that is, it was/is a regression while doing fancy new things
<urjaman> register allocation/spilling for one of X shaders
<icecream95> I haven't used X for a few weeks, so I would have missed any regression for it.
<icecream95> X definitely worked fine on Mesa master just after LCRA was merged, and I'm pretty sure it was still working for me (at least LightDM) on Nov 26, which is after reports of X not working started coming in.
<urjaman> it broke before LCRA was merged (i guess it depends if you use anything that triggers the glamor gradient shader...)
megi has quit [Ping timeout: 268 seconds]
vstehle has quit [Ping timeout: 268 seconds]
tgall_foo has joined #panfrost
vstehle has joined #panfrost
<tomeu> alyssa: ah no, was referring to races in the kernel that manifest when jobs time out
<urjaman> i have a couple of dmesg's of some very random looking kernel failures lately, but my kernel is 5.4.0 + the very first version of bbrezillon's patches from way back so i dunno how relevant they are
<urjaman> eg. this one https://urja.dev/this_ended_badly.txt search for: "BUG: Bad page state in process Compositor"
<urjaman> (i pulled that from journalctl after the laptop just rebooted itself on its own...)
<icecream95> You can recover logs even after kernel panics by looking at /sys/fs/pstore/console-ramoops-0
<urjaman> /sys/fs/pstore doesnt exist
<urjaman> so i guess i dont have that enabled or it's not relevant to how i boot ...
<icecream95> Have you got CONFIG_PSTORE enabled in you kernel config?
<urjaman> it's not mentioned in the kernel .config at all?
<urjaman> i'm guessing you're talking about something new in 5.5 or something in a vendor kernel? ...
* urjaman googles
<icecream95> It's been in mainline for a while, even 5.1 had it.
<urjaman> ah yeah, i've disabled all misc filesystems so that explains why it's not enabled at all
<urjaman> i dunno if i have a platform driver for it tho, lets see
<icecream95> > it's not relevant to how i boot Are you not using coreboot?
<urjaman> u-boot
<urjaman> also, that driver looks like very x86
<icecream95> On arm, coreboot passes the pstore information to the kernel.
<urjaman> yeah but i'm not using coreboot
<urjaman> hmm looks like i could just specify a spot of ram in the dts, but *shrug* (it'd need to be one that nothing in the reset flow touches and i have no clue right now :P)...
<HdkR> tlwoerner: Was that assert about the assertions/illegal instructions you had before?
<tlwoerner> HdkR: yes. what a silly assert to not bother printing the value that fell through the switch statement!
<icecream95> urjaman: If you get kernel panics and you want to see what happened, at least you know who to ask to test it for you. :)
<tlwoerner> my plan was to tweak that assert (and others?) to print the failing value so the message could be more helpful
<HdkR> tlwoerner: That assert itself might be an easy fix if the hardware supports the 3 other wrap modes, which current defines implies it might
<tlwoerner> HdkR: the "illegal instruction" occurred with a "--buildtype plain" and the assert is with "--buildtype debugoptimized"
<HdkR> yea, that happens
<HdkR> This is why in my own project all asserts are string parsing capable so you get some textual output before it explodes :)
guillaume_g has joined #panfrost
_whitelogger has joined #panfrost
<bbrezillon> urjaman: hm, the implementation hasn't changed much
<bbrezillon> so I suspect the problem is still present with the v2 sent by robher
<bbrezillon> robher: BTW, thanks a lot for sending this v2
yann has quit [Ping timeout: 240 seconds]
jernej has quit [Ping timeout: 246 seconds]
davidlt has joined #panfrost
jernej has joined #panfrost
pH5 has joined #panfrost
<tomeu> alyssa: regarding the state leaks, I have compared the traces of a good run and a bad run, an only two differences seem that could be relevant:
<tomeu> sampler_count remains 1 in the bad case if a previous test left it like that
<tomeu> haven't found where we should reset it, as panfrost_bind_sampler_states doesn't get called when there's no samplers
<tomeu> the other difference is:
<tomeu> -rgba32f attribute_0[48];
<tomeu> -rgba32f attribute_1[48];
<tomeu> +rgba32f attribute_1[32];
<tomeu> +rgba32f attribute_0[32];
<tomeu> does it ring a bell?
icecream95 has quit [Ping timeout: 246 seconds]
yann has joined #panfrost
raster has joined #panfrost
megi has joined #panfrost
chewitt has joined #panfrost
warpme_ has joined #panfrost
davidlt has quit [Ping timeout: 268 seconds]
chewitt has quit [Quit: Zzz..]
chewitt has joined #panfrost
abordado has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
<alyssa> icecream95: As urjaman pointed out, the spilling happens on a *particular* shader in xorg/glamor (gradients). If you're using something like i3, you won't hit that codepath and it's all good.
<alyssa> tlwoerner: I recall adding some wrap mode emulation for someone at some point uhhh
davidlt has joined #panfrost
<alyssa> that would've been GL_CLAMP
<alyssa> Oh, I wooooonder
<alyssa> It's totally poss-- hm. Ok. I can make a total conjecture but I have *no* way to test these things.
* alyssa really needs piglit and/or CTS for desktop
<alyssa> You said this was on tunnel, though? tunnel seems okay to me?
<alyssa> tomeu: sampler_count being greater than it needs to be shouldn't really matter, except for pantrace throwing a warning
<alyssa> Pretty sure the blob has done that, even
<alyssa> The attribute bit seems potentially more significant ... if the attribute buffer is downsizing, well, that's going to cause issues since we'll readback all zeroes for the higher vertices, probably resulting in crazy visual corruption
<tomeu> alyssa: cool, any reason why state leak could affect that?
<alyssa> tomeu: who knows ...
<alyssa> Let's see, I might be doing some funky calculation
<alyssa> tomeu: Theoretically size is calculated around L312 in pan_instancing.c
<alyssa> But I don't see any state to leak exactly, unless we're just straight up getting different pointers
<alyssa> In which case .. okay, sure. But then there should also be a corresponding difference in src_offset or something between the two cases?
<tlwoerner> alyssa: yes, both tunnel and tunnel2 (as well as a bunch of others) are funky for me
<tlwoerner> is there a bug tracker? maybe i need to specify board, versions, etc?
<alyssa> tlwoerner: mesa/mesa on gitlab.freedesktop.org issues, I guess
<tlwoerner> (and most will probably be closed with: "use a different version!")
<tlwoerner> interestingly enough, remember how i said the shadows were missing on glmark2 [ideas] test? if i run the test by itself the shadows appear. but when run as part of the whole glmark2 benchmark they *very often* disappear (but not always). so probably some resource getting exhausted, but only sometimes
<alyssa> Hmm
<alyssa> What mesa version is this, ybtw?
<HdkR> https://www.hardkernel.com/shop/odroid-mc1-solo/ If anyone needs Midgard 1st gen boards. They are $9 apiece until sold out :P
<alyssa> HdkR: Tempting :p
<HdkR> Good for SFBD CI farm :D
raster has joined #panfrost
<chewitt> ping Justin (CEO) and ask for some
<chewitt> if they're "giving them away" they might as well give some away :)
<HdkR> :D
chewitt has quit [Quit: Zzz..]
megi has quit [Ping timeout: 250 seconds]
megi has joined #panfrost
guillaume_g has quit [Ping timeout: 268 seconds]
<tomeu> alyssa: found some UBs in panfrost: https://gitlab.freedesktop.org/tomeu/mesa/-/jobs/1119667/raw
<tomeu> lots of spurious stuff though
<tomeu> looks like during BO destruction, at some point we reach ref count 0 when we're still doing stuff on it
<tomeu> 2019-12-11T14:21:33 [1m../src/gallium/drivers/panfrost/pan_resource.c:889:16:[1m[31m runtime error: [1m[0m[1mmember access within null pointer of type 'struct panfrost_resource'[1m[0m
<tomeu> 2019-12-11T14:21:33 #0 0xad05a902 in panfrost_resource_get_stencil ../src/gallium/drivers/panfrost/pan_resource.c:889
<tomeu> 2019-12-11T14:21:33 #1 0xad331894 in u_transfer_helper_resource_destroy ../src/gallium/auxiliary/util/u_transfer_helper.c:141
<tomeu> will look for something motr obviously problematic
<tomeu> gah, then started OOMing
<alyssa> tomeu: Oh my.
guillaume_g has joined #panfrost
<tomeu> alyssa: at the end of this job, the output of the deq run for that single failing test should be printed along any ubsan and asan warnings: https://lava.collabora.co.uk/scheduler/job/2103790
<alyssa> :+1:
yann has quit [Ping timeout: 252 seconds]
yann has joined #panfrost
* rtp_ would be happy to see T628 in CI :)
yann has quit [Ping timeout: 245 seconds]
robher has quit []
robher has joined #panfrost
daniels has quit []
daniels has joined #panfrost
guillaume_g has quit [Quit: Konversation terminated!]
raster has quit [Quit: Gettin' stinky!]
pH5 has quit [Quit: bye]
pH5 has joined #panfrost
jschwart has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
yann has joined #panfrost
warpme_ has quit [Quit: Connection closed for inactivity]
stikonas has joined #panfrost
sravn has quit [Quit: WeeChat 2.6]
sravn has joined #panfrost
nerdboy has joined #panfrost
warpme_ has joined #panfrost
jschwart has quit [Ping timeout: 250 seconds]
raster has joined #panfrost
robertfoss has quit [Ping timeout: 250 seconds]
robertfoss has joined #panfrost
icecream95 has joined #panfrost
raster has quit [Quit: Gettin' stinky!]
raster has joined #panfrost
davidlt has quit [Ping timeout: 246 seconds]
pH5 has quit [Ping timeout: 252 seconds]
pH5 has joined #panfrost
abordado has quit [Ping timeout: 246 seconds]
raster has quit [Quit: Gettin' stinky!]