alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
vstehle has quit [Ping timeout: 246 seconds]
megi has quit [Ping timeout: 246 seconds]
NeuroScr has quit [Quit: NeuroScr]
chewitt has quit [Quit: Adios!]
vstehle has joined #panfrost
<tomeu> shadeslayer: someone should look at what's going on with that assert
<tomeu> I find it quite weird that by using the safe iterator, some tests start failing
_whitelogger has joined #panfrost
_whitelogger has joined #panfrost
<bbrezillon> the list_for_each_entry() macro is not expecting the body of the loop to update the pos parameter manually
<bbrezillon> tomeu, shadeslayer: something like that should fix the problem http://code.bulix.org/nxu6z4-833288
<tomeu> nice!
<bbrezillon> or simply use LIST_FOR_EACH_ENTRY() instead of list_for_each_entry()
<bbrezillon> (seems like we have the same problem in various places)
<bbrezillon> the use of mir_foreach_instr_in_block_safe() in midgard_pair_load_store() is not so safe
pH5 has quit [Quit: bye]
<bbrezillon> if c is equal to mir_next_op(ins), we're removing the next element saved by the _safe logic to keep track of its next elem
pH5 has joined #panfrost
davidlt has joined #panfrost
megi has joined #panfrost
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
NeuroScr has joined #panfrost
davidlt has quit [Read error: Connection reset by peer]
davidlt has joined #panfrost
davidlt has quit [Ping timeout: 258 seconds]
Elpaulo1 has joined #panfrost
Elpaulo has quit [Read error: Connection reset by peer]
Elpaulo1 is now known as Elpaulo
afaerber_ has quit [Quit: Leaving]
marvs has quit [Ping timeout: 258 seconds]
NeuroScr has quit [Quit: NeuroScr]
raster has joined #panfrost
megi has quit [Ping timeout: 246 seconds]
<bbrezillon> shadeslayer: can you give that branch a try https://gitlab.freedesktop.org/tomeu/mesa/commits/mir-for-each ?
davidlt has joined #panfrost
<alyssa> bbrezillon: I have no idea what's going on bu it sounds like you know what you're talking about :+1:
<bbrezillon> alyssa: I wouldn't be so sure if I was you :)
<alyssa> bbrezillon: I don't ever really feel like I know what I'm doing
<bbrezillon> but yes, there's something fishy in those list helpers
<bbrezillon> the safe attribute is limited to insertion/removal of the current item, if you touch the next item you're screwed
<alyssa> ....Ah.
herbmilleriw has joined #panfrost
<alyssa> tomeu: BTW, are you planning to review the XFB stuff?
<alyssa> If no, I'll push now; if yes, I can wait.
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
davidlt_ has joined #panfrost
davidlt has quit [Read error: Connection reset by peer]
<tomeu> alyssa: don't think I will have time today :/
<tomeu> maybe bbrezillon?
<bbrezillon> tomeu: I know nothing about XFB :-/
<alyssa> bbrezillon: Not with that attitude! :p
<tomeu> me neither :p
<tomeu> since when has OpenGL knowledge become a prerequisite for implementing a gallium driver? :p
<bbrezillon> alyssa: okay okay, I'll have a look, but don't expect anything but stupid comments :P
<alyssa> :P
<alyssa> Most comments I make are stupid!
<alyssa> Including this one!
adjtm_ has quit [Remote host closed the connection]
adjtm_ has joined #panfrost
<robher> tomeu: comments on my madvise mesa patch?
<tomeu> robher: do you mind if I look at it tomorrow?
<robher> tomeu: sure
davidlt_ has quit [Ping timeout: 272 seconds]
<alyssa> Oh weiiiiiiiiird
<alyssa> The blob can promote "uniform expressions" to a single-invocation compute shader run as a dependency writing out to a uniform register
megi has joined #panfrost
<bnieuwenhuizen> wait, more shader runs?
<alyssa> bnieuwenhuizen: Weeee
<alyssa> It's theoretically a good opt
<alyssa> The idea is if you have:
<alyssa> gl_FragColor = vColor + sqrt(uniform);
<alyssa> It can do a prepass to compute:
<alyssa> uniform' = sqrt(uniform)
<alyssa> and that only runs once per frame
<alyssa> and then per fragment you have:
<alyssa> gl_FragColor = vColor + uniform'
<alyssa> Why not run it on the CPU, not sure
<bnieuwenhuizen> alyssa: because the uniform buffer might be populated from the GPU?
<bnieuwenhuizen> so if you want to do the opt on the CPU you have a stall maybe
<bnieuwenhuizen> but yes, it is a pretty cool opt (though I'm curious what the savings is vs. the cost of another compute dispatch)
<alyssa> bnieuwenhuizen: Yeah \shrug/
raster has quit [Remote host closed the connection]
<alyssa> (Actually they do it with a 1 pixel fragment shader but w/e)
<shadeslayer> bbrezillon: your branch works btw :)
<bbrezillon> shadeslayer: I guess it doesn't solve the unbalanced ref/unref issue
<shadeslayer> bbrezillon: no, I'm trying to dump my BO's right now, but need to figure out a way to get the debug output not to overwrite each other
<shadeslayer> bbrezillon: https://paste.ubuntu.com/p/3skCDNYRWF/ warning, huge log
megi has quit [Ping timeout: 272 seconds]
<shadeslayer> aha, I spot 2 unreferences
<shadeslayer> so it unreferences 0xffff7866ecd0 at line 71261 and then again at line 71279
<bbrezillon> shadeslayer: you should also track allocation/de-allocation and the type of BO (who allocated the BO)
<shadeslayer> bbrezillon: yeah, doing that now
davidlt has joined #panfrost
herbmilleriw has quit [Remote host closed the connection]
<shadeslayer> bbrezillon: by type of BO do you mean tiled/linear?
<bbrezillon> shadeslayer: no, I mean what this BO is used for
<bbrezillon> polygon BO, FB BO, ...
<shadeslayer> *nod*
<bbrezillon> BTW, the ref/unref on 0xffff5c0bebd0 seem to be balanced
herbmilleriw has joined #panfrost
<bbrezillon> you don't see the first ref because the ref is set to 1 when the object is allocated
<shadeslayer> bbrezillon: 0xffff7866ecd0 not 0xffff5c0bebd0
<bbrezillon> there's 2 assertions
<shadeslayer> yeah, I'm looking at the first one :)
Elpaulo has quit [Quit: Elpaulo]
<bbrezillon> shadeslayer: can you also print the ref counter?
<shadeslayer> sure, let me get that for you
herbmilleriw has quit [Remote host closed the connection]
<shadeslayer> bbrezillon: FWIW I think the second assert is just plasma restarting
<shadeslayer> bbrezillon: https://paste.ubuntu.com/p/GgftZ87gYM/
<shadeslayer> hm, how come the ref count doesn't go up for every bo_reference call
<shadeslayer> bbrezillon: http://paste.ubuntu.com/p/FhXjh5q3Q7/ is my instrumentation
raster has joined #panfrost
<shadeslayer> I guess instrumentation is too early for the unreference
raster has quit [Ping timeout: 245 seconds]
<bbrezillon> I think printing it before incrementing/decrementing is fine
<shadeslayer> bbrezillon: but I'm printing it after ref, before unref ;)
<bbrezillon> ok, then choose one (ideally before) and stick to it
<shadeslayer> yep already done, next step is to add info on what allocated it
<shadeslayer> however, I gtg for a dinner appointment soon, so I'll take a look at that tomorrow
<bbrezillon> sure, I'm on vacation for one week starting tomorrow
<bbrezillon> but I'm sure you'll figure it out
<shadeslayer> bbrezillon: you'll have to find out when you're back ;)
<shadeslayer> bbrezillon: also,have fun! :D
<bbrezillon> thx
<alyssa> Passed: 1432/2165 (66.1%)
<alyssa> Failed: 733/2165 (33.9%)
<alyssa> for the UBO section
<alyssa> That's... *barely* passing :p
pH5 has quit [Quit: bye]
herbmilleriw has joined #panfrost
herbmilleriw has quit [Remote host closed the connection]
<alyssa> Well I ran this entire shader manually, like, on paper tracking everything.. it's correct per the NIR so uh what am I missing here
<alyssa> Oh, messed up the linkage
<alyssa> No..
<alyssa> Needed to check an image in a paint editor for dEQP stuff and uh
<alyssa> Maybe got totally distracted and TL;DR krita works with Panfrost
<alyssa> Almost out of the box - it needs logic op support (I have it stubbed out)
<alyssa> Also grabbed a profile..
<alyssa> 11% of time is in st_TexSubImage (6.5% in panfrost_store_tiled_image)
<alyssa> Also auditing the shaders a bit
<alyssa> Fragment shaders are essentially optimal so w/e
<jrb> does any sort of support matrix exist documenting which mali cores are best supported?
<alyssa> jrb: Mali T860 is best supported
<alyssa> T760 is also fairly well supported but a little buggier
<alyssa> T820 seems to work until it doesn't
<alyssa> Other Txxx models may or may not work
<alyssa> No Gxx models work yet
<jrb> righto, thanks
Elpaulo has joined #panfrost
stikonas has joined #panfrost
megi has joined #panfrost
stikonas has quit [Read error: Connection reset by peer]
stikonas_ has joined #panfrost
<alyssa> Somewhat surprisingly panfrost_add_dependency is a slow.
<alyssa> When we consider that it's poking at mapped memory it does make sense, somewhat sadly
<alyssa> Let's try to fix this the right way
<alyssa> ..Or not, too much logic and I'm slightly drowning in abstraction
NeuroScr has joined #panfrost
<alyssa> Note to self:
<alyssa> dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.0
<alyssa> The regions marked as bad have the same rgb except with non-zero red values, whereas reference has 0 red
<alyssa> They're very much diferent; you just need to check in Krita since your eyes don't see the difference
<daniels> jrb: anything in particular you're interested in?
<HdkR> alyssa: Interesting uniform indexed output. I didn't believe that to be possible but maybe there was something special for that case
stikonas_ is now known as stikonas
<bbrezillon> alyssa: hm, am I missing something or are we really leaking the ins and block objects allocated in midgard_compile.c?
<alyssa> Bwap?
<alyssa> HdkR: Or maybe it's just dEQP being silly
afaerber has joined #panfrost
<alyssa> If it is legal and works as it appears, there's a bug higher up in the stack
<alyssa> I don't think it does.
<alyssa> bbrezillon: We probably are, yes.
<HdkR> alyssa: Trying to find the documentation now. It is definitely still mandated to be constant index in ES 320
<HdkR> alyssa: Wow. Apparently it was never restricted explicitly in the GLSL API
<HdkR> and there isn't even a note about it in spec that it is allowed
<HdkR> ESSL is nicer and states. "Fragment shader outputs declared as arrays may only be indexed by a constant integral expression."
<HdkR> AMD shader compiler generates some very nasty code around it
<alyssa> HdkR: Allowed by default? Ninth Amendment and all?
* alyssa or is it the tention
<alyssa> *tenth
* HdkR shruggies
<HdkR> I couldn't find anything explicitly banning it in any of the GLSL desktop specs
<alyssa> Anyway, compiler woe of the day:
<alyssa> I need to rewrite all the liveness analysis data structs
<alyssa> Instead of just a range, it has to be live in/out on a per block basis, with a range within some blocks
stikonas has quit [Remote host closed the connection]
<alyssa> So, the new algorithm will be as follows:
<alyssa> On each midgard_block, maintain bitfields (size = number of nodes):
<alyssa> - defs: Is this node written somewhere in this block?
<alyssa> - uses: Is this node read somewhere in this block?
<alyssa> - live_in: Is this node live at the beginning of this block?
<alyssa> - live_out: Is this node live at the end of this block?
<alyssa> Also on each midgard_block, maintain arrays of live_start and live_end, size = number of nodes, for the index within a block that it is written/finally used.
<alyssa> Given two nodes i and j, we can decide if they interfere by checking:
<alyssa> - For each block:
<alyssa> - If both nodes are live_in or both nodes are live_out, they interfere.
<alyssa> --------Ack I'm drowning in not knowing anything about dataflow analysis
<alyssa> HdkR: Help? :f
<HdkR> Welcome to analysis passes :D
<alyssa> HdkR: I'm so confused
<alyssa> I just read v3d's liveness analysis pass top to bottom
<alyssa> Doesn't seem too hard but still sort of magic
Elpaulo has quit [Ping timeout: 272 seconds]
<HdkR> I had a friend who had to spend months doing different analysis passes and it made him want to quit his job. It can be mentally deteriorating at times :)
<alyssa> HdkR: Thanks for the encouragemnt
<HdkR> Take your time, expect problems to crop up. It isn't impossible
<alyssa> Hm
<HdkR> Complex control flow will break most assumptions
<alyssa> HdkR: Should I have some sort of better infra built-up to make this tractable?
<alyssa> I know good utilites/data structures can make a world of a difference for preserving sanity with hard algorithms..
<HdkR> It can do. Tooling to visualize the CF with all this data was really helpful
<alyssa> Hmm reading freedreno's RA as well, this is slowly clicking
<alyssa> HdkR: I think I'll get this but definitely not a 5pm task
<HdkR> definitely
<HdkR> One of those things you can spend multiple weeks worth of effort on :P
<HdkR> (Depending on what exactly you need to get)
<alyssa> HdkR: (Symptom being a Krita shader failing RA because of an unfortunate interaction of texture pipeline registers + complex control flow)
<HdkR> ah, fun
<alyssa> HdkR: That's a GL3 krita shader, tho
<alyssa> The GL2 renderer (which I guess is fixed-function) seems to work :)
<HdkR> Nice
<alyssa> Perf is fine but not obviously better than sw, but sw is fast enough half the time?
<alyssa> I guess it's kinda bw bound anyway
<HdkR> Lowering CPU load would be a benefit at least
<alyssa> HdkR: CPU load is still pretty high but not in GL related stuff
<alyssa> I think the GL2 renderer is still doing all the hard stuff in sw and just using GL for blits and stuff
<HdkR> Yea, just means you can throw more CPU at something else rather than llvmpipe :D
<HdkR> Interesting
<alyssa> HdkR: sw meaning Krita's internal software renderer, which is like 30x faster than Krita's GL renderer with llvmpipe
<HdkR> aaaah
<HdkR> I see
<alyssa> Almost all the Krita I've done has been with that sw renderer on other chromebooks and it's ...
<alyssa> Krita is drastically faster than Firefox rendering the Discord web client.
<HdkR> If you're doing GL 3 things then you can start adding panfrost to features.txt :D
<alyssa> HdkR: For GLES 3.0, we're currently at where Panfrost was for GLES 2.0 at the beginning of 2019
<HdkR> But with a much more robust shader compiler
<alyssa> ("All the features are there but they're all half-broken but it's good enough for some stuff to work but not anything real world also I forgot about a bunch of minor features and corner - and not so corner - cases of big features0
<alyssa> That's true, we're doing overwhelmingy well on the shaders.* tests specifically since I designed the shader compiler to be ES3-class from early on
<alyssa> ALso, somebody broke GALLIUM_HUD, it was probably me
* alyssa bisects
<alyssa> Trying to find a good commit >.<