alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
<alyssa> I've been working on the compiler, specifically to improve our implementation of branching, which is currently lacking in many respects
<alyssa> Doing so will let us handle more sophisticated shaders, including a large one in STK that I had to nop out for the previous demo (once I fix these bugs, STK ES 2.0 should work unpatched)
<alyssa> I do worry about performance of branching, though..
_whitelogger has joined #panfrost
<HdkR> alyssa: Theoretically the performance of branching on midgard and bifrost should be quite a bit better than desktop equivalents
<alyssa> HdkR: Hope so
<alyssa> I was incurring some major perf issues but then I realised the compiled shader wa s buggy and it was doing like 10 texture lookups, so the branching isn't necessarily the issue here ;)
<HdkR> :D
_whitelogger has joined #panfrost
<alyssa> I resolved the issue in if/else chains, but then realised -- this shader is too long that we're overflowing the compact branch offset
<alyssa> So I've been implementing far jumps in the compiler, which should be easy enough but bugs in the disassembler and I don't think the original ISA notes we have are even totally right here
<alyssa> Unrelated: aside from some minor rendering glitches, neverball works with Panfrost
* alyssa may have just played through the first 10 levels :blush:
<alyssa> (I did need to if 0 primconvert)
<alyssa> Anywho
* alyssa is really happy with driver progress, now that we can play some actual games
<davidlt> I amazed how good that sounds :)
<cyrozap> HdkR: Out of curiosity, why should the Midgard/Bifrost branching performance be better than on desktop GPUs?
<HdkR> cyrozap: Baby warp sizes so divergence doesn't hurt as much
<cyrozap> Sorry, what's a "warp size"?
<HdkR> The number of threads that operate coherently together when the PC is the same but completely serialize once PC diverges
<HdkR> Nvidia is 32, AMD is 64, Intel is 8/16/32
<HdkR> Bifrost is 4wide or 8wide depending on model :P
<HdkR> Midgard each thread has an independent PC and executes threads round-robin style. Number of independent threads = number of multiprocessors
<HdkR> Midgard shares occupancy concerns just like AMD/Nvidia, which is something Intel doesn't worry about
<HdkR> (Not sure if Bifrost has that problem but it is likely)
<HdkR> Hm. I bet Big GEN has to worry about occupancy
<HdkR> Would be even more of a waste of die space if they made every EU have the same design of RF that they currently have
<cyrozap> Hmm, I'm still not sure I completely understand, but thanks, anyways!
<HdkR> Basically GPUs run code in lockstep and if you need to branch that lockstep is broken and throughput falls to through the floor :P
<cyrozap> Oh, I see.
<HdkR> Not to mention you don't have fancy things like branch prediction and code prefetching to save your butt like CPUs
<cyrozap> So what's the tradeoff for making a GPU that doesn't run things in lockstep? Increased die area?
<HdkR> larger die sized, larger power consumption, starts becoming a CPU and you end up with something like Larabee
chewitt has quit [Quit: Adios!]
<HdkR> You also get a guarantee of forward progress as long as your thread scheduler is fair/even/round-robin
<HdkR> But Volta proved that was possible on GPU-esque hardware
<HdkR> (Volta changed the threading model)
vakkov_ has joined #panfrost
vakkov_ has quit [Ping timeout: 244 seconds]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
<urjaman> i sent out my 6-bit dithering enable (for rockchip drm driver) patch while updating my things to 4.20... it's kinda short so maybe i could get it off my shoulders :P
_whitelogger has joined #panfrost
<hanetzer> yarg
<urjaman> yeah sorry for the highlight, figured out my missing 2GB already :P
<urjaman> okay got my redshift (gamma tables) and HDMI patches sorted for 4.20
<alyssa> HdkR: TBF, ideally your shaders are small enough you fit in i-cache so the prefetch/prediction stuff is less important..
<alyssa> urjaman: Might need that, I didn't appreciate just how bad the C201's color depth is (by default, at least) until I went back to it after using Kevin for 9 months :)
<urjaman> i also have a patch that gets very close to 60Hz mode out of the panel and correct timining info for userspace, but i'm not so sure how well this would go over
<alyssa> urjaman: On the first patch linked, pleasse give the constants meaningful names
<alyssa> Are they from the SoC manual? If so, add a #define to the relevant header with the name from the manual
<alyssa> Are they reverse-engineered? Do exactly the above, but make your own reasonable name in line with what you might expect
<alyssa> As a reader, "|= 0x6" is totally magic
<urjaman> a moment
<alyssa> (You'll notice that, for the most part, I try to stick to that with Panfrost. There are some magic numbers left in random places but that's something I want to fix, it's not an endorsement :P)
<alyssa> [Most gfx drivers do this. A few do not and it drives me up a wall trying to make sense of their code :P]
<urjaman> none of the rest of the driver does that really ... but otoh there arent many enumerations there
<urjaman> they're usually fields like "htotal" where okay that is the amount of horizontal pixels yup
<urjaman> or "thing_en" which is 1 or 0 ...
<alyssa> I mean, "thing_en = 1/0" is pretty clear by itself
<alyssa> But when it's a full-blown bit field, unless the number "0x6" is meaningful by itself (in which case it wouldn't have a 0x in front of it..), it's just a little confusing, yeah?
<urjaman> but anyways this is why there is a comment saying what it does
<urjaman> the driver had really no ... template of a practice i could find that i could copy for defining the field so yeah that's how it ended up... but yeah might make sense to do it better
<alyssa> Mm, that sets it above a lot of code I've seen ;)
<alyssa> (I.e. no comment, no #define..)
<alyssa> urjaman: Out of curiosity, how well does dithering work on such a low-res display?
<alyssa> IIRC dithering is a tradeoff of spatial density for colour quality, and C201 does not exactly have a surplus of either
<urjaman> if you look close it looks a little bit noisy maybe, but i mostly dont notice it
<urjaman> compared to the bands that are really noticeable
<alyssa> Fair point :p
<alyssa> Anyway, super cool stuff, kudos ^_^
<alyssa> :+1:
stikonas has quit [Remote host closed the connection]
<urjaman> yeah that field has 4 bits really (dither_down_sel, dither_down_mode, dither_down_en and pre_dither_down_en) but i feel like it would be quite messy to set them individually ... and it was already in the registers as a 4-bit field in the driver so it was fast to do like this
<urjaman> (sel is allegro (0) or FRC (1). mode is to 565 (0) or to 666 (1), and enable makes sense... the pre_dither is 10 to 8 bit allegro)
stikonas has joined #panfrost
<urjaman> sorry i'm thinking out loud how it would make sense to describe these ... maybe define the bits and then define an enum of the 4 logically sensible things you could set it to using them (allegro or frc, 565 or 666)
<daniels> urjaman: fwiw, the veyron panel freq was pretty heavily hand-tweaked to avoid interference with the tablet input
<urjaman> there's no tablet on the C201... or do you mean the touch panel?
<urjaman> and the C100 panel is a different thing
<daniels> ah, wrong codename then. carry on.
<urjaman> (that CrOS people had tuned the modeline for ... but for the C201 they had only changed the frequency to what was available from that PLL and ran it at 58Hz or something like that)
<urjaman> but linux picked the same frequency but still told the userspace the frequency from that panel info so was saying 60Hz while running at 58Hz ...
<urjaman> then i thought maybe i could make it 60Hz and got carried away :P
<urjaman> daniels: but yeah you're right the commit shouldnt say "veyrons" then
<urjaman> i think the c201/veyron speedy is the only one with n116bge ... but not sure right now
<urjaman> anyways i did look up the c100 panel stuff because when i found out about this frequency thing i wanted to see if they had had the same problem there ... was a yes :P
<urjaman> hmm device trees say veyron jaq,jerry,pinky and speedy have n116bge
stikonas has quit [Remote host closed the connection]
<urjaman> these code names are not very helpful in figuring out what these things actually are tbh :P
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
vakkov_ has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
<mmind00> urjaman: there is not necessary a 1:1 mapping between codenames and products ... some boards were actually used in multiple different devices :-)
<urjaman> yeah i knew jaq is such a thing ... but yeah more like "is this even a consumer product?" is usually the question (and what form factor)
vakkov_ has quit [Ping timeout: 240 seconds]
* mmind00 now reads the dithering documentation in the trm
<urjaman> i still do not actually know the algorithm for "allegro" dithering except that it is based on the pixel position and pixel value (duh) :P
<urjaman> it was just the one that needed less thinking so i turned it on and it looked good enough to me :P
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
rhyskidd has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
mifritscher has quit [Ping timeout: 252 seconds]
mifritscher has joined #panfrost
rhyskidd has quit [Quit: rhyskidd]
rhyskidd has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
<urjaman> mmind00, alyssa: https://github.com/urjaman/linux/commit/18827e7cb0f3316abc902f69bbbddd2a944eb741 would this be more or less readable than before? (not yet build tested but like ... should.)
Kwiboo has quit [Ping timeout: 268 seconds]
<HdkR> alyssa: Ideal shader sizes don't always happen of course :P