alyssa changed the topic of #panfrost to: Panfrost - FLOSS Mali Midgard & Bifrost - https://gitlab.freedesktop.org/panfrost - Logs https://freenode.irclog.whitequark.org/panfrost - <daniels> avoiding X is a huge feature
_whitelogger has joined #panfrost
<hanetzer> robher: I'm half convinced that my initial direction of splitting up the mali
<hanetzer> *driver into 'ip' blocks like lima was a bad direction, since each panfrost-capable mali gpu is much more similar to each other comapred to the lima-capable mali gpus
<hanetzer> robher: and that's a fairly 'stock' mali_kbase, just with DEBUG defined ?
<robher> hanetzer: and there's only one of any subblock.
<hanetzer> robher: yeah. and the three (gpu, mmu, and job scheduler) are always present, no?
<robher> hanetzer: yes, just debug in the register accessor file.
<robher> Right.
<hanetzer> robher: do you have the means to test a patch for me?
<hanetzer> 's not done cooking yet tho
<robher> hanetzer: tomorrow I can.
<hanetzer> kk. I should have something readyish by then. you're a 'bigwig' with kernel dev right?
<alyssa> hanetzer: Look at his email.. ;P
<hanetzer> yeah, I know :P
<hanetzer> with linaro iirc
<robher> patch monkey
<alyssa> Ooo ooo aaa aaa!
<alyssa> robher: Any chance we can get a VIP pass for code review when we upstream? You know, just flashing your email? ;p
<alyssa> ("I've reviewed 400 of your device tree patches, least you can do is return the favour, wink wink")
<robher> Upstream won't be a problem.
<alyssa> hanetzer: robh _IS_ a ~~big-shot~~ patch monkey!
<alyssa> ;P
<hanetzer> hehe
* HdkR slides patches under the door and leaves coffee out front
* hanetzer passes robher a big tankard of coffee
<hanetzer> anywho, so I'll be frank, I'm cribbing some new stuff I learned from the dwc3 driver while researching usb gadgets for my kevin chromebook, but this may be very useful in debugging panfrost in-vitro
<hanetzer> or is it en-vivo?
<robher> hanetzer: either way, debugging closed up systems always a problem.
<hanetzer> robher: closed as in physically or as in software?
<robher> Usually those aren't far apart... But physically closed and no serial port.
<hanetzer> ah. yeah. I'm fortunate in that I was able to acquire a google servo debug board :D
<hanetzer> https://ptpb.pw/2Oc0 << someone give this patch a go? its not done but should cause /sys/kernel/debug/tracing/events/panfrost to exist
<hanetzer> if so, 'echo 1 > /sys/kernel/debug/tracing/events/panfrost/enable' and 'cat /sys/kernel/debug/tracing/trace while doing panfrostian stuff with the module inserted should give register reads and writes for the gpu block
<hanetzer> oh, and debugfs must be mounted at that point of course
<hanetzer> or apparently tracefs :P
<hanetzer> seems to work. need to enable early boot tracing :P
rhyskidd has joined #panfrost
<HdkR> alyssa: Looks like on bifrost glCopy* for SSBOs end up being a CPU side memcpy
<hanetzer> https://ptpb.pw/zqht YUS!
* HdkR can't tell what is happening here
<hanetzer> that's all the iorw done during the insertion of panfrost.ko during bootup
<hanetzer> well, all the iowr in the gpu region, at least.
* hanetzer investigates
<hanetzer> yep, if you look at line 34, that's it reading out teh gpu id register. id 0x860 major 0x2 minor 0x0 status 0x0
<hanetzer> https://ptpb.pw/JI3A/sh there we go. better formatted, since we know the gpu registers are at offset 0x0 from the base of the iomem region
<hanetzer> robher: alyssa tomeu ^
<HdkR> lol, dang it bifrost compiler, stop doing cross stage optimizations. Trying to see how you work
<alyssa> HdkR: Heh
rhyskidd has quit [Remote host closed the connection]
<HdkR> Interesting. It looks like bifrost can fill GPRs in fragment from the vertex side. Sort of direct varying passoff
<alyssa> HdkR: Whaaa??
rhyskidd has joined #panfrost
<HdkR> Trying to see how it is doing this now...
<HdkR> `ADD.f32 {R7, T0}, R3, U5` <-- this is varying * uniform, R3 hadn't been touched prior in the shader(It's actually the first instruction)
<HdkR> er, varying + uniform
<alyssa> Neat
<alyssa> Keep in mind I'm Bifrost-iliterate
<hanetzer> hmm... I'm gonna have to come up with better names tho. apparently I have to pass `trace_events=gpu_read,gpu_write' to the kernel cmdline because generic names :P
<alyssa> mali_trace_events=mali_gpu_read,mali_gpu_write
<alyssa> :P
<HdkR> alyssa: Yea, it's just a bit quirky since theoretically there should be an LD_VAR to load it in
<alyssa> HdkR: ...I should probably read Bifrost docs, huh
<alyssa> What's T0?
<hanetzer> yep. but its good to have similar stuff in the real driver for comparison's sake. plus we're using official in-kernel stuff and not vendorware :P
<hanetzer> in any case, gpu_{read,write} are too generic for 'real' usage in-kernel
<HdkR> alyssa: Means that instruction writes to both R7 and T0. T0 being a temp that can be passed off to the next instruction
<alyssa> HdkR: Interesting
<HdkR> Each of the FMA and ADD subops can write to an independent temp location (T0 and T1) which can be directly passed to the next instruction
<HdkR> Improves power efficiency since you don't need to touch the RF if you don't need to
<alyssa> HdkR: Midgard has that too, but you can't write to _both_ a register and a temp
<alyssa> So, I've been rewriting the FBO related code (based on v3d's job implementation)
<HdkR> Ah yea. I think it just always writes to the temp and if it has a register backing it then it'll store to that RF as well during the write stage
<alyssa> Once done this should fix a _LOT_ of issues
<HdkR> woo
<alyssa> But first, sleep!
<HdkR> Oh hey, I found a bug in the bifrost compiler. fun
<HdkR> in the blob*
<HdkR> Happens when there is an interpolation qualifier mismatch between stages
<HdkR> (Unless the language happens to be different between GLSL and ESSL....)
<HdkR> Huh. This silly thing must live outside of shader code
<HdkR> Because vertex shader just does the expected LD_VAR + LD_VAR_ADDRESS + ST_VAR I am expecting
<HdkR> and fragment side just...uses the GPRs for the data
<HdkR> There has to be some flag somewhere that preloads fragment registers with perspective correct varying data
<HdkR> cwabbott: Got any information on this neat feature? :D
_whitelogger has joined #panfrost
<HdkR> Looks like a specialization where the first vec8 is dumped in to R0-R7
rhyskidd has quit [Quit: rhyskidd]
<HdkR> Wonder if this is a hardcoded optimization on dvalin
<HdkR> `LD_VAR.32.reuse.per_frag.v4 T1, 1, 0.x, R12` Looks like this can load from a varying location without address calculation before it
<HdkR> While vertex side still needs to do the LD_VAR_ADDR + ST_VAR side
<HdkR> very very interesting....
indy has quit [Quit: ZNC - http://znc.sourceforge.net]
indy has joined #panfrost
<tomeu> alyssa: alyssa/fbo-rewrite is looking great, btw
BenG83 has joined #panfrost
raster has joined #panfrost
cwabbott has quit [Remote host closed the connection]
_whitelogger has joined #panfrost
sphalerite has quit [Ping timeout: 252 seconds]
sphalerite has joined #panfrost
sphalerite has quit [Ping timeout: 264 seconds]
sphalerite has joined #panfrost
raster has quit [Ping timeout: 246 seconds]
raster has joined #panfrost
<alyssa> tomeu: So far it's just code lifted from v3d so
pH5 has joined #panfrost
raster has quit [Remote host closed the connection]
ezequielg has quit [Quit: leaving]
cwabbott has joined #panfrost
pH5 has quit [Quit: bye]
<cwabbott> hdkr: first off, if you haven't already found it, there's a paragraph in the ISA docs that explains what the data register is
<cwabbott> I'd really recommend reading that page top to bottom before reading any disassembly
<cwabbott> and for the second thing, yeah, I've seen a patent somewhere that mentions some varying pre-loading feature, although the driver on Lyude's device never did it so it's not documented
<cwabbott> also, I don't know how far you've gotten, but on the train ride I started to write a header file for how I would structure the IR for a Bifrost backend
<cwabbott> I think for the varying preloading thing, that there should be something in the shader_meta which has the same information as an LD_VAR instruction
<cwabbott> you'll have to look at the actual trace to see how it works
stikonas has joined #panfrost
pH5 has joined #panfrost
<Lyude> hoping I can work on bifrost ra during tommorrow and the flight to Sweden :3
<hanetzer> :P
<cwabbott> Lyude: my plan was to do an SSA-based allocator
<cwabbott> mainly because I wanted to experiment with it, and Bifrost is regular enough that it's not that hard to do it
<Lyude> cwabbott: what is that? You will have to excuse me since this is my first time doing this kinda stuff
<cwabbott> Lyude: well, if it's your first time doing RA, you'll want to read up
<Lyude> any resources you would suggest btw?
<Lyude> (also-whatever design you want I'm fine with trying :)
<cwabbott> Lyude: first off, you'll have to get an understanding of liveness
<cwabbott> wikipedia gives the classic definition as a dataflow problem: https://en.wikipedia.org/wiki/Live_variable_analysis although that's not how "real" compilers represent it
<cwabbott> then, I'd read the classic original paper on graph-based register allocation that started it all: https://cs.gmu.edu/~white/CS640/p98-chaitin.pdf
<cwabbott> next up, the linear scan paper: http://web.cs.ucla.edu/~palsberg/course/cs132/linearscan.pdf
<cwabbott> and finally, one of the papers that introduced SSA-based register allocation: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.204.2844&rep=rep1&type=pdf
<cwabbott> it's easy to fall into a trap of doing something inefficiently
<cwabbott> for example, i965 approximates live ranges by getting rid of holes, but then uses a graph-based allocator where a linear scan allocator would give the same/better results in linear time
<cwabbott> I think alyssa may have copied that inefficiency
<Lyude> cwabbott: awesome!!! This should be perfect
<Lyude> Erm
<Lyude> The resources I mean :P
<cwabbott> so, if you don't know what you're doing and just copy and paste some stuff from another driver, you might be sorry later
<cwabbott> (/register allocation rant over)
Prf_Jakob has quit [Remote host closed the connection]
hanetzer has quit [Quit: ZNC 1.7.1 - https://znc.in]
hanetzer has joined #panfrost
raster has joined #panfrost
ezequielg has joined #panfrost
raster has quit [Remote host closed the connection]
rhyskidd has joined #panfrost
raster has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
<alyssa> cwabbott: So, the RA I do for Midgard is based on what vc4/v3d's RA does
<alyssa> Though it's totally possible that in turn copied the inefficiency from i965
<alyssa> "if you don't know what you're doing and just copy and paste some stuff from another driver, you might be sorry later"
* alyssa apologises profusely
rhyskidd has quit [Quit: rhyskidd]
BenG83 has quit [Quit: Leaving]
<hanetzer> hehe
<hanetzer> robher: working on adding tracepoints to the panfrost kmod.
raster has quit [Remote host closed the connection]
<robher> hanetzer: great! In case you don't know, the gpu-scheduler has some already and you can place tracepoints on any function dynamically. And there's the whole issue of tracepoints being an ABI. So we probably want to be selective in what we add.
<hanetzer> robher: currently just the readl/writel stuff
<hanetzer> mostly so it can be seen if panfrost.ko is acting like mali_kbase.ko in the right spots
<robher> hanetzer: I thought there was a generic way to do that now.
<hanetzer> honestly dunno if its fit for inclusion in the final bits
<robher> hanetzer: that may be by faulting on accesses though, not tracepoints.
<hanetzer> you mean like x86's mmiotrace?
<robher> hanetzer: I was just going to say I'm happy with adding anything at this point.
<hanetzer> heh.
<hanetzer> progress right?
<robher> hanetzer: yeah, that's probably what I'm thinking of.
<hanetzer> I need to cook up some less generic names for the tracepoints. prolly sommat like panfrost_{gpu,mmu,job}_{read,write}
<hanetzer> or mayhaps just pan_