ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
<anarsoul> MoeIcenowy: I may have found how to support rect textures
<anarsoul> see lima_pack_reload_plbu_cmd()
<anarsoul> td->unknown_1_1 = 0x80;
<anarsoul> and then see reload_varying() - this is actual FB size, not [0-1]
<anarsoul> so my guess is that this bit enables rect textures
<anarsoul> or it's unknown_2_2 which is set only here
<MoeIcenowy> anarsoul: 1_1 is the lowest bits of td, right?
<MoeIcenowy> of the 1st word of td
yuq825 has joined #lima
<MoeIcenowy> anarsoul: add a hack for GNOME to allow 1x1 RECT texture first?
kaspter has quit [Ping timeout: 265 seconds]
camus has joined #lima
camus is now known as kaspter
<MoeIcenowy> anarsoul: did a simple test -- comment unknown_2_2 does not hurt, but comment unknown_1_1 breaks reloading (testing with weston w/o partial damage)
<MoeIcenowy> the whole background become the same color when unknown_1_1 is commented...
<MoeIcenowy> looks right?
<MoeIcenowy> anarsoul: is there any test program for proper rect texture/
<MoeIcenowy> anarsoul: BTW interestingly Allwinner seems to forget to strip symbols from libMali
jrmuizel has quit [Remote host closed the connection]
<MoeIcenowy> anarsoul: IT WORKS!!!!!!!!!!!!!!!!!!!!!!!
<MoeIcenowy> the 1_1 0x80 bit is really for rect texture
<yuq825> \o/
<anarsoul|c> Great
<MoeIcenowy> yuq825: does your gfx repo accept PR?
<yuq825> fine
<MoeIcenowy> I'm going to PR the rect texture test program
<yuq825> ok
<anarsoul|c> Are you going to make MR for mesa as well?
<MoeIcenowy> anarsoul: already sent
<MoeIcenowy> !2131
<anarsoul|c> I'll check it later tonight
<MoeIcenowy> anarsoul: btw what's the S and T in texture?
<MoeIcenowy> are they axis inside the texture?
<anarsoul> MoeIcenowy: I think you should check "sampler->base.normalized_coords" instead
<anarsoul> and set rect bit if its false
<anarsoul> that's what other drivers do
<MoeIcenowy> oh thanks for this tip
<MoeIcenowy> oops it seems to be not working?!
<MoeIcenowy> oops it should be false when rect
<anarsoul|c> normalized coords is true for regular textures
<anarsoul|c> False for rect
<MoeIcenowy> okay, now the next target is 3D texture...
<MoeIcenowy> but I need to learn what it is first
<anarsoul|c> That's gonna be tough one
<MoeIcenowy> learning OpenGL by hacking into the driver sounds... ridiculous
<anarsoul|c> Ha, that's the most fun way
<anarsoul|c> Btw, mipmapping is broken for linear textures
<anarsoul|c> Noticeable in q3a
<anarsoul|c> But it works fine for tiled
<anarsoul|c> I thought that it's wrong filtering settings, but now I think that we use wrong size for mipmap levels
<anarsoul|c> Since even with filtering disabled there's no such artifacts as with linear textures
<MoeIcenowy> BTW is rectangle texture a GL-only feature?
<anarsoul|c> I think there's extension for gles
<MoeIcenowy> looks like we implemented a feature that is not supported in blob
mardikene193 has joined #lima
<mardikene193> it much appears that sm4.0 first introduced register indirect addressing of temporaries for pixel shader.
<mardikene193> as 16 texture units and 4texture mapping units are for specialized 4pixel shader hw like r300, than the work is slightly more complex on such hw, adjusting offsets of the clamping unit, i would not know how to optimize vertex shaders since it does not use textures.
<anarsoul|c> Nope
<anarsoul|c> Qiang dumped reload from blob
dddddd has quit [Remote host closed the connection]
<anarsoul|c> So blob definitely does it, but probably they're not exposing it for some reason?
<yuq825> mipmap break? I remember I fixed it before and tested with glmark texture mipmap option
<anarsoul|c> Try q3a
<anarsoul|c> With opengl1 rendering
<anarsoul|c> I can show screenshot later tonight
<mardikene193> this might still be possible in vertex shader though afterall, since they allow constant indexing perhaps for uniforms also
<yuq825> I didn't tried q3a, but only glmark. If glmark does not work now, then must be a regression
<mardikene193> in other words, instead of texture fetches , uniform fetches can be done
<mardikene193> and instead of clamping the offsets, one can do indirections
<anarsoul|c> I'm not sure how to check it with glmark
<anarsoul|c> I.e. I'm not sure how it should look
<MoeIcenowy> anarsoul: blob doesn't expose EXT_texture_rectangle
<MoeIcenowy> oh BTW GLES2 really doesn't have it
megi has quit [Ping timeout: 245 seconds]
mardikene193 has quit [Quit: Leaving]
<anarsoul|c> I see
<anarsoul> yuq825: ^^
<anarsoul> it looks like one of levels is really off
<yuq825> glmark2-es2 -b texture:texture-filter=mipmap
<anarsoul> yuq825: so far q3a is the only way I know to reproduce the issue
<anarsoul> yuq825: maybe transfer is wrong?
<yuq825> no idea...
<anarsoul> see comment in setup_miptree()
<anarsoul> oh
<MoeIcenowy> anarsoul: is R the 3rd dimension of a 3D texture?
<anarsoul> MoeIcenowy: yes
<MoeIcenowy> how is it applied to a surface?
<anarsoul> it's 3d array, not 2d array now, so when you sample you have to supply 3 coordinates
<MoeIcenowy> how do I define the 3rd coordinate?
<MoeIcenowy> a surface itself is 2D
<anarsoul> texture is not a surface anymore
<anarsoul> it's volume :)
<MoeIcenowy> oh I just need to pass a vec3 in texture3D() ?
<anarsoul> yes
<anarsoul> MoeIcenowy: I have a feeling that one of the bits near texture_rect and texture_2d is responsible for enabling texture_3d
<anarsoul> but there's another issue
<anarsoul> we don't know format for sampler3D instruction
<MoeIcenowy> will it get interpolation on the z component of the texture3D function?
<MoeIcenowy> for example, when z=0 is full black and z=1 is full white, will z=0.5 make a gray?
<anarsoul> yuq825: setup_miptree() should probably use offset 0x400 and 0x800 for last 2 levels, not only for level 11 and level 12
<MoeIcenowy> anarsoul: any offline compiler doesn't support sampler3D?
<anarsoul> nope :(
<anarsoul> tried that
<anarsoul> it just crashes if you try to enable extension for 3D textures and use it in the shader
<MoeIcenowy> BTW I remember offline compiler is available for download for public, right?
<anarsoul> yes
<anarsoul> and you can disassemble mbs files with lima now, there's a standalone disassembler tool
<MoeIcenowy> oops there's a so far distance between 2d and cube
<anarsoul> note that cubemap is not 3D texture
<anarsoul> cubemap is just 6 faces of cube
<anarsoul> 3D texture is actually 3D texture
<MoeIcenowy> BTW do we support 1D texture well now?
<anarsoul> no :)
<anarsoul> we can emulate it I guess
<MoeIcenowy> I think we hardcoded 2D somewhere
<anarsoul> unless you figure out descriptor bit to mark 1D textures
<anarsoul> (and probably sampler instruction format however I'm not sure here)
<MoeIcenowy> just before where I set rect
<MoeIcenowy> anarsoul: however, if 2D sampler instruction is 0x00, is it possible that 1D/2D/3D share the same sampler instruction?
<anarsoul> try it?
<anarsoul> I have no idea :)
<anarsoul> I just accidentally noticed that with reload we didn't set this bit in texture descriptor anywhere
<anarsoul> and that varyings had non-normalized coordinates
<MoeIcenowy> BTW if it really supports 3D texture
<MoeIcenowy> the 3 bits after S/T clamping configuration might be R clamping
<anarsoul> alyssa suggested that unknown_3_1 can be depth
<anarsoul> it goes right after height
Barada has joined #lima
<anarsoul> MoeIcenowy: btw, can you check whether setting should_tile to true at the beginning of _lima_resource_create_with_modifiers() fixes ppmmu fault for you?
<anarsoul> MoeIcenowy: btw, you can also try clearing texture_2d bit and check what it does
adjtm has quit [Ping timeout: 265 seconds]
<MoeIcenowy> anarsoul: setting should_tile doesn't fix the pp mmu fault
<MoeIcenowy> anarsoul: shouldn't the depth be 13bit?
jailbox has quit [Ping timeout: 276 seconds]
<MoeIcenowy> anarsoul: by unsetting texture_2d, we get texture_1d
<MoeIcenowy> the y of the texture2D() input seems to be just ignored
adjtm has joined #lima
<MoeIcenowy> this may indicates we can get the 3rd dimension?
jailbox has joined #lima
hellsenberg has quit [Quit: CPU triple-faulted.]
hellsenberg has joined #lima
<MoeIcenowy> 3rd dimension seems to be ignored...
yuq825 has quit [Quit: Leaving.]
Da_Coynul has joined #lima
<MoeIcenowy> anarsoul: I think maybe Mali-400 doesn't support 3D texture...
<MoeIcenowy> even VC4 doesn't support it
Da_Coynul has quit [Client Quit]
<MoeIcenowy> BTW the offline compiler only crashes for Mali-400 on sampler3D, it performs well for Midgard/Bifrost
dddddd has joined #lima
<rellla> with cubemap and 2drect
<MoeIcenowy> rellla: where's the cubemap patchset?
<rellla> no MR yet
<rellla> i did a comment on your MR about the word1 bits ...
<rellla> it's run with that https://gitlab.freedesktop.org/rellla/piglit/commits/gles piglit setup btw, which contains a hack to increase tolerance
<Tofe> rellla: interesting, as the cubemap is currently needed by the QtWebEngine component
megi has joined #lima
jrmuizel has joined #lima
jrmuizel has quit [Ping timeout: 276 seconds]
adjtm has quit [Ping timeout: 240 seconds]
yuq825 has joined #lima
adjtm has joined #lima
jrmuizel has joined #lima
mardikene193 has joined #lima
<mardikene193> The major difference between solutions that have register indirect addressing and the one that does not is:
<mardikene193> when clamping to schedule register based duplets or single instructions based of the results of in-line loads
<mardikene193> you gotta make writebacks and readbacks manually, since scoreboard on SIMD and bypass networks on VLIW obviously no longer can do that
<mardikene193> this complicates quite a lot of stuff, but still can be done.
Barada has quit [Remote host closed the connection]
Barada has joined #lima
<mardikene193> So actually that such code has not been materialized for Mali neither freedreno and r300 yet is totally undertandable, i could not make it back times, noone else could, however after i have studied everything we can still attempt it!
<mardikene193> this is pretty difficult code, both to explain and make, and hence if you can not explain it to anyone therefor it is also very difficult to maintain.
niceplace has quit [Quit: ZNC 1.7.3 - https://znc.in]
yuq825 has quit [Ping timeout: 240 seconds]
niceplace has joined #lima
yuq825 has joined #lima
deesix has quit [Ping timeout: 240 seconds]
dddddd has quit [Ping timeout: 245 seconds]
yuq825 has quit [Client Quit]
dddddd has joined #lima
deesix has joined #lima
Elpaulo has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
yuq825 has joined #lima
jrmuizel has quit [Remote host closed the connection]
<MoeIcenowy> anarsoul: for the PP MMU fault, I commented out the code in the shader that calculates ratio and multiply to gl_FragColor, then it works
Barada has quit [Quit: Barada]
yuq825 has quit [Ping timeout: 246 seconds]
<mardikene193> The calculation for mali 400mp goes either like this, you have 128/4/4 = 8 bundles, or 128/4=32 bundles depending whether they include the vector length as separate stage
<mardikene193> now bundles needs to be multiplied with 2 to get threadgroup count
<mardikene193> since vliw works as with dual scheduler
<mardikene193> so the arbiter is probably according to the first calculation 64*64 would be too big
yuq825 has joined #lima
<mardikene193> 16*16=128 queue entries according to calculations, which makes it equal to 2pixel shaders r300 on only single core, wau
<mardikene193> something must be wrong, it can not be so powerful
<mardikene193> yeah well 16*16=128/2 actually is 64*4 is 256
<mardikene193> two core version should accommodate 512 entries and should be compatible with sm3.0 spec
<mardikene193> that means mali200 and mali300 utgard gpus can not be sm3.0
<mardikene193> minor peek to forgatten info ..need to look at multi2sim how is the arbitration done, was it 8x8 in patches of four instructions fundementally or 16*16 within queues
<MoeIcenowy> anarsoul: found the problem of the pp mmu fault
<MoeIcenowy> the texture list needs to be zeroed
jrmuizel has joined #lima
<MoeIcenowy> otherwise old data inside it might be recognized as VA
<anarsoul> MoeIcenowy: I see, send an MR?
<MoeIcenowy> oh I'm not sure now
<anarsoul> I'm not sure whether Mali4x0 supports 3D textures, I guess we'll never know
<MoeIcenowy> oh I think it's not the reason now...
<anarsoul> at least we have rect and 1d :)
<MoeIcenowy> anarsoul: fake 3D textures like vc4?
<cwabbott> I heard through the grapevine that it does support them, it was just never turned on in the driver
<cwabbott> who knows what the magic combination of descriptor bits + layout + shader bits is though
<MoeIcenowy> oh the old data in texture list seems to be not the reason
<anarsoul> cwabbott: that's unfortunate :(
yuq825 has quit [Quit: Leaving.]
<enunes> MoeIcenowy: yeah I mentioned this garbage data in the texture descriptor a couple days ago, it doesn't seem to cause any bug though
<anarsoul> MoeIcenowy: please do s/abnorm_coords/unnorm_coords and remove braces for a single line block
<MoeIcenowy> anarsoul: BTW, is v[0] equal to v.x when v is a vec2?
<anarsoul> yes
<MoeIcenowy> render->uniforms_address |= ((ctx->buffer_state[lima_ctx_buff_pp_uniform].size) / 4 - 1);
<MoeIcenowy> what's the use of this?
<anarsoul> sets size of uniforms?
<anarsoul> -1 does look suspicious though
<MoeIcenowy> however... isn't it offseting the uniforms_address?
<anarsoul> no
<anarsoul> lower bits of this register is uniform address
<anarsoul> likely lower 6 bits
<anarsoul> everything seems to be aligned to 64 bytes on Mali4x0
<MoeIcenowy> is uniform count?
<anarsoul> ?
<MoeIcenowy> do you mean that the lowest 6 bits are reused for uniform count?
<anarsoul> yes
<anarsoul> didn't you notice that they use every single bit pretty much everywhere? :)
<MoeIcenowy> BTW the code seems to be directly sourced from limare
<MoeIcenowy> render->uniforms_address |=(ALIGN(plbu->uniform_size, 4) / 4) - 1;
<MoeIcenowy> this is what in limare
<anarsoul> most of command stream generation was taken there
<MoeIcenowy> anarsoul: the whole pp uniform code looks strange
<anarsoul> why?
<MoeIcenowy> why is there a one-item array?
<MoeIcenowy> the array dumped with "add pp uniform info at va XXXXXXXX" is always only one item, containing the pointer to the real uniform array
<anarsoul> right
<anarsoul> ask ARM
<anarsoul> :)
<anarsoul> I'm not sure why they need double indirection for uniforms
<MoeIcenowy> but... as we have no size info for real uniform array length
<MoeIcenowy> maybe the lowest 6 bit of uniform address should be the length of the uniform address array (which is 1) ?
<anarsoul> MoeIcenowy: try setting it to 1?
<MoeIcenowy> ah, 0, because it's subtracted with 1
<MoeIcenowy> looks like the pp mmu fault is solved, however I don't know whether it affects the render result...
<mardikene193> yeah i remember now, it was sort of asymmetrical arbiter, 80x32 or 64x32 depending on width of bundle
<anarsoul> OK, so we know that something's wrong with uniform setup and ppmmu fault was coming from uniform read
<anarsoul> try making a test with a lot of uniforms, run it on blob and capture what it does?
<MoeIcenowy> I cannot capture the behavior of the blob now
mardikene193 has quit [Quit: Leaving]
<MoeIcenowy> and compare with mesa master to check regressions?
<MoeIcenowy> thanks
megi has quit [Ping timeout: 265 seconds]
deesix has quit [Ping timeout: 245 seconds]
dddddd has quit [Ping timeout: 245 seconds]
deesix has joined #lima
dddddd has joined #lima
<MoeIcenowy> anarsoul: could I add your R-b after changing these?
dddddd has quit [Ping timeout: 276 seconds]
deesix has quit [Ping timeout: 265 seconds]
<anarsoul> yes
deesix has joined #lima
megi has joined #lima
dddddd has joined #lima
adjtm has quit [Ping timeout: 240 seconds]
<MoeIcenowy> strange... some change in mesa makes the title bar of GNOME control center colorful
<MoeIcenowy> looks like some glamor-related issue again...
<anarsoul> :(
<anarsoul> MoeIcenowy: what about wayland session?
<anarsoul> does it work?
<anarsoul> it should have better performance for sure than X11
<MoeIcenowy> yes, wayland works here
<anarsoul> OK, so we have issue with PP uniforms
<MoeIcenowy> BTW faking 3D texture is still necessary for GNOME Shell
<anarsoul> fortunately it's easy enough to dump
<MoeIcenowy> although they seem to never really use it in shaders
<MoeIcenowy> and according to apitrace, it's still some 1x1 strange placeholder
<MoeIcenowy> what the hell is GNOME doing...
<anarsoul> MoeIcenowy: if you have a test for sampler3D, try clearing texture_2d in descriptor and setting 1 bit at time in unknown_1_2 and unknown_1_3
<MoeIcenowy> I tried it
<anarsoul> also let's assume that unknown_3_1 and part of unknown_3_2 is depth
<anarsoul> doesn't work?
<MoeIcenowy> yes, I assumed 13 bits
<MoeIcenowy> doesn't work.
<anarsoul> :(
<MoeIcenowy> ooooops
<anarsoul> it was such a nice hypothesis
<MoeIcenowy> I may forgot to set depth
<MoeIcenowy> however I killed all the test code by a `git reset --hard`
<anarsoul> I always keep experiments like these in separate branches
<anarsoul> branches are cheap in git, so why not
<anarsoul> MoeIcenowy: btw if you commited the code it's still there
<MoeIcenowy> I didn't commit it
<anarsoul> you'll just need to do some git archeology to extract it
<MoeIcenowy> I think reflog is enough
dddddd has quit [Ping timeout: 240 seconds]
deesix has quit [Ping timeout: 265 seconds]
<MoeIcenowy> anarsoul: interesting
<MoeIcenowy> after setting depth
<MoeIcenowy> thing changed
adjtm has joined #lima
<anarsoul> so sampler3D works?
deesix has joined #lima
<anarsoul> I guess you can use texture with depth 2, and just use one image for 1st layer and another for 2nd
<MoeIcenowy> I used a 1x1 depth 3 texture
<MoeIcenowy> layer 1 is red, 2 is green and 3 is blue
<anarsoul> that'd also work
<MoeIcenowy> however the result is still not the same with the one on my PC
<MoeIcenowy> maybe texture descriptor needs tweak
<anarsoul> try setting different bits for marking it as 3d
<anarsoul> and try it with texture_2d set and cleared
<MoeIcenowy> anarsoul: can texture3D be mipmapped?
dddddd has joined #lima
<MoeIcenowy> anarsoul: now the problem is that we totally don't know how should the remaining layers are placed
<MoeIcenowy> at least now the r coordinate starts to be honored
gaulishcoin has joined #lima
mardikene193 has joined #lima
<mardikene193> OK so let's reach to the point, what is needed for you is to chill out a bit, and start thinking, this will be something that you would not regret.
<mardikene193> you know technically there are not many options to call 32warps on 16bundles which include 4-5 words...
<mardikene193> at least as the final result should be a constant output which is similar to any of the methods
<mardikene193> but instead of placing a single instruction into the queue, it reorders and into the single instruction spot it adds 4-5 and fetches also 4-5
<mardikene193> from queues
<anarsoul> MoeIcenowy: sure it can :)
<anarsoul> for remaining layers you'll have to experiment
<anarsoul> and yeah, we need to implement explicit LOD to debug mipmapping
<anarsoul> MoeIcenowy: what bit enabled 3d textures?
<anarsoul> please commit it and push it somewhere so you work isn't lost
<anarsoul> MoeIcenowy: btw you also need to consider tiling :)
<anarsoul> however I'd expect it just to work if you align it to 16x16 (and probably one more x16?) boundaries
<mardikene193> hmm, well sorta maybe yeah. Actually, tiling is calculated based of the cacheline size
<mardikene193> it also requires the memory to be physically contiguous
<anarsoul> MoeIcenowy: so after all it's the same instruction for sampler for 1d, 2d and 3d cases?
<mardikene193> For some reason you have chosen the most complex subject to work on, it requires the most changes too
<mardikene193> I thought i wanted to deal more like with other things, but this is inherently big allocators property relying issue
<mardikene193> this is very complex because, the more complex the allocator goes, the more delay it adds on CPU
armessia has joined #lima
<mardikene193> hence on the first subject i talked about, the queues are the easiest to be implemented as 32*32 arbiter, which quits earlier when program ends that is
<mardikene193> it quits then when program ends, rather logically more correct way to put it
drod has joined #lima
niceplace has quit [Quit: ZNC 1.7.3 - https://znc.in]
niceplace has joined #lima
<mardikene193> technically it starts to (And this is my opinion) restart the instruction queue sooner, but ends when the program ends, i.e it starts to replace queue entries sooner, and very technically this is acheived as:
<mardikene193> in verilog as in the port connection was bigger size of the vector
<mardikene193> than that of it's nested instance
<mardikene193> so it should be ulimately quit in 24th instance and start to wrap around
<mardikene193> i do not think i myself understand that kind of arbiter :D:D
drod has quit [Read error: Connection reset by peer]
drod has joined #lima
<rellla> armessia: imho it'd be worth to make a MR with the cubemaps branch... at least a WIP one...
<armessia> rellla: I'm still working on it, the different faces aren't properly aligned yet in all cases
<armessia> rellla: but making a WIP MR already won't hurt indeed, in this way some more feedback can come in early
<armessia> rellla: will create a WIP MR soon
<rellla> now that rect textures have landed and 3d is probably worked on ;)
<rellla> fine
<mardikene193> Latest Midgard GPU is T880. ▫ Maximum of 16 shader cores. ▫ Tile size 16x16 (4x4-32x32 internally).
<mardikene193> I absolutely do not understand what i am looking at.
<mardikene193> does that mean it is configurable from 4x4-32x32
<mardikene193> are those threads?
<armessia> rellla: things are moving fast on the texturing side lately :-)
<mardikene193> i remember reading some talonmies and stackoverflow articles, Mark Harris's, and also this Robert Crovella's from nvidia.
<mardikene193> They claimed something that on such gpus like nvidia ones, when CU registers get used, it should raise an interrupt
<mardikene193> more likely it is that, different contexts can be put to different compute units
<anarsoul> armessia: just some occasional discoveries :)
<anarsoul> blob exposes rect textures only for reload
<anarsoul> and it doesn't expose 3D textures at all, so I'm not convinced that we can make it work yet. MoeIcenowy is working on it though
<anarsoul> rellla: enunes: can you review https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2080 ?
<anarsoul> that's fairly simple one, it just replaces "vec4 ssa1 = load_input(varying); vec2 ssa2 = ssa1.xy; vec2 ssa3 = ssa1.zw" with "vec2 = load_input(varying.xy); vec2 = load_input(varying.zw)"
drod has quit [Ping timeout: 265 seconds]
drod has joined #lima
<armessia> anarsoul: some nice achievements for sure!
<armessia> we already have one feature more than the blob with rect textures, can as well have two :-)
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
gaulishcoin has quit [Ping timeout: 240 seconds]
drod has quit [Remote host closed the connection]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
<mardikene193> yeah sure, well wrapping around the 32x32 arbiter properly is pretty easy, totally unsure when and how it starts to use multiple COREs