#lima on 2019-09-28 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

01:03 drod has quit [Remote host closed the connection]

01:18 adjtm has joined #lima

02:05 <anarsoul> MoeIcenowy: your change actually breaks q3a intro screen for me

02:06 <anarsoul> setting all lower bits to 1 fixes it though

02:06 <anarsoul> so I wonder if we should set it to 0xf?

02:40 jrmuizel has joined #lima

02:43 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

03:33 jrmuizel has quit [Remote host closed the connection]

04:12 dddddd has quit [Remote host closed the connection]

04:23 _whitelogger has joined #lima

04:29 _whitelogger has joined #lima

04:48 <MoeIcenowy> anarsoul: what? how is the intro screen broken?

04:48 <anarsoul> MoeIcenowy: it's plain black

04:49 <MoeIcenowy> did you tested other values than 0x0 and 0xf?

04:50 <MoeIcenowy> and did you tested 0x0 for multiple times?

04:50 <anarsoul> yes, 0x1 doesn't work

04:50 <anarsoul> 0xf works

04:50 <anarsoul> so it somehow depends on number of uniforms?

04:52 <MoeIcenowy> I still don

04:52 <MoeIcenowy> don't think so

04:53 <MoeIcenowy> it's quite strange that an indirect reference should keep the info about the real storage's length

04:58 megi has quit [Ping timeout: 240 seconds]

05:15 <anarsoul> MoeIcenowy: I have an idea, let me test it

06:02 <anarsoul> I have strong suspicion that it depends on uniform buffer size

06:02 <anarsoul> 0xf always works

06:03 <anarsoul> but we can specify only 16 values with 4 bits

06:04 <anarsoul> but we can have up to 65535 uniforms

06:06 <anarsoul> I guess it's first set bit?

06:58 <anarsoul> nah, it breaks ~10 tests

07:44 <MoeIcenowy> anarsoul: 2^16 = 65536

07:44 <MoeIcenowy> this might be a tip?

07:46 <MoeIcenowy> so... log2?

07:48 <MoeIcenowy> anarsoul: do you have the log of broken tests?

08:06 <MoeIcenowy> anarsoul: try `render->uniforms_address |= util_last_bit((ctx->buffer_state[lima_ctx_buff_pp_uniform].size) / 4 - 1);` ?

08:14 <MoeIcenowy> anarsoul: or if this value can be higher but not lower, maybe just set it to 0xf ?

08:14 <MoeIcenowy> (a violent choice

08:41 _whitelogger has joined #lima

09:28 mardikene193 has joined #lima

09:29 <mardikene193> maybe i am a bit sick, but being sick has some advantages than :D

09:30 <mardikene193> That miaow code can not be possibly something unreadable if you'd get your stuff together.

09:40 dddddd has joined #lima

09:51 drod has joined #lima

09:53 <mardikene193> assign next_valid_entry = (valid_entry_out | (decoded_init_instr)) & ~(decoded_issued | decoded_branch_taken);

09:54 <mardikene193> this is only a little bit of tricky, when you drive 40 1s to the decode_wfid and have nothing issued valid is set, which is the case when no fetch is done it drives X which evaluates to 1111111....

09:57 <mardikene193> however when you drive 1 to/as decode_wfid and no issue, it evaluates 10000000000000....

10:00 <mardikene193> as in the case of driving X valid entry does not go down , only the issued ones go down, so vacant will be for instance 0010000000 like i told, in in queue rendering

10:00 <mardikene193> however

10:03 <mardikene193> when you drive 1000000000 and things used to be 111111111 all except one (the first) go down and it evalueds 011111111

10:05 <mardikene193> it is the case in full pipeline mode or full length of the pipeline mode in miaow if i am not mistaken, but who cares right?

10:06 <mardikene193> in one case like in queue rendering the vacant is predominantly zeros, and it adds +1 every time

10:06 <mardikene193> in another case at full length of the pipeline

10:07 <mardikene193> they are predominantly ones

10:17 megi has joined #lima

10:17 <mardikene193> when you knock in 011111111 | 0 & 1111111111 the next in the vacant line for the full pipeline mode, the result should be that it always changes column, next time it will be reverse

10:18 <mardikene193> 1 | 01111111 etc. & 01111 so it always should change as round robin should do also

10:18 <mardikene193> column i meant

10:22 <mardikene193> so when it adds +1 to in queue rendering obviously when not issued the column remains unchenged and it will stay on the line

10:22 <mardikene193> even if i am wrong it does not matter, just read code please properly

10:34 drod has quit [Remote host closed the connection]

11:36 Da_Coynul has joined #lima

11:40 Da_Coynul has quit [Client Quit]

11:42 jbrown has quit [Ping timeout: 276 seconds]

11:53 Da_Coynul has joined #lima

11:54 jbrown has joined #lima

11:57 Da_Coynul has quit [Client Quit]

12:00 niceplaces has joined #lima

12:00 niceplace has quit [Ping timeout: 265 seconds]

12:01 Da_Coynul has joined #lima

12:04 jbrown has quit [Ping timeout: 276 seconds]

12:19 jbrown has joined #lima

12:44 abelvesa has quit [Ping timeout: 268 seconds]

12:50 abelvesa has joined #lima

13:01 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

13:03 Da_Coynul has joined #lima

13:24 jrmuizel has joined #lima

14:19 jrmuizel has quit [Remote host closed the connection]

14:34 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

14:47 jrmuizel has joined #lima

14:52 jrmuizel has quit [Ping timeout: 240 seconds]

14:54 jrmuizel has joined #lima

15:23 <anarsoul> MoeIcenowy: that's what I did (using util_last_bit())

15:23 <MoeIcenowy> anarsoul: doesn't work?

15:23 <MoeIcenowy> did you run any test about just set it to 0xf?

15:23 <anarsoul> seems to work, but I want to check what blob does

15:23 <anarsoul> MoeIcenowy: 0xf also works

15:24 <MoeIcenowy> util_last_bit also seems to work?

15:24 <anarsoul> yes, but I'm not convinced that it's correct solution

16:03 <MoeIcenowy> anarsoul: do you remember the texture issue by roman on Android?

16:03 <MoeIcenowy> I think I met it on glamor

16:03 <anarsoul> yes

16:03 <anarsoul> I don't know how to fix it though

16:03 <MoeIcenowy> currently Gtk+ programs give me a strange color when some UI element is changed

16:04 <MoeIcenowy> and with Xephyr I succeeded in recording a apitrace for the strange color

16:05 <MoeIcenowy> It's green to cyan gradient

16:07 <anarsoul> MoeIcenowy: looks like uniform size has log2 dependency, it's 0 for 1x vec4, 1 for 2x vec4, 2 for 3-4x vec4, 3 for 5-8x vec4, 4 for 9x vec4, etc

16:07 <anarsoul> that's values that blob uses

16:07 <MoeIcenowy> oh interesting

16:07 <MoeIcenowy> so could you make a MR with the log2 code?

16:07 <anarsoul> sure

16:08 <anarsoul> let me put it together and run it through piglit

16:08 <anarsoul> MoeIcenowy: also I think that we can do more that 16 samplers. Sampler index in instruction is 12 bits

16:09 <MoeIcenowy> ah 16 is enough now ;-)

16:09 <anarsoul> but again, we have only 4 bits to specify size of texture descriptors array

16:10 <anarsoul> see where I'm going? :)

16:10 <MoeIcenowy> I don't know

16:11 <anarsoul> MoeIcenowy: likely it's again log2, just like uniforms

16:11 <MoeIcenowy> oops

16:11 <MoeIcenowy> I thought about stolen size bits from another register...

16:12 <MoeIcenowy> anarsoul: will == between two integers work reliably in PP?

16:12 <anarsoul> MoeIcenowy: PP is pretty sane, so I don't expect any surprises here

16:12 <MoeIcenowy> the numbers are quite small, in range of 0~10

16:12 <MoeIcenowy> anarsoul: but it's FP16 only

16:12 <anarsoul> MoeIcenowy: PP doesn't have integers, only floats

16:12 <anarsoul> yeah, it should work

16:12 <MoeIcenowy> because for integers no error should be present?

16:13 <anarsoul> MoeIcenowy: we have 10 bits of precision

16:13 <anarsoul> so it should be accurate enough for small integers

16:14 <MoeIcenowy> the operations that produces the strange gradient look quite normal...

16:15 <MoeIcenowy> and the shader looks simple

16:17 Da_Coynul has joined #lima

16:19 <anarsoul> MoeIcenowy: likely it uses wrong texture address

16:19 <MoeIcenowy> BTW looks like it's doing a strance operation

16:19 <anarsoul> check what's up with texture descriptor when you replaying trace

16:20 <MoeIcenowy> rendering to a FBO and use the FBO as a texture to be feed in

16:21 <anarsoul> oh

16:21 <anarsoul> it should work though

16:22 <anarsoul> we call lima_submit_add_bo() in lima_texture_desc_set_res()

16:22 <anarsoul> so it adds dependency on it

16:22 <MoeIcenowy> the shader have a lot of useless codes though

16:23 <MoeIcenowy> I mean the fragment shader

16:23 <MoeIcenowy> https://pastebin.aosc.io/paste/kzHLH16uXX7YeVFZTngsjg here's the fs

16:23 <MoeIcenowy> source_repeat_mode is 0 in this run

16:24 <anarsoul> MoeIcenowy: try https://gist.github.com/anarsoul/980636fb5f6ba8bcbb30baad7de7cb2a

16:25 <MoeIcenowy> tiled_texture again, right?

16:25 <MoeIcenowy> oh no

16:25 <anarsoul> btw yeah, you can try with tiled textures

16:26 <MoeIcenowy> looks like no mipmap is used on this texture -- mipmap on a UI element is meaningless

16:26 <anarsoul> try this extra lima_bo_wait()

16:27 <MoeIcenowy> anarsoul: no help

16:27 <anarsoul> then I'm out of ideas

16:27 <anarsoul> :)

16:27 <MoeIcenowy> will we specify the same memory area out?

16:28 <MoeIcenowy> or will we reload it and use another memory area?

16:28 <anarsoul> what do you mean?

16:28 <MoeIcenowy> when we use the target FBO as one of the input texture

16:28 jernej_ has joined #lima

16:28 jernej has quit [Read error: Connection reset by peer]

16:28 jernej_ is now known as jernej

16:29 <MoeIcenowy> anarsoul: BTW how can I locate a single draw in lima.dump?

16:29 <anarsoul> good question

16:29 <anarsoul> I don't know :)

16:29 <anarsoul> by analyzing it?

16:29 <anarsoul> or add some extra traces

16:35 jrmuizel has quit [Remote host closed the connection]

16:38 <MoeIcenowy> anarsoul: is there two dummy GP uniforms?

16:38 <anarsoul> they're not dummy

16:38 <anarsoul> see https://gitlab.freedesktop.org/panfrost/mali-isa-docs/blob/master/Utgard-GP.md

16:39 <anarsoul> "output transformation"

16:39 <MoeIcenowy> ah I mean not specified by the user here

16:55 <MoeIcenowy> anarsoul: strange... looks like it's not the same buffer

16:55 <MoeIcenowy> the texture is 85x32, but the target is 87x34

16:55 <MoeIcenowy> But strangely in apitrace dump they're changed altogether

16:56 jrmuizel has joined #lima

16:57 <anarsoul> running piglit with my uniforms fix now

17:00 <MoeIcenowy> anarsoul: strange

17:00 <MoeIcenowy> the texture buffer read out changed when a glBindFramebufferEXT call is performed

17:00 <MoeIcenowy> ?!

17:00 jrmuizel has quit [Remote host closed the connection]

17:01 <MoeIcenowy> oh strange...

17:01 <MoeIcenowy> seems like some memory instability

17:02 <MoeIcenowy> re-lookup the state changes things

17:02 <anarsoul> MoeIcenowy: see https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2160

17:19 <MoeIcenowy> anarsoul: tried to lower PP shader CF

17:19 <MoeIcenowy> and this problem disappeared

17:19 <anarsoul> then likely we have some bug in it?

17:19 <MoeIcenowy> maybe

17:20 <anarsoul> can you show LIMA_DEBUG=pp output for faulty draw?

17:21 <MoeIcenowy> anarsoul: I have a combined log currently

17:21 <MoeIcenowy> should I find the part of this shader and upload?

17:21 <anarsoul> yes, both NIR and ppir parts

17:22 <MoeIcenowy> anarsoul: https://pastebin.aosc.io/paste/BwtTg0lQSM7JFjzR81G5rg

17:27 <anarsoul> source_repeat_mode is 0?

17:27 <MoeIcenowy> all uniforms are 0

17:28 <anarsoul> oh, so it takes long path?

17:32 jrmuizel has joined #lima

17:35 <anarsoul> MoeIcenowy: I don't see anything wrong in shader code...

17:41 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

17:42 jrmuizel has quit [Ping timeout: 240 seconds]

18:04 jrmuizel has joined #lima

18:29 jrmuizel has quit [Remote host closed the connection]

18:29 <MoeIcenowy> anarsoul: strange... although there's possibility that flatten shader will run for longer time, but your lima_bo_wait fix should be enough to solve this, right?

18:30 <anarsoul> MoeIcenowy: hold on, so does it fix the issue?

18:30 <MoeIcenowy> no

18:30 <MoeIcenowy> only shader flatting solves the issue

18:30 <anarsoul> then something's likely wrong in ppir compiler

18:31 <anarsoul> or maybe we don't know something about PP and generate code that won't work

18:31 <MoeIcenowy> BTW do you need the full trace?

18:31 <anarsoul> sure, but I won't be able to look into it till I return from XDC

18:33 <MoeIcenowy> anarsoul: is data inside a texture kept integer as PP input?

18:35 <anarsoul> MoeIcenowy: it doesn't matter. Sampler instruction output is always vec4 for fp16

18:35 <anarsoul> MoeIcenowy: I think we somehow get coordinates wrong

18:35 <anarsoul> feel free to walk this shader manually

18:35 <anarsoul> maybe you'll find something

18:35 <MoeIcenowy> anarsoul: I don't think it easy to get colorful result from a grayscale texture...

18:36 <anarsoul> oh

18:36 <mardikene193> this was quite a crap what i pulled before...quite a trollish shit, even embarrassing .

18:37 <mardikene193> however i simulated the modules, and i offer you soon all the proof, yeah it appears to be still +1 and as i told though, but works slightly differently

18:40 <MoeIcenowy> but it's still a quite strange result that the texture content also changes...

18:44 <mardikene193> I was not sure why did I see this +1 all the time, i ran the best possible way in my head only, it happens to be that designers did the same, i was so obsessively sure that two in consequence needs to be run in queues

18:45 <mardikene193> and what the heck it is so exactly , but my mind plays a virus all the time, thoughts play de ja vu

18:47 <anarsoul> MoeIcenowy: maybe branch conditions are wrong?

18:47 <anarsoul> i.e. we assume that ge.s0 returns 0 for false and non-zero for true

18:47 <anarsoul> but what if it's not?

18:48 <MoeIcenowy> BTW, the malisc result seems to be directly use branch.ge

18:49 <MoeIcenowy> and... how to read the disassemble result?

18:49 <MoeIcenowy> is , used to divide different instructions assigned to the same slot?

18:51 <anarsoul> MoeIcenowy: it's easy, single instruction is executed left to right

18:51 <anarsoul> then it goes to next instruction unless branch is taken

18:51 <anarsoul> $[0-5] are registers

18:51 <MoeIcenowy> does ^const0.x, const0 10.000000 0.000000 0.000000 0.000000 mean that ^const0.x=10.0 in this instruction?

18:52 <anarsoul> ^uniform, ^const, ^texture are pipeline registers

18:52 <anarsoul> ^const0.x is pipeline register

18:52 <anarsoul> "const0 10.000000 0.000000 0.000000 0.000000" loads pipeline register with specified values

18:53 <anarsoul> so e.g.: load.u 0, ge.s0 $0.x ^uniform.x ^const0.x, const0 10.000000 0.000000 0.000000 0.000000

18:54 <anarsoul> loads uniform 0 into ^uniform pipeline register, then $0.x = ge(^uniform.x, ^const0.x)

18:54 <anarsoul> MoeIcenowy: does it make sense?

18:55 <MoeIcenowy> yes

18:55 <MoeIcenowy> load.v means load varying into a internal register?

18:55 <anarsoul> no

18:55 <anarsoul> load.v loads varying to physical register or to special register ^discard

18:56 <anarsoul> texture instruction uses value in ^discard register as coords

18:56 <MoeIcenowy> why is it named ^discard?

18:56 <anarsoul> so "load.v $3.xy 0.xy" loads varying 0.xy into register $3.xy

18:56 <anarsoul> for historical reasons. Also it's lost in next instruction

18:57 <anarsoul> so

18:57 <anarsoul> "load.v ^discard. $3.xyxx" loads ^discard with value in register $3.xy

18:57 <anarsoul> but

18:57 <anarsoul> "load.v ^discard. 3.xyxx" loads ^discard with value in varying 3.xy

18:57 <anarsoul> note missing $ in second case

18:58 <MoeIcenowy> so load.v means "loads varying to physical register or loads varying or physical register to ^dicard" ?

18:58 <anarsoul> yes

18:58 <anarsoul> selects are also a bit weird

18:58 <anarsoul> e.g.

18:58 <anarsoul> mov.s0 $0.x, sel.v1 $0.zw $1.xxzw $3.xxxy

18:59 <anarsoul> select uses pipeline register that's output of scalar0 unit for condition

18:59 <anarsoul> so in this case

18:59 <anarsoul> $0.x is condition

18:59 <anarsoul> $0.zw is destination

18:59 <anarsoul> $1.xxzw is first argument

18:59 <anarsoul> $3.xxxy is second argument

19:00 <MoeIcenowy> oh complex

19:00 <MoeIcenowy> how to assign .xxzw to .zw?

19:01 <mardikene193> it was basically cause the priority encoder returns always 0 or 1 when X was involved in the valid_entry

19:01 <MoeIcenowy> oh lima has no way to load an arbitrary shader...

19:01 <anarsoul> you just specify destination :)

19:02 <MoeIcenowy> s/shader/compiled shader/

19:02 <anarsoul> no, we need shader runner

19:02 <MoeIcenowy> implement the MBS loader by ARM? ;-)

19:04 <anarsoul> something simpler would be nice

19:04 <anarsoul> I don't think we need MBS

19:11 <MoeIcenowy> strange... if I use eq.s1 result as z of gl_FragColor

19:11 <MoeIcenowy> no visible change can be seen if the condition is met or not

19:12 <MoeIcenowy> oh sorry

19:12 <MoeIcenowy> I set a wrong value for the other number

19:13 megi has quit [Quit: WeeChat 2.6]

19:13 <MoeIcenowy> seems that eq.s1 results in 1.0 when met, 0.0 when not met

19:15 <anarsoul> that's what we expect

19:18 <MoeIcenowy> yes

19:20 <anarsoul|c> Try walking the disassembly to check whether it does what you expect

19:28 <MoeIcenowy> anarsoul: maybe I should try https://gitlab.freedesktop.org/rellla/mesa/commits/lima-ppir-swapargs-fix ?

19:29 <anarsoul|c> Nope, it's wrong

19:30 <MoeIcenowy> ah, no change

19:30 <MoeIcenowy> anarsoul: why is it wrong?

19:30 <anarsoul|c> Because le will be ge if you swap args

19:31 <MoeIcenowy> oh okay

19:31 <anarsoul|c> At lt will be gt

19:31 <anarsoul|c> And

19:31 <MoeIcenowy> I got silly

19:31 <MoeIcenowy> swap is not not

20:06 marcodiego has joined #lima

20:18 drod has joined #lima

21:54 jrmuizel has joined #lima

22:40 jrmuizel has quit [Remote host closed the connection]

22:58 jrmuizel has joined #lima

23:37 jrmuizel has quit [Remote host closed the connection]

23:58 jrmuizel has joined #lima