ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
drod has quit [Remote host closed the connection]
adjtm has joined #lima
<anarsoul> MoeIcenowy: your change actually breaks q3a intro screen for me
<anarsoul> setting all lower bits to 1 fixes it though
<anarsoul> so I wonder if we should set it to 0xf?
jrmuizel has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
jrmuizel has quit [Remote host closed the connection]
dddddd has quit [Remote host closed the connection]
_whitelogger has joined #lima
_whitelogger has joined #lima
<MoeIcenowy> anarsoul: what? how is the intro screen broken?
<anarsoul> MoeIcenowy: it's plain black
<MoeIcenowy> did you tested other values than 0x0 and 0xf?
<MoeIcenowy> and did you tested 0x0 for multiple times?
<anarsoul> yes, 0x1 doesn't work
<anarsoul> 0xf works
<anarsoul> so it somehow depends on number of uniforms?
<MoeIcenowy> I still don
<MoeIcenowy> don't think so
<MoeIcenowy> it's quite strange that an indirect reference should keep the info about the real storage's length
megi has quit [Ping timeout: 240 seconds]
<anarsoul> MoeIcenowy: I have an idea, let me test it
<anarsoul> I have strong suspicion that it depends on uniform buffer size
<anarsoul> 0xf always works
<anarsoul> but we can specify only 16 values with 4 bits
<anarsoul> but we can have up to 65535 uniforms
<anarsoul> I guess it's first set bit?
<anarsoul> nah, it breaks ~10 tests
<MoeIcenowy> anarsoul: 2^16 = 65536
<MoeIcenowy> this might be a tip?
<MoeIcenowy> so... log2?
<MoeIcenowy> anarsoul: do you have the log of broken tests?
<MoeIcenowy> anarsoul: try `render->uniforms_address |= util_last_bit((ctx->buffer_state[lima_ctx_buff_pp_uniform].size) / 4 - 1);` ?
<MoeIcenowy> anarsoul: or if this value can be higher but not lower, maybe just set it to 0xf ?
<MoeIcenowy> (a violent choice
_whitelogger has joined #lima
mardikene193 has joined #lima
<mardikene193> maybe i am a bit sick, but being sick has some advantages than :D
<mardikene193> That miaow code can not be possibly something unreadable if you'd get your stuff together.
dddddd has joined #lima
drod has joined #lima
<mardikene193> assign next_valid_entry = (valid_entry_out | (decoded_init_instr)) & ~(decoded_issued | decoded_branch_taken);
<mardikene193> this is only a little bit of tricky, when you drive 40 1s to the decode_wfid and have nothing issued valid is set, which is the case when no fetch is done it drives X which evaluates to 1111111....
<mardikene193> however when you drive 1 to/as decode_wfid and no issue, it evaluates 10000000000000....
<mardikene193> as in the case of driving X valid entry does not go down , only the issued ones go down, so vacant will be for instance 0010000000 like i told, in in queue rendering
<mardikene193> however
<mardikene193> when you drive 1000000000 and things used to be 111111111 all except one (the first) go down and it evalueds 011111111
<mardikene193> it is the case in full pipeline mode or full length of the pipeline mode in miaow if i am not mistaken, but who cares right?
<mardikene193> in one case like in queue rendering the vacant is predominantly zeros, and it adds +1 every time
<mardikene193> in another case at full length of the pipeline
<mardikene193> they are predominantly ones
megi has joined #lima
<mardikene193> when you knock in 011111111 | 0 & 1111111111 the next in the vacant line for the full pipeline mode, the result should be that it always changes column, next time it will be reverse
<mardikene193> 1 | 01111111 etc. & 01111 so it always should change as round robin should do also
<mardikene193> column i meant
<mardikene193> so when it adds +1 to in queue rendering obviously when not issued the column remains unchenged and it will stay on the line
<mardikene193> even if i am wrong it does not matter, just read code please properly
drod has quit [Remote host closed the connection]
Da_Coynul has joined #lima
Da_Coynul has quit [Client Quit]
jbrown has quit [Ping timeout: 276 seconds]
Da_Coynul has joined #lima
jbrown has joined #lima
Da_Coynul has quit [Client Quit]
niceplaces has joined #lima
niceplace has quit [Ping timeout: 265 seconds]
Da_Coynul has joined #lima
jbrown has quit [Ping timeout: 276 seconds]
jbrown has joined #lima
abelvesa has quit [Ping timeout: 268 seconds]
abelvesa has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
jrmuizel has joined #lima
jrmuizel has quit [Ping timeout: 240 seconds]
jrmuizel has joined #lima
<anarsoul> MoeIcenowy: that's what I did (using util_last_bit())
<MoeIcenowy> anarsoul: doesn't work?
<MoeIcenowy> did you run any test about just set it to 0xf?
<anarsoul> seems to work, but I want to check what blob does
<anarsoul> MoeIcenowy: 0xf also works
<MoeIcenowy> util_last_bit also seems to work?
<anarsoul> yes, but I'm not convinced that it's correct solution
<MoeIcenowy> anarsoul: do you remember the texture issue by roman on Android?
<MoeIcenowy> I think I met it on glamor
<anarsoul> yes
<anarsoul> I don't know how to fix it though
<MoeIcenowy> currently Gtk+ programs give me a strange color when some UI element is changed
<MoeIcenowy> and with Xephyr I succeeded in recording a apitrace for the strange color
<MoeIcenowy> It's green to cyan gradient
<anarsoul> MoeIcenowy: looks like uniform size has log2 dependency, it's 0 for 1x vec4, 1 for 2x vec4, 2 for 3-4x vec4, 3 for 5-8x vec4, 4 for 9x vec4, etc
<anarsoul> that's values that blob uses
<MoeIcenowy> oh interesting
<MoeIcenowy> so could you make a MR with the log2 code?
<anarsoul> sure
<anarsoul> let me put it together and run it through piglit
<anarsoul> MoeIcenowy: also I think that we can do more that 16 samplers. Sampler index in instruction is 12 bits
<MoeIcenowy> ah 16 is enough now ;-)
<anarsoul> but again, we have only 4 bits to specify size of texture descriptors array
<anarsoul> see where I'm going? :)
<MoeIcenowy> I don't know
<anarsoul> MoeIcenowy: likely it's again log2, just like uniforms
<MoeIcenowy> oops
<MoeIcenowy> I thought about stolen size bits from another register...
<MoeIcenowy> anarsoul: will == between two integers work reliably in PP?
<anarsoul> MoeIcenowy: PP is pretty sane, so I don't expect any surprises here
<MoeIcenowy> the numbers are quite small, in range of 0~10
<MoeIcenowy> anarsoul: but it's FP16 only
<anarsoul> MoeIcenowy: PP doesn't have integers, only floats
<anarsoul> yeah, it should work
<MoeIcenowy> because for integers no error should be present?
<anarsoul> MoeIcenowy: we have 10 bits of precision
<anarsoul> so it should be accurate enough for small integers
<MoeIcenowy> the operations that produces the strange gradient look quite normal...
<MoeIcenowy> and the shader looks simple
Da_Coynul has joined #lima
<anarsoul> MoeIcenowy: likely it uses wrong texture address
<MoeIcenowy> BTW looks like it's doing a strance operation
<anarsoul> check what's up with texture descriptor when you replaying trace
<MoeIcenowy> rendering to a FBO and use the FBO as a texture to be feed in
<anarsoul> oh
<anarsoul> it should work though
<anarsoul> we call lima_submit_add_bo() in lima_texture_desc_set_res()
<anarsoul> so it adds dependency on it
<MoeIcenowy> the shader have a lot of useless codes though
<MoeIcenowy> I mean the fragment shader
<MoeIcenowy> source_repeat_mode is 0 in this run
<MoeIcenowy> tiled_texture again, right?
<MoeIcenowy> oh no
<anarsoul> btw yeah, you can try with tiled textures
<MoeIcenowy> looks like no mipmap is used on this texture -- mipmap on a UI element is meaningless
<anarsoul> try this extra lima_bo_wait()
<MoeIcenowy> anarsoul: no help
<anarsoul> then I'm out of ideas
<anarsoul> :)
<MoeIcenowy> will we specify the same memory area out?
<MoeIcenowy> or will we reload it and use another memory area?
<anarsoul> what do you mean?
<MoeIcenowy> when we use the target FBO as one of the input texture
jernej_ has joined #lima
jernej has quit [Read error: Connection reset by peer]
jernej_ is now known as jernej
<MoeIcenowy> anarsoul: BTW how can I locate a single draw in lima.dump?
<anarsoul> good question
<anarsoul> I don't know :)
<anarsoul> by analyzing it?
<anarsoul> or add some extra traces
jrmuizel has quit [Remote host closed the connection]
<MoeIcenowy> anarsoul: is there two dummy GP uniforms?
<anarsoul> they're not dummy
<anarsoul> "output transformation"
<MoeIcenowy> ah I mean not specified by the user here
<MoeIcenowy> anarsoul: strange... looks like it's not the same buffer
<MoeIcenowy> the texture is 85x32, but the target is 87x34
<MoeIcenowy> But strangely in apitrace dump they're changed altogether
jrmuizel has joined #lima
<anarsoul> running piglit with my uniforms fix now
<MoeIcenowy> anarsoul: strange
<MoeIcenowy> the texture buffer read out changed when a glBindFramebufferEXT call is performed
<MoeIcenowy> ?!
jrmuizel has quit [Remote host closed the connection]
<MoeIcenowy> oh strange...
<MoeIcenowy> seems like some memory instability
<MoeIcenowy> re-lookup the state changes things
<MoeIcenowy> anarsoul: tried to lower PP shader CF
<MoeIcenowy> and this problem disappeared
<anarsoul> then likely we have some bug in it?
<MoeIcenowy> maybe
<anarsoul> can you show LIMA_DEBUG=pp output for faulty draw?
<MoeIcenowy> anarsoul: I have a combined log currently
<MoeIcenowy> should I find the part of this shader and upload?
<anarsoul> yes, both NIR and ppir parts
<anarsoul> source_repeat_mode is 0?
<MoeIcenowy> all uniforms are 0
<anarsoul> oh, so it takes long path?
jrmuizel has joined #lima
<anarsoul> MoeIcenowy: I don't see anything wrong in shader code...
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
jrmuizel has quit [Ping timeout: 240 seconds]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
<MoeIcenowy> anarsoul: strange... although there's possibility that flatten shader will run for longer time, but your lima_bo_wait fix should be enough to solve this, right?
<anarsoul> MoeIcenowy: hold on, so does it fix the issue?
<MoeIcenowy> no
<MoeIcenowy> only shader flatting solves the issue
<anarsoul> then something's likely wrong in ppir compiler
<anarsoul> or maybe we don't know something about PP and generate code that won't work
<MoeIcenowy> BTW do you need the full trace?
<anarsoul> sure, but I won't be able to look into it till I return from XDC
<MoeIcenowy> anarsoul: is data inside a texture kept integer as PP input?
<anarsoul> MoeIcenowy: it doesn't matter. Sampler instruction output is always vec4 for fp16
<anarsoul> MoeIcenowy: I think we somehow get coordinates wrong
<anarsoul> feel free to walk this shader manually
<anarsoul> maybe you'll find something
<MoeIcenowy> anarsoul: I don't think it easy to get colorful result from a grayscale texture...
<anarsoul> oh
<mardikene193> this was quite a crap what i pulled before...quite a trollish shit, even embarrassing .
<mardikene193> however i simulated the modules, and i offer you soon all the proof, yeah it appears to be still +1 and as i told though, but works slightly differently
<MoeIcenowy> but it's still a quite strange result that the texture content also changes...
<mardikene193> I was not sure why did I see this +1 all the time, i ran the best possible way in my head only, it happens to be that designers did the same, i was so obsessively sure that two in consequence needs to be run in queues
<mardikene193> and what the heck it is so exactly , but my mind plays a virus all the time, thoughts play de ja vu
<anarsoul> MoeIcenowy: maybe branch conditions are wrong?
<anarsoul> i.e. we assume that ge.s0 returns 0 for false and non-zero for true
<anarsoul> but what if it's not?
<MoeIcenowy> BTW, the malisc result seems to be directly use branch.ge
<MoeIcenowy> and... how to read the disassemble result?
<MoeIcenowy> is , used to divide different instructions assigned to the same slot?
<anarsoul> MoeIcenowy: it's easy, single instruction is executed left to right
<anarsoul> then it goes to next instruction unless branch is taken
<anarsoul> $[0-5] are registers
<MoeIcenowy> does ^const0.x, const0 10.000000 0.000000 0.000000 0.000000 mean that ^const0.x=10.0 in this instruction?
<anarsoul> ^uniform, ^const, ^texture are pipeline registers
<anarsoul> ^const0.x is pipeline register
<anarsoul> "const0 10.000000 0.000000 0.000000 0.000000" loads pipeline register with specified values
<anarsoul> so e.g.: load.u 0, ge.s0 $0.x ^uniform.x ^const0.x, const0 10.000000 0.000000 0.000000 0.000000
<anarsoul> loads uniform 0 into ^uniform pipeline register, then $0.x = ge(^uniform.x, ^const0.x)
<anarsoul> MoeIcenowy: does it make sense?
<MoeIcenowy> yes
<MoeIcenowy> load.v means load varying into a internal register?
<anarsoul> no
<anarsoul> load.v loads varying to physical register or to special register ^discard
<anarsoul> texture instruction uses value in ^discard register as coords
<MoeIcenowy> why is it named ^discard?
<anarsoul> so "load.v $3.xy 0.xy" loads varying 0.xy into register $3.xy
<anarsoul> for historical reasons. Also it's lost in next instruction
<anarsoul> so
<anarsoul> "load.v ^discard. $3.xyxx" loads ^discard with value in register $3.xy
<anarsoul> but
<anarsoul> "load.v ^discard. 3.xyxx" loads ^discard with value in varying 3.xy
<anarsoul> note missing $ in second case
<MoeIcenowy> so load.v means "loads varying to physical register or loads varying or physical register to ^dicard" ?
<anarsoul> yes
<anarsoul> selects are also a bit weird
<anarsoul> e.g.
<anarsoul> mov.s0 $0.x, sel.v1 $0.zw $1.xxzw $3.xxxy
<anarsoul> select uses pipeline register that's output of scalar0 unit for condition
<anarsoul> so in this case
<anarsoul> $0.x is condition
<anarsoul> $0.zw is destination
<anarsoul> $1.xxzw is first argument
<anarsoul> $3.xxxy is second argument
<MoeIcenowy> oh complex
<MoeIcenowy> how to assign .xxzw to .zw?
<mardikene193> it was basically cause the priority encoder returns always 0 or 1 when X was involved in the valid_entry
<MoeIcenowy> oh lima has no way to load an arbitrary shader...
<anarsoul> you just specify destination :)
<MoeIcenowy> s/shader/compiled shader/
<anarsoul> no, we need shader runner
<MoeIcenowy> implement the MBS loader by ARM? ;-)
<anarsoul> something simpler would be nice
<anarsoul> I don't think we need MBS
<MoeIcenowy> strange... if I use eq.s1 result as z of gl_FragColor
<MoeIcenowy> no visible change can be seen if the condition is met or not
<MoeIcenowy> oh sorry
<MoeIcenowy> I set a wrong value for the other number
megi has quit [Quit: WeeChat 2.6]
<MoeIcenowy> seems that eq.s1 results in 1.0 when met, 0.0 when not met
<anarsoul> that's what we expect
<MoeIcenowy> yes
<anarsoul|c> Try walking the disassembly to check whether it does what you expect
<anarsoul|c> Nope, it's wrong
<MoeIcenowy> ah, no change
<MoeIcenowy> anarsoul: why is it wrong?
<anarsoul|c> Because le will be ge if you swap args
<MoeIcenowy> oh okay
<anarsoul|c> At lt will be gt
<anarsoul|c> And
<MoeIcenowy> I got silly
<MoeIcenowy> swap is not not
marcodiego has joined #lima
drod has joined #lima
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima