#lima on 2019-08-18 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:11 ninolein_ has joined #lima

00:11 ninolein has quit [Ping timeout: 264 seconds]

00:33 tlwoerner has joined #lima

00:53 jrmuizel has joined #lima

01:14 yuq825 has joined #lima

01:38 jrmuizel has quit [Remote host closed the connection]

01:39 jrmuizel has joined #lima

02:42 jrmuizel has quit [Remote host closed the connection]

03:01 yuq8251 has joined #lima

03:01 yuq825 has quit [Remote host closed the connection]

03:23 <anarsoul> yuq8251: hi

03:24 <anarsoul> have you noticed that X11 has large latency when glamor is in use with lima?

03:24 <anarsoul> i.e. if I move cursor it moves 2x-3x slower than I move mouse

03:24 <yuq8251> hi

03:24 <anarsoul> I'm not sure where it comes from since CPU load isn't high

03:25 <yuq8251> which x11 application?

03:26 <yuq8251> or desktop?

03:26 <anarsoul> yuq8251: no applications, just plain X11 with xterm

03:27 <anarsoul> just try moving cursor

03:28 <yuq8251> I didn't see it on my amlogic s905x

03:28 <anarsoul> yuq8251: amlogic s905x may have cursor plane

03:29 <anarsoul> you can try 'Option "SWcursor" "true"' in your xorg config to enable sw cursor

03:29 <yuq8251> let me use software cursor

03:32 <yuq8251> same result

03:35 <anarsoul> do you want me to make video?

03:35 <anarsoul> can you try on some board with allwinner soc?

03:38 <yuq8251> yeah, a video, I can try on allwinner, but need much time to setup one

03:43 <anarsoul> https://www.youtube.com/watch?v=rRln8pBwNgs

03:46 <anarsoul> watch my hand moving mouse and see how cursor lags

03:53 <anarsoul> yuq8251: I think it's somehow related to GPU, if I lower resolution to 1024x768 there's no lag

03:53 <anarsoul> but it's here in 1920x1080

03:54 <anarsoul> I mean to GPU load

03:56 <anarsoul> yuq8251: you can also try lowering GPU frequency, IIRC s905x has mali450 that runs at pretty high freq

03:58 <anarsoul> yuq8251: or try launching mpv playing some video and glxgears

03:58 <anarsoul> and move glxgears window

03:58 <anarsoul> or better some other window (xterm)

03:58 <anarsoul> you'll see that gears spin slower

04:02 <anarsoul> https://www.youtube.com/watch?v=AE1nPO3OHg8

04:02 <anarsoul> it looks like it couldn't render it in time and tries to catch up :)

04:12 <yuq8251> looks like task from multi context is not interleaved

04:12 <yuq8251> as you are moving window then glxgears stop

04:13 <yuq8251> or the window manage does not update other window when moving a window

04:15 <anarsoul> yuq8251: maybe, but I get similar result with no apps and just a cursor

04:16 <anarsoul> however I'm not sure how many contexts are there

04:17 <yuq8251> you may add some print or debugfs in kernel driver to monitor

04:17 <yuq8251> btw, glamor does not call eglswapbuffer

04:18 <anarsoul> and what are the consequences?

04:18 <yuq8251> it uses glflush to update render target

04:18 <yuq8251> so if it does not call glclear, tile buffer gpu like lima have to reload the screen all the time when glflush

04:19 <anarsoul> I see

04:19 <anarsoul> and that's expensive

04:19 <yuq8251> yes

04:20 <yuq8251> I RE the mali blob before, it treat glflush case with continuously using GP PLBU buffer

04:20 <anarsoul> *sigh* looks like there's a lot to fix before we can use lima with X11

04:21 <yuq8251> and will overflow when many glflush

04:21 <yuq8251> so I use the reload method

04:23 <yuq8251> https://gitlab.freedesktop.org/lima/mesa/commit/5e8fa3800da5707c7bba66cab8b01b8b719a197d

04:23 <yuq8251> this is the revert commit

04:25 <yuq8251> I think it would be much better for composite window manager

04:26 <yuq8251> and wayland desktop

04:28 <anarsoul> hm

04:29 <anarsoul> I can try xcompmgr

04:31 <anarsoul> it doesn't help

04:31 <anarsoul> and moving cursor is enough to get glxgears to stutter

04:48 <yuq8251> xserver has a GL context, glxgears has one, composite WM has one

04:49 <anarsoul> I see

04:50 <anarsoul> yuq8251: can you reproduce the issue on your side?

04:51 <yuq8251> Oh, I can see now with glxgears running, even without WM

04:53 <anarsoul> btw, please review my pp cf branch when you have some time

04:56 <yuq8251> ok

04:59 <anarsoul> it doesn't regress in piglit, so at least it generates correct code

05:03 <yuq8251> that's nice

05:04 <yuq8251> have you tested some desktop?

05:04 <yuq8251> like xfce

05:05 <anarsoul> nope

05:05 <anarsoul> due to this latency issue

05:05 <anarsoul> anything in X11 isn't really usable for me

05:09 <yuq8251> I can see similar lag with weston, but much better

05:17 <anarsoul> yuq8251: it's gets a lot worse with something GPU-heavy

05:18 <anarsoul> I tried starting ioquake3 and lag in menu is tens of seconds

05:31 dddddd has quit [Remote host closed the connection]

06:30 mardestan has joined #lima

06:36 <mardestan> plaes: actually libv pointed out correctly that it isn't possible to deal with freedesktop guys, it's definitely allready enough of reason why some smaller branches of people would need to get along, i have no interest to fight with you cause i did allready declare that the med institution run was a huge scam. fdo evil people did get a sniff at it and continue to scam, not possible to deal with them.

06:39 <mardestan> And i did respond to my sister that i do not quite know what you are about, it seems like you have what it takes but something is still missing, it could be this bad fdo team for you too maybe which causes issues like this

06:40 <mardestan> because i remember talking with you when you did say something about wallace tree multipliers or whatever was it, it seemed like you did have a bit clue in what you do.

06:45 <mardestan> IF someone were to even offer to work with such collective by a company, cause they caused a lot of braindamage in every episode i have been telling them how to do things.

06:45 <mardestan> i would refuse working with such scammers in the same team, that is a big blow pranksters scammers and violators are not needed in such areas.

06:54 <mardestan> all the people who got irritated cause of FDO morans are tried to be reasoned and awarded to me, that i am the devil, yet those guys do not seem to understand what is two complement system, which is something that is described in every programming book, and i am pretty sure that it is even something that plaes knows about.

06:57 <mardestan> Yes so when you negate or subtract in twos complement system, this is going to be fast, cause it can ignore carries and hence generate only couple of gate delays.

06:58 <mardestan> it was not complex to reinvent all the logic behing it, cause that is just common sense entirely, but it is something that fdo people entirely lack.

06:59 <mardestan> and i responded to my sister, even though plaes seems to be doing pretty mad stuff when it comes to me, i baasically think he is bigger man then fdo scammers, I favor him more to understand how things work.

07:00 <mardestan> it is purely elementary school stuff, nothing complex about electronic circuits

07:07 <mardestan> libv: i am really sorry that this powershow demonstration was done on you, and largely your everyday life was screwd by scammers like this, my trohbles started a bit earlier i got pretty sick feeling too from fdo people pretty much all the time.

07:09 <mardestan> when i expressed my opinion that such an idiot like Dave Airlie should be put off, i got 17black hawk helicopters circulating above my apartement and bunch of deluded nasty cockblockers annoying me everywhere in new zealand.

07:10 <mardestan> those guys are idiots, i just do not seem to get how SAS people did not understand it early enough to have been allowing such demonstration to take places, and ruin the momementum of luc too in life.

07:15 <mardestan> it is a standard procedure that some authorities will dispatch a chopter when someone is attacked with cold weapon actually in british societies at least

07:17 <mardestan> now what i tell is most important here, i was assaulted by some fuckers who are ordered by people who airlied has be allied with

07:18 <mardestan> and in the backround those people are hated in some communities quite a lot, and known to be mad injustice type of terror scammers.

07:18 cwabbott has quit [Ping timeout: 245 seconds]

07:24 <mardestan> her operand modulis in my country was to go humiliate people like me with her sex comments, where each and every one of those occasions shortly after some fuckers assaulted me and i was framed, i also said that this was the case and warned others, that one nutter like that seems to fill her days with such activity

07:27 <mardestan> I have three of such women terrorists, my head explodes when i need to think about, like wtf. is wrong with them, which way it works, do men manipulate them or vice versa.

07:27 <mardestan> however the result has been huge row of assaults towards me always, untolerable humiliative comments etc.

07:29 <mardestan> In other words, those cunts have been entirely nuts, and even police knows that but some instances were influenced to take a decision against me instead.

07:34 <mardestan> Plaes and going on , on the road of Viktor Kingisepp betrayding estonians consistently...i try to memorise what happened to this guy, he was known to kill himself later on, yeah it seems actually so instead of kapo killing him off, it is because the new friends after killing bunch of estonians off were more annoying even

07:37 <mardestan> he just did not expect and evaluate how bad this is going to be to betray his people to substitute them with lot worse foreigners, once he had done this and saw airlied type of guys take over he just killed himself

07:37 mardestan has quit [Remote host closed the connection]

07:37 raimo has joined #lima

07:38 raimo has quit [Client Quit]

08:11 _whitelogger has joined #lima

11:41 yuq8251 has quit [Remote host closed the connection]

12:09 mardestan has joined #lima

12:12 <mardestan> I mentioned a little that i worked on the theory of different standards of and ontop of jtag stuff, those documents are allready quite large, it is with my paranoias present about people obsess compulsively banning me a biggest complexitiy to start with

12:13 <mardestan> it is possible to also target the in-order flip-flops of issue modules when the queues are not present which is minor set of in-order cpus which are designed not have such queues

12:14 <mardestan> frankly this might be too much due to me being victimized to handle that my own, i would want to do it, but it is how it is

12:18 <mardestan> those kinds of cores are very cheap, so it would be beneficial to pimp them up and not investing a lot of money for the hw hence, but even when finally tracing those buffers is carried out and ready for filling in those buffers from caches

12:19 <mardestan> even then there are some security issues probably or safety issues to do like that on mainstream or update undergoing systems

12:26 <mardestan> I know the specification very well and understand it, but since i've been falling ill due to perverse activity of the channel mods, i i have so much anger in me, that i can barely recognize myself as human being, which of course has been part of the plan for those terror scammers.

12:27 <mardestan> paranoia says: DO not do those jobs, so they can put you off with their critics.

12:31 <mardestan> So the easiest to talk with is libv, who does not have much time, we have some mutual understanding which would turn out to be useful for this type of projects, cause scientists say wrong bits on such debug pipeline may damage the hardware even user should not be given 100percent warranty that it will also be entirely safe

12:32 <mardestan> not to mention the people who program the chip in such way, they must be aware that some mistake can be fatal to the health of the chip on debug pipeline issue module fills with wrong bits

12:38 <mardestan> so hence the ATSP standard also says allready in it's name read as -- advanced test pattern program generator, in other words patterns that you send to the logic must be something safe , however yeah they can be traced in a way when they match then they will be safe too

12:41 <mardestan> so other probably understand that there is no evaluating pipeline like in decode stage in issue debug stage, any bits you send will be accepted and tried to be executed

12:47 <mardestan> if the debug pipeline on ARM in-order processors shifts in those regs enough fast like with adaptive async mode as sw pipelining method would do

12:48 <mardestan> then of course such method is incredibly fast on low-end hw too, prolly some millions of times faster

12:51 <mardestan> I assume one of such method is BYPASS reg filled in with BSR content instead and shifted in DR-SHIFT to TDI

12:53 jrmuizel has joined #lima

13:02 mardestan has left #lima ["Leaving"]

13:07 cwabbott has joined #lima

13:22 jrmuizel has quit [Remote host closed the connection]

13:44 dddddd has joined #lima

13:54 jrmuizel has joined #lima

13:56 <enunes> anarsoul: hey, sure we can rework the register selection for spilling, do you have some ideas?

13:57 <enunes> I'm more worried first to fix the infinite loop case you hit, maybe I should pick your branch and remove that attempts implementation

13:57 <enunes> to try to reproduce it and propose an improvement in marking registers unspillable

13:58 <enunes> and then we can also improve the register selection algorithm

13:59 <enunes> with shaderdb that is much easier now

14:14 ninolein_ has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

15:17 jrmuizel has quit [Remote host closed the connection]

15:22 jrmuizel has joined #lima

15:39 jrmuizel has quit [Remote host closed the connection]

16:28 <anarsoul> enunes: yeah, I have one idea

16:30 <anarsoul> enunes: we can do two passes: 1st pass: calculate maximum register pressure, 2nd pass: choose one register that is in block where max reg pressure is reached

16:43 jrmuizel has joined #lima

17:06 jrmuizel has quit [Remote host closed the connection]

19:35 jrmuizel has joined #lima

20:17 <enunes> anarsoul: I have to look at how we can calculate this, but seems better than what we have

20:17 <enunes> I noticed that the mesa ralloc has "ra_get_best_spill_node"

20:35 jrmuizel has quit [Read error: Connection reset by peer]

20:37 jrmuizel has joined #lima

20:52 jrmuizel has quit [Remote host closed the connection]

21:00 <anarsoul> enunes: yeah, maybe it's better to use it

21:01 jrmuizel has joined #lima

21:06 jrmuizel has quit [Remote host closed the connection]

21:10 <enunes> anarsoul: quick attempt to use it seems to have marginal gain and inconclusive results, https://gitlab.freedesktop.org/snippets/652/raw

21:11 <enunes> slightly better in the spills though, maybe it is worth it

21:13 <anarsoul> enunes: try glmark2 -b ideas

21:13 <enunes> with your branch?

21:14 <anarsoul> yes

21:26 <enunes> anarsoul: ideas works, resolves spilling with exactly 10 attempts

21:26 <enunes> without this change it just aborted as took >10

21:26 <enunes> shadow renders a bit strange in ideas

21:27 <enunes> ah wait, no, it also aborted the compilation, but still worked?

21:28 <enunes> let me start over

21:33 jrmuizel has joined #lima

21:34 <anarsoul> enunes: yeah, it's weird, it aborts compilation but continues to work

21:34 <anarsoul> however rendering is incorrect

21:34 <enunes> anarsoul: well yeah I can reproduce the infinite loop with the current master, with this local change switching to ra_get_best_spill_node it doesn't resolve spilling either, but regalloc runs out of registers and correctly aborts quickly

21:41 matv1 has joined #lima

21:42 <anarsoul> enunes: OK, so can you send an MR with your local change to my branch?

21:42 <anarsoul> however I'm not sure if it's possible...

21:43 <anarsoul> if you just point me to your branch I can just pull this change

21:43 <enunes> anarsoul: I will do that, first I will try to see what it is doing to see if we can optimize it in some way

21:43 <anarsoul> I think vectorization should help ideas

21:46 <enunes> anarsoul: still fails regalloc even with vectorize

21:55 <anarsoul> :(

21:55 <anarsoul> well, then we'll have to look into it later

21:56 <anarsoul> blob compiles it just fine, so it should be possible

21:58 <anarsoul> enunes: does vectorize help if you use vector select? (just fake it for now)

21:58 <enunes> lets see

21:59 <enunes> still regalloc fail

21:59 <anarsoul> :(

22:00 <enunes> seems that many registers should still be spillable, I'm wondering why it gives up

22:00 <anarsoul> enunes: it should also help if we fuse branch condition into branch

22:02 <anarsoul> I'll play with it after cf branch merges

22:04 <anarsoul> it shouldn't be too difficult

22:07 <anarsoul> enunes: what are we missing in ppir besides cf?

22:09 <anarsoul> I think all the other sampler types

22:09 <anarsoul> and that's probably it?

22:10 <enunes> then bugfixes I guess

22:12 <anarsoul> and optimizations

22:12 <anarsoul> also X11 is not really usable, glamor works but we have some issue with job queue

22:13 <anarsoul> see https://youtu.be/AE1nPO3OHg8

22:14 <anarsoul> glxgears freezes when I move another window and then tries to catch up

22:22 <enunes> anarsoul: I saw the discussion... yeah that seems hard to debug

22:22 <enunes> is this a build without debugs?

22:22 <anarsoul> yes

22:23 <enunes> job queue you mean the drm sched one?

22:23 <anarsoul> I guess you can reproduce it since you're using pine64

22:23 <anarsoul> I'm not sure how it's implemented

22:33 <enunes> hmm apparently mesa ralloc is marking many nodes as "in_stack" and they are not spilling candidates, need to figure out what that means

22:42 <anarsoul> enunes: I doubt there's a bug in it

22:42 <anarsoul> it's used by vc4, v3d and i965

22:43 <enunes> anarsoul: yeah I'm sure it's not a bug in it, I wonder if we should set something different so that it doesn't do that, or just what it means

22:45 <anarsoul> enunes: there's an explanation what in_stack means in register_allocate.c

22:45 <anarsoul> see comment at the top of file

22:47 <enunes> sure I read that, still not clear to me why it stays set after the algorithm executes and why it is a condition to select the best spillable node

22:56 <anarsoul> enunes: anyway, don't spend too much time on it, fusing condition into branch will save one reg for each branch

22:56 <anarsoul> N regs for nested branches :)

22:58 <enunes> anarsoul: yeah there is an explanation for that stuff in the commit logs, I don't think we can do anything about it

22:58 <enunes> especially if branching takes registers away, maybe it is indeed unresolvable

22:58 <enunes> I suppose I will submit a MR to switch to ra_get_best_spill_node anyway since it solves the infinite loop issue

22:59 <enunes> and it seems that this is what everyone else uses

22:59 <anarsoul> enunes: just point me to the branch and I'll cherry pick the commit

23:07 <enunes> anarsoul: I guess i can submit it anyway and we can possibly merge it anyway before cf gets merged?

23:07 <enunes> not sure if you already intend to merge the current cf iteration

23:08 <anarsoul> enunes: I do, waiting for some review :)

23:09 <anarsoul> it causes not regression in piglit and fixes 41 test

23:10 <anarsoul> s/not/no

23:11 <anarsoul> also we can actually run X11 now

23:14 <enunes> mostly out of curiosity, why do we need ppir_op_dummy ?

23:15 <enunes> also I would appreciate some more verbose commit messages for this as it's +616 -248 lines :)

23:16 <anarsoul> I'll try to add more to commit message, but there's nothing interesting in implementation

23:17 <anarsoul> enunes: ppir_op_dummy is used for placeholder for ppir_dest which is reg

23:18 matv1 has quit [Quit: Leaving]

23:18 <enunes> it gets removed eventually?

23:18 <anarsoul> basically we can get nir where register is read before it's assigned, it's totally fine, but compiler expects non-NULL value in comp->var_nodes

23:18 <anarsoul> it's just ignored

23:19 <enunes> this is the nir undef value?

23:20 <anarsoul> no

23:20 <anarsoul> it's not undef

23:21 <anarsoul> basically we can have something like: loop { r1 = r2; if (somecond) break; r2 = someothervalue }

23:22 <anarsoul> it's a read from uninitialized register

23:23 <anarsoul> but it gets initialized on next iteration :)

23:24 <enunes> I see, and nir doesnt create that undef assignment for it in this case?

23:24 <anarsoul> no

23:24 <anarsoul> (and it makes no sense - it's redundant)

23:25 <anarsoul> it's not ssa

23:25 <anarsoul> it's a reg

23:25 <anarsoul> it can be assigned multiple times

23:26 <enunes> hmm so thats the difference then, its not ssa

23:41 <anarsoul> enunes: I think we should assign different spill cost for regs with different number of components

23:41 <anarsoul> IIRC we're using vec4 temporary regardless of number of used components

23:42 <enunes> yes

23:42 <anarsoul> so it's beneficial to spill regs with more components

23:42 <enunes> ok, I can try that

23:48 <anarsoul|c> Even if we stored floats as floats it's more beneficial to spill vec4 regs

23:49 <enunes> anarsoul: hah, very nice https://gitlab.freedesktop.org/snippets/653

23:55 <enunes> anything else we might want to favour, some type of instruction maybe?

23:56 <enunes> anarsoul: btw, this reminds me: not duplicating the use of uniforms was also something that greatly affected register pressure

23:56 <enunes> we might want to do that again, I think I recall even the blob does it

23:57 <enunes> right now one uniform used by the entire program basically takes away 1 register which will likely be spilled anyway, so we don't really save memory accesses by not doing that