ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
ninolein_ has joined #lima
ninolein has quit [Ping timeout: 264 seconds]
tlwoerner has joined #lima
jrmuizel has joined #lima
yuq825 has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
yuq8251 has joined #lima
yuq825 has quit [Remote host closed the connection]
<anarsoul> yuq8251: hi
<anarsoul> have you noticed that X11 has large latency when glamor is in use with lima?
<anarsoul> i.e. if I move cursor it moves 2x-3x slower than I move mouse
<yuq8251> hi
<anarsoul> I'm not sure where it comes from since CPU load isn't high
<yuq8251> which x11 application?
<yuq8251> or desktop?
<anarsoul> yuq8251: no applications, just plain X11 with xterm
<anarsoul> just try moving cursor
<yuq8251> I didn't see it on my amlogic s905x
<anarsoul> yuq8251: amlogic s905x may have cursor plane
<anarsoul> you can try 'Option "SWcursor" "true"' in your xorg config to enable sw cursor
<yuq8251> let me use software cursor
<yuq8251> same result
<anarsoul> do you want me to make video?
<anarsoul> can you try on some board with allwinner soc?
<yuq8251> yeah, a video, I can try on allwinner, but need much time to setup one
<anarsoul> watch my hand moving mouse and see how cursor lags
<anarsoul> yuq8251: I think it's somehow related to GPU, if I lower resolution to 1024x768 there's no lag
<anarsoul> but it's here in 1920x1080
<anarsoul> I mean to GPU load
<anarsoul> yuq8251: you can also try lowering GPU frequency, IIRC s905x has mali450 that runs at pretty high freq
<anarsoul> yuq8251: or try launching mpv playing some video and glxgears
<anarsoul> and move glxgears window
<anarsoul> or better some other window (xterm)
<anarsoul> you'll see that gears spin slower
<anarsoul> it looks like it couldn't render it in time and tries to catch up :)
<yuq8251> looks like task from multi context is not interleaved
<yuq8251> as you are moving window then glxgears stop
<yuq8251> or the window manage does not update other window when moving a window
<anarsoul> yuq8251: maybe, but I get similar result with no apps and just a cursor
<anarsoul> however I'm not sure how many contexts are there
<yuq8251> you may add some print or debugfs in kernel driver to monitor
<yuq8251> btw, glamor does not call eglswapbuffer
<anarsoul> and what are the consequences?
<yuq8251> it uses glflush to update render target
<yuq8251> so if it does not call glclear, tile buffer gpu like lima have to reload the screen all the time when glflush
<anarsoul> I see
<anarsoul> and that's expensive
<yuq8251> yes
<yuq8251> I RE the mali blob before, it treat glflush case with continuously using GP PLBU buffer
<anarsoul> *sigh* looks like there's a lot to fix before we can use lima with X11
<yuq8251> and will overflow when many glflush
<yuq8251> so I use the reload method
<yuq8251> this is the revert commit
<yuq8251> I think it would be much better for composite window manager
<yuq8251> and wayland desktop
<anarsoul> hm
<anarsoul> I can try xcompmgr
<anarsoul> it doesn't help
<anarsoul> and moving cursor is enough to get glxgears to stutter
<yuq8251> xserver has a GL context, glxgears has one, composite WM has one
<anarsoul> I see
<anarsoul> yuq8251: can you reproduce the issue on your side?
<yuq8251> Oh, I can see now with glxgears running, even without WM
<anarsoul> btw, please review my pp cf branch when you have some time
<yuq8251> ok
<anarsoul> it doesn't regress in piglit, so at least it generates correct code
<yuq8251> that's nice
<yuq8251> have you tested some desktop?
<yuq8251> like xfce
<anarsoul> nope
<anarsoul> due to this latency issue
<anarsoul> anything in X11 isn't really usable for me
<yuq8251> I can see similar lag with weston, but much better
<anarsoul> yuq8251: it's gets a lot worse with something GPU-heavy
<anarsoul> I tried starting ioquake3 and lag in menu is tens of seconds
dddddd has quit [Remote host closed the connection]
mardestan has joined #lima
<mardestan> plaes: actually libv pointed out correctly that it isn't possible to deal with freedesktop guys, it's definitely allready enough of reason why some smaller branches of people would need to get along, i have no interest to fight with you cause i did allready declare that the med institution run was a huge scam. fdo evil people did get a sniff at it and continue to scam, not possible to deal with them.
<mardestan> And i did respond to my sister that i do not quite know what you are about, it seems like you have what it takes but something is still missing, it could be this bad fdo team for you too maybe which causes issues like this
<mardestan> because i remember talking with you when you did say something about wallace tree multipliers or whatever was it, it seemed like you did have a bit clue in what you do.
<mardestan> IF someone were to even offer to work with such collective by a company, cause they caused a lot of braindamage in every episode i have been telling them how to do things.
<mardestan> i would refuse working with such scammers in the same team, that is a big blow pranksters scammers and violators are not needed in such areas.
<mardestan> all the people who got irritated cause of FDO morans are tried to be reasoned and awarded to me, that i am the devil, yet those guys do not seem to understand what is two complement system, which is something that is described in every programming book, and i am pretty sure that it is even something that plaes knows about.
<mardestan> Yes so when you negate or subtract in twos complement system, this is going to be fast, cause it can ignore carries and hence generate only couple of gate delays.
<mardestan> it was not complex to reinvent all the logic behing it, cause that is just common sense entirely, but it is something that fdo people entirely lack.
<mardestan> and i responded to my sister, even though plaes seems to be doing pretty mad stuff when it comes to me, i baasically think he is bigger man then fdo scammers, I favor him more to understand how things work.
<mardestan> it is purely elementary school stuff, nothing complex about electronic circuits
<mardestan> libv: i am really sorry that this powershow demonstration was done on you, and largely your everyday life was screwd by scammers like this, my trohbles started a bit earlier i got pretty sick feeling too from fdo people pretty much all the time.
<mardestan> when i expressed my opinion that such an idiot like Dave Airlie should be put off, i got 17black hawk helicopters circulating above my apartement and bunch of deluded nasty cockblockers annoying me everywhere in new zealand.
<mardestan> those guys are idiots, i just do not seem to get how SAS people did not understand it early enough to have been allowing such demonstration to take places, and ruin the momementum of luc too in life.
<mardestan> it is a standard procedure that some authorities will dispatch a chopter when someone is attacked with cold weapon actually in british societies at least
<mardestan> now what i tell is most important here, i was assaulted by some fuckers who are ordered by people who airlied has be allied with
<mardestan> and in the backround those people are hated in some communities quite a lot, and known to be mad injustice type of terror scammers.
cwabbott has quit [Ping timeout: 245 seconds]
<mardestan> her operand modulis in my country was to go humiliate people like me with her sex comments, where each and every one of those occasions shortly after some fuckers assaulted me and i was framed, i also said that this was the case and warned others, that one nutter like that seems to fill her days with such activity
<mardestan> I have three of such women terrorists, my head explodes when i need to think about, like wtf. is wrong with them, which way it works, do men manipulate them or vice versa.
<mardestan> however the result has been huge row of assaults towards me always, untolerable humiliative comments etc.
<mardestan> In other words, those cunts have been entirely nuts, and even police knows that but some instances were influenced to take a decision against me instead.
<mardestan> Plaes and going on , on the road of Viktor Kingisepp betrayding estonians consistently...i try to memorise what happened to this guy, he was known to kill himself later on, yeah it seems actually so instead of kapo killing him off, it is because the new friends after killing bunch of estonians off were more annoying even
<mardestan> he just did not expect and evaluate how bad this is going to be to betray his people to substitute them with lot worse foreigners, once he had done this and saw airlied type of guys take over he just killed himself
mardestan has quit [Remote host closed the connection]
raimo has joined #lima
raimo has quit [Client Quit]
_whitelogger has joined #lima
yuq8251 has quit [Remote host closed the connection]
mardestan has joined #lima
<mardestan> I mentioned a little that i worked on the theory of different standards of and ontop of jtag stuff, those documents are allready quite large, it is with my paranoias present about people obsess compulsively banning me a biggest complexitiy to start with
<mardestan> it is possible to also target the in-order flip-flops of issue modules when the queues are not present which is minor set of in-order cpus which are designed not have such queues
<mardestan> frankly this might be too much due to me being victimized to handle that my own, i would want to do it, but it is how it is
<mardestan> those kinds of cores are very cheap, so it would be beneficial to pimp them up and not investing a lot of money for the hw hence, but even when finally tracing those buffers is carried out and ready for filling in those buffers from caches
<mardestan> even then there are some security issues probably or safety issues to do like that on mainstream or update undergoing systems
<mardestan> I know the specification very well and understand it, but since i've been falling ill due to perverse activity of the channel mods, i i have so much anger in me, that i can barely recognize myself as human being, which of course has been part of the plan for those terror scammers.
<mardestan> paranoia says: DO not do those jobs, so they can put you off with their critics.
<mardestan> So the easiest to talk with is libv, who does not have much time, we have some mutual understanding which would turn out to be useful for this type of projects, cause scientists say wrong bits on such debug pipeline may damage the hardware even user should not be given 100percent warranty that it will also be entirely safe
<mardestan> not to mention the people who program the chip in such way, they must be aware that some mistake can be fatal to the health of the chip on debug pipeline issue module fills with wrong bits
<mardestan> so hence the ATSP standard also says allready in it's name read as -- advanced test pattern program generator, in other words patterns that you send to the logic must be something safe , however yeah they can be traced in a way when they match then they will be safe too
<mardestan> so other probably understand that there is no evaluating pipeline like in decode stage in issue debug stage, any bits you send will be accepted and tried to be executed
<mardestan> if the debug pipeline on ARM in-order processors shifts in those regs enough fast like with adaptive async mode as sw pipelining method would do
<mardestan> then of course such method is incredibly fast on low-end hw too, prolly some millions of times faster
<mardestan> I assume one of such method is BYPASS reg filled in with BSR content instead and shifted in DR-SHIFT to TDI
jrmuizel has joined #lima
mardestan has left #lima ["Leaving"]
cwabbott has joined #lima
jrmuizel has quit [Remote host closed the connection]
dddddd has joined #lima
jrmuizel has joined #lima
<enunes> anarsoul: hey, sure we can rework the register selection for spilling, do you have some ideas?
<enunes> I'm more worried first to fix the infinite loop case you hit, maybe I should pick your branch and remove that attempts implementation
<enunes> to try to reproduce it and propose an improvement in marking registers unspillable
<enunes> and then we can also improve the register selection algorithm
<enunes> with shaderdb that is much easier now
ninolein_ has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
<anarsoul> enunes: yeah, I have one idea
<anarsoul> enunes: we can do two passes: 1st pass: calculate maximum register pressure, 2nd pass: choose one register that is in block where max reg pressure is reached
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
<enunes> anarsoul: I have to look at how we can calculate this, but seems better than what we have
<enunes> I noticed that the mesa ralloc has "ra_get_best_spill_node"
jrmuizel has quit [Read error: Connection reset by peer]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
<anarsoul> enunes: yeah, maybe it's better to use it
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
<enunes> anarsoul: quick attempt to use it seems to have marginal gain and inconclusive results, https://gitlab.freedesktop.org/snippets/652/raw
<enunes> slightly better in the spills though, maybe it is worth it
<anarsoul> enunes: try glmark2 -b ideas
<enunes> with your branch?
<anarsoul> yes
<enunes> anarsoul: ideas works, resolves spilling with exactly 10 attempts
<enunes> without this change it just aborted as took >10
<enunes> shadow renders a bit strange in ideas
<enunes> ah wait, no, it also aborted the compilation, but still worked?
<enunes> let me start over
jrmuizel has joined #lima
<anarsoul> enunes: yeah, it's weird, it aborts compilation but continues to work
<anarsoul> however rendering is incorrect
<enunes> anarsoul: well yeah I can reproduce the infinite loop with the current master, with this local change switching to ra_get_best_spill_node it doesn't resolve spilling either, but regalloc runs out of registers and correctly aborts quickly
matv1 has joined #lima
<anarsoul> enunes: OK, so can you send an MR with your local change to my branch?
<anarsoul> however I'm not sure if it's possible...
<anarsoul> if you just point me to your branch I can just pull this change
<enunes> anarsoul: I will do that, first I will try to see what it is doing to see if we can optimize it in some way
<anarsoul> I think vectorization should help ideas
<enunes> anarsoul: still fails regalloc even with vectorize
<anarsoul> :(
<anarsoul> well, then we'll have to look into it later
<anarsoul> blob compiles it just fine, so it should be possible
<anarsoul> enunes: does vectorize help if you use vector select? (just fake it for now)
<enunes> lets see
<enunes> still regalloc fail
<anarsoul> :(
<enunes> seems that many registers should still be spillable, I'm wondering why it gives up
<anarsoul> enunes: it should also help if we fuse branch condition into branch
<anarsoul> I'll play with it after cf branch merges
<anarsoul> it shouldn't be too difficult
<anarsoul> enunes: what are we missing in ppir besides cf?
<anarsoul> I think all the other sampler types
<anarsoul> and that's probably it?
<enunes> then bugfixes I guess
<anarsoul> and optimizations
<anarsoul> also X11 is not really usable, glamor works but we have some issue with job queue
<anarsoul> glxgears freezes when I move another window and then tries to catch up
<enunes> anarsoul: I saw the discussion... yeah that seems hard to debug
<enunes> is this a build without debugs?
<anarsoul> yes
<enunes> job queue you mean the drm sched one?
<anarsoul> I guess you can reproduce it since you're using pine64
<anarsoul> I'm not sure how it's implemented
<enunes> hmm apparently mesa ralloc is marking many nodes as "in_stack" and they are not spilling candidates, need to figure out what that means
<anarsoul> enunes: I doubt there's a bug in it
<anarsoul> it's used by vc4, v3d and i965
<enunes> anarsoul: yeah I'm sure it's not a bug in it, I wonder if we should set something different so that it doesn't do that, or just what it means
<anarsoul> enunes: there's an explanation what in_stack means in register_allocate.c
<anarsoul> see comment at the top of file
<enunes> sure I read that, still not clear to me why it stays set after the algorithm executes and why it is a condition to select the best spillable node
<anarsoul> enunes: anyway, don't spend too much time on it, fusing condition into branch will save one reg for each branch
<anarsoul> N regs for nested branches :)
<enunes> anarsoul: yeah there is an explanation for that stuff in the commit logs, I don't think we can do anything about it
<enunes> especially if branching takes registers away, maybe it is indeed unresolvable
<enunes> I suppose I will submit a MR to switch to ra_get_best_spill_node anyway since it solves the infinite loop issue
<enunes> and it seems that this is what everyone else uses
<anarsoul> enunes: just point me to the branch and I'll cherry pick the commit
<enunes> anarsoul: I guess i can submit it anyway and we can possibly merge it anyway before cf gets merged?
<enunes> not sure if you already intend to merge the current cf iteration
<anarsoul> enunes: I do, waiting for some review :)
<anarsoul> it causes not regression in piglit and fixes 41 test
<anarsoul> s/not/no
<anarsoul> also we can actually run X11 now
<enunes> mostly out of curiosity, why do we need ppir_op_dummy ?
<enunes> also I would appreciate some more verbose commit messages for this as it's +616 -248 lines :)
<anarsoul> I'll try to add more to commit message, but there's nothing interesting in implementation
<anarsoul> enunes: ppir_op_dummy is used for placeholder for ppir_dest which is reg
matv1 has quit [Quit: Leaving]
<enunes> it gets removed eventually?
<anarsoul> basically we can get nir where register is read before it's assigned, it's totally fine, but compiler expects non-NULL value in comp->var_nodes
<anarsoul> it's just ignored
<enunes> this is the nir undef value?
<anarsoul> no
<anarsoul> it's not undef
<anarsoul> basically we can have something like: loop { r1 = r2; if (somecond) break; r2 = someothervalue }
<anarsoul> it's a read from uninitialized register
<anarsoul> but it gets initialized on next iteration :)
<enunes> I see, and nir doesnt create that undef assignment for it in this case?
<anarsoul> no
<anarsoul> (and it makes no sense - it's redundant)
<anarsoul> it's not ssa
<anarsoul> it's a reg
<anarsoul> it can be assigned multiple times
<enunes> hmm so thats the difference then, its not ssa
<anarsoul> enunes: I think we should assign different spill cost for regs with different number of components
<anarsoul> IIRC we're using vec4 temporary regardless of number of used components
<enunes> yes
<anarsoul> so it's beneficial to spill regs with more components
<enunes> ok, I can try that
<anarsoul|c> Even if we stored floats as floats it's more beneficial to spill vec4 regs
<enunes> anarsoul: hah, very nice https://gitlab.freedesktop.org/snippets/653
<enunes> anything else we might want to favour, some type of instruction maybe?
<enunes> anarsoul: btw, this reminds me: not duplicating the use of uniforms was also something that greatly affected register pressure
<enunes> we might want to do that again, I think I recall even the blob does it
<enunes> right now one uniform used by the entire program basically takes away 1 register which will likely be spilled anyway, so we don't really save memory accesses by not doing that