#lima on 2019-09-07 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:05 drod has joined #lima

00:17 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

00:29 Da_Coynul has joined #lima

00:34 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

00:35 Da_Coynul has joined #lima

00:42 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

00:45 <anarsoul> rellla: enunes: I did some profiling of glmark2-es2 running in weston and guess where it spends most of the time?

00:45 <anarsoul> :)

00:45 <anarsoul> hint: we don't have BO cache yet

00:48 <anarsoul> anyway, I attached flamegraph to the issue: https://gitlab.freedesktop.org/lima/mesa/issues/110

00:58 <anarsoul> I'll work on it tomorrow

01:19 Da_Coynul has joined #lima

01:36 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

02:17 anarsoul|c has quit [Quit: Connection closed for inactivity]

02:22 drod has quit [Remote host closed the connection]

03:14 _whitelogger has joined #lima

03:30 dddddd has quit [Remote host closed the connection]

03:47 _whitelogger has joined #lima

04:10 <anarsoul> I implemented naive bo cache, it's racy (so crashes sometimes), but it doesn't solve lagginess :(

04:26 _whitelogger has joined #lima

05:08 _whitelogger has joined #lima

08:06 Elpaulo has quit [Quit: Elpaulo]

08:07 Elpaulo has joined #lima

08:17 dddddd has joined #lima

09:16 zombah has quit [Quit: leaving]

09:21 zombah has joined #lima

10:18 hoijui has joined #lima

10:36 hoijui has quit [Quit: Leaving]

10:51 Da_Coynul has joined #lima

10:55 Da_Coynul has quit [Client Quit]

10:55 hoijui has joined #lima

11:03 jernej has quit [Ping timeout: 252 seconds]

11:11 jernej has joined #lima

11:15 Da_Coynul has joined #lima

11:24 mardestan has joined #lima

11:25 <mardestan> so linking the binary blobs in either shared contexts or without seems to be possible on opengl es binry drivers and prolly with lima too right.

11:26 <mardestan> but what is the more crucial part is command buffer stream compression, the problem was raised by one yankee, who is a friendly guy with me, he stated that no matter what you do how fast in the shader

11:26 <mardestan> the command processor would start to bottleneck, which is even partly correct

11:27 <mardestan> to be able to change that i promptly developed one twos complement based uniform distribution of the individudal commands to the storage buffers like cache slots

11:28 <mardestan> but in that cheme since perhaps the command stream pipes do not have ALUs right, it needs a shared accessible storage communicated or filled in with shaders.

11:29 <mardestan> so basically shaders are workers who queue commands to consumers so the consumer is a command processor

11:29 anarsoul|c has joined #lima

11:30 <mardestan> to implement this, i need to sniff out some details on mali GPUs and other, since there may not me many extensions in vulkan or such to deal with this

11:31 <mardestan> depending wether khronos has delt with such issues before, i have not looked and i would not know, if someone does then be brave to speak up.

11:32 <mardestan> so far i have looked that for instance AMD and NVIDIA gpus are supported almost transparently in this method, as they allow to accessing command buffers via tlbs in hw for nvidia, and l2 for amd

11:32 <mardestan> i do not know much about mali gpus

11:34 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

11:36 Da_Coynul has joined #lima

11:37 Da_Coynul has quit [Client Quit]

11:38 Da_Coynul has joined #lima

11:42 <mardestan> this compression itself is rather easy type of elementary school material though based of two complement addition in electronics , works the best only on this type of integer format or floaing point too probably

11:43 <mardestan> ones complement does not work for instance, but of course there are piles of other compression methods, but i stress out the easiest is the mentioned.

11:52 <mardestan> I think i am going to first author the shaders and then i need to look at the mali command processor my own, the only mali gpu i have is the vintage low-end mp-400

11:54 <mardestan> thos shitheads here do not want to talk about real issues anyways, i need to recruite my own crew to deal with this stuff

11:55 <mardestan> it appears what listen on the streets was that my reputation was ruined by some girls who had too many fuckers, some porn rats started to talk about my addictions effectively ruining my life in the world, projecting their heorics in the field to me

11:55 hoijui has quit [Ping timeout: 245 seconds]

11:57 <mardestan> I mean really, who would voluenteer as ex-sportsman to swim and get awarded by someones sex heroics, and get his life entirely ruined that way, stalked by absolute intriguants daily basis and facing assaults by such outsider fuck experts.

12:00 <mardestan> a known sluts who i whitewash in any real event aside from theirs expertise which happens to be screwing with random quasimodos and yelling on streets and bragging about that, and humiliating native estonian towntown ex sporstmen

12:03 <mardestan> i allready spend six-times in the institution over this that i myself got humiliated and assaulted, and i express my opinion about the situation that people who organized this can't be bigger trashish outsiders than they are possibly

12:03 <mardestan> their resilliance in life even though i got all the charges are perpendicular to mickey mouse

12:06 <mardestan> in particular the suffering does not seem to end to me, 25 years and still going, lots of injuries got and body is almost ruined that way, i can not see leftovers to compete with me still

12:07 <mardestan> and it's pointless to talk to me about this, since i have clearly expressed that all those years i fell as victim into other problems over a complete fraud.

12:08 kaspter has joined #lima

12:10 hoijui has joined #lima

12:12 <mardestan> that is how things are, when you grow up with a potential to change something and acheive, you start get paranoid blockings, hence when i describe what happened everything falls into balancing theories, i was not given much chanche to benefit from my good genetics

12:12 hoijui has quit [Remote host closed the connection]

12:12 hoijui has joined #lima

12:14 <mardestan> the bigger men who started to bloodily fight against their terror which happened oftenly and daily basis for the similar reasons , they died pretty fast anyways

12:18 <mardestan> so theoretical possibilities can be rounded to, you accept the terror and underdevelop every day, getting weaker 100folds what you were suppose to be, and you get live, due to jelousy against you by suicidal leftovers, or you fight back and pretty much die anyways on the run

12:19 <mardestan> i chose to suffer and live, cause i saw those who behaved the other way when i was young and they were talents, they were killed

12:24 <mardestan> by the way that is sad, that i can not never ever take their pain away, and have to act as ground to all their shit basically every day, but that is the world we live in, healthy people suffer all the time

12:28 <mardestan> Take is hamlets dilemma or prisoners dilemma mixed, last states that innocent and good people just will have same odds with any other cruel violating born leftoverish outsider

12:29 <mardestan> and what the balance means good people score only violations at them, so during the lifetime most things they gather from others is --- minuses

12:30 <mardestan> this is why the odds are the same, leftovers are fed through this way with violations, this is the total sum of the balance

12:31 <mardestan> you grow with resources more +++ then minuses they add the same amount of minuses at you in any event mostly

12:32 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

12:33 Da_Coynul has joined #lima

12:39 drod has joined #lima

12:50 raimo has joined #lima

12:50 mardestan has quit [Read error: Connection reset by peer]

12:53 raimo has quit [Read error: Connection reset by peer]

12:54 drod has quit [Remote host closed the connection]

12:55 raimo has joined #lima

12:57 raimo has quit [Read error: Connection reset by peer]

12:58 mardestan has joined #lima

13:00 raimo has joined #lima

13:00 mardestan has quit [Read error: Connection reset by peer]

13:03 raimo has quit [Read error: Connection reset by peer]

13:05 mardestan has joined #lima

13:07 <mardestan> and hardware can not be designed much better than it is allready, I've been given time to conclude something my own, i think for karolherbst a little time yet needs to be given, for him to understand that GPUs are actually all models are out-of-order execution chips, but in-order pipeline chips per instructions, but execution is out-of-order in between the instructions

13:08 <mardestan> i am also not sure why those vendors play with wordings the way they do

13:08 <mardestan> it can however confuse people, you gotta access a brain silently to find out the logics of truth

13:10 <mardestan> also VLIW was never a mistake, cause actually infact that is the sanest thing along with CGRA how to do things

13:10 hoijui has quit [Ping timeout: 268 seconds]

13:13 <mardestan> the best thing how pipelined processors work is described ontop of berkley dlx arch on a webpage that i have several times referred to allready

13:14 <mardestan> it is the base arch of the most famous os book also, Patterson David and this crew, right.

13:15 <mardestan> on CPU a pipeline works such that there are pipeline stages and pipelines per length of the pipeline, if you manage to understand it

13:15 <mardestan> so compared to non-pipelined processors one stage is pipeline length of non-pipelined full length/pipelined stages

13:16 <mardestan> in other words, short cycles on pipelined processors and

13:16 <mardestan> long cycles for non-pipelined processor

13:16 <mardestan> every new pipeline is lock-step started after one stage finishes

13:20 <mardestan> on VLIW a slight different scheme is used, it is a dual wave scheduler, per ALU bundle

13:22 <mardestan> so when one instruction in the bundle can not be scheduled this one stalls in the arbiter, and all but this one are scheduled

13:22 <mardestan> pipeline is started from even wave, then odd wave, then again even, then again odd etc.

13:25 <mardestan> the win on VLIW comes in a way that they can put more ALUs on the chip, which work entirely in parallel

13:25 <mardestan> this is due to the scheduling being cheaper, and no scoreboard, cause bypass networks are also cheaper

13:25 <mardestan> that lowers power and also consumes less resources

13:33 <mardestan> realworld tech cypress thread and some amd devgurus forums links and toms hardware pages all describe it enough detailistically how VLIW works

13:34 <mardestan> pipeline is at even warp executing, then odd warp is reading back, and all the things are restarted when something can not be scheduled it is parked or stalling and coming back after full pipeline start

13:34 <mardestan> back at

13:39 anarsoul|c has quit [Quit: Connection closed for inactivity]

13:41 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

13:47 <mardestan> is there something you yet do not understand, one wave just starts another wave continuously without blocking and stalls on dependency stalling the whole subsequent instruction slots

13:47 <mardestan> i.e the rest of the bundle

13:48 <mardestan> all vliw bundles r600 for instance have 16 of them, can work in parallel per CU, on mali exactly the same

13:48 <mardestan> but the vliw bundle seems to be termed a core there

13:49 <mardestan> compiler determines vliw dependencies and asynchronous or non-dependent instructions good guys have many terms :)

13:50 <mardestan> are put first by reordering them on cpu

13:50 Da_Coynul has joined #lima

13:50 <mardestan> on SIMD compiler architects do not have to do that, cause the chip replays everything

13:51 <mardestan> then karol also was dealing with SIMT/SIMD differences

13:52 <mardestan> SIMT is extension to SIMD

13:52 <mardestan> so cores are SIMT on vliw, and lanes are SIMD

13:53 <mardestan> or vice versa actually, sorry

13:53 <mardestan> and on true SIMD, SIMD units are SIMD on GCN and lanes of 64 are SIMT

13:55 <mardestan> vector length is SIMD lane width SIMT or vice versa, it does not absolutely matter how you call them

14:04 <mardestan> hence i told you are not going to need a full compiler reordering to run an elbrus chip, all you need is a x86 thin checksum for the binaries, that reorders registers, if you ever wanted to run many instructions at time from the queues

14:06 <mardestan> so elbrus CPU which seems to be father for VLIW

14:06 <mardestan> dunno those newer ones have word of 32instructions probably

14:06 <mardestan> if they were not any way dependent on each other theyd be scheduled in parallel for execution

14:07 <mardestan> it is my favorite chip so to speak , very easy to program

14:43 Marex has joined #lima

14:43 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

14:46 hoijui has joined #lima

14:48 Da_Coynul has joined #lima

15:09 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

15:11 hoijui has quit [Ping timeout: 244 seconds]

15:21 Da_Coynul has joined #lima

15:56 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

16:05 anarsoul|c has joined #lima

16:28 Da_Coynul has joined #lima

16:58 <anarsoul> enunes: do you understand how drm scheduler works?

17:01 drod has joined #lima

17:03 <enunes> anarsoul: not much honestly, but I spent some time staring at it a while ago while working on a memory leak bug related to it

17:04 <anarsoul> enunes: it looks like it doesn't provide equal opportunity for different tasks to run

17:04 <anarsoul> e.g. glmark2 running in weston gets a lot more gpu time than weston itself

17:05 <enunes> is weston submitting any rendering?

17:06 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

17:12 <anarsoul> I assume so

17:12 <mardestan> It has been described on mailing list, that r300 chips can not do dest register redirections or indirections, again not sure what mali and freedreno yet allow there.

17:12 <anarsoul> why wouldn't it?

17:12 <anarsoul> it has to render glmark surface

17:14 hoijui has joined #lima

17:14 <anarsoul> even cursor stutters in weston, so I assume it waits for some job to complete

17:18 <enunes> I enabled a 500ms timeout in drm_sched for lima, assuming it works I think you would be getting errors if some job was stuck for longer than that

17:18 <enunes> so I honestly don't know how it all works togethet, but is it possible that if nothing is moving in weston, just glmark renders and weston composites it without having to render anything itself?

17:18 mardestan has quit [Quit: Leaving]

17:20 <anarsoul> enunes: if some job was working for longer than that

17:22 <enunes> hmm ok that would make sense too

17:22 <anarsoul> but looks like it can sit in queue for seconds

17:23 <anarsoul> making X or wayland unusable

17:25 <enunes> so in the cases where wayland doesn't execute a job for seconds, the entire screen is frozen for that duration?

17:25 <anarsoul> yes

17:25 <anarsoul> (try it)

17:26 <anarsoul> but glmark is working like crazy rendering something into memory

17:26 <anarsoul> that's never displayed

17:27 <enunes> I'm looking into those other reports now

17:27 <anarsoul> I have strong suspicion that they all have the same root cause

17:27 <enunes> the most interesting thing I got is:

17:28 <enunes> [ 121.716110] Fence drm_sched:gp:e:3ed released with pending signals!

17:28 <enunes> but only once

17:28 <enunes> sounds pretty bad

17:28 <anarsoul> there's a race somewhere in kernel driver

17:28 <enunes> yeah it's where I'm looking now

17:29 <anarsoul> and that's something related to multiple contexts

17:29 <enunes> strange that it doesn't reproduce with e.g. parallel piglit with glx or egl_x11 platform

17:30 <enunes> but it's some missing reference counting in lima bo allocation/freeing

17:30 <anarsoul> enunes: I thought that piglit doesn't run parallel jobs

17:30 <enunes> it does by default unless you use --no-concurrency

17:31 <enunes> I used to run with that but dropped it a while ago, works well

17:31 <enunes> so the race condition seems to require some pattern that doesn't happen with that

17:31 <enunes> but glmark2 on Xorg reproduces it all the time

17:34 <anarsoul> enunes: sounds like something's wrong with dependencies

17:35 <anarsoul> weston job depends on glmark2 bo

17:37 <anarsoul> yeah, and when glmark2 terminates weston *sometimes* gets ppmmu fault

17:37 <anarsoul> which probably means that bo created by glmark2 is gone

17:37 <anarsoul> but weston still tries to use it

17:38 <enunes> if you add some prints to lima_ctx_buff_alloc it doesn't reproduce

17:39 <enunes> because somewhere else it will have enough time to count the references correctly

17:40 jernej has quit [Ping timeout: 264 seconds]

17:42 <anarsoul> enunes: ppmmu fault can't come from ctx_buff_alloc

17:44 <enunes> I know, but the ppmmu fault happens right after some failure in the allocation path at lima_bo_create

17:48 <anarsoul> not really

17:49 <anarsoul> you can't get a VA for non-existent BO

17:49 adjtm_ has quit [Ping timeout: 245 seconds]

17:49 <enunes> I always get this before the fault:

17:49 <enunes> [ 48.276222] glmark2-es2: page allocation failure: order:0, mode:0x4(GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0

17:49 <anarsoul> it'll crash right away

17:49 <enunes> then the backtrace and

17:49 <enunes> [ 48.610830] lima 1c40000.gpu: mmu page fault at 0x22f6cbc0 from bus id 0 of type read on ppmmu0

17:51 <anarsoul> hm

17:52 <anarsoul> OK, I'll continue working on BO cache

17:52 <anarsoul> in theory it should fix some of allocation failures

17:57 hoijui has quit [Ping timeout: 268 seconds]

17:58 adjtm has joined #lima

18:21 <anarsoul> enunes: hm, looks like my BO cache fixed stutter for me :\

18:21 <anarsoul> wanna try it?

18:22 <anarsoul> at least for -b build and -b shadow, -b texture still stutters a lot

18:25 <anarsoul> as well as X11

18:29 <anarsoul> enunes: see https://gitlab.freedesktop.org/mesa/mesa/merge_requests/1903

18:29 <anarsoul> it essentially serializes pipeline, maybe that's why it fixes the issue

18:35 <anarsoul> yet I don't understand why it doesn't help with -b textures

18:35 <anarsoul> flamegraph doesn't show anything suspicious

18:40 <anarsoul> enunes: since serialization fixes it, it's definitely something with job dependencies

18:41 <anarsoul> well, partially fixes it

19:28 kaspter has quit [Quit: kaspter]

19:49 Da_Coynul has joined #lima

20:18 jernej has joined #lima

20:19 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

20:32 Da_Coynul has joined #lima

20:58 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

20:59 Da_Coynul has joined #lima

21:03 Da_Coynul has quit [Client Quit]

21:07 Da_Coynul has joined #lima

21:40 <marex-cloud> anarsoul: heh, if you want nasty parallel test, build qtwebengine and open blossom webgl demo

21:40 <marex-cloud> anarsoul: that really messed etnaviv up

21:40 StarfishPrime__ has joined #lima

21:45 <anarsoul> weston + 'glmark2-es2-wayland -b textures' blows it up for now

21:48 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

21:49 <anarsoul> it doesn't crash, but weston stutters a lot

22:03 dri-logger has quit [Ping timeout: 244 seconds]

22:03 Da_Coynul has joined #lima

22:04 glisse has quit [Ping timeout: 245 seconds]

22:16 dri-logger has joined #lima

22:20 <marex-cloud> anarsoul: including out of order rendered frames?

22:21 dri-logger has quit [Ping timeout: 268 seconds]

22:23 dri-logger has joined #lima

22:29 _whitelogger has joined #lima

22:35 dri-logger has quit [Ping timeout: 252 seconds]

22:41 glisse has joined #lima

22:43 dri-logger has joined #lima

22:45 Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]

22:48 ltucker has joined #lima

22:49 ltucker has quit [Remote host closed the connection]

23:04 dri-logger has quit [Ping timeout: 240 seconds]

23:04 glisse has quit [Ping timeout: 248 seconds]

23:12 dri-logger has joined #lima

23:19 glisse has joined #lima

23:46 glisse has quit [Remote host closed the connection]

23:48 dri-logger has quit [Ping timeout: 268 seconds]

23:54 glisse has joined #lima

23:56 dri-logger has joined #lima