ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
drod has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<anarsoul> rellla: enunes: I did some profiling of glmark2-es2 running in weston and guess where it spends most of the time?
<anarsoul> :)
<anarsoul> hint: we don't have BO cache yet
<anarsoul> anyway, I attached flamegraph to the issue: https://gitlab.freedesktop.org/lima/mesa/issues/110
<anarsoul> I'll work on it tomorrow
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
anarsoul|c has quit [Quit: Connection closed for inactivity]
drod has quit [Remote host closed the connection]
_whitelogger has joined #lima
dddddd has quit [Remote host closed the connection]
_whitelogger has joined #lima
<anarsoul> I implemented naive bo cache, it's racy (so crashes sometimes), but it doesn't solve lagginess :(
_whitelogger has joined #lima
_whitelogger has joined #lima
Elpaulo has quit [Quit: Elpaulo]
Elpaulo has joined #lima
dddddd has joined #lima
zombah has quit [Quit: leaving]
zombah has joined #lima
hoijui has joined #lima
hoijui has quit [Quit: Leaving]
Da_Coynul has joined #lima
Da_Coynul has quit [Client Quit]
hoijui has joined #lima
jernej has quit [Ping timeout: 252 seconds]
jernej has joined #lima
Da_Coynul has joined #lima
mardestan has joined #lima
<mardestan> so linking the binary blobs in either shared contexts or without seems to be possible on opengl es binry drivers and prolly with lima too right.
<mardestan> but what is the more crucial part is command buffer stream compression, the problem was raised by one yankee, who is a friendly guy with me, he stated that no matter what you do how fast in the shader
<mardestan> the command processor would start to bottleneck, which is even partly correct
<mardestan> to be able to change that i promptly developed one twos complement based uniform distribution of the individudal commands to the storage buffers like cache slots
<mardestan> but in that cheme since perhaps the command stream pipes do not have ALUs right, it needs a shared accessible storage communicated or filled in with shaders.
<mardestan> so basically shaders are workers who queue commands to consumers so the consumer is a command processor
anarsoul|c has joined #lima
<mardestan> to implement this, i need to sniff out some details on mali GPUs and other, since there may not me many extensions in vulkan or such to deal with this
<mardestan> depending wether khronos has delt with such issues before, i have not looked and i would not know, if someone does then be brave to speak up.
<mardestan> so far i have looked that for instance AMD and NVIDIA gpus are supported almost transparently in this method, as they allow to accessing command buffers via tlbs in hw for nvidia, and l2 for amd
<mardestan> i do not know much about mali gpus
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Client Quit]
Da_Coynul has joined #lima
<mardestan> this compression itself is rather easy type of elementary school material though based of two complement addition in electronics , works the best only on this type of integer format or floaing point too probably
<mardestan> ones complement does not work for instance, but of course there are piles of other compression methods, but i stress out the easiest is the mentioned.
<mardestan> I think i am going to first author the shaders and then i need to look at the mali command processor my own, the only mali gpu i have is the vintage low-end mp-400
<mardestan> thos shitheads here do not want to talk about real issues anyways, i need to recruite my own crew to deal with this stuff
<mardestan> it appears what listen on the streets was that my reputation was ruined by some girls who had too many fuckers, some porn rats started to talk about my addictions effectively ruining my life in the world, projecting their heorics in the field to me
hoijui has quit [Ping timeout: 245 seconds]
<mardestan> I mean really, who would voluenteer as ex-sportsman to swim and get awarded by someones sex heroics, and get his life entirely ruined that way, stalked by absolute intriguants daily basis and facing assaults by such outsider fuck experts.
<mardestan> a known sluts who i whitewash in any real event aside from theirs expertise which happens to be screwing with random quasimodos and yelling on streets and bragging about that, and humiliating native estonian towntown ex sporstmen
<mardestan> i allready spend six-times in the institution over this that i myself got humiliated and assaulted, and i express my opinion about the situation that people who organized this can't be bigger trashish outsiders than they are possibly
<mardestan> their resilliance in life even though i got all the charges are perpendicular to mickey mouse
<mardestan> in particular the suffering does not seem to end to me, 25 years and still going, lots of injuries got and body is almost ruined that way, i can not see leftovers to compete with me still
<mardestan> and it's pointless to talk to me about this, since i have clearly expressed that all those years i fell as victim into other problems over a complete fraud.
kaspter has joined #lima
hoijui has joined #lima
<mardestan> that is how things are, when you grow up with a potential to change something and acheive, you start get paranoid blockings, hence when i describe what happened everything falls into balancing theories, i was not given much chanche to benefit from my good genetics
hoijui has quit [Remote host closed the connection]
hoijui has joined #lima
<mardestan> the bigger men who started to bloodily fight against their terror which happened oftenly and daily basis for the similar reasons , they died pretty fast anyways
<mardestan> so theoretical possibilities can be rounded to, you accept the terror and underdevelop every day, getting weaker 100folds what you were suppose to be, and you get live, due to jelousy against you by suicidal leftovers, or you fight back and pretty much die anyways on the run
<mardestan> i chose to suffer and live, cause i saw those who behaved the other way when i was young and they were talents, they were killed
<mardestan> by the way that is sad, that i can not never ever take their pain away, and have to act as ground to all their shit basically every day, but that is the world we live in, healthy people suffer all the time
<mardestan> Take is hamlets dilemma or prisoners dilemma mixed, last states that innocent and good people just will have same odds with any other cruel violating born leftoverish outsider
<mardestan> and what the balance means good people score only violations at them, so during the lifetime most things they gather from others is --- minuses
<mardestan> this is why the odds are the same, leftovers are fed through this way with violations, this is the total sum of the balance
<mardestan> you grow with resources more +++ then minuses they add the same amount of minuses at you in any event mostly
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
drod has joined #lima
raimo has joined #lima
mardestan has quit [Read error: Connection reset by peer]
raimo has quit [Read error: Connection reset by peer]
drod has quit [Remote host closed the connection]
raimo has joined #lima
raimo has quit [Read error: Connection reset by peer]
mardestan has joined #lima
raimo has joined #lima
mardestan has quit [Read error: Connection reset by peer]
raimo has quit [Read error: Connection reset by peer]
mardestan has joined #lima
<mardestan> and hardware can not be designed much better than it is allready, I've been given time to conclude something my own, i think for karolherbst a little time yet needs to be given, for him to understand that GPUs are actually all models are out-of-order execution chips, but in-order pipeline chips per instructions, but execution is out-of-order in between the instructions
<mardestan> i am also not sure why those vendors play with wordings the way they do
<mardestan> it can however confuse people, you gotta access a brain silently to find out the logics of truth
<mardestan> also VLIW was never a mistake, cause actually infact that is the sanest thing along with CGRA how to do things
hoijui has quit [Ping timeout: 268 seconds]
<mardestan> the best thing how pipelined processors work is described ontop of berkley dlx arch on a webpage that i have several times referred to allready
<mardestan> it is the base arch of the most famous os book also, Patterson David and this crew, right.
<mardestan> on CPU a pipeline works such that there are pipeline stages and pipelines per length of the pipeline, if you manage to understand it
<mardestan> so compared to non-pipelined processors one stage is pipeline length of non-pipelined full length/pipelined stages
<mardestan> in other words, short cycles on pipelined processors and
<mardestan> long cycles for non-pipelined processor
<mardestan> every new pipeline is lock-step started after one stage finishes
<mardestan> on VLIW a slight different scheme is used, it is a dual wave scheduler, per ALU bundle
<mardestan> so when one instruction in the bundle can not be scheduled this one stalls in the arbiter, and all but this one are scheduled
<mardestan> pipeline is started from even wave, then odd wave, then again even, then again odd etc.
<mardestan> the win on VLIW comes in a way that they can put more ALUs on the chip, which work entirely in parallel
<mardestan> this is due to the scheduling being cheaper, and no scoreboard, cause bypass networks are also cheaper
<mardestan> that lowers power and also consumes less resources
<mardestan> realworld tech cypress thread and some amd devgurus forums links and toms hardware pages all describe it enough detailistically how VLIW works
<mardestan> pipeline is at even warp executing, then odd warp is reading back, and all the things are restarted when something can not be scheduled it is parked or stalling and coming back after full pipeline start
<mardestan> back at
anarsoul|c has quit [Quit: Connection closed for inactivity]
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<mardestan> is there something you yet do not understand, one wave just starts another wave continuously without blocking and stalls on dependency stalling the whole subsequent instruction slots
<mardestan> i.e the rest of the bundle
<mardestan> all vliw bundles r600 for instance have 16 of them, can work in parallel per CU, on mali exactly the same
<mardestan> but the vliw bundle seems to be termed a core there
<mardestan> compiler determines vliw dependencies and asynchronous or non-dependent instructions good guys have many terms :)
<mardestan> are put first by reordering them on cpu
Da_Coynul has joined #lima
<mardestan> on SIMD compiler architects do not have to do that, cause the chip replays everything
<mardestan> then karol also was dealing with SIMT/SIMD differences
<mardestan> SIMT is extension to SIMD
<mardestan> so cores are SIMT on vliw, and lanes are SIMD
<mardestan> or vice versa actually, sorry
<mardestan> and on true SIMD, SIMD units are SIMD on GCN and lanes of 64 are SIMT
<mardestan> vector length is SIMD lane width SIMT or vice versa, it does not absolutely matter how you call them
<mardestan> hence i told you are not going to need a full compiler reordering to run an elbrus chip, all you need is a x86 thin checksum for the binaries, that reorders registers, if you ever wanted to run many instructions at time from the queues
<mardestan> so elbrus CPU which seems to be father for VLIW
<mardestan> dunno those newer ones have word of 32instructions probably
<mardestan> if they were not any way dependent on each other theyd be scheduled in parallel for execution
<mardestan> it is my favorite chip so to speak , very easy to program
Marex has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
hoijui has joined #lima
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
hoijui has quit [Ping timeout: 244 seconds]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
anarsoul|c has joined #lima
Da_Coynul has joined #lima
<anarsoul> enunes: do you understand how drm scheduler works?
drod has joined #lima
<enunes> anarsoul: not much honestly, but I spent some time staring at it a while ago while working on a memory leak bug related to it
<anarsoul> enunes: it looks like it doesn't provide equal opportunity for different tasks to run
<anarsoul> e.g. glmark2 running in weston gets a lot more gpu time than weston itself
<enunes> is weston submitting any rendering?
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<anarsoul> I assume so
<mardestan> It has been described on mailing list, that r300 chips can not do dest register redirections or indirections, again not sure what mali and freedreno yet allow there.
<anarsoul> why wouldn't it?
<anarsoul> it has to render glmark surface
hoijui has joined #lima
<anarsoul> even cursor stutters in weston, so I assume it waits for some job to complete
<enunes> I enabled a 500ms timeout in drm_sched for lima, assuming it works I think you would be getting errors if some job was stuck for longer than that
<enunes> so I honestly don't know how it all works togethet, but is it possible that if nothing is moving in weston, just glmark renders and weston composites it without having to render anything itself?
mardestan has quit [Quit: Leaving]
<anarsoul> enunes: if some job was working for longer than that
<enunes> hmm ok that would make sense too
<anarsoul> but looks like it can sit in queue for seconds
<anarsoul> making X or wayland unusable
<enunes> so in the cases where wayland doesn't execute a job for seconds, the entire screen is frozen for that duration?
<anarsoul> yes
<anarsoul> (try it)
<anarsoul> but glmark is working like crazy rendering something into memory
<anarsoul> that's never displayed
<enunes> I'm looking into those other reports now
<anarsoul> I have strong suspicion that they all have the same root cause
<enunes> the most interesting thing I got is:
<enunes> [ 121.716110] Fence drm_sched:gp:e:3ed released with pending signals!
<enunes> but only once
<enunes> sounds pretty bad
<anarsoul> there's a race somewhere in kernel driver
<enunes> yeah it's where I'm looking now
<anarsoul> and that's something related to multiple contexts
<enunes> strange that it doesn't reproduce with e.g. parallel piglit with glx or egl_x11 platform
<enunes> but it's some missing reference counting in lima bo allocation/freeing
<anarsoul> enunes: I thought that piglit doesn't run parallel jobs
<enunes> it does by default unless you use --no-concurrency
<enunes> I used to run with that but dropped it a while ago, works well
<enunes> so the race condition seems to require some pattern that doesn't happen with that
<enunes> but glmark2 on Xorg reproduces it all the time
<anarsoul> enunes: sounds like something's wrong with dependencies
<anarsoul> weston job depends on glmark2 bo
<anarsoul> yeah, and when glmark2 terminates weston *sometimes* gets ppmmu fault
<anarsoul> which probably means that bo created by glmark2 is gone
<anarsoul> but weston still tries to use it
<enunes> if you add some prints to lima_ctx_buff_alloc it doesn't reproduce
<enunes> because somewhere else it will have enough time to count the references correctly
jernej has quit [Ping timeout: 264 seconds]
<anarsoul> enunes: ppmmu fault can't come from ctx_buff_alloc
<enunes> I know, but the ppmmu fault happens right after some failure in the allocation path at lima_bo_create
<anarsoul> not really
<anarsoul> you can't get a VA for non-existent BO
adjtm_ has quit [Ping timeout: 245 seconds]
<enunes> I always get this before the fault:
<enunes> [ 48.276222] glmark2-es2: page allocation failure: order:0, mode:0x4(GFP_DMA32), nodemask=(null),cpuset=/,mems_allowed=0
<anarsoul> it'll crash right away
<enunes> then the backtrace and
<enunes> [ 48.610830] lima 1c40000.gpu: mmu page fault at 0x22f6cbc0 from bus id 0 of type read on ppmmu0
<anarsoul> hm
<anarsoul> OK, I'll continue working on BO cache
<anarsoul> in theory it should fix some of allocation failures
hoijui has quit [Ping timeout: 268 seconds]
adjtm has joined #lima
<anarsoul> enunes: hm, looks like my BO cache fixed stutter for me :\
<anarsoul> wanna try it?
<anarsoul> at least for -b build and -b shadow, -b texture still stutters a lot
<anarsoul> as well as X11
<anarsoul> it essentially serializes pipeline, maybe that's why it fixes the issue
<anarsoul> yet I don't understand why it doesn't help with -b textures
<anarsoul> flamegraph doesn't show anything suspicious
<anarsoul> enunes: since serialization fixes it, it's definitely something with job dependencies
<anarsoul> well, partially fixes it
kaspter has quit [Quit: kaspter]
Da_Coynul has joined #lima
jernej has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
Da_Coynul has joined #lima
Da_Coynul has quit [Client Quit]
Da_Coynul has joined #lima
<marex-cloud> anarsoul: heh, if you want nasty parallel test, build qtwebengine and open blossom webgl demo
<marex-cloud> anarsoul: that really messed etnaviv up
StarfishPrime__ has joined #lima
<anarsoul> weston + 'glmark2-es2-wayland -b textures' blows it up for now
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
<anarsoul> it doesn't crash, but weston stutters a lot
dri-logger has quit [Ping timeout: 244 seconds]
Da_Coynul has joined #lima
glisse has quit [Ping timeout: 245 seconds]
dri-logger has joined #lima
<marex-cloud> anarsoul: including out of order rendered frames?
dri-logger has quit [Ping timeout: 268 seconds]
dri-logger has joined #lima
_whitelogger has joined #lima
dri-logger has quit [Ping timeout: 252 seconds]
glisse has joined #lima
dri-logger has joined #lima
Da_Coynul has quit [Quit: My MacBook Air has gone to sleep. ZZZzzz…]
ltucker has joined #lima
ltucker has quit [Remote host closed the connection]
dri-logger has quit [Ping timeout: 240 seconds]
glisse has quit [Ping timeout: 248 seconds]
dri-logger has joined #lima
glisse has joined #lima
glisse has quit [Remote host closed the connection]
dri-logger has quit [Ping timeout: 268 seconds]
glisse has joined #lima
dri-logger has joined #lima