#lima on 2019-10-23 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:13 <Ntemis> how i use apitrace? any help?

00:22 <Ntemis> nvm

00:22 <Ntemis> apitrace trace --api [egl] /path/to/application [args...]

00:35 <anarsoul> yep

00:50 Danct12[m] has joined #lima

01:23 _whitelogger has joined #lima

02:00 kaspter has quit [Ping timeout: 265 seconds]

02:00 kaspter has joined #lima

02:11 camus has joined #lima

02:12 kaspter has quit [Read error: Connection reset by peer]

02:12 camus is now known as kaspter

02:42 camus has joined #lima

02:42 kaspter has quit [Ping timeout: 240 seconds]

02:42 camus is now known as kaspter

02:59 camus has joined #lima

03:00 kaspter has quit [Ping timeout: 240 seconds]

03:00 camus is now known as kaspter

03:01 Ntemis has quit [Read error: Connection reset by peer]

03:07 dddddd has quit [Remote host closed the connection]

03:33 camus has joined #lima

03:35 kaspter has quit [Ping timeout: 276 seconds]

03:35 camus is now known as kaspter

04:02 kaspter has quit [Ping timeout: 268 seconds]

04:04 kaspter has joined #lima

04:26 kaspter has quit [Ping timeout: 240 seconds]

04:27 kaspter has joined #lima

04:51 megi has quit [Ping timeout: 265 seconds]

05:37 kaspter has quit [Ping timeout: 240 seconds]

05:37 camus has joined #lima

05:40 camus is now known as kaspter

05:45 ecloud_wfh is now known as ecloud

05:47 camus has joined #lima

05:48 kaspter has quit [Ping timeout: 240 seconds]

05:48 camus is now known as kaspter

05:51 mastermart has joined #lima

05:54 <mastermart> www3.uji.es/~figual/Tesis/tesis.pdf cuplapack, CUBLAS, libflame , supermatrix algorithms-by-blocks, you appear to not be capable of reading their out of order execution ideas, this pdf has everything needed to generate the deps of regs to be used for arbitary instructions

06:00 kaspter has quit [Ping timeout: 240 seconds]

06:00 kaspter has joined #lima

06:15 <mastermart> first you pin the queues with their methods, then you destroy the object and start to recursively run the logic pinned into queues, depending on how many rows you select from the selector, that is how many operations will run, if you select only one and block the others, it will run one writeback instruction for instance until you unblock the other blocks rows randombly the executed instructions will linearly grow with every block released

06:15 <mastermart> from scoreboard

07:01 UnivrslSuprBox has quit [Quit: ZNC 1.6.6+deb1ubuntu0.2 - http://znc.in]

07:02 UnivrslSuprBox has joined #lima

07:13 <mastermart> so 4x4 matrix has 4+3+3=10 instructions in schedule, when you want to use all 16 you need not square but rectangular 4x8 matrix, from 4x4 4 is diagonal and 3's are upper and lower triangulars

07:23 <mastermart> the instruction selector finds that you need to schedule 4th instruction in the matrix, this corresponds to row2 instruction1 syrk4, when 7th instruction needs to be scheduled this is tile3/row3 syrk7

07:24 <mastermart> so you can schedule seperately 1 each or, several at time, depending what loads graduate and which not

07:29 <mastermart> 5th instruction is tile2 inst2 in the table, then you need to schedule two instructions back to back, from the selector the 1st and the second, that will run gemm5

07:29 paulk-leonov has quit [Ping timeout: 240 seconds]

07:36 kaspter has quit [Quit: kaspter]

07:37 kaspter has joined #lima

07:38 <mastermart> so i was a little incosistent but for performance reasons, gemm8 tile2 inst3 comes from selector 1 and 3

07:38 <mastermart> inconcistent

07:44 <mastermart> sorry gemm6, only one is issued the last in different order then back to back two of them, this is going to be slower to schedule

07:47 <mastermart> so the selctor combinations are 1 alone, 1 and 2, 2 and 3 and finally 1 and 4 final one is the only one that schedules differently than back to back and that is also consuming more power and latency

07:53 paulk-leonov has joined #lima

08:16 camus has joined #lima

08:18 kaspter has quit [Ping timeout: 264 seconds]

08:18 camus is now known as kaspter

08:19 <mastermart> it'd be to confusing, but generally you want to substitute gemm6 and gemm6 from the table with eachother

08:19 <mastermart> to/too

08:20 <mastermart> cause this will end up being faster

08:24 <mastermart> hence the final algorithm is modified version of algorithms-by-blocks, very easy modification

08:26 <mastermart> you do not want to compute from a line 1 and 8 instructions, you want to compute 3and4 or 1 and 2 or 7 and8, cause if you go 1 to 7, the arbiter does additional work in hw, it wants to reschedule 2 3 4 5 6 and the power budget grows with latency, cause those will cause additional bitwise gate delays

08:29 <mastermart> I read many of those versions, i think however not entirely sure, that i allready saw this modification done somewhere

08:52 <mastermart> you'd consistently acheive 1TB/s of floating point issue band with fermi card, as outlied there, this is insane as you see, meaning insanely performant

09:02 <mastermart> On a GPU withNvidia Cublas, the sustained performance achieved by thegemmimple-mentation (373.4 GFLOPS) represents roughly a 40% of the peak performance ofthe architecture, that was for tesla

09:02 <mastermart> it is huge performance it will be doubled when the code is corrected a bit

09:17 <rellla> anarsoul: https://gitlab.freedesktop.org/rellla/mali-syscall-tracker/commits/streamdecode

09:18 <rellla> first version ...

09:32 megi has joined #lima

09:44 <mastermart> anyhow, i would also exchange syrk and the first gemm on the line, and things should be top notch, only for the pinned FU lines , not the procedure lines which always run

09:45 <mastermart> and change it to be dependent on the outcome, this will just make emulated writebacks from the FUs

09:45 <mastermart> https://inf.ethz.ch/personal/markusp/teaching/252-2600-ETH-fall11/slides/16-Humair.pdf I may do new slides, how that stuff works, i sent my laptops to repair

09:46 <mastermart> page 22 table is what i talked about.

09:47 <mastermart> those slides are well done, cause they are marked and big and easy to follow hence

09:49 dddddd has joined #lima

10:10 warpme_ has joined #lima

10:15 <mastermart> ouh righty i was wrong, actually the last thing i would not do, cause RAW deps would get blocked, while WAR deps would not

10:20 <mastermart> syrk4 becomes A22 - At1,2 A2,3 and is export instruction, this commits the stuff to the readback procedure, and gemms on that line will write both to A2,3

10:20 <mastermart> because on that block if it is a fu line, only one can commit at time

10:26 <mastermart> whole shader needs to be in SSA, yes in hw too, cause that will end up being easier, once the gemm commits, the export will unstall and commit the data , where it checks which line committed it , does calculations and runs the next one, cause in ssa all reads of previous writes come in seqeuence

11:15 BenG83 has quit [Ping timeout: 252 seconds]

11:39 BenG83 has joined #lima

11:48 BenG83 has quit [Ping timeout: 240 seconds]

12:37 <MoeIcenowy> anarsoul: tried retroarch xmb menu on lima on PineTab

12:37 <MoeIcenowy> works here on 64-bit system

12:39 <MoeIcenowy> but looks that the performance is bad (it gets better when "Icon Shadows" are disabled

12:45 <MoeIcenowy> anarsoul: BTW trying to set "Menu Shader Pipeline" to "Simple Snow" triggers a certain pp error

12:47 <MoeIcenowy> and it looks that it hangs the GPU

12:49 <MoeIcenowy> oh this problem happens only on KMS...

13:08 kaspter has quit [Quit: kaspter]

13:11 kaspter has joined #lima

13:15 mastermart has quit [Ping timeout: 240 seconds]

13:18 mastermart has joined #lima

13:19 <mastermart> plaes: snitch/agent/betraydor fuck off and do it fast! If you don't stop stalking me and my family we send you to hospital with head concussion .

13:27 <mastermart> My friend who is a patriot of our country, has a whole book about people like plaes and the earlier who made the run up, they hardly respond to anything face to face, their job is to spread lies in foreign countries and to some in our country, last is short, cause this is small country, but foreign countries provide possibilities to scam forward.

13:28 <mastermart> i had several girls doing it in portugal and other countries, so i made it to worlds notorious one.

13:29 <mastermart> such are called agents who you meet several times later in life, but they do not talk with you, cause otherwise they would be exposed, they choose their victims

13:29 <mastermart> who are dumb enough to belive what they say.

13:31 <mastermart> We have infrastructure to deal with snitches and agents whos job is to betray estonian citizens without any legal reasons, and this structure is called Eeesti Kaitsepolitsei, in short KAPO.

13:32 <mastermart> 100years old authority during 2 republics

13:34 <mastermart> Agent life is such, that as said, one chooses his victims from groups of people, and the real victim is not being notified who one betraydes, and another thing is one does not do public writings

13:35 <mastermart> because then one is exposed to, he does behind the back conspiracy among selected audiences.

13:38 <mastermart> IN such way my grandfathers uncle was killed, and my own granny saw right away knowing the history, it may happen to me too, so i have had more then 100s of snitches in various countries not only in estonia, but regularly those are indeed estonians.

13:45 <mastermart> back times during the first estonian republic, agents were trained to do that, they had special preparation in special schools to betray their nations citizens, obviously i know that cause my grandfather was a criminal investigator

13:46 <mastermart> who investigated all that stuff , jesuiitide kool eesti keeles.

13:52 <mastermart> so from us at least i do not care about the diagnosis, i am paranoid for a reason, and my family is too, cause i carry too much genetics with my relatives who were betrayded by other estonians and who were killed.

13:59 <mastermart> my dad is rich and moral, but those ones who went against me, the betrayders, are even richer but amoral.

14:05 <mastermart> I can't say anymore do not go to portugal telling fairytales about me, not a single letter i got response from, they have betrayal in their veins and nerves, father was a rich soviet communist, it was all done 10years in estonia, couple years in england, many years in portugal, where i did not act the same

14:05 <mastermart> i contacted her directly before going to talk behind the back

14:13 mastermart has quit [Read error: Connection reset by peer]

15:19 enunes has quit [Read error: Connection reset by peer]

15:20 enunes has joined #lima

16:18 kaspter has quit [Ping timeout: 268 seconds]

16:19 kaspter has joined #lima

16:25 kaspter has quit [Ping timeout: 240 seconds]

16:37 kaspter has joined #lima

16:51 <anarsoul> rellla: nice work!

18:30 adjtm has quit [Ping timeout: 252 seconds]

18:45 enunes has quit [Ping timeout: 268 seconds]

19:33 armessia has joined #lima

19:42 enunes has joined #lima

19:57 adjtm has joined #lima

21:41 armessia has quit [Remote host closed the connection]

22:01 armessia50 has joined #lima

22:01 armessia50 has quit [Remote host closed the connection]

22:02 armessia has joined #lima

22:12 <armessia> anarsoul: hi

22:12 <anarsoul> hi

22:12 <armessia> anarsoul: regarding the disabled ci for lima and the cubemaps branch

22:12 <armessia> I rebased because of a recent change in the lima fail list

22:13 <anarsoul> oh, right

22:13 <armessia> Wanted to get everything nicely passing but now I'm a bit blocked I guess

22:13 <armessia> Until the hardware gets fixed which runs the lima tests

22:13 <armessia> Any idea when this would be fixed?

22:14 <anarsoul> armessia: you can just re-enable it temporarily in your branch, get pipeline passing, disable it back

22:15 <armessia> anarsoul: ok, will try that

22:15 <armessia> anarsoul: will be for tomorrow though

22:15 <armessia> Before the rebase the pipeline was passing, but still need to look into what is fixed by which commit

22:15 <armessia> But that will be for tomorrow :-)

22:16 <anarsoul> :)

22:16 <armessia> anarsoul: another question, can I add your reviewed by in the commits which were already reviewed? Or will gitlab do that automatically?

22:16 <anarsoul> you have to add it

22:17 <armessia> Ok, good to know

22:17 <anarsoul> 'git rebase -i' is your friend :)

22:18 <armessia> Yup :-)

22:21 armessia has quit [Remote host closed the connection]

22:59 <anarsoul> enunes: we definitely have some issue with BO refcounting

23:00 <enunes> anarsoul: so far all I have discovered is that if I disable bo cache, it doesn't leak

23:00 <anarsoul> interesting

23:01 <anarsoul> enunes: also looks like we're freeing ctx BO somewhere where we should not

23:01 <anarsoul> I'm getting occasional gpmmu read failures when running q3a in loop

23:04 <enunes> it seems that we do the last unref right after submitting to the kernel, I don't understand enough about it to know if it will be held by something else

23:04 <enunes> until the job finishes

23:05 <anarsoul> ouch

23:05 <anarsoul> I see at least one issue

23:07 <anarsoul> well, it's probably not it

23:08 <anarsoul> enunes: can you check if lima_bo_cache_get bails out due to lima_bo_wait() returning false?

23:10 <enunes> let me try

23:15 <anarsoul> also BO cache shouldn't leak core than 5s of BOs, however that can be a lot

23:20 <anarsoul> you can try reducing that with https://gist.github.com/anarsoul/0705b66043ee0bf3f0a2a94b3d1ec2ab

23:20 <anarsoul> but I'm not sure why it leaks in first place

23:32 <anarsoul> enunes: you also need to add support for indexed draws for your MR

23:32 <enunes> anarsoul: possibly, I didn't test that yet, I guess I need to write an app for it

23:32 mrueg has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]

23:33 <enunes> also quads, I wonder if we even get quads

23:35 <enunes> though indexed draws should use the same indices that I'm modifying, so I expected to just use a test app to ensure it works

23:35 <anarsoul> ideas seems to use glDrawElements() but it's unlikely that it has enough vertices to trigger the issue

23:36 mrueg has joined #lima

23:37 <enunes> I would just use something like gbm-surface and fill the screen with 100000 small triangles or something like that

23:37 <enunes> but yeah I need to do that before removing wip

23:39 <enunes> anarsoul: I think it's not bailing out on lima_bo_wait at lima_bo_cache_get

23:40 <anarsoul> so it actually gets BO from cache?

23:40 <enunes> yes

23:42 <enunes> I'm testing this with glmark2-es2-drm -b bump:bump-render=high-poly which allocates quite large buffers

23:43 <enunes> in less than 10 seconds it runs out of 2gb of memory

23:43 <anarsoul> ouch

23:43 <anarsoul> how much does it allocate per frame?

23:45 <anarsoul> so I think I understand where issue may be coming from

23:46 <anarsoul> let's say we have 2 BOs, one is 4kb, another is 8kb

23:46 <anarsoul> let's assume both go into the same bucket

23:46 <anarsoul> we allocate them in this order: 4kb, 8kb

23:46 <anarsoul> but free them in opposite order

23:46 <anarsoul> i.e. 8kb, 4kb

23:46 <enunes> it allocates 2 * 1048512 and 1 * 206976 per frame

23:47 <anarsoul> for next frame we allocate again 4kb, 8kb

23:47 <anarsoul> bo_cache grabs 8kb for 4kb allocation since it's big enough

23:47 <anarsoul> but when it tries to grab bo for 8kb it gets nothing since 4kb is not big enough and it allocates new bo

23:48 <anarsoul> so we end up with 2 8kb BOs that are live and 1 4kb BO in cache

23:50 <enunes> still it should just allocate one more 8kb and end up with 3 live 8kb buffers

23:51 <enunes> well 2 in that case

23:53 <anarsoul> yeah, but what if it's something bigger like 1mb, 1.5mb, 1.9mb?

23:54 <anarsoul> 206kb and 1mb should go into different buckets though