#lima on 2019-11-22 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:00 <suurnoku2> I have uring the ten 25years lived highly silent life and private one not giving too many agents real chanches, when i have slipped to allow some scammer agent around me, i have been punished hard.

00:02 <suurnoku2> even in foreign countries, some mad lady managed to make me famous, and there were several protesters all the way through to annoy my fans and ladies who had only come to see me, and their dads were only potential along with sportsmen to defend me, cause ladies can not defend me from cockblockers.

00:03 <suurnoku2> where they came, well one sexual intriguant iinvolved as fuckee with many quasimodos around the world pretty much , chipsies kilis no matter which ones, lots of english bullies

00:04 <suurnoku2> basically a big fucktard lady but clueless enough in science and real events

00:04 <suurnoku2> her fucktards fans seemed to say form the most of those protesters.

00:10 <suurnoku2> brainless indivuduals without any logics involved under any of the legal events i have seen, but think with dicks and pussies only.

00:14 <suurnoku2> the outcome is oftenly very similar if people lack hobbies and interest besides intrigues and orgies.

00:17 <suurnoku2> and the result is sexual comments and violative comments daily present from terroristic quasimodos at my name only because those retards lack other interests.

00:20 <suurnoku2> eesti keeles siis üks sitane õli nikkus ennast eesti agulinikuõritustelt erinavate quasimodode seltskonnast portugali glamuuri ja kõik minu fännid kannatasid üle maailma, nagu neid huvitaks need keda see libu on nikkunud, et nad minu ümber tiirleks ja minu fänne terroriseeriks.

00:27 suurnoku2 has quit [Quit: Leaving]

03:05 megi has quit [Ping timeout: 240 seconds]

05:36 Barada has joined #lima

05:56 aureliusmarcus has joined #lima

05:59 <aureliusmarcus> the whole thing is when destination register is not available for store, it is a bit harder to block the registers , while reusing the stores for writebacks, texture has an arbiter on loads, it buffers them, but there is a method which works without buffering also.

05:59 <aureliusmarcus> It is because GPU hardware does not check for WAR hazards

06:09 <aureliusmarcus> the whole thing works so: that you have selector diagonal with LSU stores and vector loads, first a procedure allows to graduate two of those loads, everything is issued, it comes back to the diagonal but this time around since 38 of the registers did not graduate , only 2 of the instructions are issued

06:17 <aureliusmarcus> so if you skip from the diagonal most of the instruction the hardware keeps the last diagonal in the in-flight buffers, so it runs two or one or more of those instructions that did graduate, and then gets a stall event from the dispatcher

06:17 <aureliusmarcus> which means 1111 as decode wfid is driven on the chip

06:17 <aureliusmarcus> which is a wild card that matches last issued wfid from simd arbiter

06:18 <aureliusmarcus> and hence you branch the queues to that particular column selected after the last instruction scheduled on the selector diagonal

06:24 <aureliusmarcus> the logic is always the same kind, first the selector issues all, but graduates only some of them , based of the dependent load contents and toggled via corresponding offsets

06:25 <aureliusmarcus> when it wraps around back to 0 next time all but some are blocked, then it branches

06:25 <aureliusmarcus> issues certain instructions from the tile, and frees the dependencies and the whole thing starts again from the first diagonal

06:26 <aureliusmarcus> again it issues everything but graduates some, then it selects the tile to run etc.

06:28 <aureliusmarcus> this is like a pipeline of instructions and it is best to be run as SSA interprated

06:29 <aureliusmarcus> interpreted

06:33 dddddd has quit [Read error: Connection reset by peer]

06:39 <aureliusmarcus> accoring to litmus testsuite as i anticipated my own long time ago, repeating the last very same address on the texture unit, skips the writes

06:39 <aureliusmarcus> this is how you do not allow the load to graduate

06:41 <aureliusmarcus> and i allready told you, this is not something that can be fixed in hardware, cause it is inherent to how RTLs are functioning

06:41 <aureliusmarcus> they function based revaluation events, and if nothing changed from the previous time

06:41 <aureliusmarcus> revaluation events are not driven, and so are not the data forwarded, since nothing changed

06:43 dllud has quit [Ping timeout: 276 seconds]

06:46 <aureliusmarcus> non-blocking assignment in the respective place on the chip are handled in address calculator in the following way:

06:47 <aureliusmarcus> assignments do not block they are executed in parallel, but commit is deferred until it is sure that all assignments got re-evalatuated

06:47 <aureliusmarcus> in other words, if one did not, the chip stalls in the address calculator, and hence misses the write and leave scoreboard blocking

06:52 <aureliusmarcus> it is not possible to overcome this issue, even if you were to reset the stuff, only possible value to reset to is 0

06:53 <aureliusmarcus> then obviously you post an address zero and still are able to skip, you can not drive random values to address calculator for reset

06:53 <aureliusmarcus> cause that would produce very unexpected bug if the address happens to collide

06:57 <aureliusmarcus> in the current hw paradigm, this method should always work is the meaning of all of that

06:58 <aureliusmarcus> you can not drive exceptions on the zero register values, cause this is a legal value in registers, and spec allows this at least historically enough so that all chips would be covered

07:07 aureliusmarcus has quit [Ping timeout: 276 seconds]

07:32 megi has joined #lima

08:16 monstr has joined #lima

08:32 <rellla> i wonder, if it is just me: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445#note_320444

08:44 kaspter has quit [Quit: kaspter]

08:45 kaspter has joined #lima

09:08 gcl has quit [Ping timeout: 276 seconds]

09:20 gcl has joined #lima

09:25 yann|work has quit [Ping timeout: 240 seconds]

09:35 gcl has quit [Ping timeout: 240 seconds]

09:37 gcl has joined #lima

10:03 ecloud_ is now known as ecloud_wfh

10:08 Barada has quit [Quit: Barada]

10:22 yann|work has joined #lima

10:39 cwabbott has quit [Quit: cwabbott]

10:40 <enunes> rellla: I'll try that, though I don't have a list of passing deqp tests handy anymore, can you share the one you've been using?

10:40 <rellla> enunes: http://imkreisrum.de/deqp/master_8fc8e8e8_tiled/

10:41 <rellla> i'm running the test with your branch and the fix now, but i also have to skip 'dEQP-GLES2.functional.texture.size.2d.2048x2048_rgba888' due to OOM crash

10:42 <enunes> oom crash with or without the MR?

10:42 <rellla> up to this test (~8100) it seems i have the same result as master

10:42 <rellla> OOM with the MR

10:42 <rellla> the others in here http://imkreisrum.de/deqp/master_8fc8e8e8_tiled/deqp-lima-skips.txt fail with master

10:43 <enunes> I'd try checking if the memory consumption is growing during execution, I saw that before running glmark2 in a loop

10:43 <enunes> just running something like htop on a separate window

10:44 <rellla> i guess the different behaviour in executing the tests single or in a set indicates sth like that ...

10:44 <enunes> so it's possible the new bo I introduced to allow larger varying buffers is still leaking

10:49 <enunes> it didn't show with piglit though, probably because of separate binaries

10:56 <rellla> enunes: be aware i enabled should_tile = true in the tests :)

10:56 <rellla> otherwise you will get new_fails :p

10:57 <enunes> as long as it doesn't crash and abort the run with master I guess it's ok

10:59 maccraft123 has joined #lima

10:59 cwabbott has joined #lima

11:04 maccraft123 has quit [Client Quit]

11:06 maccraft123 has joined #lima

11:06 maccraft123 has quit [Client Quit]

11:18 maccraft has joined #lima

11:27 warpme_ has joined #lima

11:36 maccraft has quit [Quit: WeeChat 2.6]

11:53 maccraft has joined #lima

11:58 maccraft has quit [Client Quit]

12:09 megi has quit [Quit: WeeChat 2.6]

12:34 maccraft has joined #lima

12:38 maccraft has quit [Client Quit]

12:39 maccraft has joined #lima

13:48 kaspter has quit [Quit: kaspter]

13:57 maccraft has quit [Quit: WeeChat 2.6]

14:27 dddddd has joined #lima

14:43 gcl has quit [Ping timeout: 265 seconds]

14:45 gcl has joined #lima

14:54 maccraft has joined #lima

14:55 gcl has quit [Ping timeout: 240 seconds]

15:07 monstr has quit [Remote host closed the connection]

15:19 <anarsoul> rellla: enunes: sounds like we still have an issue with BO cache and/or BO management?

15:19 <anarsoul> BO cache relies on working lima_bo_Wait()

15:19 <enunes> anarsoul: I think it's refcount of the BO I allocate now in lima_draw_vbo

15:20 <enunes> I think it should deallocate automatically with the deref after submitting to the kernel

15:20 <anarsoul> but what will happen if BO is added to both submits (GP and PP)?

15:20 <enunes> but it doesn't

15:20 <enunes> (I think)

15:20 <anarsoul> enunes: refcnt is re-initialized in BO-cache later

15:20 <anarsoul> I think we're just reusing BOs too early

15:21 <anarsoul> varyings BO is added to both submits

15:21 <enunes> unfortunately I have a long trip today and will be for 2 weeks away

15:22 <anarsoul> but if lima_bo_wait() signals completion when only GP is done then we're screwed

15:22 <enunes> I do have remote access but not sure about time, maybe I can take a look while waiting today

15:23 <anarsoul> have a safe trip

15:24 <enunes> I think if I added the BOs to a list and deref them at the end, there would be no leak, I tried this once

15:24 <enunes> but I think I shouldn't need that with the automatic deref on what is on the submits

15:24 <anarsoul> I'll try to check lima_bo_wait() logic this weekend, but I have storng suspicion that it's broken :(

15:32 gcl has joined #lima

16:57 <anarsoul> hm

16:59 <anarsoul> I think lima_bo_wait() should work fine

17:02 gcl has quit [Ping timeout: 265 seconds]

17:17 <anarsoul> probably we're not adding a BO to submit somewhere?

17:17 <anarsoul> enunes: rellla: ^^

17:17 <anarsoul> rellla: can you try with BO cache disabled?

17:35 maccraft has quit [Ping timeout: 245 seconds]

17:49 BenG83 has joined #lima

17:55 yann|work has quit [Ping timeout: 265 seconds]

18:08 gcl has joined #lima

18:24 maccraft123 has joined #lima

18:40 <anarsoul> rellla: also I guess we can now use your new shiny cmd stream parser to see what's going on :)

18:50 dllud has joined #lima

18:58 yann|work has joined #lima

19:05 dllud has quit [Remote host closed the connection]

19:12 piggz has joined #lima

19:21 dllud has joined #lima

19:48 BenG83 has quit [Remote host closed the connection]

19:52 piggz has quit [Ping timeout: 246 seconds]

19:56 piggz has joined #lima

20:18 <enunes> anarsoul: I tihnk it will also leak without BO cache

20:19 <anarsoul> I don't think it's a leak

20:48 <enunes> anarsoul: well I think there is clearly some form of leak, if you run glmark2-es2-drm --run-forever -b refract:duration=1s and watch memory consumption it leaks something like 100MB per run

20:49 <anarsoul> hm

20:49 <anarsoul> I did something like that a while ago (when I debugged BO cache leaks) and it didn't leak for me with the fixes

20:49 <enunes> I'm just not sure if it's something wrong I'm doing with refcounting, or if it's some more general bug in allocation

20:49 <anarsoul> even if it leaks it doesn't explain why we get gp mmu fault

20:50 <enunes> I don't get gp mmu fault

20:50 <anarsoul> rellla does with deqp

20:50 <enunes> yeah that sounds more serious and I don't think it's explained by my MR

20:51 <anarsoul> that's what I was talking about -- looks like we're reusing BO too early

20:51 <enunes> sounds like the possible race condition in deref right after submit that we talked some time ago

20:51 <anarsoul> and that only can happen if lima_bo_wait() said that BO is not busy

20:52 <anarsoul> enunes: I don't remember a talk about this race

20:52 <anarsoul> any links?

20:56 <enunes> anarsoul: @23:04 https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima&highlight_names=&date=2019-10-23

20:57 <enunes> maybe we were talking about two different things

20:57 <enunes> but it's still not clear to me

20:58 <anarsoul> enunes: I believe BOs are also referenced in kernel

21:00 <enunes> yeah that would make sense

21:01 <enunes> so if lima_bo_wait is wrong, it is a kernel bug?

21:06 <enunes> I guess I don't even have to deref myself to ensure the refcount even if I add to thw two submits, lima_submit_start should just take care of it

21:06 <enunes> so, I don't know why it leaks

21:07 <anarsoul> I think there can be the issue with a BO that's added to multiple submits

21:07 <anarsoul> i.e. varying BO

21:07 <anarsoul> let's say we added it to GP submit that writes into it and then into PP submit that reads

21:07 <anarsoul> the question is would completion of GP submit signal that BO is ready? and what lima_bo_wait() would return in this case

21:08 <enunes> we had that before, just it was on the ctx buff

21:08 <anarsoul> yes, but ctx buff is 1MB and it doesn't get reused too often

21:09 <anarsoul> yet I've seen GP MMU faults when running q3a for 20-120mins

21:09 <anarsoul> I guess I have to ask Qiang, I don't really understand how it works in kernel

21:11 <enunes> one thing that I maintained from the previous implementatoin that might impact this is the LIMA_SUBMIT_BO_READ parameter

21:14 <anarsoul> so varying BO has to be added as LIMA_SUBMIT_BO_WRITE to GP submit and as LIMA_SUBMIT_BO_READ to PP submit

21:14 <enunes> yeah that makes sense to me, though I just kept the previous behaviour of LIMA_SUBMIT_BO_READ to both

21:15 <anarsoul> nah, that won't work

21:15 <anarsoul> you need exclusive fence for it

21:15 <enunes> lima_ctx_buff_va

21:15 <enunes> let me try with WRITE

21:21 <enunes> ok that seems a bit better, but I still see leaks

21:22 <enunes> maybe some other place needs the same

21:22 <enunes> (even without bo cache so not cached BOs)

21:23 <anarsoul> likely ctx buffer if there's anything that writes into it

21:24 <enunes> yeah maybe need to double check all places using lima_ctx_buff_va

22:02 maccraft123 has quit [Quit: WeeChat 2.6]

22:09 maccraft123 has joined #lima

22:43 maccraft123 has quit [Ping timeout: 240 seconds]

23:22 gcl has quit [Ping timeout: 240 seconds]

23:23 gcl has joined #lima