ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
<suurnoku2> I have uring the ten 25years lived highly silent life and private one not giving too many agents real chanches, when i have slipped to allow some scammer agent around me, i have been punished hard.
<suurnoku2> even in foreign countries, some mad lady managed to make me famous, and there were several protesters all the way through to annoy my fans and ladies who had only come to see me, and their dads were only potential along with sportsmen to defend me, cause ladies can not defend me from cockblockers.
<suurnoku2> where they came, well one sexual intriguant iinvolved as fuckee with many quasimodos around the world pretty much , chipsies kilis no matter which ones, lots of english bullies
<suurnoku2> basically a big fucktard lady but clueless enough in science and real events
<suurnoku2> her fucktards fans seemed to say form the most of those protesters.
<suurnoku2> brainless indivuduals without any logics involved under any of the legal events i have seen, but think with dicks and pussies only.
<suurnoku2> the outcome is oftenly very similar if people lack hobbies and interest besides intrigues and orgies.
<suurnoku2> and the result is sexual comments and violative comments daily present from terroristic quasimodos at my name only because those retards lack other interests.
<suurnoku2> eesti keeles siis üks sitane õli nikkus ennast eesti agulinikuõritustelt erinavate quasimodode seltskonnast portugali glamuuri ja kõik minu fännid kannatasid üle maailma, nagu neid huvitaks need keda see libu on nikkunud, et nad minu ümber tiirleks ja minu fänne terroriseeriks.
suurnoku2 has quit [Quit: Leaving]
megi has quit [Ping timeout: 240 seconds]
Barada has joined #lima
aureliusmarcus has joined #lima
<aureliusmarcus> the whole thing is when destination register is not available for store, it is a bit harder to block the registers , while reusing the stores for writebacks, texture has an arbiter on loads, it buffers them, but there is a method which works without buffering also.
<aureliusmarcus> It is because GPU hardware does not check for WAR hazards
<aureliusmarcus> the whole thing works so: that you have selector diagonal with LSU stores and vector loads, first a procedure allows to graduate two of those loads, everything is issued, it comes back to the diagonal but this time around since 38 of the registers did not graduate , only 2 of the instructions are issued
<aureliusmarcus> so if you skip from the diagonal most of the instruction the hardware keeps the last diagonal in the in-flight buffers, so it runs two or one or more of those instructions that did graduate, and then gets a stall event from the dispatcher
<aureliusmarcus> which means 1111 as decode wfid is driven on the chip
<aureliusmarcus> which is a wild card that matches last issued wfid from simd arbiter
<aureliusmarcus> and hence you branch the queues to that particular column selected after the last instruction scheduled on the selector diagonal
<aureliusmarcus> the logic is always the same kind, first the selector issues all, but graduates only some of them , based of the dependent load contents and toggled via corresponding offsets
<aureliusmarcus> when it wraps around back to 0 next time all but some are blocked, then it branches
<aureliusmarcus> issues certain instructions from the tile, and frees the dependencies and the whole thing starts again from the first diagonal
<aureliusmarcus> again it issues everything but graduates some, then it selects the tile to run etc.
<aureliusmarcus> this is like a pipeline of instructions and it is best to be run as SSA interprated
<aureliusmarcus> interpreted
dddddd has quit [Read error: Connection reset by peer]
<aureliusmarcus> accoring to litmus testsuite as i anticipated my own long time ago, repeating the last very same address on the texture unit, skips the writes
<aureliusmarcus> this is how you do not allow the load to graduate
<aureliusmarcus> and i allready told you, this is not something that can be fixed in hardware, cause it is inherent to how RTLs are functioning
<aureliusmarcus> they function based revaluation events, and if nothing changed from the previous time
<aureliusmarcus> revaluation events are not driven, and so are not the data forwarded, since nothing changed
dllud has quit [Ping timeout: 276 seconds]
<aureliusmarcus> non-blocking assignment in the respective place on the chip are handled in address calculator in the following way:
<aureliusmarcus> assignments do not block they are executed in parallel, but commit is deferred until it is sure that all assignments got re-evalatuated
<aureliusmarcus> in other words, if one did not, the chip stalls in the address calculator, and hence misses the write and leave scoreboard blocking
<aureliusmarcus> it is not possible to overcome this issue, even if you were to reset the stuff, only possible value to reset to is 0
<aureliusmarcus> then obviously you post an address zero and still are able to skip, you can not drive random values to address calculator for reset
<aureliusmarcus> cause that would produce very unexpected bug if the address happens to collide
<aureliusmarcus> in the current hw paradigm, this method should always work is the meaning of all of that
<aureliusmarcus> you can not drive exceptions on the zero register values, cause this is a legal value in registers, and spec allows this at least historically enough so that all chips would be covered
aureliusmarcus has quit [Ping timeout: 276 seconds]
megi has joined #lima
monstr has joined #lima
kaspter has quit [Quit: kaspter]
kaspter has joined #lima
gcl has quit [Ping timeout: 276 seconds]
gcl has joined #lima
yann|work has quit [Ping timeout: 240 seconds]
gcl has quit [Ping timeout: 240 seconds]
gcl has joined #lima
ecloud_ is now known as ecloud_wfh
Barada has quit [Quit: Barada]
yann|work has joined #lima
cwabbott has quit [Quit: cwabbott]
<enunes> rellla: I'll try that, though I don't have a list of passing deqp tests handy anymore, can you share the one you've been using?
<rellla> i'm running the test with your branch and the fix now, but i also have to skip 'dEQP-GLES2.functional.texture.size.2d.2048x2048_rgba888' due to OOM crash
<enunes> oom crash with or without the MR?
<rellla> up to this test (~8100) it seems i have the same result as master
<rellla> OOM with the MR
<rellla> the others in here http://imkreisrum.de/deqp/master_8fc8e8e8_tiled/deqp-lima-skips.txt fail with master
<enunes> I'd try checking if the memory consumption is growing during execution, I saw that before running glmark2 in a loop
<enunes> just running something like htop on a separate window
<rellla> i guess the different behaviour in executing the tests single or in a set indicates sth like that ...
<enunes> so it's possible the new bo I introduced to allow larger varying buffers is still leaking
<enunes> it didn't show with piglit though, probably because of separate binaries
<rellla> enunes: be aware i enabled should_tile = true in the tests :)
<rellla> otherwise you will get new_fails :p
<enunes> as long as it doesn't crash and abort the run with master I guess it's ok
maccraft123 has joined #lima
cwabbott has joined #lima
maccraft123 has quit [Client Quit]
maccraft123 has joined #lima
maccraft123 has quit [Client Quit]
maccraft has joined #lima
warpme_ has joined #lima
maccraft has quit [Quit: WeeChat 2.6]
maccraft has joined #lima
maccraft has quit [Client Quit]
megi has quit [Quit: WeeChat 2.6]
maccraft has joined #lima
maccraft has quit [Client Quit]
maccraft has joined #lima
kaspter has quit [Quit: kaspter]
maccraft has quit [Quit: WeeChat 2.6]
dddddd has joined #lima
gcl has quit [Ping timeout: 265 seconds]
gcl has joined #lima
maccraft has joined #lima
gcl has quit [Ping timeout: 240 seconds]
monstr has quit [Remote host closed the connection]
<anarsoul> rellla: enunes: sounds like we still have an issue with BO cache and/or BO management?
<anarsoul> BO cache relies on working lima_bo_Wait()
<enunes> anarsoul: I think it's refcount of the BO I allocate now in lima_draw_vbo
<enunes> I think it should deallocate automatically with the deref after submitting to the kernel
<anarsoul> but what will happen if BO is added to both submits (GP and PP)?
<enunes> but it doesn't
<enunes> (I think)
<anarsoul> enunes: refcnt is re-initialized in BO-cache later
<anarsoul> I think we're just reusing BOs too early
<anarsoul> varyings BO is added to both submits
<enunes> unfortunately I have a long trip today and will be for 2 weeks away
<anarsoul> but if lima_bo_wait() signals completion when only GP is done then we're screwed
<enunes> I do have remote access but not sure about time, maybe I can take a look while waiting today
<anarsoul> have a safe trip
<enunes> I think if I added the BOs to a list and deref them at the end, there would be no leak, I tried this once
<enunes> but I think I shouldn't need that with the automatic deref on what is on the submits
<anarsoul> I'll try to check lima_bo_wait() logic this weekend, but I have storng suspicion that it's broken :(
gcl has joined #lima
<anarsoul> hm
<anarsoul> I think lima_bo_wait() should work fine
gcl has quit [Ping timeout: 265 seconds]
<anarsoul> probably we're not adding a BO to submit somewhere?
<anarsoul> enunes: rellla: ^^
<anarsoul> rellla: can you try with BO cache disabled?
maccraft has quit [Ping timeout: 245 seconds]
BenG83 has joined #lima
yann|work has quit [Ping timeout: 265 seconds]
gcl has joined #lima
maccraft123 has joined #lima
<anarsoul> rellla: also I guess we can now use your new shiny cmd stream parser to see what's going on :)
dllud has joined #lima
yann|work has joined #lima
dllud has quit [Remote host closed the connection]
piggz has joined #lima
dllud has joined #lima
BenG83 has quit [Remote host closed the connection]
piggz has quit [Ping timeout: 246 seconds]
piggz has joined #lima
<enunes> anarsoul: I tihnk it will also leak without BO cache
<anarsoul> I don't think it's a leak
<enunes> anarsoul: well I think there is clearly some form of leak, if you run glmark2-es2-drm --run-forever -b refract:duration=1s and watch memory consumption it leaks something like 100MB per run
<anarsoul> hm
<anarsoul> I did something like that a while ago (when I debugged BO cache leaks) and it didn't leak for me with the fixes
<enunes> I'm just not sure if it's something wrong I'm doing with refcounting, or if it's some more general bug in allocation
<anarsoul> even if it leaks it doesn't explain why we get gp mmu fault
<enunes> I don't get gp mmu fault
<anarsoul> rellla does with deqp
<enunes> yeah that sounds more serious and I don't think it's explained by my MR
<anarsoul> that's what I was talking about -- looks like we're reusing BO too early
<enunes> sounds like the possible race condition in deref right after submit that we talked some time ago
<anarsoul> and that only can happen if lima_bo_wait() said that BO is not busy
<anarsoul> enunes: I don't remember a talk about this race
<anarsoul> any links?
<enunes> maybe we were talking about two different things
<enunes> but it's still not clear to me
<anarsoul> enunes: I believe BOs are also referenced in kernel
<enunes> yeah that would make sense
<enunes> so if lima_bo_wait is wrong, it is a kernel bug?
<enunes> I guess I don't even have to deref myself to ensure the refcount even if I add to thw two submits, lima_submit_start should just take care of it
<enunes> so, I don't know why it leaks
<anarsoul> I think there can be the issue with a BO that's added to multiple submits
<anarsoul> i.e. varying BO
<anarsoul> let's say we added it to GP submit that writes into it and then into PP submit that reads
<anarsoul> the question is would completion of GP submit signal that BO is ready? and what lima_bo_wait() would return in this case
<enunes> we had that before, just it was on the ctx buff
<anarsoul> yes, but ctx buff is 1MB and it doesn't get reused too often
<anarsoul> yet I've seen GP MMU faults when running q3a for 20-120mins
<anarsoul> I guess I have to ask Qiang, I don't really understand how it works in kernel
<enunes> one thing that I maintained from the previous implementatoin that might impact this is the LIMA_SUBMIT_BO_READ parameter
<anarsoul> so varying BO has to be added as LIMA_SUBMIT_BO_WRITE to GP submit and as LIMA_SUBMIT_BO_READ to PP submit
<enunes> yeah that makes sense to me, though I just kept the previous behaviour of LIMA_SUBMIT_BO_READ to both
<anarsoul> nah, that won't work
<anarsoul> you need exclusive fence for it
<enunes> lima_ctx_buff_va
<enunes> let me try with WRITE
<enunes> ok that seems a bit better, but I still see leaks
<enunes> maybe some other place needs the same
<enunes> (even without bo cache so not cached BOs)
<anarsoul> likely ctx buffer if there's anything that writes into it
<enunes> yeah maybe need to double check all places using lima_ctx_buff_va
maccraft123 has quit [Quit: WeeChat 2.6]
maccraft123 has joined #lima
maccraft123 has quit [Ping timeout: 240 seconds]
gcl has quit [Ping timeout: 240 seconds]
gcl has joined #lima