ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
<Ntemis> how i use apitrace? any help?
<Ntemis> nvm
<Ntemis> apitrace trace --api [egl] /path/to/application [args...]
<anarsoul> yep
Danct12[m] has joined #lima
_whitelogger has joined #lima
kaspter has quit [Ping timeout: 265 seconds]
kaspter has joined #lima
camus has joined #lima
kaspter has quit [Read error: Connection reset by peer]
camus is now known as kaspter
camus has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
camus is now known as kaspter
camus has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
camus is now known as kaspter
Ntemis has quit [Read error: Connection reset by peer]
dddddd has quit [Remote host closed the connection]
camus has joined #lima
kaspter has quit [Ping timeout: 276 seconds]
camus is now known as kaspter
kaspter has quit [Ping timeout: 268 seconds]
kaspter has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #lima
megi has quit [Ping timeout: 265 seconds]
kaspter has quit [Ping timeout: 240 seconds]
camus has joined #lima
camus is now known as kaspter
ecloud_wfh is now known as ecloud
camus has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
camus is now known as kaspter
mastermart has joined #lima
<mastermart> www3.uji.es/~figual/Tesis/tesis.pdf cuplapack, CUBLAS, libflame , supermatrix algorithms-by-blocks, you appear to not be capable of reading their out of order execution ideas, this pdf has everything needed to generate the deps of regs to be used for arbitary instructions
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #lima
<mastermart> first you pin the queues with their methods, then you destroy the object and start to recursively run the logic pinned into queues, depending on how many rows you select from the selector, that is how many operations will run, if you select only one and block the others, it will run one writeback instruction for instance until you unblock the other blocks rows randombly the executed instructions will linearly grow with every block released
<mastermart> from scoreboard
UnivrslSuprBox has quit [Quit: ZNC 1.6.6+deb1ubuntu0.2 - http://znc.in]
UnivrslSuprBox has joined #lima
<mastermart> so 4x4 matrix has 4+3+3=10 instructions in schedule, when you want to use all 16 you need not square but rectangular 4x8 matrix, from 4x4 4 is diagonal and 3's are upper and lower triangulars
<mastermart> the instruction selector finds that you need to schedule 4th instruction in the matrix, this corresponds to row2 instruction1 syrk4, when 7th instruction needs to be scheduled this is tile3/row3 syrk7
<mastermart> so you can schedule seperately 1 each or, several at time, depending what loads graduate and which not
<mastermart> 5th instruction is tile2 inst2 in the table, then you need to schedule two instructions back to back, from the selector the 1st and the second, that will run gemm5
paulk-leonov has quit [Ping timeout: 240 seconds]
kaspter has quit [Quit: kaspter]
kaspter has joined #lima
<mastermart> so i was a little incosistent but for performance reasons, gemm8 tile2 inst3 comes from selector 1 and 3
<mastermart> inconcistent
<mastermart> sorry gemm6, only one is issued the last in different order then back to back two of them, this is going to be slower to schedule
<mastermart> so the selctor combinations are 1 alone, 1 and 2, 2 and 3 and finally 1 and 4 final one is the only one that schedules differently than back to back and that is also consuming more power and latency
paulk-leonov has joined #lima
camus has joined #lima
kaspter has quit [Ping timeout: 264 seconds]
camus is now known as kaspter
<mastermart> it'd be to confusing, but generally you want to substitute gemm6 and gemm6 from the table with eachother
<mastermart> to/too
<mastermart> cause this will end up being faster
<mastermart> hence the final algorithm is modified version of algorithms-by-blocks, very easy modification
<mastermart> you do not want to compute from a line 1 and 8 instructions, you want to compute 3and4 or 1 and 2 or 7 and8, cause if you go 1 to 7, the arbiter does additional work in hw, it wants to reschedule 2 3 4 5 6 and the power budget grows with latency, cause those will cause additional bitwise gate delays
<mastermart> I read many of those versions, i think however not entirely sure, that i allready saw this modification done somewhere
<mastermart> you'd consistently acheive 1TB/s of floating point issue band with fermi card, as outlied there, this is insane as you see, meaning insanely performant
<mastermart> On a GPU withNvidia Cublas, the sustained performance achieved by thegemmimple-mentation (373.4 GFLOPS) represents roughly a 40% of the peak performance ofthe architecture, that was for tesla
<mastermart> it is huge performance it will be doubled when the code is corrected a bit
<rellla> first version ...
megi has joined #lima
<mastermart> anyhow, i would also exchange syrk and the first gemm on the line, and things should be top notch, only for the pinned FU lines , not the procedure lines which always run
<mastermart> and change it to be dependent on the outcome, this will just make emulated writebacks from the FUs
<mastermart> https://inf.ethz.ch/personal/markusp/teaching/252-2600-ETH-fall11/slides/16-Humair.pdf I may do new slides, how that stuff works, i sent my laptops to repair
<mastermart> page 22 table is what i talked about.
<mastermart> those slides are well done, cause they are marked and big and easy to follow hence
dddddd has joined #lima
warpme_ has joined #lima
<mastermart> ouh righty i was wrong, actually the last thing i would not do, cause RAW deps would get blocked, while WAR deps would not
<mastermart> syrk4 becomes A22 - At1,2 A2,3 and is export instruction, this commits the stuff to the readback procedure, and gemms on that line will write both to A2,3
<mastermart> because on that block if it is a fu line, only one can commit at time
<mastermart> whole shader needs to be in SSA, yes in hw too, cause that will end up being easier, once the gemm commits, the export will unstall and commit the data , where it checks which line committed it , does calculations and runs the next one, cause in ssa all reads of previous writes come in seqeuence
BenG83 has quit [Ping timeout: 252 seconds]
BenG83 has joined #lima
BenG83 has quit [Ping timeout: 240 seconds]
<MoeIcenowy> anarsoul: tried retroarch xmb menu on lima on PineTab
<MoeIcenowy> works here on 64-bit system
<MoeIcenowy> but looks that the performance is bad (it gets better when "Icon Shadows" are disabled
<MoeIcenowy> anarsoul: BTW trying to set "Menu Shader Pipeline" to "Simple Snow" triggers a certain pp error
<MoeIcenowy> and it looks that it hangs the GPU
<MoeIcenowy> oh this problem happens only on KMS...
kaspter has quit [Quit: kaspter]
kaspter has joined #lima
mastermart has quit [Ping timeout: 240 seconds]
mastermart has joined #lima
<mastermart> plaes: snitch/agent/betraydor fuck off and do it fast! If you don't stop stalking me and my family we send you to hospital with head concussion .
<mastermart> My friend who is a patriot of our country, has a whole book about people like plaes and the earlier who made the run up, they hardly respond to anything face to face, their job is to spread lies in foreign countries and to some in our country, last is short, cause this is small country, but foreign countries provide possibilities to scam forward.
<mastermart> i had several girls doing it in portugal and other countries, so i made it to worlds notorious one.
<mastermart> such are called agents who you meet several times later in life, but they do not talk with you, cause otherwise they would be exposed, they choose their victims
<mastermart> who are dumb enough to belive what they say.
<mastermart> We have infrastructure to deal with snitches and agents whos job is to betray estonian citizens without any legal reasons, and this structure is called Eeesti Kaitsepolitsei, in short KAPO.
<mastermart> 100years old authority during 2 republics
<mastermart> Agent life is such, that as said, one chooses his victims from groups of people, and the real victim is not being notified who one betraydes, and another thing is one does not do public writings
<mastermart> because then one is exposed to, he does behind the back conspiracy among selected audiences.
<mastermart> IN such way my grandfathers uncle was killed, and my own granny saw right away knowing the history, it may happen to me too, so i have had more then 100s of snitches in various countries not only in estonia, but regularly those are indeed estonians.
<mastermart> back times during the first estonian republic, agents were trained to do that, they had special preparation in special schools to betray their nations citizens, obviously i know that cause my grandfather was a criminal investigator
<mastermart> who investigated all that stuff , jesuiitide kool eesti keeles.
<mastermart> so from us at least i do not care about the diagnosis, i am paranoid for a reason, and my family is too, cause i carry too much genetics with my relatives who were betrayded by other estonians and who were killed.
<mastermart> my dad is rich and moral, but those ones who went against me, the betrayders, are even richer but amoral.
<mastermart> I can't say anymore do not go to portugal telling fairytales about me, not a single letter i got response from, they have betrayal in their veins and nerves, father was a rich soviet communist, it was all done 10years in estonia, couple years in england, many years in portugal, where i did not act the same
<mastermart> i contacted her directly before going to talk behind the back
mastermart has quit [Read error: Connection reset by peer]
enunes has quit [Read error: Connection reset by peer]
enunes has joined #lima
kaspter has quit [Ping timeout: 268 seconds]
kaspter has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #lima
<anarsoul> rellla: nice work!
adjtm has quit [Ping timeout: 252 seconds]
enunes has quit [Ping timeout: 268 seconds]
armessia has joined #lima
enunes has joined #lima
adjtm has joined #lima
armessia has quit [Remote host closed the connection]
armessia50 has joined #lima
armessia50 has quit [Remote host closed the connection]
armessia has joined #lima
<armessia> anarsoul: hi
<anarsoul> hi
<armessia> anarsoul: regarding the disabled ci for lima and the cubemaps branch
<armessia> I rebased because of a recent change in the lima fail list
<anarsoul> oh, right
<armessia> Wanted to get everything nicely passing but now I'm a bit blocked I guess
<armessia> Until the hardware gets fixed which runs the lima tests
<armessia> Any idea when this would be fixed?
<anarsoul> armessia: you can just re-enable it temporarily in your branch, get pipeline passing, disable it back
<armessia> anarsoul: ok, will try that
<armessia> anarsoul: will be for tomorrow though
<armessia> Before the rebase the pipeline was passing, but still need to look into what is fixed by which commit
<armessia> But that will be for tomorrow :-)
<anarsoul> :)
<armessia> anarsoul: another question, can I add your reviewed by in the commits which were already reviewed? Or will gitlab do that automatically?
<anarsoul> you have to add it
<armessia> Ok, good to know
<anarsoul> 'git rebase -i' is your friend :)
<armessia> Yup :-)
armessia has quit [Remote host closed the connection]
<anarsoul> enunes: we definitely have some issue with BO refcounting
<enunes> anarsoul: so far all I have discovered is that if I disable bo cache, it doesn't leak
<anarsoul> interesting
<anarsoul> enunes: also looks like we're freeing ctx BO somewhere where we should not
<anarsoul> I'm getting occasional gpmmu read failures when running q3a in loop
<enunes> it seems that we do the last unref right after submitting to the kernel, I don't understand enough about it to know if it will be held by something else
<enunes> until the job finishes
<anarsoul> ouch
<anarsoul> I see at least one issue
<anarsoul> well, it's probably not it
<anarsoul> enunes: can you check if lima_bo_cache_get bails out due to lima_bo_wait() returning false?
<enunes> let me try
<anarsoul> also BO cache shouldn't leak core than 5s of BOs, however that can be a lot
<anarsoul> but I'm not sure why it leaks in first place
<anarsoul> enunes: you also need to add support for indexed draws for your MR
<enunes> anarsoul: possibly, I didn't test that yet, I guess I need to write an app for it
mrueg has quit [Quit: http://quassel-irc.org - Chat comfortably. Anywhere.]
<enunes> also quads, I wonder if we even get quads
<enunes> though indexed draws should use the same indices that I'm modifying, so I expected to just use a test app to ensure it works
<anarsoul> ideas seems to use glDrawElements() but it's unlikely that it has enough vertices to trigger the issue
mrueg has joined #lima
<enunes> I would just use something like gbm-surface and fill the screen with 100000 small triangles or something like that
<enunes> but yeah I need to do that before removing wip
<enunes> anarsoul: I think it's not bailing out on lima_bo_wait at lima_bo_cache_get
<anarsoul> so it actually gets BO from cache?
<enunes> yes
<enunes> I'm testing this with glmark2-es2-drm -b bump:bump-render=high-poly which allocates quite large buffers
<enunes> in less than 10 seconds it runs out of 2gb of memory
<anarsoul> ouch
<anarsoul> how much does it allocate per frame?
<anarsoul> so I think I understand where issue may be coming from
<anarsoul> let's say we have 2 BOs, one is 4kb, another is 8kb
<anarsoul> let's assume both go into the same bucket
<anarsoul> we allocate them in this order: 4kb, 8kb
<anarsoul> but free them in opposite order
<anarsoul> i.e. 8kb, 4kb
<enunes> it allocates 2 * 1048512 and 1 * 206976 per frame
<anarsoul> for next frame we allocate again 4kb, 8kb
<anarsoul> bo_cache grabs 8kb for 4kb allocation since it's big enough
<anarsoul> but when it tries to grab bo for 8kb it gets nothing since 4kb is not big enough and it allocates new bo
<anarsoul> so we end up with 2 8kb BOs that are live and 1 4kb BO in cache
<enunes> still it should just allocate one more 8kb and end up with 3 live 8kb buffers
<enunes> well 2 in that case
<anarsoul> yeah, but what if it's something bigger like 1mb, 1.5mb, 1.9mb?
<anarsoul> 206kb and 1mb should go into different buckets though