ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
<MoeIcenowy> anarsoul: looks like when the PP MMU fault is triggered, the fragment shader has a if control flow
<anarsoul> what's the shader?
<MoeIcenowy> ?
<anarsoul> what's shader source?
<MoeIcenowy> I think you can see it in qapitrace
<MoeIcenowy> because I pointed out which call is failing
<MoeIcenowy> BTW the comments mentioned a file name "st-scroll-view-fade.glsl"
<anarsoul> MoeIcenowy: standalone compiler fails on this shader in NIR validation
<anarsoul> anyway, there's nothing special in it, just some ifs
UnivrslSuprBox has quit [Remote host closed the connection]
UnivrslSuprBox has joined #lima
<MoeIcenowy> anarsoul: why can it fail?!
<anarsoul> MoeIcenowy: standalone compiler? probably someone broke something for standalone
<anarsoul> it fails before it even reaches lima-specific parts
mardikene193 has joined #lima
<anarsoul> that's unfortunate but I don't have time to fix it atm
<mardikene193> hehee joss193s sentences are still ignored by the logger.
<mardikene193> yes, those guys who made it a reality actually did not make overly much sense back times too
<mardikene193> even though i struglled in greater detail back then.
<mardikene193> the quality of a professional read must be according to the fact which we can not deny, anything legal and doing that in more expert kind of way becomes with a real effort involved
<mardikene193> just like high level sport in a good level goes allready a bit more complex so does programming
<mardikene193> you can mostly only acheive results if you make some effort to test your resillience
yuq825 has joined #lima
<mardikene193> and the fact that you can not tolerate critics while flooding shit onto another personality demonstrates enough of the fact that you lack this kind of ability to resist to tension or pressure.
kaspter has quit [Ping timeout: 276 seconds]
<mardikene193> all that begun with you dirtying my reputation when i started testing you and put you into trouble to doubt in your claims of SHIT!
<mardikene193> in every logical way the insitution people have lost every conversation/debate against me, but those morans thought relying on foreign people we can scam forward, until i started to clearly own foreign ones too.
<mardikene193> it ever grows in complexity when someone tells you gotta win the game , even though appearingly some do that with ease this all requires an effort, which scammers are unable to materialize hence they are known scammers.
<mardikene193> for instance when i realize i am dealing with a scammer of such degree who does not even have potential to make any sense
<mardikene193> I start to temporarily show off the heavy scoring i am used to with, when my life and welfare is put on the line
kaspter has joined #lima
jrmuizel has quit [Remote host closed the connection]
dddddd has quit [Remote host closed the connection]
camus has joined #lima
kaspter has quit [Ping timeout: 240 seconds]
camus is now known as kaspter
kaspter has quit [Ping timeout: 245 seconds]
yuq825 has quit [Quit: Leaving.]
<mardikene193> most the problems back then were none knew what i was sentenced for, the major outsiders had complaints about me, which did not seem legal to the police either. and violations were mostly outstanding and primarly with directions towards me not the other way around
<mardikene193> when you say you are scared of me, there is a reason originating from your own bad behavior more likely other then my behavior, which have to me justified, i never attended in criminal or nonjustified activity at innocent people
jernej has joined #lima
jernej has quit [Client Quit]
<mardikene193> when you come to state crap because of some betrayding total fucktard outsider hailing from estonia , none of the real personalities are happy about it in our country, you get into trouble for doing unnatural things like this.
<bshah> btw : re: i am not quite sure if it is same thing or not : https://gitlab.freedesktop.org/lima/mesa/issues/122 but, QML/Qt have similar-ish font rendering bug, which seems to be workarounded by QT_ENABLE_GLYPH_CACHE_WORKAROUND ... potentially useless for teh specific bug though I guess
<bshah> what does mythtv use?
<mardikene193> I was at getting the names who scam acting as columbian gangsters and doing harm to me, do you think it is my dream job to deal with, you see some nutters organised several assaults and attempts to me, i am no more fine that outsiders do that, i would like to get those violator names and my crew including me will treat them "fine"
<mardikene193> It is not easy for me to deal with such subjects and also try to make sense performign studies every day.
<mardikene193> When you try to give decent effort some other area suffers too, i have sacrificied hence also quite a lot.
<anarsoul> IIRC they don't use Qt
<bshah> okay
<mardikene193> I have read miaow now for say 4-5years i think actively, and as i am on a set of meds, it seems i still do accasional mistakes, for instance yestirday i revealed one of them.
<mardikene193> i still did not look to closely about the mechanism of the even and odd path, only partial evaluation of valid_entry.v was done, and this is classics how major screws happen
<mardikene193> could be more convieniant at times to present my results of research if all wasn't discouraging and envying me in a style go fuckyourself and stop spamming
<mardikene193> so teeth crossed always suffering to bring out the last details of the code while staying days and days at home, and i want to see an outcome in form of change to your thinking the least
_whitelogger has joined #lima
<mardikene193> all those paths have to be discovered successively when we talk about professional programming, you can not turn back on hardware you see, as sort of berkleys people even have dlx chip simulator , schematic simulator shown what states graphically the signals are at
<mardikene193> when you are going to compete or try conversating with such people you gotta know stuff too.
Barada has joined #lima
jernej has joined #lima
yuq825 has joined #lima
_whitelogger has joined #lima
kaspter has joined #lima
<mardikene193> the speedup relies on something that is several times described in forums by different people, fallback memory on biggest gear that of dedicated GDDR5 at its very best gives you 300GB/S bandwidth while 6T register read or reg array parallel read 8TB/s roughly on GCN kind of chips
<mardikene193> but of course it's not only this, however instruction fetch can be fallback memory bound
<mardikene193> Cause GCN generations are allready in quite small process with faster transistor conduction gate delays switch on/off speeds register reads are in fraction of femtoseconds
<mardikene193> and on VLIW chips there also is a replay mechanism like on SIMD ones, but that happens between different bundles not within a bundle
<mardikene193> so yeah on terascale when you open the wikipedia page similar to mali it has 16 bundles per CORE they have so called simd arbiter
<mardikene193> the difference is in vector width while on GCN it is 16lanes also on MALI this is smaller like 4
<mardikene193> but i was going to talk a little bit about verilog, allthough the procedures are repeating albeit in the beginning showing up as totally unique, there are some requirements , namely you need to study cont. assignment and sensitivity list specs, rather then full event loop where the last is also somewhat recommended
<mardikene193> those two mentioned are very commonly and oftenly used, cont assigment works so that RHS can not evaluate to the same value as before
<mardikene193> any one of the LHS -- left hand side of the assigment needs to change , only one for it to evaluate right hand side to different value
<mardikene193> until the state is not met when nothing changes wire is continuosuly assigned, it is a simple trick how to read verilog procedures that use it properly
<mardikene193> cause of maybe you were unable to trust me as well as read those files in miaow, i described in words just in case what is happening on the chip itself.
<mardikene193> one of the main dilemmas i can see you still have maybe is , you see simd arbiter can arbitrate based of the scoreboard entries, even though a cuda program can give you feedback as basereg wfids are 0 1 2 3 4 etc.
<mardikene193> inside the chip it is handled that wfids can go also 0 to 7 if 1-6 ones skipped that is extreme example can happen but....
<mardikene193> that 7 is given a warp registers of the 1st warp and feedback according to that, but it's queue entry is in column 7 instead of 1
<mardikene193> this is primarly designed of course with a thought that in this way, you can seek the whole queue faster
<mardikene193> so when i understood bcos arguments on #osdev , he is some aussie os hacker what i have gathered
<mardikene193> then they do not apply and it is also documented, you can emulate branches very fast instead of slow
<mardikene193> it is some type of PISO and SIPO methods in the scoreboard and queues
<mardikene193> every entry of the wfid line the maximum on SIMD which is 40 on GCN are taken in parallel
<mardikene193> actually that is not true when there are very many async ones hw caps it to 16
<mardikene193> the main point is that such way hw has fast capability to skip 40waves inside a row
<mardikene193> the underlying design in miaow is entirely correct once again in that departement
<mardikene193> i do not know if zwabbit is just an incredible genious to have been mainly programming this great stuff on it's own when normally this is done with teams
<mardikene193> or did he use some hw generator to get closer there
<mardikene193> in both cases i belive both me and also AMD recognize this guy as a real programmer.
<mardikene193> yeah yeah we have some like this in scandinavia but i have not at least yet managed to meet them, some in estonia probably too, even falanx was norwegian one, miaow is also absolutely cracking code
mardikene193 has quit [Quit: Leaving]
jernej has quit [Ping timeout: 245 seconds]
Barada has quit [Quit: Barada]
jernej has joined #lima
enunes has joined #lima
<cwabbott> rellla: it's probably different because shader_runner creates a rgba8 surface, which means that each pixel consists of 4 8-bit channels where 0 means "1.0" and 255 means "1.0"
<cwabbott> the blending unit rounds the value you send with gl_FragColor, so the value you see when reading it back will only have 8 bits of precision
<cwabbott> also, the PP uses half-floats which only have 11 bits of precision, which isn't much more
<cwabbott> if I had to guess, it's failing because half floats aren't accurate enough, and implementing distance(a, b) on a scalar as sqrt((a - b)^2) causes extra rounding errors which make the computed difference happen to be closer to 0 when it shouldn't be
<rellla> cwabbott: thanks for the explanation, though i don't understand it all :) i think i have to get the basics about half float and precision first.
<rellla> so the read back value then is also 4 x 8bit?
<cwabbott> it's the 8-bit value divided by 255, so 0 is shown as 0 and 255 as 1.0
<rellla> i understand your last post about the additional rounding errors, but in the posted example the sqrt path is not used, but fabs(fadd(a, -b))
<cwabbott> writing gl_FragColor does the inverse thing, so 1.0 is rounded to 255
<cwabbott> I meant to explain why it would pass if you use sqrt
<cwabbott> using abs(a - b) only has one rounding step (the subtraction) but sqrt((a - b)^2) has three (subtract, then square, then sqrt)
<cwabbott> so the second will always be less accurate (in addition to being slower!)
<rellla> ah ok. so i should try to lower that abs to sqrt also?
<cwabbott> no, it's just not something you can solve
<cwabbott> the test is written expecting that the GPU is using regular 32-bit floats
<rellla> so it's just that piglit doesn't respect half floats in this case or in general at all.
<cwabbott> it's that exposing classic OpenGL on mali-400 at all is a hack
<cwabbott> since desktop GL requires that you use normal 32-bit floats
<rellla> so then should i lower the abs modifier at all? is this an accuracy issue in the (now) succeeding test as well or is it an issue, that some ops can't deal with abs() sources?
<cwabbott> piglit is certainly within its rights to check that something exposing desktop GL calculates its results accurately enough
<cwabbott> and we just can't guarantee that
<cwabbott> no, it's not that it can't deal with abs sources
<cwabbott> replacing abs with sqrt just makes it calculate the difference incorrectly
<cwabbott> well, *more* incorrectly
<cwabbott> which in this cases happens to mean that the calculated difference is smaller, and it passes
<cwabbott> I would check that increasing the tolerance makes it pass, and if that's the case, there's not much we can do
joss193 has joined #lima
joss193 is now known as mardikene193
<mardikene193> I basically need to do the implementation, i could do some more simulations like interactive tutorials how modern hardware works, due to time deficiency i find better use in doing all the bits of code in driver by my own.
maciejjo has quit [Remote host closed the connection]
<mardikene193> But when you listened to me carefully than there should not be anything so important missing anymore. I covered several time the most important paths.
<mardikene193> *times
<mardikene193> Some of the GL specific or spec specific things are that likely on SM4.0 and forward the instruction limits grew limitless or limited to available registers and hence vastly beyond the queue limits
<rellla> cwabbott: ok, i will "tune" the tests with bigger tolerance and see if they pass. if so, we would have to accept, that not every line gets a green pass...
<mardikene193> plus the register reuse code that never was implemented extensively is not allowed, hence on 4chains deep indirections they use address register to avoid replay of earlier instructions.
<rellla> though i'm wondering, why the blob does it that way ...
<cwabbott> probably laziness, and/or "why would anyone use distance() on scalars?"
<cwabbott> or they were trying to hew closer to the spec, which doesn't mention that optimization in its description of length()
<mardikene193> https://inf.ethz.ch/personal/markusp/teaching/252-2600-ETH-fall11/slides/16-Humair.pdf about the flame project which mathematically displays my ideas which hence were not as innovative as you may think, i have not extensively looked into that lib, allthough i have fair bit of expectations how they do it
<mardikene193> they still can run that stuff on single-gpu too, but i still confirm those guys understand things absolutely fine and correctly.
<mardikene193> because it is one to one consistent as to how i understand!
<cwabbott> either way, length() and distance() are lowered fairly early in mesa, and it's not worth changing just for one driver
<mardikene193> they concentrate on expressing it via mathematical formulas and so called supermatrix algorithms-by-blocks methods, which are absolutely correct in this link, i can tell fine that.
<rellla> cwabbott: i did not mean distance, i mean, why does the blob lower abs to sqrt(mul(x,x)) when it can use the modifier, use the abs() op or do just max(x,-x)?
<cwabbott> it doesn't
<cwabbott> it lowers "float x; ... = length(x);" to "... = sqrt(x *x);"
<cwabbott> whereas mesa lowers it to "... = abs(x);"
<cwabbott> the original shader uses distance() which is defined in terms of length()
<rellla> ah, then i misinterpreted it.
<rellla> but then the blob does it more expensive than needed.
<cwabbott> yes, it does
megi has joined #lima
<mardikene193> a lot can be done in highly better way, in most part i see integrated GPUs needing arithmetic on register valus generated by compilers as compressed chekcsum boolean toggles as indexes of certain arithmetic instructions. wau that become long.
<mardikene193> it's because constant fetches sometimes can be too expensive. But maybe you allready pack or compress stuff, and have engines to hide their latency by seeking them ahead or tiling them properly before the actual use.
<mardikene193> remember the two's complement uniform distrubuted indexes on 32bit or 64bit variable toggling booleans of the index with very fast two gate delays twos complement bitflipws
<mardikene193> this is a silver bullet to store stuff, since to unpack the stream or values on whole the address space is very cheap on twos complement.
<mardikene193> divide and reciprocals is something that a skilled programmer would not do most the time, this can be done minorly better i assume.
<mardikene193> *those
<mardikene193> but on correct pipelining i.e the queues in hw, even if there are 1/3 instructions divide it will still falwlessly in good perfromance
yuq825 has quit [Quit: Leaving.]
<mardikene193> compounded operations like repeated operations can be done in parallel only to some degree, never entirely parallel but the more parallel the more resources wasted too on the die
<mardikene193> all the cost of more used arithmetics is in DLX base arch cost and delay evaluation book done by some germans, it squeezes a lot of stuff into one book, which is good, otherwise all this is known on the web extensively from other sources too.
<mardikene193> some refuse to read them thinking about it being nuclear science , but really just remember the elementary school material a bit or memorise and this is a must know and comes easily when you went to school somewhere wherever.
<mardikene193> world works in a system of balance to some degree, if there are creative guys like me, who sometimes think brilliant have capabilities to acheive much can accosionally show good performance in sports and other events, and should have a position amongst the community, such men as seen are nudged or trimmed into a balance
<mardikene193> mostly for them it seems like the community works against them and society, all the time everything is given in favor to others and violations seems like most the time you run against the major wind
<mardikene193> this is why idealistically i like some concepts of communism, but in reality this order never materializes
raimo has joined #lima
mardikene193 has quit [Read error: Connection reset by peer]
raimo is now known as mardikene193
<mardikene193> hence life made me to hate tolerance politics what is largely in power and in predominant hotspot use at the moment!
<mardikene193> if you do not diserve it in anyway just because you are zombie or freak it's mostly not good idea to encourage such, mainly cause they do not perform sane activity.
<mardikene193> its just the tradeoff of me suffering throughout the life cause someone suffered only at birth , is not something I personally am after.
maciejjo has joined #lima
<mardikene193> they have cruel understanding that person looking a potential one does not never need have any rights and actually do not diserve the life that was given to them, it is so to speak not the handycaps but still in hidden way capped people tend to think so about superior personalities, some lost their lives because community did not think they siserved what they were given.
<mardikene193> it is enough of injustice
<mardikene193> I am and would be willing to share my assets into social programs, i am unwilling to face that every privilege about me is given to a born zombie who most of the time snatches all resources and momentum off me and violates daily basis too
<mardikene193> and communism states that in fixed manner, there is equal rights to anyone
Barada has joined #lima
<mardikene193> My dad has a sense of justice a little bit, mom does not a sense of justice comes from capabilities to think, delusion people do not understand what justice and fairness is
jernej has quit [Ping timeout: 250 seconds]
jernej has joined #lima
<mardikene193> may i ask what is your rationale of thinking about the need to sanctionize me any further, when i carry world record events and sequence of violations done to my particular stereotype or identity?
<mardikene193> I went to knee surgery which in troublesome estonian situation due to playing in hard surface i developed osgood-schlatter , which is 100percent surgery in recovery sense done most parts of the world, i knew i get hurt but chanches were either the insitution or taking on the surgery, shortly after i was vastly injured and saw doctors and estonians to celebrate laugh and they get their payback
<mardikene193> shortly after all turned into madness
<mardikene193> they did not think it was never enough what they acheived, and all of that went into a loop
jernej has quit [Ping timeout: 250 seconds]
<mardikene193> when tiger woods needed 15surgeries to be in any top form , i needed also only one or couple to be in form, i new exactly how will it turn out though in limited choices given that punishers will take charge there
<mardikene193> people who punish others for their mistakes , others noted as stereotypes who were never responsible for their lack of success
jernej has joined #lima
<mardikene193> and i still think in right direction, i encourage people and stereotypes like me to do practice sessions as early as possible, cause overall why i am still living after that much of violations is: that i eagerly take part of physical training, i am badly injured but hence only because of that still capable
<mardikene193> nothing compensates the lack of training when you get into a position where there is injury first one present
<mardikene193> and in terms of taking care of your body without any classification needed to be classified as sportsmen a stupid one like i was.
<mardikene193> this does not mean you can not program in free time or have good use of your time in science too
deesix has quit [Ping timeout: 245 seconds]
deesix has joined #lima
yuq825 has joined #lima
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
<mardikene193> and what is osgood-schlatter overall, everyone who have had it actually know what this condition is about, it's nothing dangerous, but in sports you feel your muscles in less tense situation and store less pressure in the leg overall if a minor tip of the cone inside the double knee is resected out
<mardikene193> some call those things as cysts
yuq825 has quit [Ping timeout: 245 seconds]
<mardikene193> a decent orthopedic surgeon should know that, how to enter without injuring tendons, nerves the target, and perform that type of resection with say laser
<rellla> cwabbott: setting the tolerance to tolerance*10 in piglit seems to get them pass. tested it for fs-tan-float on my lima-next branch without any abs/distance lowering. so it definitively seems to be an accuracy issue. thanks!
<mardikene193> and it was not to be any way possible to get injured like this along the way
<mardikene193> because i could perform those surgeries without any type of issues
yuq825 has joined #lima
* rellla will look into how the generated tests are generated and maybe do a piglit version with more tolerant tolerance values...
<rellla> anyway, good to know, that we can skip trying to hack sth into lima to get piglit tests passing.
<enunes> rellla: one time I made this, it is still in my stash, but I didn't conclude much from it https://paste.fedoraproject.org/paste/YmZEgr7x~TvtAD-09nWq1w/raw
<mardikene193> so having said all that, now i go, and i still have chanches to participate in some events, when i get some modifications done, rely a little bit on luck on the run, i earn some money and find person who fixes some stuff for me.
<mardikene193> cause with time i have also gathered some fans and stuff.
<rellla> enunes: thanks, i will try that
mardikene193 has left #lima ["Leaving"]
dllud has quit [Ping timeout: 265 seconds]
yuq825 has quit [Quit: Leaving.]
jrmuizel has joined #lima
<rellla> enunes, anarsoul: as it's not labeled ~lima, you may have missed that https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2047
dllud has joined #lima
yuq825 has joined #lima
Barada has quit [Quit: Barada]
dddddd has joined #lima
yuq825 has quit [Quit: Leaving.]
<anarsoul> rellla: yeah, sorry
<anarsoul> rellla: you need to tick "Allow commits from members who can merge to the target branch" so I can rebase and merge it
<rellla> oh sorry, done
<cwabbott> anarsoul: btw, the way indirect uniform loads work doesn't have anything to do with registers or register latencies
<cwabbott> there are four address registers, which you can write to directly using some special complex ops
<cwabbott> there is some latency between when the address register gets written and when you can use it, iirc only for stores
<cwabbott> I mean, there's a latency between when you write the address register and when you can use it
<anarsoul> cwabbott: honestly I don't remember what was conversation about :)
drod has joined #lima
<anarsoul> these dmesg-fails look suspicious
<rellla> this is with enunes' piglit patch. increasing the tolerance makes 40 tests pass... though i'll have to prepare a reference test set still
<anarsoul> yeah, something's wrong with textures
<anarsoul> fragment shader is trivial
<anarsoul> if it fails then either texture descriptor is wrong or allocated buffer for texture is too small
<rellla> the dmesg-fail also occur with master 88b8922
<anarsoul> :(
<rellla> (left one)
<enunes> anarsoul: I did notice something weird with the texture descriptor while working on https://gitlab.freedesktop.org/lima/mesa/issues/122 but not sure if it's a bug yet
<anarsoul> wanna look into it?
<anarsoul> enunes: dump it?
<enunes> anarsoul: yeah I did of course
<enunes> basically we skip the first descriptor for some reason, and it stays with unitialized stuff
<anarsoul> :\
<enunes> but my patch to actually use it, caused pp mmu fault
<enunes> so I'm still looking into it
<enunes> today I will have more time to get back into that
<anarsoul> I'm using this small tool to decode descriptors dumped from mali blob: https://gist.github.com/anarsoul/e30f8b1cbe9c3a2630faa7e3c821af8f
<anarsoul> enunes: btw I've prepared a pinebook with weston, glmark2 and q3a
<anarsoul> for demoing purpose on XDC :)
<enunes> ah, sounds great
<enunes> any way to set multiple monitors with it to try to actually present from it?
<anarsoul> nah, dual screen won't work. It's either HDMI or LCD. Also it's mini-hdmi
armessia has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima
drod has quit [Ping timeout: 265 seconds]
<anarsoul> cwabbott: do I understand correctly that it means if we want to use indirect uniforms load we have to load address register 2 instructions prior to using it?
drod has joined #lima
armessia has quit [Quit: Leaving]
jrmuizel has quit [Remote host closed the connection]
drod has quit [Remote host closed the connection]
jrmuizel has joined #lima
jrmuizel has quit [Remote host closed the connection]
jrmuizel has joined #lima