#lima on 2019-07-11 — irc logs at freenode.irclog.whitequark.org

2019-07-03 10:24 ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!

00:18 jrmuizel has quit [Remote host closed the connection]

00:22 deesix has quit [Ping timeout: 245 seconds]

00:23 dddddd has quit [Ping timeout: 268 seconds]

00:24 deesix has joined #lima

00:33 jrmuizel has joined #lima

00:36 dddddd has joined #lima

01:04 mardinator_ has joined #lima

02:38 jrmuizel has quit [Remote host closed the connection]

03:32 _whitelogger has joined #lima

05:07 dddddd has quit [Remote host closed the connection]

05:13 Barada has joined #lima

06:26 guillaume_g has joined #lima

07:20 jonkerj has quit [Ping timeout: 248 seconds]

07:25 jonkerj has joined #lima

07:46 ecloud is now known as ecloud_wfh

08:07 mardinator_ has quit [Ping timeout: 252 seconds]

08:07 mardinator_ has joined #lima

08:29 <rellla> anarsoul: https://pastebin.com/raw/9GMMXXuP

08:31 <rellla> i did a manual disassembly of the binaries and compared both, lima and offline compiler. i don't find any unknown bits set. the only difference which *could* have an effect is, that offline compiler combines the dFdx and dFdy instructions...

08:31 <rellla> though i only checked the control words for now...

08:33 <rellla> but cwabbott said, that 2 instructions should not be the problem...

09:01 <enunes> rellla: again I don't know if it makes sense... but can you do only one of fddx or fddy at a time rather than both, to make it simpler?

09:53 Elpaulo has joined #lima

10:03 <rellla> uahah

10:05 <rellla> passing 4|6 now

10:08 <rellla> To get dFdx(-x, x) as described in "Note: dFdx(x) is actually implemented as dFdx(-x, x) (same for dFdy)" we need to !negate alu->src[1] and not alu->src[0]

10:15 <rellla> argument 0 here https://gitlab.freedesktop.org/rellla/mesa/blob/ppir_fddxy/src/gallium/drivers/lima/ir/pp/codegen.h#L196

10:16 <rellla> is what https://github.com/cwabbott0/mali-isa-docs/blob/master/Utgard-PP.md describes as argument 1 which differs in the term

10:19 <rellla> so this seems to be the issue because dFdx doesn't seem to be commutative

10:20 <rellla> for the other ones, glsl-derivs-abs-sign smells like some precision issue and for glsl-derivs-swizzle i have to check the code

10:31 <rellla> ok, so both seem to be precision issues. i would be glad for any hints ...

10:31 <rellla> http://imkreisrum.de/piglit/glsl-derivs/

10:33 jbrown has quit [Read error: Connection reset by peer]

10:36 <rellla> cwabbott: fyi, as soon as i re-enable nir_lower_wpos_ytransform, all tests fail again

10:38 jbrown has joined #lima

10:38 Wizzup has quit [Ping timeout: 248 seconds]

10:45 Wizzup has joined #lima

11:38 <enunes> rellla: when I had issues with precision I tried to run the same tests with the blob, and they also failed, so I noted it in the MR submitted it anyway as it is not lima's fault

11:44 <rellla> enunes: ok, i have no blob setup here :)

11:44 <rellla> does this sound like precision issue for you: https://pastebin.com/raw/Yd9i76BK

11:59 dddddd has joined #lima

12:09 jrmuizel has joined #lima

12:41 <mardinator_> I remember now , why i first hand described things differently. Most other chips besides miaow do bring in more instructions at time via fetch module.

12:43 <mardinator_> then they will work as described before probably, if one replaces https://github.com/VerticalResearchGroup/miaow/blob/master/src/verilog/rtl/wavepool/scbd_feeder.v valid_wf with multivalue there, then it is like out of order fetching

12:46 <mardinator_> then yeah if you go out of order, to be able to change the queue column, two instruction on the in question line should be scheduled to switch into the other column

12:49 jrmuizel has quit [Remote host closed the connection]

12:52 <mardinator_> but this can be demonstrated with a little simulation of that module, giving values to logic of this module, but it is seen single values get elliminated while duplets won't

12:59 <mardinator_> it's like a two-level recursive procedure as it seems, when only 4 is scheduled, it will elliminate 4 in the upper long bitwise line, when 4 and 5 is scheduled by simd vacant=00001100..., it will elliminate 4 and then 5 will be passed with valid and f_decode_wfid=4

12:59 <mardinator_> err f_decode_wfid=5

13:05 <mardinator_> pretty sure that verilog in the spec schedules writes or rewrites before subsequent reads, worth to confirm

13:09 <mardinator_> I do not think that anyone is in particular a full idiot in this crew, so by far am I not, it is just that you have been said to stop violating me allready by many people, you just pee on yourself continuing to do that.

13:12 <mardinator_> skills generally develop only by doing practice sessions, when you instead go violating someone that is not a good model, and you are not able to capitilize on your possible talent this way.

13:16 <mardinator_> those ideas possibly only with minor drawbacks or occasional faults in case of me, they accumulate not via natural intelligence, but because i spend time every day in practicing stuff, i am not even a gamer bu i consider it better then drinking all the time with my fucked up life

13:45 jrmuizel has joined #lima

13:46 chewitt has quit [Quit: Zzz..]

13:50 libv_ has joined #lima

13:52 libv has quit [Disconnected by services]

13:52 libv_ is now known as libv

14:14 <mardinator_> maybe there are some patents involved for this, maybe fd.o guys are afraid of something, however as libv said obsess compulsive and excess power demonstration instead of ignoring or working instead, is just never something useful, the likes of stalking someone absolutely wasted time, likes of conspiring, there are much better ways to spend time more wisely.

14:24 <cwabbott> rellla: sorry for that! We had the source order backwards in the original lima project, which made a number of things more awkward, but the mesa driver + disassembler has it correct

14:25 chewitt has joined #lima

14:25 chewitt has quit [Client Quit]

14:28 <cwabbott> and yeah, dFdx and dFdy are not commutative -- they're basically an add where one of the sources comes from the pixel's horizontal or vertical neighbor

14:29 <cwabbott> the negate turns the sum into a difference

14:32 <mardinator_> yeah i remeber derivative showing the gain in respect to some time interval

14:33 <mardinator_> i was not able to read the code in greater detail, but looks like all know that it is done by summing up the lanes

14:38 <mardinator_> interlane communication indeed does this type of thing the fastest, it can also be emulated, but this is quite bad performance then

14:40 chewitt has joined #lima

14:46 Barada has quit [Quit: Barada]

15:04 <mardinator_> cwabbott: look almost ok, but you can also take derivative in respect to the first coordinate in four lane or 64 lane or whatever setup, or can't you? not only from the last element pixel

15:05 <mardinator_> since division is with higher latency the arithmetics are add and subtract

15:05 <mardinator_> on gcn this appears to be done on image unit, the compiler puts the indices properly and makes the arithmetic based of them

15:15 <mardinator_> imo just the summing up needs to be done, but just the in respect sum needs to be discarded, like it was subtracted from the result

15:23 <rellla> cwabbott: you do not have to apologize. any opinion on the nir_lower_wpos_ytransform?

15:24 <rellla> i still do not understand, what this is needed for anyway :)

15:26 <cwabbott> rellla: iirc it's for window system buffers, which for historical reasons are rendered upside-down, and for whatever reason the way gallium flips the framebuffer means that you need to flip y derivatives

15:27 <cwabbott> that pass is part of gallium hiding the flipping from the driver so you don't have to worry about it

15:28 <cwabbott> I think mali has a different way of flipping rendering that doesn't require flipping derivatives

15:28 <cwabbott> hence why the blob doesn't do it

15:31 <cwabbott> so avoiding that pass would require (a) reverse-engineering how the blob flips rendering and (b) adding, and setting, a cap + driver interface that says "I'll flip rendering myself" and wiring it up in lima

15:34 <rellla> cwabbott: ok thanks, but when this doesn't cause a problem, i wonder why it breaks my tests again. the fddy ones for example...

15:35 <cwabbott> no idea on that one :)

15:36 <mardinator_> I would do derivatives with a sampler and clamping either with mirrored repeat or based of the virtual address

15:36 <cwabbott> go through the assembly, take a look at the uniform values submitted by the driver, one of them should be wrong

15:37 <mardinator_> so you accumulate the results which are needed to two locations and later sum them together

15:37 <cwabbott> unless you're rendering to a system window it should be the same as if the pass didn't exist

15:38 <cwabbott> (I mean, as long as you're rendering to an FBO and not rendering to the implicit window-system window)

15:48 mardinator_ has left #lima ["Leaving"]

16:12 mardinator_ has joined #lima

16:19 <mardinator_> all this mentioned scbd_feeder.v does is it detects when simd arbiter schedules two instructions in program order, in other words the equivalent in that case, is last simd scheduled wfid +1, only difference is that you have to schedule the next one too for it to work to switch the column there, otherwise single scheduled instruction gets cleaved

16:19 guillaume_g has left #lima ["Konversation terminated!"]

16:23 <mardinator_> if you won't i.e like when the next one is dependent, valid_entry gets a zero wafefront turns off the vacant right away cause issued_wfid goes in but issued_valid did not due to dependency, hence valid_entry gets zero, and scbd_feeder.v elliminates the last scheduled instruction, but not giving any of the f_decode_valid sigals forward

16:25 <mardinator_> from there on it will try to schedule the subsequent instructions with giving plus 1 to the last simd wfid

16:26 <mardinator_> if none of them schedule it will get X from the round_robin.v from fetch module in other words on a complete stall it switches in the end

16:28 gtucker has quit [Ping timeout: 252 seconds]

16:28 tomeu has quit [Ping timeout: 252 seconds]

16:28 gtucker has joined #lima

16:29 xexaxo1 has quit [Ping timeout: 252 seconds]

16:29 xexaxo1 has joined #lima

16:30 tomeu has joined #lima

17:02 <mardinator_> so in case of round robin fetch it will go back to the first wavefront , in case of greedy-then-oldest, it will take the next one in the priorority list

17:04 jrmuizel has quit [Remote host closed the connection]

17:04 jrmuizel has joined #lima

17:05 jrmuizel has quit [Remote host closed the connection]

17:06 <mardinator_> hence on the full pipeline you can not ever get freeze with round-robin neither greedy then oldest, on short-pipeline you can get a freeze on both if you do not do things properly, but easier to get the freeze with round-robin

17:06 <mardinator_> there on fast pipeline

17:08 <mardinator_> if you manage to look into the code, you may notice about my talks, that we are not talking about a troll here, but nonthe less complete expert

17:15 drod has joined #lima

17:16 mardinator_ has quit [Quit: Leaving]

18:17 mardinator_ has joined #lima

18:19 <mardinator_> it does not particularly matter if there is opencl based scheduling abstraction available, such chips though are very rare that does not have Opencl EP even, powervr535 is one i know though! It does not also matter which type of scheduling is used, both will work but round robin with branching is faster

18:20 <mardinator_> you can do some type of mix too on GCN since priorities can be altered in shader, however there is not much point, since round robin the default will do nicely

18:21 <mardinator_> that was about embedded world, but desktop world has lots of desktop gpus that did not have opencl but were programmed entirely incorrectly.

18:24 <mardinator_> not only under mesa, since mesa hackers use reverse engineered priopriatery methods though don't forget NVIDIA branched their Opengl from mesa times ago, Brian Pauls stack that time, but same case for propr. stacks they implemented the driver in a wrong way.

18:25 <mardinator_> same goes for all kernels in CPU world, the schedulers are incorrectly programmed not taking advantage of sw based tomasulo derivatives

18:26 <mardinator_> in other words, maybe the hw is scripted, designers did their work properly and there was not much chanche to avoid doing it correctly either, but sw developers haven't yet taken all under control

18:26 <mardinator_> properly

18:27 jrmuizel has joined #lima

18:37 <mardinator_> Little rant over this recently i discovered vmware publishing thir verilog simulator on github, the company who acquired tungsten graphics, those guys coded like real men under pressure but there is room for very large enhancements.

18:38 <mardinator_> and i dunno maybe some of them are dealing with hw those days, it seems to be a great way of finally doing all correctly, when knowing hw you also know how to run the very last bit.

18:47 <mardinator_> Maybe there are risks in making such code available for general public, but i really in some areas will practice further with being on those directions, cause in the end i need money on my bank account too.

19:02 <mardinator_> no one lets you scam in a way that cause mart spammed the channel we utterly failed cause of the distraction, one in the past was talking how my energy waves from distance distracted him, so he did not tolerate me living too on my own.

19:05 <mardinator_> you receive very critical staring and views when trying to do such thing, you only think you pull utter crap and hope me to die right, after scoring ten thousound bans are you seeing that someone succeeded in doing it?

19:19 <mardinator_> so i do not have much more to say too, every time i have been attacked there are interfering people, what looks to be one sided fight for nutters, isn't at all like this, conflict has two sides, and there are supporters who adore me too.

19:20 <mardinator_> bye

19:20 mardinator_ has quit [Quit: Leaving]

20:54 drod has quit [Read error: Connection reset by peer]

20:55 drod has joined #lima

21:53 jbrown has quit [Remote host closed the connection]

21:58 jbrown has joined #lima

22:02 libv has quit [Ping timeout: 245 seconds]

22:03 libv has joined #lima

22:18 jrmuizel has quit [Remote host closed the connection]

22:38 drod has quit [Remote host closed the connection]

23:25 jrmuizel has joined #lima

23:47 jrmuizel has quit [Remote host closed the connection]

23:52 dddddd has quit [Ping timeout: 246 seconds]