ChanServ changed the topic of #lima to: Development channel for open source lima driver for ARM Mali4** GPUs - Kernel has landed in mainline, userspace driver is part of mesa - Logs at https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=lima and https://freenode.irclog.whitequark.org/lima - Contact ARM for binary driver support!
camus has joined #lima
kaspter has quit [Ping timeout: 265 seconds]
camus is now known as kaspter
camus has joined #lima
kaspter has quit [Ping timeout: 246 seconds]
camus is now known as kaspter
camus has joined #lima
kaspter has quit [Ping timeout: 265 seconds]
camus is now known as kaspter
kaspter has quit [Ping timeout: 265 seconds]
kaspter has joined #lima
cp has quit [Ping timeout: 245 seconds]
piggz has quit [Read error: Connection reset by peer]
piggz has joined #lima
cp has joined #lima
kaspter has quit [Remote host closed the connection]
kaspter has joined #lima
Barada has joined #lima
kaspter has quit [Ping timeout: 265 seconds]
kaspter has joined #lima
cp has quit [Ping timeout: 265 seconds]
Barada has quit [Quit: Barada]
cp has joined #lima
cp has quit [Quit: Disappeared in a puff of smoke]
cp- has joined #lima
maccraft123 has joined #lima
maccraft123 has quit [Quit: WeeChat 2.6]
kaspter has quit [Ping timeout: 240 seconds]
kaspter has joined #lima
maccraft123 has joined #lima
maccraft123 has quit [Client Quit]
maccraft123 has joined #lima
dddddd has quit [Remote host closed the connection]
maccraft123 has quit [Read error: Connection reset by peer]
maccraft123 has joined #lima
yann|work has quit [Ping timeout: 240 seconds]
maccraft123 has quit [Quit: WeeChat 2.6]
<rellla> anarsoul: with your branch i get 4|4 passes but still a kernel error. i'm starting a complete run now.
warpme_ has joined #lima
maccraft123 has joined #lima
maccraft123 has quit [Client Quit]
_whitelogger has joined #lima
yann|work has joined #lima
<rellla> anarsoul: use.index_array.array still screws sth up, so i will do a run without the test
<rellla> i wonder if that could be some issue with 32/64bit?
<rellla> glGenBuffers(1, 0x0000ffffd83e1174); seems to deal with a 64bit pointer, which i don't know if we handle that correct!?
<rellla> the resulting pictures is kind of correct in the lower right area...
<rellla> can be bogus finding, too :p
<rellla> maybe we should ignore that issue for now and solve it later?
ecloud_wfh is now known as ecloud
maccraft123 has joined #lima
maccraft123 has quit [Quit: WeeChat 2.6]
camus has joined #lima
kaspter has quit [Remote host closed the connection]
camus is now known as kaspter
maccraft123 has joined #lima
maccraft123 has quit [Client Quit]
maccraft123 has joined #lima
smaeul has quit [Ping timeout: 240 seconds]
maccraft123 has quit [Read error: Connection reset by peer]
megi has joined #lima
smaeul has joined #lima
monstr has joined #lima
adjtm_ has joined #lima
adjtm has quit [Ping timeout: 276 seconds]
kaspter has quit [Quit: kaspter]
kaspter has joined #lima
maccraft123 has joined #lima
maccraft123 has quit [Client Quit]
maccraft123 has joined #lima
maccraft123 has quit [Quit: WeeChat 2.6]
<rellla> hm, no. we should solve that now :)
<rellla> my guess is, that the buffer size in glBufferData() is also relevant. http://imkreisrum.de/deqp/vf_1/result_single.xml uses 763.
<rellla> if i change that to 80, i get less lines and the test passes. 128 results in a blue square for both, result and reference picture, and 763 displays the right lines at the beginning of the draw but then sth goes wrong.
<rellla> so imho it's some buffer/memory related issue...
maccraft123 has joined #lima
maccraft123 has quit [Quit: WeeChat 2.6]
monstr has quit [Remote host closed the connection]
<rellla> heyo, i think i found sth related.
<rellla> info->count always comes with a MAX of 129 to lima_draw. so the above test uses a count of 763 and this might be the reason, why it breaks.
<rellla> the last working number of vertices is 126: http://imkreisrum.de/deqp/vf_3/result_single.xml
<rellla> whereas 127 shows wrong output: http://imkreisrum.de/deqp/vf_4/result_single.xml
<anarsoul> rellla: kernel error is definitely not good
<rellla> anarsoul: the first thing i want to solve is, why i get that maximum of 129 number of vertices !?
dddddd has joined #lima
<rellla> ok, so i stop for now. somwhere between deqp and mesa the nr is shrinked to 129 :(
<anarsoul> what number are you talking about?
<rellla> hooray, they pass. found it.
<anarsoul> what was the issue?
<rellla> analyize the issue is up to you :p
<rellla> let me explain, what i did:
<rellla> setting this value high enough to do all in one draw solves the tests (but not the issue probably :)
<rellla> mesa seems to correctly split the draws (for testing purpose i set max_verts to 50)
<anarsoul> is it a regression?
<rellla> it seems, the loop is executed only once. in mesa debug, i get one glDrawElements and one glDrawArrays
yann|work has quit [Ping timeout: 252 seconds]
<rellla> for one draw or to do all in once.
<anarsoul> rellla: issue should be fixed in lima, not in deqp
<rellla> yeah
<rellla> i'll look into that later. at least i know, that the issue is somehow related to nr of vertices
<rellla> it seems lima is not able to draw more than 125 vertices correctly in one draw.
<anarsoul> that's probably for line strips
<anarsoul> not sure if it's correct though since enunes' branch actually fixes glmark2 -b refract which has more than 65k vertices
<anarsoul> maybe we under-allocate some buffer?
megi has quit [Ping timeout: 250 seconds]
yann|work has joined #lima
rembrandt83 has joined #lima
<rembrandt83> Finally got my laptop from repair, power button was replaced.
<anarsoul> rembrandt83: great
<rembrandt83> So I looked at proper lookup table indexing from bitfields , this is one of the most sophisticated problems to me so far.
<rembrandt83> how to remap roots and logarithms to make them work with low latency
<rembrandt83> the tables seem easy to me, and probably one can index into them with filtering units or texture mapping units, but i haven't got much experience and knowhow on vertex shaders.
<rembrandt83> so you have some bitfield combination with filtering some power of two out from the modulus operation, now you have a base and remainder of that operation, this should be the index, how to clamp it into correct register is the problem.
<rembrandt83> the mirror odd even filtering mode seems good, but is there something similar in hw for vertex programs too, if texture lookup is missing on the vertex shader?
<rembrandt83> glpointsize has some hw clamping stuff, but i yet have not identified how that works
<rembrandt83> so the issue is, the coord normalization methods are quite oftenly used as divisions and logarithms and roots probably
<rembrandt83> you may have several twos complement cache buffers to get the info from, i am not sure how i redirect it cheaply to the correct one
megi has joined #lima
<rembrandt83> transform_feedback isn't kinda there for mali 400mp i assume or is there this chanche, some describe something like vertex texture fetch VTF. without PBOs this is going to put the passthrough fragment program texture info back to CPU and then into VBOs
<rembrandt83> what i think either some intelligent rounding instruction is needed on vertex shaders or every divisor of power of two needs to have a priority encoder to the constant cache for instance, to fetch the magic number from there, since there is 32powers then 32 buffers
<rembrandt83> anyhow all this is very complex. i am not entirely in woods with this one, having read literature in stacks too
<rembrandt83> but i do not like that fast memory as data cache is wasted on page memory while they can be iteratively pinned to store precomputed values of long latency ops
<rembrandt83> page tables are not very intelligent to waste cache on
<rembrandt83> most those theorems are bit overkill to me too though, highly hard to follow especially if i can not rely on very good math skills, which i never had.
<rembrandt83> so after having cracked a single one like lagrange quadtratic interpolation with resillient reading and following, and it feels like a lot of unoptimized one , all the time is wasted since there literally gazillions of theorems on the net
<rembrandt83> the data path is always going to be coarse grain sort of, but hw makes a lot of 32bit 4byte cache entries and regs available to be used, so it isn't much of a problem to waste them on priority encoder
<rembrandt83> since accessing those coarse grain regs will be always very fast on short pipeline mode
maccraft123 has joined #lima
<rembrandt83> effective address calculator has enough components but it all is pretty perverse to think about those solutions every day with no possibility to negotiate with anyone to cooperate
<rembrandt83> alone doing it is pretty mad, but this is all i got at the moment injuries took my out of other events long time allready, i think i will branch out doing all my own anyways
<rembrandt83> my/me
rembrandt83 has quit [Quit: CGI:IRC (EOF)]
maccraft123 has quit [Quit: sleep]
cwabbott has quit [Remote host closed the connection]
cwabbott has joined #lima