<anarsoul>
I have strong suspicion that writing the buffer and then reading it back could be faster than traversing large tile heap
<anarsoul>
but I don't think that we have any benchmarks that hit tile heap size limit
<enunes>
anarsoul: yeah I agree if it would be too much effort to benchmark that, it might not be worth it now and we should just go with what you proposed
<enunes>
I did some trying with the mali driver some days ago, I think it does split the draw at some point too, but I dont have the data now
<enunes>
from what I saw also the heap seems to be allocated and maintianed by user space (?) and if it runs out of heap it just notifies user space to try again
<enunes>
so in the kernel there was no hard limit until the system runs out of memory that can be allocated to the gpu
<enunes>
doesnt any of the glmark samples do a lot of draw calls of could at least be modified to do a lot of draw calls just for a quick sanity check?
<anarsoul>
enunes: I just set the limit to 2 for sanity check and fixed deqp failures
<anarsoul>
basically we need to propagate resolve if we split the job since we can't rely on whether subsequent draws will write anything into depth/stencil or color buffers
<enunes>
the horse results in 1400 draw calls, drops from 18 to 14 fps with the MR
<enunes>
and looks like the depth buffer is not working properly, dont know if you are still working on that
<anarsoul>
hmm
<anarsoul>
do you have a screenshot?
<anarsoul>
it passed all deqp tests but GLES2.functional.fragment_ops.depth_stencil.stencil_depth_funcs.stencil_gequal_no_depth which seems to be related to stencil