<alyssa>
Woah woah that was something about the hardware I didn't need to learn.
<urjaman>
o.O?
<alyssa>
urjaman: The polygon list BO has to be larger than reported in `polygon_list_size`
<urjaman>
okay good, nothing exploded :P
<alyssa>
:p
<alyssa>
urjaman: How goes the Chromebooking and Panfrosting these days?
<urjaman>
i guess i'm still kinda on a break ... will get back to it soon(TM) but have had both other things to attend to and also the kernel grind started feeling too much like work
<urjaman>
i mean i am running the C201 obviously but havent touched the software in a while :P
<alyssa>
Relatable :p
<alyssa>
// Plist BO size 14E000
<alyssa>
.polygon_list_size = 0x13fe00,
<alyssa>
// body offset 20992
<alyssa>
0x14E000 != 0x13FE00
<urjaman>
oh i was at Assembly 2019 (computer festival, demoparty, lan party, whatever) and my C201 photobombed in an "official" (by assembly photo people) picture of my 3D printer
<alyssa>
Okay, 30 bytes per tile... seems arbitray
<alyssa>
---And not even right either hrmph
<urjaman>
very arbi tray indeed
<alyssa>
Never such a thing in hw..
<alyssa>
Also it's possible the blob overallocates somewhat
<urjaman>
that is a random amount to overallocate by ...
<alyssa>
urjaman: I mean I might not have anything to do with the tile count
davidlt has quit [Ping timeout: 244 seconds]
_whitelogger has joined #panfrost
davidlt has joined #panfrost
_whitelogger has joined #panfrost
_whitelogger has joined #panfrost
megi has quit [Ping timeout: 246 seconds]
davidlt_ has joined #panfrost
davidlt has quit [Ping timeout: 272 seconds]
bshah has joined #panfrost
vstehle has joined #panfrost
davidlt__ has joined #panfrost
davidlt_ has quit [Ping timeout: 244 seconds]
davidlt__ has quit [Read error: Connection reset by peer]
davidlt_ has joined #panfrost
davidlt_ has quit [Remote host closed the connection]
davidlt__ has joined #panfrost
davidlt__ has quit [Read error: Connection reset by peer]
davidlt has joined #panfrost
krh has quit [Ping timeout: 248 seconds]
pH5 has quit [Quit: bye]
davidlt has quit [Ping timeout: 245 seconds]
anarsoul has quit [Remote host closed the connection]
anarsoul has joined #panfrost
pH5 has joined #panfrost
jernej has quit [Ping timeout: 264 seconds]
davidlt has joined #panfrost
davidlt has quit [Remote host closed the connection]
davidlt has joined #panfrost
<tomeu>
Prf_Jakob: is there a profiling mode that can tell me which are the tests that take most of the time?
afaerber has quit [Quit: Leaving]
davidlt_ has joined #panfrost
davidlt has quit [Read error: Connection reset by peer]
megi has joined #panfrost
davidlt_ has quit [Ping timeout: 246 seconds]
davidlt has joined #panfrost
<tomeu>
Prf_Jakob: also, how can I print regressions and improvements but not already-known failures?
<daniels>
alyssa, urjaman: i don't think it is related to the tile count - 0x13fe00 is only aligned to 512 bytes, whereas 0x14e000 is aligned up to 4096 bytes
<daniels>
which makes sense - no sense allocating BOs which aren't aligned to page size
<urjaman>
yes, but it's not the next 4k
<urjaman>
(that'd be at 0x140000, so there's atleast 56k extra space)
<daniels>
good point, and that would also be the next 64k boundary
<urjaman>
but i suppose alyssa will figure it out, atleast if there's more examples ... tho i have one maybe silly thought
<urjaman>
does the blob perform it's own memory allocation for these things? maybe it just padded upwards to the next thing instead of leaving a tiny hole
raster has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
davidlt has quit [Read error: Connection reset by peer]
davidlt has joined #panfrost
davidlt has quit [Read error: Connection reset by peer]
<shadeslayer>
could someone shed some light on what CANARY means in a ralloc context?
<cwabbott>
shadeslayer: if you hit that assert, it probably means you passed a mem_ctx to a ralloc function that isn't actually a ralloc context or NULL
<cwabbott>
or there's some kind of memory corruption
<cwabbott>
it's a constant stored in every ralloc context and checked when you pass in a context
<shadeslayer>
aha I see
<shadeslayer>
thank you! :)
tgall_foo has quit [Read error: Connection reset by peer]
<daniels>
right - generally with allocators, any kind of failure like that means that you're using memory after free, or have double-freed, or overflowed an allocation
<shadeslayer>
what I'm confused by is the assert(info->canary == CANARY)
<daniels>
shadeslayer: is this still the Plasma thing? if so, it's probably quicker to run it under valgrind than to spend two days trying to figure out what's going out without valgrind :P
<shadeslayer>
so it will assert when the known value is found?
<daniels>
err, other way around
<daniels>
it will assert when the 'canary' member is not the static value CANARY
<shadeslayer>
uhhh ...
<shadeslayer>
assert(info->canary == CANARY);
<daniels>
yeah
<daniels>
if info->canary == CANARY, then it will continue
<daniels>
if info->canary != CANARY, it will abort
<shadeslayer>
oh right, ofcourse
<shadeslayer>
right, what I see in gdb :
<shadeslayer>
(gdb) print info->canary
<shadeslayer>
$1 = 5902598
<shadeslayer>
which is int for the hex value
<daniels>
usually you use them to implement a pretty weak form of memory-overrun detection - for instance, struct { char my_buffer[128]; uint32_t canary = 0xdeadbeef; }
<daniels>
doing that, if you write past 128 bytes of my_buffer, you'll overwrite the 'canary
<shadeslayer>
so the value seems correct, but it still asserts?
<daniels>
' member with whatever you were going to write
<daniels>
is that what you see at the assert point?
<alyssa>
I have given up pretending I know what I'm doing right now
<daniels>
shadeslayer: well, there you have it - the block at line 846 tells you that we're trying to access a pan_job after it's been free()d
<daniels>
seems like quite a mess inside panfrost_drm_force_flush_fragment()
<daniels>
for some reason we flush and free the job, but then we later end up with that job still being current and being flushed again
<daniels>
shadeslayer: which bits can I help you step through?
<alyssa>
daniels: "seems like..." You sound surprised!
<shadeslayer>
heh
<shadeslayer>
daniels: so this is essentially 2 different threads trying to free the same job?
<shadeslayer>
one of them being the blitting and the other one a eglswap?
<daniels>
shadeslayer: that shouldn't be possible, since they're within the same EGLContext, and you cannot have the same context current in multiple threads
<daniels>
shadeslayer: i would just start by printf tbh: every time you create a job, every time you assign a job to screen->last_job, every time you change screen->last_fragment_flushed (the condition which controls whether or not we try to free the job inside panfrost_drm_force_flush_fragment!), every time we free a job - print that out including the job pointer
<daniels>
and then eventually unravel why it is that we free a job that we end up later trying to use
<alyssa>
daniels: What is the deal with multithreading in GL?
<daniels>
alyssa: btw, you'd be surprised how unsurprised I can sound sometimes :P the more-numerous-by-the-week grey hairs in my beard didn't come a) from nowhere or b) without a well-practiced wary tone
jernej has quit [Remote host closed the connection]
<daniels>
anyway, assuming 'what's the deal with ... ?' wasn't a Seinfeld-style lead-in, basically you can have multiple EGLContexts created from a single EGLDisplay, but each context can only be current in at least one thread simultaneously
<daniels>
so if you have a per-FD BO cache, for example (which you do need), that needs to be mutexed because it's entirely possible for multiple contexts to be working simultaneously on the same device
<alyssa>
Aaahh ok
<daniels>
however, pretty much all the GL state and objects are context-local, so you don't have to e.g. mutex every single FBO access
pH5 has quit [Quit: bye]
davidlt_ has joined #panfrost
jernej has joined #panfrost
<shadeslayer>
daniels: re printf, lovely
<daniels>
shadeslayer: (and from there you can start to drill through the call tree and surrounding context to find out _why_ those jobs are being allocated/assigned/flushed/freed when they are, and thus where the logic error is that leads us to be using a job we've already freed)
<shadeslayer>
roger, I'll try to spend some time on this, though I'm on vacation starting tomorrow
* alyssa
tries to figure out unk0 encoding
<alyssa>
It's very nearly linear
<alyssa>
I'm guessing they have some alignments thrown in or something
<shadeslayer>
so might really only get to it by mid next week
<daniels>
shadeslayer: oh, so you are - we can pick it up next week then
<shadeslayer>
aye
<shadeslayer>
daniels: alyssa do you reckon these BO Cache + MIR patches can be merged?
raster has quit [Remote host closed the connection]
<shadeslayer>
maybe I can finish polishing them up, since this use after free is a separate issue?
<shadeslayer>
or do you reckon it's better to fix it all up?
<daniels>
well, if the MIR-iterator patch is positively reviewed then we could definitely stick that in
<daniels>
but is the pan_job UAF definitely not related to the BO cache?
raster has joined #panfrost
<shadeslayer>
daniels: I mean ... it probably is, just wasn't sure if that would block things since without that BO Cache you'd still hit similar issues with importing bo's
<alyssa>
(Actually the X axis is off-by-one, and the Y should be in hex to be legible, but meh)
<daniels>
shadeslayer: tbh I think it's best to just leave the BO cache until we at least understand what the failure is
<shadeslayer>
daniels: ack
<daniels>
but pushing Boris's MIR iterator patches sounds like a good idea
<daniels>
alyssa: ^?
<alyssa>
daniels: Pretty sure I r-b'd it
<alyssa>
And if I didn't I'm pretty sure I had a reason not to
<daniels>
heh
<daniels>
shadeslayer doesn't have commit rights tho, and Boris is on holiday this week
<alyssa>
daniels: Oh, yeah. I was waiting on a v2 from Boris.
<daniels>
(i love the sublinear behaviour around multiples of four! beautiful)
<alyssa>
(and the v2 would include a list.h change so I couldn't push myself anyway, would need a review)
davidlt_ is now known as davidlt
<daniels>
fair enough
<alyssa>
(Aside: matplotlib is stupidly easy to use. Like I already had my data in my Python notebook... it was just a few lines later to get a pretty graph)
<daniels>
'notebook' like jupyter, or?
<alyssa>
notebook like vim, random paper, and a pencil
<alyssa>
notebook like vim, random paper, and a pencil
<alyssa>
oops
<alyssa>
So the nice thing about the sublinear behaviour is that I can compute forward differences to get something more legible:
<alyssa>
er, I guess quasi-linear? piecewise linear?
<alyssa>
And so I can pretty easily fit a curve to the forward differences --
<alyssa>
clearly (32 if last of 4, 64 otherwise) | (1 if ??? else 0)
<alyssa>
And then manipulate it that way
<alyssa>
Although I *strongly* suspect the graph we're looking at is from rounding an unaligned product
<alyssa>
That's alright. We'll see soon enough.
<alyssa>
s/|/^/
<alyssa>
Er, not even.... hm
<alyssa>
Okay, so if we take the table and & ~1, we can ignore that bit for now
<alyssa>
Oh hum huh
<alyssa>
Alright. Have stuff in Python, now onto the notebook :p
<alyssa>
daniels: Re the "unaligned rounding" --- just as a visualization exercise, imagine zooming in on an oblique line drawn with Bresenham's and no antialiasing, and then drawing smooth lines between connected pixels
<alyssa>
It would look something like that graph, yeah?
<daniels>
heh
<alyssa>
Granted I'm not much of a visual person so I'm not sure why I'm doing it this way but you know :p
<alyssa>
I don't usually use graphs; it's nice to change things up sometimes :)
tgall_foo has joined #panfrost
pH5 has joined #panfrost
unoccupied has quit [Ping timeout: 268 seconds]
<alyssa>
2 pages of maths (by hand) later and I have a nice closed form expression ... I think ...
* alyssa
is going to have to try symbolic computing for RE at some point
raster has quit [Remote host closed the connection]
raster has joined #panfrost
<alyssa>
Okay, yes, my closed form expression is correct (Pythonified it)