<Lyude|PTOish>
that sounds like it was painful to debug
<tomeu>
I think I was lucky :)
<davidlt>
wow, finally
<davidlt>
Intel released thunderbolt 3 spec, which becomes USB 4
<HdkR>
So all of our USB 4 controllers are going to be huge and overpriced? :p
<raster>
they will contain an i7 just in the controller :)
<HdkR>
Hopefully once someone other than Intel releases a controller then the firmware will be able to be updated
<raster>
(though what's new, just about every hdd has an arm cpu in it... and tonnes of raid controllers are little arm machines ad arc bmc's etc.) :)
<HdkR>
aye
<HdkR>
Samsung's SSDs have like...five CPU cores in them now
<raster>
not surprising...
<raster>
moar cores!
<raster>
:)
<HdkR>
Cortex-R4s or something last I knew
<raster>
cortex r's?
<raster>
they are really being paranoid
<HdkR>
Back in their three core designs I knew they separated the logic that one core handled reads, one handled writes, and one handled misc things. Not sure what happens on the new additional cores
<raster>
i wonder if ever gpu's will just become extra cpu cores with different scheduling (and some texel fetch/write/blender instructions etc.)
<HdkR>
You mean Larrabee? :P
<raster>
well larrabee was a discrete gpu right?
<HdkR>
Which was effectively a ton of Pentium 3 CPUs with some additional GPU-esque instructions
<raster>
just happened to be x86
<raster>
well i was more thinking regular cpu cores just with insane SMT
<HdkR>
Or pentium? I forget now
<raster>
i thinbk they were ye olde pentium
<HdkR>
But you can see how that project failed
<raster>
but imagine a cpu that instead of running until a stall, then switching to another vcpu (diff reg bank etc.)
<raster>
it would just run 1 instruction then hw bank switch to another context
<raster>
and it could hold like 32, 64, 128+ of these per core
<raster>
so you'd hide stalls in ctx switches
<HdkR>
You're just describing a GPU at that point
<raster>
yup
<HdkR>
:P
<raster>
it's nota very big leap between gpus and cpus these days
<raster>
so why really have them be so different?
<HdkR>
and if you were mad enough, you could use ARM's SVE instruction set as a GPU. Make a GPU using AArch64 + 2048bit SVE
<raster>
it's missing texel fetch/interpolation stuff in hw
<raster>
no concept of tiled mem layout etc.
<HdkR>
Glue some additional texture fetch pipelines on
<HdkR>
:D
<raster>
yup
<raster>
that's all it really needs... :)
<raster>
with just a lot of hw "SMT" switching (instead of 2 or 4 as you see today - 32x, 64x etc.)
<raster>
your gpu is really just some daemon you ipc to :)
<raster>
(that daemon is scheduled exclusively on these wide smt cores)
<raster>
i dont know why larrabee failed but rumors were that intel underestimated the sw work needed to actually write the "gpu"
<HdkR>
I figured perf/w wasn't anywhere near where it needed to be to be competitive
<raster>
i wonder if that was a sw or hw problem tho...
<raster>
was it they just were taking too long on the sw to maximize the hw utilisation...
<HdkR>
Could be both :P
<raster>
or the hw just wasn't capable enough...
<davidlt>
Xeon Phi (KNC, KNL, etc) also didn't fly
<raster>
so i hear - xeon phi was an offshoot of larrabee
<davidlt>
yes
<davidlt>
KNL was the 1st true product, KNC was kinda beta-testing
<davidlt>
but it was complicated, two types of RAM on the board
<davidlt>
SMT4, AVX-512 (but not all extensions), SSE4, AVX were considered legacy IIRC
<davidlt>
we always struggled to efficiently use it
<raster>
tho to go against gpu's you'd need to seriously up the core *AND* smt count
<raster>
i thought they only did 4 way smt on phi
<davidlt>
Yes, 4 SMT
<raster>
you'd need more like 32x
<davidlt>
and really complicated memory setup
<raster>
or 64x
<raster>
as you need to probably just make a very dumb scheduler that does 1 instruction, then switch so to hide stalls you'd need a lot of cores to fill in the cycles
<davidlt>
soon we will know what Intel is doing with GPUs, Gen 11 stuff will be discussed at GDC IIRC
<raster>
let's see
<davidlt>
Gen 11 is like a base for their Xe
<raster>
i wonder if this ended up coming from their gpu group or the larrabee guys (where they finally managed it and it just took longer)
<davidlt>
come on, they hired loads of people incl. some legendary folks from AMD
<raster>
what i have on my desk is a bit xeon-phi like
<davidlt>
I think, it's at least 3-4 years for them on this project
<raster>
tho i suspect its still beefier per core
<raster>
so i can imagine how you'd build a gpu out of this many cores :)
<alyssa>
mifritscher: You are the 0.1%! :P
<alyssa>
tomeu: I don't have a script to check the code style, no
<alyssa>
Not sure what G-S is missing, probably just a ton of debugging
<alyssa>
tomeu: Oh, fwoosh, yes, you're right. Good catch :) Send a patch? :P
<raster>
davidlt: well i wonder what they based it on - their existing gpu designs just dialed up to 11, or on larrabee like design... :)
<davidlt>
raster, Gen 11 is a new design
<raster>
like from scratch?
<davidlt>
from my understanding
<davidlt>
the execution units are significantly smaller IIRC
<raster>
hmmm then i wonder what it looks like
<davidlt>
and they are doubling the count, from 24 to 48 on iGPU side
<raster>
smaller than their gpu designs or their larrabee design?
<davidlt>
there was a leaked benchmark and it outperforms any integrated AMD solutions based on Vega
<raster>
hmmm
<davidlt>
smaller compared to Gen 9.5
<raster>
but this is discrete
<davidlt>
this is iGPU
<raster>
hmmm ok so that means they can get more cores on
<raster>
oh i thought they were doing a discrete gpu?
<davidlt>
there was a leaked benchmark for Ice Lake
<davidlt>
They are starting with Gen 11 as major step, that's iGPU
<raster>
hmm then whats the relation to the discrete gpu rumors?
<davidlt>
but it's base for them for their 2020 Xe project (which scales from iGPU all the way to datacenter)
<davidlt>
it's a starting point for them
<raster>
hmmm
<davidlt>
one step before the big thing
<raster>
will be interesting to see
<davidlt>
and Gen 11 is like 100-130% (or more in some cases) according to leaked benchmark
<davidlt>
(compared to previous generation iGPU)
<alyssa>
Y'all are noisy :p
<raster>
yup
<raster>
:)
<tomeu>
alyssa: I'm unsure on how this should be fixed, as I'm missing some knowledge on what's the design like
<raster>
davidlt: well double+ is a good leap for team blue.
<tomeu>
alyssa: is the overwritten pointer supposed to be stored somewhere else?
<alyssa>
tomeu: Let me se
<alyssa>
It's been a while since I touched that code
<tomeu>
guess my first doubt is why are we allocating with malloc the levels, then allocating a BO for the AFBC
<alyssa>
tomeu: The overwritten pointer should be freed, I guess, and then we should check for afbc to decide whether to free() or not
<alyssa>
tomeu: Essentially though this is a bigger design issue, I guess -- the way it's setup, anything texture-like defaults to a tiled texture, but if you try to render into it, it turns itself into an AFBC resource
<alyssa>
That's.... probably wrong :p
<tomeu>
ok, I can do that, though I wonder if we shouldn't be delaying the allocation so we don't allocate something that doesn't end up being used
<tomeu>
ah, thought everything would be AFBC except possibly some render targets
<alyssa>
We probably should be, but how late do we delay?
<alyssa>
tomeu: The problem with making everything AFBC is we have no way to compress into AFBC from software
<tomeu>
probably until the same point in time when we allocate now the AFBC buffer?
<alyssa>
AFBC buffer is allocated from set_framebuffer_state
<alyssa>
You're uncovering a massive kludge of hacks right now :p
<HdkR>
I forget, is AFBC lossless or lossy?
<raster>
A
<HdkR>
and don't tell me "visually lossless" is lossless :P
<HdkR>
"The format preserves original image exactly (bit exact), and compression ratios are comparable to other lossless compression standards."
<alyssa>
HdkR: lossless :p
<HdkR>
Totally fair if you're converting resources to AFBC when it becomes an RT as long as you can later sample from AFBC formats :D
<HdkR>
More of an issue if you couldn't sample from it later...
<tomeu>
alyssa: ok, maybe we should do some erfactoring before fixing this
<tomeu>
alyssa: sorry if today looks like I'm patch-bombing too much :)
paulk-leonov has quit [Ping timeout: 250 seconds]
paulk-leonov has joined #panfrost
pH5 has quit [Quit: bye]
robertfoss has quit [Quit: WeeChat 2.3]
robertfoss has joined #panfrost
pH5 has joined #panfrost
<HdkR>
tomeu: You have a typo in your latest blog post. You called Bifrost Bitfrost :P
stikonas has joined #panfrost
<raster>
frosty bits...
<raster>
:)
belgin has joined #panfrost
raster has quit [Remote host closed the connection]
raster has joined #panfrost
belgin has quit [Ping timeout: 259 seconds]
belgin has joined #panfrost
afaerber has quit [Quit: Leaving]
raster has quit [Remote host closed the connection]
belgin has quit [Ping timeout: 258 seconds]
raster has joined #panfrost
<raster>
alyssa: any reason you abort a clear if color is NULL?
<raster>
or well complain about it and try and clear if it wasn't cleared during a partial render?
<raster>
it may be that it never clears at all for any reason :)
<raster>
so "!ctx->frame_cleared" in panfrost_flush() doesn't mean it'
<raster>
it's partial rendering
<raster>
if it never cleared that panfrost_clear() will be a nop and so won't be doing what the comment says above :)
raster has quit [Remote host closed the connection]
bbrezillon has quit [Ping timeout: 272 seconds]
bbrezillon has joined #panfrost
afaerber has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
stikonas has quit [Remote host closed the connection]
stikonas has joined #panfrost
bnieuwenhuizen has quit [Quit: No Ping reply in 180 seconds.]
hopetech has quit [Ping timeout: 272 seconds]
hopetech has joined #panfrost
bnieuwenhuizen has joined #panfrost
tlwoerner has quit [Quit: Leaving]
<alyssa>
tomeu: Flooding my inbox with patches is more than welcome! :)
<alyssa>
raster: Old bug, the code I pushed last night should fix that