ChanServ changed the topic of #linux-rockchip to: Rockchip development discussion | Wiki at | Logs at | ML at
jwerner has quit [Client Quit]
jwerner has joined #linux-rockchip
nighty has joined #linux-rockchip
wzyy2 has joined #linux-rockchip
cnxsoft has joined #linux-rockchip
indy has quit [Ping timeout: 256 seconds]
indy has joined #linux-rockchip
JohnDoe_71Rus has joined #linux-rockchip
beeno has joined #linux-rockchip
geekerlw has joined #linux-rockchip
lkcl has joined #linux-rockchip
IgorPec11 has joined #linux-rockchip
JohnDoe0 has joined #linux-rockchip
JohnDoe_71Rus has quit [Ping timeout: 240 seconds]
lkcl has quit [Ping timeout: 264 seconds]
IgorPec11 has quit [Ping timeout: 260 seconds]
lkcl has joined #linux-rockchip
geekerlw has quit [Quit: Page closed]
<LongChair> stdint: i made some further investigations
<LongChair> and i'm wondering something
<LongChair> if vpu decodes 4K frames in NV12
<LongChair> that makes roughly 3840*2160 * 1,5 bytes per frames right ?
<stdint> right
<LongChair> which is roughly 12.4 Megs per frames
<LongChair> at 60 fps that would make 764 Megs /s
<LongChair> and the Tinker mem bandwith is roughly 850 megs /s
<LongChair> that doesn't include the input data bandwith
<LongChair> given that i would say that there is no chance it can decode & render that as VOP would also consume about the same bandwidth
<LongChair> is that a valid thinking ?
wadim_ has joined #linux-rockchip
<stdint> LongChair, you forget the MV data
<LongChair> MV data ?
<stdint> it is not 12.4Megs per frame
<LongChair> it's more you mean ?
<stdint> it is about 16.5 Megs per frame
<LongChair> that exceeds the meme bandwith ... 990 M/s
<stdint> also the vop won't consume so much, it depends on the resolution of the target screen
<LongChair> so i have two possibilities
<LongChair> 1 - find a way to OC the ram ... i looked yesterday, but could not find where that clock is set
<LongChair> 2 - decode to 1080p frames ... but dunno if vpu output frame size can be specified
<LongChair> stdint : any clues on any of those two topics
<stdint> LongChair, about the topic 2, yes it certainly is
paulk-collins has joined #linux-rockchip
<stdint> you could set the output resolution in drm
<LongChair> well that won't chaneg the memory bandwidth
_whitelogger has joined #linux-rockchip
<LongChair> that would reduce the bandwidth by like 4
<LongChair> because i don't see how drm woudl influence the frame size that vpu will output ...
<stdint> LongChair, no, the vop scaling won't read all the stride
<stdint> so the bandwidth would decrease
<LongChair> at this point i am not using output, i still have problem with decoder and the memory bandwidth it will use
<LongChair> so VOP is not in the picture yet
<LongChair> i'm talking VPU & mpp
<stdint> "render that as VOP would also consume about the same bandwidth"
<LongChair> yeah i understand
<LongChair> but if vpu already consumes all the memory bandwidth that will not change anything
<LongChair> and i think that is what happens
mac-l1 has joined #linux-rockchip
<stdint> that is why there is a output queue
<stdint> the display and decoding are async
<LongChair> so can i with mpp specify the size of the frames that vpu will generate ?
<stdint> LongChair, you can't
<stdint> or you can't decode all the MB at once
<LongChair> MB ?
<LongChair> do you agree that VPU will generate 16.5 MB frames in memory and will then be limited by the memory bandwidth ? (even without doing any display, so no VOP)
<LongChair> the only way to overcome this would then be to allow to specify the size of the frames that are coming out of VPU. you could read a 4K stream and decode it into 1080p frames
_whitelogger has joined #linux-rockchip
<stdint> LongChair, Marcoblock
<LongChair> i have ni idea what that is
<stdint> LongChair, you can't do that
<LongChair> ok then 4K playback will be limited in terms of fps
<LongChair> roughly to 24/30 fps
<LongChair> it's not possible to get over this because of memory bandwith limitation
_whitelogger has joined #linux-rockchip
<mac-l1> stdint: hi guys. i believe the vpu hw has postprocessing capabilities that can downscale the output frame.
<mac-l1> those are not implemented in current mpp: see
<stdint> mac-l1, yes it is
<LongChair> because the mem bandwidth i'm getting on a S905 device is even lower ... like 700 M /s
<stdint> mac-l1, but it is not used for decoding
<stdint> just for display
<mac-l1> ah, ok.
<stdint> and only JPEG decoder may use it
<LongChair> mac-l1: i think i understood what was happenning with 4K decoding
<stdint> as I am writing it
<LongChair> and that's not super good news ...
<stdint> is you scaling the output frame, it would break the frame prediction
<stdint> but I know the vp9 support it, but the rk3288 doesn't support vp9
<LongChair> stdint: AFBC is not something that is supported right ?
<stdint> AFBC?
<LongChair> ARM FrameBuffer Compression
<stdint> I am not sure about that, it is about display system
<LongChair> i know Mali supports it .. but in our case, would require VPU and VOP to support this
<LongChair> it divides the frame size by about 2
<stdint> no it doesn't
_whitelogger has joined #linux-rockchip
<LongChair> like can you run `hdparm -T /dev/mmcblk0`on RK3399 ?
<stdint> I don't have rk3399 running Linux currently
<LongChair> ok
paulk-collins has quit [Remote host closed the connection]
<stdint> you may ask paulk-collins later
<stdint> LongChair, also all the rk3399 are using a 32 bit userspace now
<amstan> stdint: archlinuxarm uses 64 bit userspace on 3399
<amstan> works pretty well
<stdint> anyway, what I got at rk3288 tinker is Timing cached reads: 1794 MB in 2.00 seconds = 897.21 MB/sec
<amstan> haven't tried much graphics, tough i hear it works too
<stdint> amstan, just internally don't use it
<LongChair> stdint: yeah i'm getting the same roughly here
<LongChair> but 897M / 16,5 = 54 fps
<LongChair> only decoding
wzyy2 has quit [Ping timeout: 240 seconds]
<stdint> LongChair, 1190.81 MB/sec in 32 bit system
<LongChair> pretty nice
<LongChair> stdint: by the way, could you ask someone if there is a way to try OCing the RAM. I found some stuff on older boards in the forums, but the tinker dts files don't seem to have the same stuff
<stdint> LongChair, in clock parameters is in a dts files
<stdint> in the u-boot
<LongChair> i looked both at the miniarm dts and the rk3299.dtsi but didn't find anything
<stdint> LongChair, search rockchip,pctl-timing
<stdint> LongChair, not only clock all the parameters have to be adjusted
<LongChair> do we know what they are ?
cnxsoft1 has joined #linux-rockchip
cnxsoft has quit [Ping timeout: 246 seconds]
cnxsoft1 is now known as cnxsoft
ganbold has joined #linux-rockchip
nighty has quit [Quit: Disappears in a puff of smoke]
matthias_bgg has joined #linux-rockchip
wzyy2 has joined #linux-rockchip
<ayaka> LongChair, it depends on whether you are familiar with DMC
<LongChair> DMC ... not sure what tha is
<wzyy2> HI, Longchair, which kernel do you use? tinker or rockchip-linux?
<LongChair> rockchip linux is what we're one
<LongChair> on*
<wzyy2> Tinker kernel have ddr dvfs and rockchip-linux don't have.
<LongChair> what is that ?
<LongChair> ggogling says it's ram freq adjustement or something like that right ?
lkcl has quit [Ping timeout: 240 seconds]
<LongChair> @wzyy2 : so that allows dynamic voltage and frequency scaling ... hmmm
<LongChair> so that would allow to lower ram freq & voltage in certain situation ... can the lpddr3 run higher than 533Mhz ?
<wzyy2> I haven't try overclock ddr.
<wzyy2> You can change the value in dts.
<wzyy2> - - Maybe it can work....
nighty has joined #linux-rockchip
<LongChair> how is the operating point determined ?
<wzyy2> = =
<LongChair> yeah i mean there are 3 freq, which each one of these freq correspond to. it's usually associated to some state
<LongChair> like when is 200mhz used ... when 333Mhz is used a;d 533Mhz ?
<wzyy2> 200000 is freq, 1000000 is voltage(In tinker board, it's fixed).
<LongChair> yes i got that
<LongChair> but there are 3 operating points
<LongChair> 200/333/533 Mhz for ram freq
<LongChair> how does it switch between these points
<wzyy2> just like cpu.
<wzyy2> If system is busy, it will switch to high freq.
<wzyy2> If I remember correctly, it use "simple_ondemand" policy.
<LongChair> ok
wzyy2 has quit [Ping timeout: 260 seconds]
paulk-collins has joined #linux-rockchip
stdint has quit [Ping timeout: 260 seconds]
stdint has joined #linux-rockchip
wzyy2 has joined #linux-rockchip
<wzyy2> I read you guys chat log.
<wzyy2> I think you have misunderstand cpu ram write speed and ddr bandwidth.
<wzyy2> cpu ram write speed : 990 MiB/s
<wzyy2> ddr bandwidth: 533 * 2 * 8 = 8.32 GiB/s
<wzyy2> Usually 4k video decode use up to 2 GiB/s memory bandwidth.
<wzyy2> I think ddr bandwidth is not the limition.
<wzyy2> If you suspect ddr bandwidth, you can test performance with xserver closed which will stop using gpu and leave some bandwidths.
<LongChair> @wzyy2 : those figures are not what we're getting with hdparm -T /dev/mmcblk0
<LongChair> so that might be cpu limitation then
<wzyy2> It's cpu memory read wirte speed, not vpu.
<phh> that doesn't really feel like rk3288 ;)
<phh> but on there, the CPU memory width is as large as VPU's
<phh> oh, but RAMs are 4x128 and cpu is only 1x128
<phh> ok
<phh> but then, you need to have the VPU hit a different DMC than VOP, how is this done?
<wzyy2> - - 1 GiB/s is reasonable, i think.
<phh> well there is no board with 4k60 decoding there ;)
<phh> wzyy2: but according to the scheme you gave, CPU, VPU and VOP has the same bandwidth. it could indeed not be shared, so it would be ok. still, for 4k60, it needs ~ 950MB/s on the bus, and CPU on the same bus has 880
<phh> LongChair: can you check with mbw?
<phh> possibly actual bandwidth is higher, just hdparm is not reliable enough
<wzyy2> It's not rk3288...
<wzyy2> It's arm's a73 reference design.
<phh> ok
<phh> (when do we get firefly-rk3400 with cortex-a73, mali g71? ;) )
LongWork has joined #linux-rockchip
<wzyy2> CPU have more stages to access memory than other ip, which limit its speed.
<wzyy2> It seems rk3399 are not sold well, so i guess rockchip will not be interesting in high performance chip recently.. ; )
<phh> pfff, I've ordered two rk3399 (firefly and chromebook plus) and still haven't received anything
<wzyy2> Market capacity is too small, high-performance chips are mostly used for mobile phones, not tv box, tables, industries.
<LongWork> @wzyy2 : i'm interested in investigating all those performance issues
<LongWork> i suppose it's very late where you are
<wzyy2> not late, 9:30
<LongWork> ok, so you're saying that RAM has a BW of 8 GB/s
<LongWork> then cpu has only 990M/s
<LongWork> but when decoding a 4K video, the only thing that goes from/to cpu is pushing the packets so that should not be significant
<wzyy2> yep
<LongWork> when i do just pure VPU decoding with my code
<LongWork> VPU will generate 4K frames which ayaka said were roughly 16.5 M / frame
<LongWork> i'm rtying to reach 60 fps on a few videos
<LongWork> so that would be about 990M/s coming out of VPU->RAM
<LongWork> I have clocked the VPU for HEVC to 600Mhz
<LongWork> and i would expect this to allow to reach that framerate on HEVC/main10/at about 60 mbits
<LongWork> and i have hard time to reach that
<LongWork> for some reason in my setup with last release kernel, there is still the same dw_hdmi issue not being able to set the 4K frequencies
<wzyy2> just revert it - -...
<LongWork> yeah ... but the issue is still there :)
<wzyy2> rockchip-linux kernel don't support 4k 60hz for rk3288..
<LongWork> the wierd part is that i don't set any mode, and i have a 1080p AVR in between so i wouldn't expect it to pick a 4K mode as AVR should not return an EDID with 4K modes
<LongWork> most of the other devices i have will remain in 1080p mode when i go thru the avr
<LongWork> they will show 4K only if i plug the device directly to tv
<LongWork> anyways, not sure hat happens there, but if i have the hdmi cable plugged in, the hdparam measurement will be 650M/s
<LongWork> if i unplug teh cable, it goes up to 850 M/s
<wzyy2> Have you test it with xserver closed?
<LongWork> yes
<LongWork> we don't have even xserver on our LE build
<wzyy2> ok
<LongWork> then when i do pure decoding, i get a better decoding perf with cable unplugged , tahn cable plugged
<LongWork> so more likely something is eating the bandwith, or it's lower than what we think
<wzyy2> ...I think i found the reason.
<wzyy2> check this patch.
<LongWork> i will tonite
<LongWork> but if we did have 8 GB/s 4K@60 wouldn't be a problem
<LongWork> at least for decoding
<wzyy2> I seems nickey set a high qos level for vop.
<LongWork> it would be imited by VPU performance
<wzyy2> 8 GB/s is not that much.
<LongWork> is that GigaBit or GigaBytes ?
<wzyy2> GigaBytes
<LongWork> 4K@60 is anyways 990M/s .. so that is what VPU will use to write the frames into memory, then VOP would need to read those with zero copy, so would be roughly 2GB/s of bandwidth
<LongWork> which is 1/4th of what we should have available so should be easy ;)
<LongWork> what would this patch do btw ?
<LongWork> i mean it seems to set some priority to VOP
<LongWork> but what would that change
<LongWork> in my case i don't even do any display yet
<phh> new
<phh> fail
<wzyy2> It's complex than you think... I'm not familiar with those bandwidth things, it's SOC architect's job, when they design the SOC, they will consider it.
<phh> wzyy2: yes but perhaps we're missing some information that might show that something is not done properly somewhere
<wzyy2> VOP might use a fixed bandwidth even you don't do any display.
cnxsoft has quit [Quit: cnxsoft]
<ayaka> the future SoC used for tv box have a better performance in video
<phh> rk3328? +10?
<ayaka> not like wzyy2 said, they just having a weak cpu and less interfaces
<wzyy2> But less performance in gpu and cpu.
<ayaka> rk3228
<ayaka> video doesn't need powerful GPU or VPU
<ayaka> cup
<ayaka> cpu
<phh> yes but it's better to have both ;)
<phh> (ok, not really for a tvbox)
<ayaka> if you don't play game, dynamic drawing capability is not necessary
pro777 has joined #linux-rockchip
pro777 has quit [Client Quit]
andoma_ is now known as andoma
<LongWork> wzyy2: thanks i'll start with removing that
<LongWork> wzyy2: like phh said, we migh be missing something. From what you say with such bandwidth that should be no problem.
<phh> perhaps it's not ram :s
<LongWork> and ayaka said that the VPU was more powerefull than i though :)
<LongWork> thought
<LongWork> so if ram is allright and vpu is powerfull that should be no problem ... but that's not exactly what i'm seeing
<phh> well, I think VPU has a mode where it just outputs a test target, perhaps using it would help there?
<ayaka> I don't what do you mean
<phh> I mean a mode where it takes no input at all, but does output to video buffer.
<phh> well i don't see anything like this in the trm
<phh> LongChair: perhaps you can try to make a file from a black screen?
<LongWork> i dunno ... but there are probably some already existing test files
<LongWork> i wish there was a way to measure VPU usage
<LongWork> or some profiling tools to see what's happenning
<ayaka> it is not possible
<ayaka> there are two kind of things would effec the VPU
<ayaka> Marco-blocks prediction and memory
<ayaka> increase the frequency of the VPU would speed up the logic work
<LongWork> i'm already at 600M there
<LongWork> that's the last enum value ... not sure if i can go above that :)
<ayaka> but now the memory become the other problem
<ayaka> if the MB or I would said the sequence of frames are not to complex
<LongWork> yeah what makes me wonder is that even if we measure a lower bandwith from CPU, it's not meaning full from what wzyy2 said. CPU has a lower bandwith to ram than VPU
<ayaka> increasing the frequency won't help a lot
<LongWork> if VPU has 8GB/s roughly that shouldn't be a problem
<LongWork> unless something else on the BUS uses the memory bandwidth
<ayaka> who said that
<LongWork> who said what ?
<LongWork> the 8GB/s ?
<LongWork> wzyy2 did
<LongWork> he said 533Mhz * 2 * 8. (2 channels, 8 bytes 64 bits data width)
<LongWork> he also said that the CPU bandwidth to ram was lower .. around 990 M/s
<LongWork> but that shouldn't matter for decoding
<ayaka> that number is not correct
lkcl has joined #linux-rockchip
<ayaka> it is the bandwidth to ram controller
<ayaka> but there are some latency coefficient
<LongWork> i'm not an hardware expert
<LongWork> regarding VPU, is rockchip making the code that goes into it to decode or is it like built in the hardware ?
ganbold has quit [Ping timeout: 246 seconds]
ganbold has joined #linux-rockchip
<ayaka> which code
<LongWork> ayaka: i'm wondering how the guys in charge of that deal with such things
<LongWork> the vpu run some code to decode the streams right ?
<LongWork> like the codes that handles HEVC / H264 stream decoding
<ayaka> oh, it is complete run in cpu
<ayaka> I also write a part of them
<ayaka> you would see those commit in mpp
<ayaka> the rest is hardware
<ayaka> there is no firmware, all the thing is open source
<phh> LongChair: yes rockchip is amongst the last people to do real hw decoding, not dsp decoding
<ayaka> no, no just us
<LongWork> so how you guys check the vpu performance then ?
<ayaka> nothing beyond what we have done but it in android
<LongWork> hmmm
<LongWork> i dunno really where to look at ....
<LongWork> phh: you did mention another tool earlier ?
<phh> LongChair: well, that's mbw, but that's to better measure the actual cpu bandwidth
wzyy2_ has joined #linux-rockchip
wzyy2 has quit [Ping timeout: 258 seconds]
wzyy2_ has quit [Read error: Connection reset by peer]
bertje__ has quit [Quit: bertje__]
wzyy2_ has joined #linux-rockchip
wzyy2_ has quit [Read error: Connection reset by peer]
lkcl has quit [Ping timeout: 260 seconds]
wadim_ has quit [Remote host closed the connection]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
wzyy2 has joined #linux-rockchip
fireglow has quit [Quit: Gnothi seauton; Veritas vos liberabit]
fireglow has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
Mine has joined #linux-rockchip
Mine has quit [Client Quit]
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: Connection reset by peer]
beeno has quit [Ping timeout: 260 seconds]
beeno has joined #linux-rockchip
wzyy2 has joined #linux-rockchip
wzyy2 has quit [Read error: No route to host]
JohnDoe0 has quit [Quit: KVIrc 4.9.2 Aria]
beeno has quit [Ping timeout: 240 seconds]
afaerber has joined #linux-rockchip
akaizen has joined #linux-rockchip
vagrantc has joined #linux-rockchip
afaerber has quit [Remote host closed the connection]
BenG83 has joined #linux-rockchip
bertje__ has joined #linux-rockchip
bertje__ has quit [Client Quit]
paulk-collins has quit [Quit: Leaving]
nighty has quit [Remote host closed the connection]