Wizzup has quit [Ping timeout: 248 seconds]
kloczek has quit [Remote host closed the connection]
Wizzup has joined #linux-exynos
kloczek has joined #linux-exynos
nighty- has joined #linux-exynos
TheSeven has quit [Ping timeout: 255 seconds]
TheSeven has joined #linux-exynos
mszyprow has joined #linux-exynos
aballier has joined #linux-exynos
snawrocki has quit [Remote host closed the connection]
snawrocki has joined #linux-exynos
nighty- has quit [Quit: Disappears in a puff of smoke]
<memeka> mszyprow: ping
<mszyprow> memeka: pong
<memeka> mszyprow: so any idea why is that on kernel 3.10 playing video is so much faster than on 4.x?
<memeka> same rootfs
<memeka> issit MFC driver, or the V4L2 subsystem?
<mszyprow> memeka: have you checked cpu usage?
<mszyprow> memeka: maybe for some unknown reasons kernel is doing something completely insane like cache flushing on every frame
<mszyprow> memeka: there was such issue some time ago
<memeka> mszyprow: on 4.x kernel?
<mszyprow> when user ptr v4l2 mode was used
<mszyprow> yea
<memeka> ok, so... decoding is FAST
<memeka> i can't argue with that... it's not the decoding but the copying of buffering
<mszyprow> the other idea I have is to check if cpu freq is configured to the same values
<memeka> I traced gstreamer, and it was taking 20ms to decode 1 frame, and 24ms to copy
<memeka> so CPU usage is huge
<memeka> in 4.x
<memeka> single thread CPU 100% with 1080p frame
<mszyprow> okay, so you lose time mainly on copying the frames?
<memeka> and dropping frames, can't cope
<memeka> yup
<memeka> then i found some optimized memcpy in arm asm
<mszyprow> maybe the 3.10 kernel used some tricks to enable cache on the buffers
<memeka> LD_PRELOAD that, and it's ok
<mszyprow> this heavily improves cpu copying
<memeka> hm
<memeka> probably that's what memcpy does
<memeka> that optimized one i mean
<memeka> issit possible to enable that cache on 4.x?
<memeka> mszyprow: and i found another interesting thing
<memeka> profiling gstreamer, kodi and ffmpeg+mpv
<memeka> so both gstreamer & kodi were losing time on memcpy
<mszyprow> memeka: frankly I would first like to fix the zero copy path (dma buf issues) instead of hacking for enabling cpu cache on dma buffers
<memeka> like 80% CPU time = memcpy
<memeka> mszyprow: yeah assuming arm will ever publish a wayland driver with dmabuf :((
<mszyprow> copying data to uncached buffer IS time consuming
<memeka> so most of the other time, like 7% or something, was spent by some tiling function in the mali driver
<memeka> something like cobjp_neon_linear_to_block_8b_8x8
<memeka> now, profiling ffmpeg .... the results where opposite
<memeka> 70% CPU time in cobjp_neon_linear_to_block_8b_8x8
<memeka> 10% CPU time in cobjp_neon_linear_to_block_16b_8x8
<memeka> so that's 80% CPU time in the mali driver
<memeka> then just under 10% in memcpy
<mszyprow> it really depends how the memory is mapped to userspace
<memeka> so this means that the mali driver was importing the buffer, as opposed to gstreamer exporting the buffer?
<memeka> something like that?
<mszyprow> and different drivers / kernel versions might use different flags
<memeka> well here is the same kernel, same drivers, different userspace programs...
<memeka> i mean gst vs ffmpeg
<mszyprow> it looks then that ffmpeg is doing de-tiling internally, while gst does it by mali
<mszyprow> if I got it right
<memeka> the overall result being that using that optimized memcpy helped gstreamer, because it was copying the buffer with the "memcpy" function, and ffmpeg was not optimized because it was somehow relegating the memcpy to the driver
<memeka> i think the other way?
<mszyprow> gsc was copying tiled buffer to mali texture
<mszyprow> while ffmpeg was de-tiling it (during the copying?)
<memeka> looks like
<mszyprow> that's why memcpy replacement had no effect
<memeka> yup
<memeka> ok makes sense
<memeka> so basically you think that the reason 3.10 is better with videos is because it caches the buffers, and basically that's the same thing done by that optimized memcpy?
<memeka> CPU usage is a bit lower in 3.10 i think, but not by much
<mszyprow> nope optimized memcpy != using cache
<mszyprow> that's something completely different
<mszyprow> probably both can be even used together to have even higher boost
<mszyprow> I assume that you have compared the hardkernel's v3.10 kernel?
<memeka> "a prefetch distance of 4 cache-lines works best experimentally"
<mszyprow> but the buffers are still not cached
<memeka> interesting :)
<mszyprow> but neon with some hacks might be close to simple cached access
<memeka> well they are close
<memeka> so i get 70-80% total A15 (out of 400) on 4.x with this thing
<memeka> and in 3.10 it's something like 60% without it
<memeka> (on 1080p video => 2MB buffers i think)
<memeka> so to allocate the memory as cached .... this needs to be done by v4l2? or by MFC driver?
<memeka> the difference is huge anyway .... from unplayable 1080p with 1 A15 core @ 100% to smooth 1080p with total A15 cores summed up to 80% :)
<memeka> or to 60% :P
<mszyprow> v4l2
<mszyprow> but still, this is only my hypothesis
<mszyprow> I didn't look at the source code
<memeka> mszyprow: i think you are very correct
<mszyprow> if I remember correctly v3.10 kernel used custom memory handling for any video/media drivers
<mszyprow> which in the end might enable caching on userspace buffers
<memeka> so MFC driver was not using the v4l2 mem alloc functions?
<memeka> should i look there and try to port the way it's doing memory allocation?
<mszyprow> memeka: it makes no sense to port that mess, really
<mszyprow> memeka: maybe adding proper support for cached buffers would be a better idea
<memeka> well, it would make a lot of people happy to have caching enabled :D
<mszyprow> my_todo_list++
<memeka> mszyprow: if you would add that on your to-do list i would be very grateful and offer to test every change on a LOC :)
<memeka> i know from the odroids forums there are a lot of 3.10 users that don't update to 4.x for this exact reason :(
<mszyprow> fixing IOMMU support and swithing to malloc+user ptr should give the same result
<memeka> and i spend quite some time debugging if it's MFC, mali drivers, or something else :))
<memeka> after fixing the perf counters i could do some proper debugging :))
<memeka> i thought IOMMU is working :P
<mszyprow> well, IOMMU is working, but MFC causes page faults ;)
<mszyprow> you have already reported it
<memeka> well i thought it's the DRM dmabuf-import
<memeka> since MFC can export and GSC can import dmabuf ok
<memeka> and it was GSC to DRM actually, since DRM can't import the output of MFC
<memeka> (there was a patch to allow NV12/NV21 on DRM, i can test to export from MFC but i think the result will be similar)
<mszyprow> DRM in Exynos5422 can use NV12/21 at all
<mszyprow> the patch was for Exynos4210/4412
<memeka> nope it can't
<memeka> it reports that it can, then fails
<mszyprow> in case of Exynos5422 one have to add support for so called local path between GSC and Mixer/HDMI
<mszyprow> not yet implemented
<mszyprow> sorry, I mean Exynos5422 CAN'T use NV12 yet
<mszyprow> that's also on our mainline todo list...
<memeka> yes that's what i'm saying :P
<memeka> need to send MFC -> GSC -> DRM
<memeka> and i could do dmabuf import in GSC from MFC
<memeka> but then crash on dmabuf import in DRM from GSC
<mszyprow> yes, but there is a special mode in which GSC becomes a part of display path (so basically a DRM)
<mszyprow> but not yet supported :(
<memeka> ah
<memeka> is that really useful, for windowed environments?
<mszyprow> one can easily implement typical video overlay plane with it
<memeka> "easily" :)
<mszyprow> sadly, we are busy with something not mainline related now :(
<memeka> :(
<memeka> mszyprow: i have to be off to bed, have a very early start tmr
<memeka> thanks for your thoughts
<memeka> and pls put caching buffers on your list (on top :P) :)
<mszyprow> have a good night then!
<memeka> mszyprow: https://patchwork.kernel.org/patch/9392453/ would this be the way?
<mszyprow> memeka: yes, and one more hack for DMA_ATTR_NON_CONSISTENT attribute support in arch/arm/dma-mapping.c
<memeka> what hack? if it's really fast, i can try before i can go to slp :D
<mszyprow> and one has to set DMA_ATTR_NON_CONSISTENT in buf->attrs
<memeka> yeah in MFC, that's easy
<memeka> what's the issue with arch/arm/dma-mapping.c ?
<memeka> mszyprow: q->dma_attrs = DMA_ATTR_ALLOC_SINGLE_PAGES | DMA_ATTR_NON_CONSISTENT | DMA_ATTR_NO_KERNEL_MAPPING;
<memeka> would this do
<memeka> ?
<mszyprow> in mfc? yes
* mszyprow is looking for the changes needed in dma-mapping.c
<memeka> thanks
<mszyprow> https://review.tizen.org/gerrit/gitweb?p=platform/kernel/linux-exynos.git;a=commit;h=b3094bac66a84ade60c62909735d09626a75edc3
<mszyprow> (not sure if it works without login)
<memeka> doesn't work w/o login :(
<memeka> Not Found
<mszyprow> memeka: check your mail then
<mszyprow> memeka: this is hack
<mszyprow> memeka: but you might easily check if it helps to reduce cpu usage
<mszyprow> memeka: arm maintainer rejected this approach many times
<memeka> thanks
<mszyprow> btw, it might be a good idea to register on tizen.org
<mszyprow> there is quite a lot of our exynos related stuff there
<memeka> i actually have account it seems
<memeka> from a few years back :))
<mszyprow> and it looks that anonymous git access is working, git clone git://git.tizen.org/platform/kernel/linux-exynos
<memeka> but initially i got "Not Found"
<memeka> now it works
<mszyprow> maybe it needs login for the first access
<mszyprow> to set cookies, etc
<memeka> it works now.... it took a while to load all the repos
<memeka> testing now if it reduces cpu :D
<mszyprow> which gst plugin provides "v4l2video0dec" element?
<mszyprow> I got ERROR GST_PIPELINE grammar.y:816:priv_gst_parse_yyparse: no element "v4l2video0dec"
<memeka> YAY
<memeka> it's gst-plugins-good
<memeka> 60% CPU usage on 1080p video, and what looks like good framerate (well, reported via SSH)
<memeka> yay yay
<memeka> thanks mszyprow! I'll check tomorrow to see stuff is actually displayed :) but it looks like it's working!
<mszyprow> :)
<memeka> time to sleepd now, it's almost tomorrow
<mszyprow> have a good night then! now you can sleep peacefully ;)
<memeka> yeah :D
genii has joined #linux-exynos
nighty- has joined #linux-exynos
_whitelogger_ has joined #linux-exynos
_whitelogger_ has quit [Ping timeout: 246 seconds]
_whitelogger has joined #linux-exynos
_whitelogger has joined #linux-exynos
_whitelogger has quit [Ping timeout: 258 seconds]
_whitelogger_ has joined #linux-exynos
mszyprow has quit [Ping timeout: 248 seconds]
willmore has quit [Read error: Connection reset by peer]
willmore has joined #linux-exynos
Putti has joined #linux-exynos
<Putti> Hi, thought to come here to ask whether anyone knows how to get serial console showing up with Samsung galaxy I9305 that has exynos 4412 SoC? I'm using the linux-next tree. And I actually tried searching this channels IRC logs but found just me asking this very same question one year ago :D
<Putti> If I understood right wiewo got the UART working with this git tree: https://code.fossencdi.org/kernel_i9300_mainline.git but I'm having the I9305 version instead of I9300 and it doesn't seem to work on it.
Putti has quit [Remote host closed the connection]
Putti has joined #linux-exynos
putti_ has joined #linux-exynos
Putti has quit [Ping timeout: 248 seconds]
putti_ has quit [Remote host closed the connection]
putti_ has joined #linux-exynos
putti_ is now known as Putti
Putti has quit [Ping timeout: 248 seconds]
Putti has joined #linux-exynos
Vasco_O is now known as Vasco
prahal_odroid has quit [Remote host closed the connection]
Putti has quit [Ping timeout: 248 seconds]
Putti has joined #linux-exynos
Putti has quit [Ping timeout: 248 seconds]
Putti has joined #linux-exynos
Putti has quit [Remote host closed the connection]
Putti has joined #linux-exynos
Putti has quit [Ping timeout: 248 seconds]
genii has quit [Quit: Beer has been poured, and I have been summoned. https://youtu.be/dwpqiaWKPkQ]