<oliv3r>
ssvb: i'll try lcd + hdmi dual-head then; but atleast that means hardware is capable
<oliv3r>
ssvb: what I wonder, with these tablets showing off 'hdmi port included!!' does it work on any of the devices?
<ssvb>
oliv3r: well, it only worked for me if vga was assigned to /dev/fb0 and hdmi to /dev/fb1, but not the other way around
<ssvb>
the code is quite fragile and buggy
<ssvb>
also I don't know what kind of hdmi port is included in supposedly a13 tablets (if you are using one), maybe it needs its own support code in the kernel driver
<ssvb>
but at least dual-head seems to work in some configurations on a10 if configured via fex file
<wingrime>
I found 2 things I can fix with dma
<oliv3r>
ssvb: and extremly messy :p
<oliv3r>
ssvb: A10
<oliv3r>
ssvb: do you think the video driver can be 'fixed' or will it really need a complete rewrite?
<oliv3r>
also, while going over the disp_mode section in the fex guide, initially I just copied the original part of the wiki and had no clue what it ment. Now I think atleast I know mode 0 and mode 1, but use /dev/fb0 and you either enable output 0 or 1. Mode 4 sounds like 'clone' mode
<oliv3r>
mode 2, 'dual head'
<oliv3r>
mode 3 is 'misterious' as it's probably not documented properly there
<ssvb>
hmm, I have not checked mode 3, but might be one big framebuffer which spans over both monitors
<ssvb>
I guess it would be reasonable to first try it with both monitors having the same resolution
vicenteH has quit [Ping timeout: 248 seconds]
<wingrime>
oliv3r: funny
<wingrime>
oliv3r: sdhost use strange own dma
<wingrime>
ssvb: I need your help
<wingrime>
ssvb: I need someone who can measure nand speed with/without my patch for dma
<wingrime>
ssvb: same for ethernet
<wingrime>
oliv3r: you wanted
vicenteH has joined #linux-sunxi
<wingrime>
hramrach:
mdfe has joined #linux-sunxi
hipboi has joined #linux-sunxi
<oliv3r>
ssvb: i was just running outside, and was thinking the same thing. I'll document that in the wiki
<oliv3r>
ssvb: i don't have ethernet on my tablet :)
<wingrime>
patch affected ether usb sound nand
<oliv3r>
well i can run simple tests on my tablet; i'm booting stage/3.4
<oliv3r>
i'll pull in that patch then and see what it does
<oliv3r>
this against stage/3.4?
<oliv3r>
gimme a few, need prob an hour
<oliv3r>
testing hdmi stuff for hansg first
<oliv3r>
ssvb: i assume hdmi support is non-hotplug; it's based on whats in script.bin and edid?
<wingrime>
olibv3r: I want some measurment before and after
<wingrime>
olibv3r: patch for 3.4/stage
<oliv3r>
how do you want me to measure?
<wingrime>
something like time and dd
<oliv3r>
on nand? i can test that
<oliv3r>
read or write?
<wingrime>
both
<oliv3r>
ok
vinifm has joined #linux-sunxi
<ssvb>
wingrime: is cache flushing done in hard irq context after your patch?
<wingrime>
yes
<wingrime>
I tested, no dmesg mesages all works good
<wingrime>
ssvb: irq handler need rework
<wingrime>
ssvb: see my repo
<wingrime>
ssvb: I send more one patch
<wingrime>
ssvb: cpu actulay don't care irq context for dcache flush
<techn_>
you could enable dma debug stuff.. I tried that once and it gave some warnings/errors
<wingrime>
techn_: I want move dma irq handler to worker thread
<wingrime>
It will affect to sound,nand,ethernet,usb speed
<ssvb>
wingrime: cache flushing takes time, and you don't normally want the irq handler doing heavy work
<wingrime>
ssvb: I actulay will try make irq lighter for dma
<ssvb>
maybe it would be better to flush cache in the beginning of 'sw_dma_enqueue'?
<wingrime>
ssvb: actualy sw_dma_enqueue not alawys send to dma
<wingrime>
ssvb: it may save it and do later
<wingrime>
ssvb: code need rewrite to linux qeues
<wingrime>
ssvb: also cpu is defenetly stoped when cache flush
<wingrime>
ssvb: so there is no any difference when we do it
<wingrime>
ssvb: next patch more interesting for optimisation
<ssvb>
you want to have dma transfers always running for best performance
<ssvb>
if there is a long delay (doing cache flush) between the completion of previous dma transfer and the start of a new dma transfer, then this is not good
<wingrime>
for new dma transfet you must call sw_dma_enqueue
<wingrime>
ssvb: you need do flush cache
<wingrime>
to dest-addr
<techn_>
why that cache is flushed?
<ssvb>
but you don't need to delay the cache flush until the very last moment
<wingrime>
techn_: CPU save some freq-used data to SRAM
<wingrime>
techn_: dcache
<wingrime>
techn_: when you use DMA cpu don't know that data changed
<techn_>
but why dma requires cache flush?
<wingrime>
techn_: CPU don't know that data in cache changed
<techn_>
oh.. so when you use dma you should disable cache
<wingrime>
techn_: actualy you need drop cached data
<wingrime>
techn_: if there is no that data in cache so it will costs no time
<wingrime>
techn_: look like arm can check is this data in cache and flush it using command
<ssvb>
wingrime: so is it a transfer from DMA to CPU (for example NAND read)?
<wingrime>
ssvb: dma can do nand->ram
<wingrime>
ssvb: on a13 supported: IR, uart, audio , sram,sdram,spi.usb
<vinifm>
I've been having problems with DMA, when using sockets
<ssvb>
wingrime: well, for this direction of transfer, invalidating the cache before dma transfer is complete seems wrong
<wingrime>
ssvb: maybe, but I don't know how make it after
<wingrime>
ssvb: but maybe not
<wingrime>
ssvb: becose , we need CPU reread dram on first access
<wingrime>
ssvb: 14606475 branch-misses # 7.96% of all branches
<wingrime>
ssvb: yesterday was more
<wingrime>
or i don't know
<wingrime>
this is side effect or anything else
<ssvb>
branch prediction misses should be totally unrelated to data cache
<wingrime>
ssvb: it related
<ssvb>
how so?
<wingrime>
ssvb: I think that D cache clean will clean I cache
<wingrime>
ssvb: at least you must reset pipline when you drop data
<ssvb>
somehow this does not make much sense to me
<ssvb>
but in any case, I can confirm the high rate for branch prediction misses
<wingrime>
ssvb: thats why I ask help me with that stuff
<wingrime>
ssvb: I-cache cnd D-cache are linked between
<ssvb>
ok, let's see what can be done
<wingrime>
ssvb: I ask test performance with/without my patches
ganbold_ has quit [Remote host closed the connection]
<ssvb>
but I-cache and D-cache are not linked in ARM, for example for JIT you need to explicitly clean D-cache and then invalidate I-cache before executing the modified chunk of code
<fra79Wii>
hi so what's the situation of CedarX... Someone is keeping up the reverse engineering? I've tried to make the current version to work on android 4.2 but there is no way..
<fra79Wii>
Or we should wait until allwinner release a new SDK for 4.2?
fra79Wii has quit [Remote host closed the connection]
fra79Wii has joined #linux-sunxi
<oliv3r>
wingrime: what branch is your dma test on? i added your github as remote, but can't find it :)
<oliv3r>
i guess i can cherrypick 384a649b09f928fd2065068cb40b73b52f724210 on stage/3.4
<wingrime>
wingrime-wip
<wingrime>
oliv3r
<wingrime>
wait
<wingrime>
test this
<wingrime>
please test performance "general"
<wingrime>
please test nand speed
<wingrime>
and someone test audo and ether
<wingrime>
oliv3r: I make new Interesting patch
<wingrime>
oliv3r: are you using a13 ?
n01 has joined #linux-sunxi
<wingrime>
oliv3r: I make patch that moves IRQ handling to workqueue
<wingrime>
oliv3r: It generay must change performance
<ssvb>
wingrime: does not look too bad, considering that atom and exynos5 executed roughly twice more branches total (apparently trivially predictable)
eebrah_ has joined #linux-sunxi
<ssvb>
wingrime: the absolute number of mispredicted branches is quite comparable
<wingrime>
ssvb: I done interesting stuff
<wingrime>
ssvb: I move irq handler to workqeue
<wingrime>
and push it soon
<wingrime>
it generaly will have performance impact (positive or negative)
<ssvb>
:)
<ssvb>
but in any case, this whole dma irq handler looks very suspicious
<ssvb>
if anyone is up to fixing it, the fixed implementation probably should be clean and correct
<ssvb>
I mean reshuffling code and only fixing parts of it may have unpredictable effects (triggering some latent bugs)
<wingrime>
wait I san it soon
<wingrime>
*send
<wingrime>
see my github
<wingrime>
ssvb: try use my 3.4 head
eebrah_ has quit [Ping timeout: 252 seconds]
<wingrime>
I have some patches cleanups
<wingrime>
branch wingrime-wip
<wingrime>
ssvb: and test performance differences with irq to workqeue patch
<ssvb>
wingrime: I surely can, but I would prefer if you could initially benchmark your code yourself ;)
ganbold__ has quit [Ping timeout: 256 seconds]
<ssvb>
if you expect performance improvements, then I can try to confirm them
<wingrime>
ssvb: bad/good performance are secondary, not important, long irq handlers must be in moved to workqueues
<wingrime>
ssvb: this fixes strange bugs , that dma_callback functions are called in irq context
<wingrime>
ssvb: for example wemac will send message in this context (in callback)
<wingrime>
ssvb: that totaly unacceptable for response reason
<wingrime>
ssvb: I have not ether (a13) so I can only predict
<wingrime>
ssvb: I wan't some one more test this
<wingrime>
ssvb: becose dma used for audio.ethernet,usb,nand
<wingrime>
*ether->ethernet
Dave77 has joined #linux-sunxi
<paulk-desktop>
HI
<paulk-desktop>
so it seems that my patches were sent after all
torqu3e has quit [Quit: torqu3e]
eebrah has quit [Ping timeout: 255 seconds]
Guest60022 has joined #linux-sunxi
Dave77 has quit [Ping timeout: 256 seconds]
fra79Wii has quit [Remote host closed the connection]
fra79Wii has joined #linux-sunxi
Guest60022 has quit [Quit: Leaving]
Guest60022 has joined #linux-sunxi
simosx has joined #linux-sunxi
simosx has joined #linux-sunxi
<vinifm>
hi, what is the difference between linux drivers and u-boot drivers?
<oliv3r>
wingrime: ah, the wip branch; ok ,well i cherry-picked it for now; booting 3.4 now to see its performance
Guest60022 has quit [Quit: Leaving]
eebrah has joined #linux-sunxi
eebrah is now known as Guest79196
Guest79196 has quit [Client Quit]
ZaEarl has quit [Ping timeout: 245 seconds]
gzamboni has quit [Ping timeout: 240 seconds]
gzamboni has joined #linux-sunxi
eebrah_ has joined #linux-sunxi
ZaEarl has joined #linux-sunxi
Dave77 has joined #linux-sunxi
bsdfox has quit [Ping timeout: 256 seconds]
rz2k has quit []
ZaEarl has quit [Ping timeout: 245 seconds]
<wingrime>
ssvb: olib3r: I tested and rebuilded 4 times and can say that there is small profit
<wingrime>
ssvb: olib3t: I talking about first patch for "flush"
<wingrime>
ssvb: olib3t: without Throughput 0.711998 MB/sec with 0.806631 MB/sec
<wingrime>
ssvb: olib3t: without 9.48% of all branches with 7.95% of all branches
<wingrime>
so I can say that "misticly" change branch-miss count
<wingrime>
but results at least stable
<wingrime>
last patch: move to workqeue are unstable
<wingrime>
I gen hung with it , it looks like hidden bugs or simular
<oliv3r>
i'm running time dd if=/dev/zero of=test.zero bs=64k count=8192
<oliv3r>
on the sd card to start with, then on nand
<oliv3r>
theni 'll do with your patches; i'll paste bin the results
<wingrime>
oliv3r: I use "dbench 1 -t 20 -s -S -F --directory=/media/0000-006F/"
<wingrime>
for disk
<wingrime>
and sudo perf_3.2.0-39 stat -B dd if=/dev/zero of=/dev/null count=1000000
<wingrime>
for branches
<wingrime>
but it need enable performance counting in kernel config
<oliv3r>
i haven't gotten all that installed :p
<wingrime>
apt-get ))
<oliv3r>
but i did remember to put performance gov. on
<wingrime>
oliv3r: this not performance gov
<wingrime>
oliv3r: "performance counting tools"
<wingrime>
oliv3r: somethig simular
<wingrime>
oliv3r: in General menuconfig
<oliv3r>
i know :)
<oliv3r>
but my kernel boots ondemand by default
<ssvb>
wingrime: performance counters should be already enabled by default
<wingrime>
ssvb: a13 have other config
<ssvb>
which one?
<wingrime>
ssvb: a13_defconfig
<wingrime>
ssvb: it realy strange see branch-prediction impact hre
<ssvb>
hmm, I think a13_defconfig should also enable the performance counters
<ssvb>
if not, then IMHO it would make sense to update the configs
Dave77 has quit []
<ssvb>
regarding branch prediction impact, I think the biggest problem there might be that the BTB is too small on Cortex-A8
<ssvb>
and the old entries are just evicted when running large code with huge number of branches
<wingrime>
ssvb: there is something we can do
<ssvb>
and because of associativity and aliasing effects, even the minor shifts in branch addresses because of unrelated code insertion/deletion could affect average prediction rate
n01 has quit [Ping timeout: 255 seconds]
<wingrime>
?
<wingrime>
can we do magic alligment ?
<ssvb>
if it's associativity and collisions problem for BTB entries, then it's kind of random and hard to control
<oliv3r>
/silo/build/sunxi-bsp/linux-sunxi/arch/arm/mach-sun4i/dma/dma.c: In function 'sw_dma_loadbuffer':
<oliv3r>
/silo/build/sunxi-bsp/linux-sunxi/arch/arm/mach-sun4i/dma/dma.c:526:4: error: implicit declaration of function '__cpuc_flush_dcache_area' [-Werror=implicit-function-declaration]
<oliv3r>
i ran both tests on mmc and nand, i cut off the bs=1k dsync version after a few minutes as it was horribly slow :p
<oliv3r>
your dma patches do improve performance quite a bit.
<oliv3r>
interestingly overal mmc is faster, but with 1k blocks and dsync, nand is faster
<oliv3r>
i cherry-picked the dma fix on stage/sunxi-3.4 + header change. and your right, it was unstable. while waiting initially (after about 10 minutes) i had atleast a reboot. but it was only the one
torqu3e has joined #linux-sunxi
vinifm has quit [Remote host closed the connection]