nighty- has quit [Quit: Disappears in a puff of smoke]
wwilly has joined #linux-exynos
libv_ is now known as libv
nighty- has joined #linux-exynos
wwilly has quit [Quit: Leaving]
afaerber has quit [Quit: Leaving]
kloczek has quit [Remote host closed the connection]
kloczek has joined #linux-exynos
afaerber has joined #linux-exynos
wwilly has joined #linux-exynos
wwilly has quit [Client Quit]
wwilly has joined #linux-exynos
<wwilly>
Hi, I got some strange behaviour on an Exynos-5422 Odroid-xu3 with mainline 4.11. This is a think aload, please if you have spare time to think with me.
<wwilly>
I'm doing compilation on board, with cmake+clang.
<wwilly>
When I use multiple threads, sometime I get clang-6.0: error: unable to execute command: Bus error [...] Error 254; some other time the compilation dependency fail because the last stage generated files are incomplet.
<wwilly>
When I use only 1 thread, it compiles smouthly.
<wwilly>
Is it possible that the linux memory sync mechanism are wrong somewhere, and then a reading occur before the writing?
<wwilly>
The bus driver doing something wrong in edge corner?
<wwilly>
Other idea, is it possible that the power supply is not strong enough and do not give enough juice to handle so much power required for the cpu, and then the memory do not get enough juice and beem! data corruption?
<willmore>
wwilly, sounds like you are having memory errors. Likely brought upon you by the many parallel memory intensive compiler processes.
<krzk>
wwilly: memory sync - rather unlikely... I think insufficient power might be more likely
<willmore>
It's also entirely possible that it's power.
<krzk>
and try without devfreq
<wwilly>
DVFS is on performance
<krzk>
wwilly: devfreq scales multiple busses (also memory) and intensive compilation is not only CPU-memory but also storage related
<krzk>
Hmmm... then devfreq should not scale anything
<wwilly>
the board is on a room temperature of 10°C
<krzk>
wwilly: do you have anything in dmesg?
<wwilly>
fan running and without plastic box around
<wwilly>
nope, no oom killer
<krzk>
enough free space?
<wwilly>
the power supply is the stock one from hardkernel European reseller
<wwilly>
yep, 16Go, and swap space
<wwilly>
but files to compile isn't so huge
<wwilly>
compiled in release without debug symbole
<krzk>
go try older kernels :) ... it's weird because if it would be memory or bus error something should be in dmesg as well
<wwilly>
if power supply related, does linux check this kind of things?
<krzk>
wwilly: nope, if there is not enough power then quite probably random silent issues might happen
<krzk>
HW (PMIC) should deal with it but usually nothing is reported
<wwilly>
should I try with another power supply maybe?
<wwilly>
I don't really care about fail to compile with 4 threads but only 1, is more about result of benching and experimenting nowing that "something" issue
<krzk>
wwilly: you can try. the stock supplies are 4A so they should be enough.
<wwilly>
... should ...
<wwilly>
:)
<wwilly>
yep I will try
<krzk>
you can also try to limit the frequency of cpu or devfreq (with scalling governor it is possible to set min/max)
<wwilly>
uhm also, I may be reach a corner of using the board at 2GHz give issues...
<wwilly>
yep, I change dts def to reach this frequency
<wwilly>
never get see an issue before so I keep it
<wwilly>
kept*
<wwilly>
uhm, that's really funny, by putting a sync sleep 1 after a chunk of compiling multiple files, clang never hang wrong
<wwilly>
I don't know the way cmake generates makefiles, but maybe it doesn't put a sync properly somewhere?
<willmore>
It shouldn't have to...
<wwilly>
ok so artefact of waiting then?
<willmore>
Yeah, that's my guess.
<willmore>
A chance to cool or something.
wwilly has quit [Quit: Leaving]
wwilly has joined #linux-exynos
<krzk>
wwilly maybe you heat the thermal issue you were mentioning some time ago?
<krzk>
that some cores are busy but thermal is not slowing them down?
<wwilly>
nope, this is fixed
<wwilly>
99.9999% confident
<willmore>
cpuburn confident?
<wwilly>
have not try this one
<willmore>
Unless a board can run cpuburn for an hour or so, I don't consider it proven stable. But, that's a personal standard. Like running memtest86 and prime95 on a PC. Unless they can run those loads error free for a few days, I don't trust them.
wwilly has quit [Quit: Leaving]
wwilly has joined #linux-exynos
wwilly_ has joined #linux-exynos
wwilly has quit [Read error: Connection reset by peer]
<wwilly_>
willmore, when thermal governor will start throtte the cooling device at one point, what will you test after that exactly? what do you think about?
<willmore>
Power delivery circuitry.
<wwilly_>
willmore, have you a good program to test a15?
<willmore>
wwilly_, https://github.com/ssvb/cpuburn-arm doesn't have an a15 version. I don't know if ssvb recommends one of the other versions for that core. I'll see if they're around.
<willmore>
Nope, they don't seem to be on right now.
<willmore>
Try the A9 version to start.
<wwilly_>
a7 are software compatible with 15 right?
genii has joined #linux-exynos
<willmore>
They all should be.
<willmore>
The trick with these programs is that they run sets of instructions that have been shown to draw the most power in their respective cores.
<willmore>
So they need to be tuned to the specific core to get the best draw. But, they're pretty demanding regardless, so try each and see which is worst/best.
<wwilly_>
yep ok
<wwilly_>
I trying a9
<wwilly_>
stepwise drop frequency to 0.6GHz rather fastly, and thermal sensor show a limit of ~75°C, as per my dts definition
<wwilly_>
fan off
<wwilly_>
I can keep running that all night to be 99.9999% cpuburn confident
<willmore>
That seems pretty severe. Yikes.
<willmore>
How may cores is it using?
<willmore>
You may need to do something special to make sure it runs on the A15's and something else to get the A7 version running on the other four cores.
<wwilly_>
all cores
<willmore>
All eight?
<wwilly_>
yep taskset
<willmore>
That's the one.
<willmore>
Okay, good luck. Can you measure power draw?
<wwilly_>
yep will put it on my "everything"-tracer
<wwilly_>
willmore, will give you my EDF invoice :)
<memeka>
mszyprow_: is there a chance for exynos5 drm to support NV12 format?
<mszyprow_>
memeka: yes, one would need to add support for so called local bus between GSC and Mixer
<mszyprow_>
memeka: this way Mixer will get one more plane with all formats that are supported by GSC
<mszyprow_>
memeka: but sadly we have more urgent things on todo list...
<memeka>
right ok :D
<memeka>
mszyprow_: there was a patch for kodi + drm prime which uses 2 layers, one drm layer for video and one egl layer for interface
<memeka>
but it's not working because video drm layer needs nv12 :D
<mszyprow_>
memeka: there is a little chance that we will take a look into this next year, but I can't promise anything
mszyprow_ has quit [Ping timeout: 260 seconds]
Vasco_O is now known as Vasco
<wwilly_>
willmore, with my tracer, cpuburn-a9 draw roughly 10W instantaneously while running 4 threads on big
<wwilly_>
temperature increase drastically in thermal manager kicks in
<wwilly_>
I will retrace a bit more and share the graphs if you're interested
<wwilly_>
thank for your advice for that "bench"
<willmore>
wwilly_, I am always interested in seeing results for such things. You have an XU3, right?
<wwilly_>
and the xu4 yes
<wwilly_>
I also have a pine64 a64, but because last time I've checked the mainline doesn't support it well, I haven't play mutch with it
paulk-gagarine-s has joined #linux-exynos
<wwilly_>
by the way, does other board has thermal senor per core like the exynos5422?
<wwilly_>
would be interesting to try my stuff on other board
paulk-gagarine has quit [Ping timeout: 260 seconds]
<javier___>
wwilly_: if is an Exynos522, then I think it may be the issue reported when scaling the A15 to 2GHz
<javier___>
since you said that you changed that. IIRC we had a maximum of 1.9GHz in mainline due the missing regulator changes in mainline
<wwilly_>
javier___, but why I get error like this only know, but using cmake+clang with multiple thread? I think it could be because this is a case of producer consumer, and consumer start eating before produced value are not already written
<wwilly_>
is why if I put a sync+sleep, the data are writen out
<javier___>
wwilly_: I can't remember what trigged the reported issue, just that the core weren't stable with higher frequencies
<wwilly_>
but then there is a barrier missing somewhere, by either userspace software stack, or in kernel
<wwilly_>
which kind of instability we are talking about exactly?
<javier___>
wwilly_: I don't have a full context either, just from the commits in the ChromeOS tree
<javier___>
on a call now, give me some mins and I'll refer you to those
<wwilly_>
thank you
<wwilly_>
when I took a break earlier, I thought about missing a barrier somewhere, while working at different frequency cpu/memory, and linux scheduler aren't ok about flushing memory
<wwilly_>
but I'm not properly aware of how it works internally
<javier___>
wwilly_: sorry, that was longer than I thought. Let me look at the commits now
<javier___>
basically, the problem that the regulator-locker fixes is when you have to make sure that the voltage for two rails are bewtween a maxium spread
<javier___>
my understanding is that you need to scale the INT rail voltage for some A15 high frequencies since the ARM rail voltage is scaled up
<javier___>
but we don't have infrastructure currently in mainline to do that
<javier___>
wwilly_: you may try scaling your A15 up to 1.9GHz (actually I see that in mainline the maximum operating point is 1.7GHz)