#armbian on 2020-05-17 — irc logs at freenode.irclog.whitequark.org

00:13 archetech has quit [Quit: Textual IRC Client: www.textualapp.com]

00:16 ChriChri_ has joined #armbian

00:17 ChriChri has quit [Ping timeout: 264 seconds]

00:17 ChriChri_ is now known as ChriChri

00:19 dani has joined #armbian

02:18 sunshavi has quit [Ping timeout: 265 seconds]

02:39 sunshavi has joined #armbian

02:45 sunshavi has quit [Ping timeout: 265 seconds]

03:21 Tenkawa has quit [Quit: Leaving.]

03:39 sunshavi has joined #armbian

04:00 xecutertool has joined #armbian

04:02 xec has quit [Ping timeout: 244 seconds]

04:07 dddddd has quit [Ping timeout: 256 seconds]

04:23 _whitelogger has joined #armbian

04:58 oida has quit [Remote host closed the connection]

04:59 oida has joined #armbian

05:03 skiboy has quit [Quit: Leaving]

05:25 IgorPec has joined #armbian

05:25 IgorPec has quit [Changing host]

05:25 IgorPec has joined #armbian

05:29 IgorPec has quit [Read error: Connection reset by peer]

05:29 IgorPec has joined #armbian

05:29 IgorPec has quit [Changing host]

05:33 NeuroScr has quit [Quit: NeuroScr]

05:51 ScrumpyJack has quit [Ping timeout: 264 seconds]

05:57 ScrumpyJack has joined #armbian

06:11 archetech has joined #armbian

06:30 _whitelogger has joined #armbian

06:50 sassinak-work has quit [Ping timeout: 260 seconds]

06:51 sassinak-work has joined #armbian

06:54 <IgorPec> Tony_mac32: hi

08:07 macc24 has joined #armbian

09:18 <Werner> npi-a64-only-audio-usb.patch and board-h6-orangepi-lite2-fix-missing-all.patch fail to apply. Dont know why yet.

09:21 <IgorPec> I think we should open a Jira with a topic "Cleanup sunxi patch mess" :) you are welcome to do that

09:22 <IgorPec> not sure if we can do much prior to this release

09:22 <Werner> I do not have a Lite2 which may be broken due to this

09:23 <IgorPec> aha, i will check it late. lite2 in fact have some issues ... must go now, later in the evening

09:24 <Werner> Not that much important though. It is dev branch

09:24 <IgorPec> aha, then later later ;)

09:24 <Werner> Yep. Have fun

09:24 <IgorPec> tnx, u2

09:24 <Werner> ty

09:53 Strykar has quit [Ping timeout: 240 seconds]

09:53 Strykar has joined #armbian

10:19 dddddd has joined #armbian

11:36 macc24 has quit [Quit: WeeChat 2.8]

11:46 macc24 has joined #armbian

12:39 macc24 has quit [Ping timeout: 272 seconds]

12:52 macc24 has joined #armbian

13:34 Tenkawa has joined #armbian

13:37 ichernev has joined #armbian

13:38 <ichernev> hello. I'm having some issues with Helios4 running ubuntu 18.04 with armbian kernel. For some reason my raid10 raid says all devices are busy, and it can't assemble them

13:40 <ichernev> https://paste.ubuntu.com/p/bpGhCB8DHc/ https://paste.ubuntu.com/p/QBcX4GSsty/ https://paste.ubuntu.com/p/cch7ctdC4b/

13:41 <ichernev> there are no messages in dmesg. Is there a simple way to increase dmesg verbosity? Or recompile with DEBUG enabled for specific devices?

13:43 <plntyk> dmesg -n <level> , see man page

13:44 <plntyk> 9 should be more verbose than default for example

13:46 <plntyk> also kernel boot parameter "loglevel" could be changed eh... 7 is the max dmesg/kernel loglevel

13:47 <plntyk> see https://elinux.org/Debugging_by_printing

13:47 <ichernev> plntyk, I don't know how to change the kernel command line

13:47 <ichernev> I tried less /proc/kmsg but I think it hung...

13:49 <ichernev> plntyk, well, the problem is that I get this device busy, and I can't figure out why. lsof doesn't show anything, fuser doesn't show anything...

13:56 <ichernev> plntyk, I did dmesg -n 8 (9 was not supported), and it didn't really help (no more logs in dmesg when I try to assemble the raid)

13:59 <plntyk> the kernel cmdline is probably set in u-boot bootloader configuration file / environment variable so you have to edit that

14:00 <plntyk> https://forum.armbian.com/topic/6033-helios4-support/page/7/ shows one example - but that involves having serial connection set up

14:01 <plntyk> dont know if Helios4 supports a user editable text-based bootscript in /boot

14:01 <ichernev> under /boot/armbianEnv.txt there is verbosity=1. If I change it to 8, do I need to rebuild /boot/boot.cmd?

14:02 macc24 has quit [Ping timeout: 258 seconds]

14:04 <ichernev> so /boot/armbianEnv.txt is "sourced" in /boot/boot.cmd, which contains setenv bootargs which loglevel=${verbosity}. I just need to recompile it then

14:07 <plntyk> yes

14:08 <ichernev> hm 'mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr' was listed at the end of boot.cmd, but it doesn't fetch the new value from armbianEnv.txt. boot.cmd says explicitly not to modify it directly ...

14:12 <ichernev> armbianEnv.txt is the current values of a working system (if I understand correctly), not the proper place to change values for boot.cmd. Anyway. I modified the file that should not be modified and I have verbosity=12 now

14:30 <ichernev> https://termbin.com/h4v5 -- this is (part of) dmesg. It has a few errors around sata, not sure how critical they are

14:33 <Tenkawa> what is on sdb?

14:34 <Tenkawa> theres an io error on it too

14:34 <Tenkawa> or is that the sata drive?

14:39 <ichernev> it has 4 sata devices (sda-sdd), they are supposed to run in raid. The / is on mmc, so it is not critical for boot, but my mdadm assemble gives "DEVICE BUSY"

14:39 <Tenkawa> can they be polled individually?

14:41 <ichernev> Tenkawa, I don't understand the question. What command should I run to poll them? I do have /dev/sda /dev/sdb .. devices. I can run smartctl commands on them, and they respond well. The LEDs flash when polled...

14:42 <Tenkawa> what about fdisk -l /dev/sda

14:42 <Tenkawa> does it come back at least to a prompt

14:43 <Tenkawa> (there may be no table on the drive but this will check communications)

14:43 <Tenkawa> I'll brb.. I need to go outside for 5 minutes and plug up lawnmower battery

14:45 <ichernev> they are part of raid array, so no partitions, but it find basic info: https://termbin.com/hcne

14:54 <Tenkawa> I know that however fdisk -l still sends a call to the controller and drive to make sure its readable

14:54 <Tenkawa> if you dont even get feedback at the os level the raid isnt going tp matter anyway

14:54 <Tenkawa> er to

14:55 <Tenkawa> for this test

14:55 Hokedli has joined #armbian

14:56 <Tenkawa> afk again.. brb (doing a lot of stuff today while our weather is half way decent)

15:07 <ichernev> Tenkawa, well, I can dd if=/dev/sdX and read as much as I want to, so in that regard they are accessible

15:07 <Tenkawa> back

15:08 <Tenkawa> thats good

15:08 <Tenkawa> so you have "direct" access

15:08 <Tenkawa> hmm

15:08 <Tenkawa> let me look up something

15:13 <[TheBug]> Helios4 is Marvel much like ESPRESSOBin

15:13 <Tenkawa> hmm.. do you have smartctl installed?

15:14 <[TheBug]> your using onboard SATA channels or have you added additional via a card?

15:14 <ichernev> https://paste.ubuntu.com/p/cch7ctdC4b/

15:14 <ichernev> this is smartctl on all devices. I ran short tests and they are all good

15:14 <[TheBug]> ^^

15:14 <ichernev> [TheBug], what do you mean? It is marvel 380, and it is helio4, yes

15:15 <[TheBug]> so are you using the built on 4 ports onlyu

15:15 <[TheBug]> or did you add another sata via mPCIe

15:15 <[TheBug]> I can't remember if they provided a mPCIe there

15:15 <[TheBug]> or used it for the 4 ports they supply

15:15 <ichernev> [TheBug], ah, sorry. I'm using "onboard" -- the ones that came in with the board. I'm not sure how the board itself is configured, but there is a mainline DTS for inspiration :)

15:15 <[TheBug]> k just checking because for my ESPRESSOBin's I use a 4 port sata card in mPCIe

15:16 <Tenkawa> hmmm thats not good that they wont even register

15:16 <[TheBug]> which I think Helios uses similar just on board using the mpcie lanes

15:16 <[TheBug]> Do you have any patition table on the drives your using?

15:16 <[TheBug]> when you first assembled the raid10

15:17 <[TheBug]> did you do it on direct device

15:17 <[TheBug]> or on paritions?

15:17 <ichernev> [TheBug], I used whole devices. I read it is the "best" way? ... :-/

15:17 <[TheBug]> or is this the first time trying to assemble?

15:17 <[TheBug]> thats fine

15:17 <[TheBug]> just trying to get a better idea of y our setup

15:17 <ichernev> Tenkawa, what do you mean "wont even register". Register where

15:17 <Tenkawa> actually using partitions is

15:17 <[TheBug]> can you pastebin the command your running that returns the error?

15:18 <[TheBug]> (if its too long to paste here)

15:18 <Tenkawa> these drives arent in the smart db

15:18 <Tenkawa> harder to diag

15:18 <ichernev> https://termbin.com/fx858 -- assemble gives "busy"

15:19 <Tenkawa> line 324 of your smartctl is one example

15:19 <ichernev> Tenkawa, that is weird, they are relatively new model. 8GB Seagate Skyhawk

15:19 <[TheBug]> when your do a cat /proc/mdstat

15:19 <[TheBug]> what do you see?

15:19 <[TheBug]> it sounds to me like it already auto assembled

15:20 <[TheBug]> most of time mdadm runs on boot and auto assembles

15:20 <[TheBug]> so unless you did something to cause it not to during boot you may be fighting a game you already won

15:21 <ichernev> I was running this setup for 1.2 years, whithout any issues. One day I receive an email that it is degraded (one disk out), but when I logged in I saw the "missing" drive. For some reason it had a different name (sde), but it is all configured with UUID (I hope). And now after a reboot it can't attach any drives -- they are all busy

15:21 <[TheBug]> the new name means it likely lost connectivity to the drive for some reason like hot-swap

15:21 <ichernev> https://termbin.com/lhgx

15:21 <[TheBug]> and when the controller came back up gave it a new name

15:21 Hokedli has quit [Quit: Konversation terminated!]

15:21 <[TheBug]> ya bro

15:21 <[TheBug]> just look what you just sent me

15:21 <[TheBug]> ohh

15:22 <[TheBug]> I see

15:22 <[TheBug]> now I better understand

15:22 <[TheBug]> it won't activate

15:22 <[TheBug]> it put them all together but won't activate

15:22 <[TheBug]> so it is assembled already

15:22 <[TheBug]> but something not allowing it to activate

15:22 <ichernev> the (S) stands for spare, as far as I read?

15:23 <[TheBug]> ichernev: so

15:23 <[TheBug]> two things I would do

15:23 <[TheBug]> actually a few

15:23 <ichernev> can I tell it to reread and see if it's OK? I'm pretty sure the problem is a few blocks in the device that got dropped

15:23 <[TheBug]> this will help you to figure

15:23 <[TheBug]> first

15:23 <[TheBug]> mdadm -D /dev/md0

15:23 <ichernev> delete?

15:23 <[TheBug]> this will give you better stats about whats going on in your raid

15:24 <[TheBug]> no, that is not delete

15:24 <[TheBug]> just dumb use of a D flag on a program

15:24 <[TheBug]> lol seems scary but that is not it

15:24 Hokedli has joined #armbian

15:24 <[TheBug]> mdadm --help if you need

15:24 <[TheBug]> anyways

15:24 <[TheBug]> check that

15:24 <ichernev> https://termbin.com/6s47

15:24 <[TheBug]> it will describe current state more

15:25 <ichernev> yes, I checked :) it is detail -- I've run it. I had a long conversation in the ubuntu channel, but when they figured I was running armbian kernel they got kind of angry :|

15:25 archetech has quit [Quit: Leaving]

15:25 <[TheBug]> um has nothing to do with kernel

15:25 <[TheBug]> things are just out of whack and you will need to speed time on it

15:25 <[TheBug]> you will need to stop the mdadm raid

15:26 <[TheBug]> you will need to examine the drives to make sure they are still in sync

15:26 <[TheBug]> then you will want to likely force assemble the raid again

15:27 <[TheBug]> This article isn't so bad: https://andersbrownworth.com/cms/411/Linux/Software.RAID/inactive/mdadm

15:27 <[TheBug]> you will probably want to follow similar case

15:27 <[TheBug]> stop the raid

15:27 <[TheBug]> examine drives

15:27 <[TheBug]> then force re-assemble if you need

15:29 redentor has joined #armbian

15:29 <ichernev> [TheBug], ok, that explains most of it. So should I let mdadm figure out the "test", or I need to test the drives myself? It would make sense that mdadm can do that

15:29 <[TheBug]> not sure what you mean by test, I didn't use that word, I said examine which is a function of the tool also described in that reference

15:30 <ichernev> I've run examine -- they all look "good"

15:30 <ichernev> let me paste it again (it was at the top)

15:30 <[TheBug]> ichernev: http://fibrevillage.com/storage/676-how-to-fix-linux-mdadm-inactive-array may also be a help

15:31 <ichernev> https://termbin.com/rv98

15:31 <[TheBug]> it shows better how to examine the volumes

15:32 <[TheBug]> so

15:32 <[TheBug]> to me it looks like you need to stop the array

15:32 <[TheBug]> and probably force assemble it

15:32 <[TheBug]> right now

15:32 <[TheBug]> array thinks it is missing one drive though

15:32 <[TheBug]> http://fibrevillage.com/storage/676-how-to-fix-linux-mdadm-inactive-array

15:32 <[TheBug]> er

15:32 <[TheBug]> Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)

15:33 <[TheBug]> mdamd --stop /dev/md0

15:33 <ichernev> yes, so if I force assemble it, will it think that it is "correct" or it will start checking?

15:34 <ichernev> this is the only worry I have about force assembly

15:34 <[TheBug]> I mean there is some risk here, cause I don't know how you got the drives into that weird state

15:34 <[TheBug]> where they 'assembled' but can't decide hot to activate

15:34 <[TheBug]> but all docs show you examine

15:34 <[TheBug]> stop

15:34 <[TheBug]> then if it won't assemble manually, you would force

15:34 <ichernev> :)

15:35 <[TheBug]> um

15:35 <[TheBug]> do you know which volume was the one that failed previously

15:35 <[TheBug]> whats really weird is all 4 drives believe 1 drive is missing

15:35 <[TheBug]> they are all AAA.

15:36 <ichernev> no, the last one is AAAA

15:36 <ichernev> sdd

15:36 <[TheBug]> ahh okay

15:36 <ichernev> at least it was when I ran it a few hours ago

15:36 <[TheBug]> you said this is raid10?

15:37 <ichernev> yes, raid10 with 4 drives

15:37 <[TheBug]> okay so

15:37 <[TheBug]> if I were you, I would do like the following maybe..

15:37 <[TheBug]> I would first stop and try to force assemble with all drives

15:37 <[TheBug]> barring that

15:38 <[TheBug]> I would assemble /dev/sda,sdb,sdc and then I would readd sdd to the array after it is only started with those 3 drives

15:38 <[TheBug]> as it seems that the array its self only thinks it has 3 right now

15:38 <[TheBug]> though forcing assemble may resolve that

15:39 <ichernev> I think re-add is for replacing

15:39 <[TheBug]> in fact sdc is also a bit desync, if you look sda and sdb are on event 91238

15:39 <[TheBug]> the other two drives are a bit off

15:39 <[TheBug]> probably why in this state

15:40 <[TheBug]> ichernev: if you check it doesn't understand its been dropped from array (sdd) thats why it shows AAAA while others AAA.

15:41 <ichernev> ok, after force, it shows the 3 drives only. I guess I have to --readd the 4th one now

15:41 <[TheBug]> so either by force assemble it will need to be placed back or

15:41 <[TheBug]> you can try assemble array with only 3 drives

15:41 <[TheBug]> then re-add the sdd and let it rebuild

15:41 <[TheBug]> its already a bit off in events

15:41 <[TheBug]> that may be a better idea than force assembling

15:41 <[TheBug]> see on examine

15:41 <[TheBug]> Events : 91238

15:41 <[TheBug]> this counter should technially all be the same

15:41 <[TheBug]> they should all be on same event

15:42 <[TheBug]> but in this case based on your pastebin, they are not

15:42 <[TheBug]> sdc is close

15:42 <[TheBug]> sdd seems a lot more off

15:42 <[TheBug]> so you can probably assemble a, b , c and then re-add d

15:42 <[TheBug]> as long as a, b, c come up in an assembled device

15:42 <[TheBug]> readding d shoudl be nominal

15:43 <[TheBug]> so something like

15:43 <[TheBug]> mdadm --stop /dev/md0

15:43 <[TheBug]> mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc

15:43 <[TheBug]> then mdadm -D /dev/md0

15:43 <[TheBug]> assuming it comes up

15:43 <[TheBug]> should show actrive array lacking 1 drive

15:43 <[TheBug]> then re-add volume to array (sdd)

15:44 <[TheBug]> and it will rebuild

15:44 <[TheBug]> if you use write intenal bitmap should be fast recovery

15:44 <[TheBug]> since not super behind

15:44 <ichernev> ok, so after force assembly a b and c showed up. Now I re-added d and it started rebuilding, but then it stopped and marked sdd as (F)

15:45 <[TheBug]> then mayube your drives bad

15:45 <[TheBug]> whast dmesg saying

15:45 <[TheBug]> or

15:45 <[TheBug]> actually

15:45 <[TheBug]> ichernev: I maye you a bet

15:45 <[TheBug]> I could be wrong

15:45 <[TheBug]> but I bet your SATA cable went bad on that drive

15:46 <[TheBug]> or isn't connected tightly

15:46 <[TheBug]> causing some issue with connectivity

15:46 <[TheBug]> its either that or the sata controller is having issues

15:46 <[TheBug]> would seem

15:46 <ichernev> https://termbin.com/qjvn

15:46 <[TheBug]> woo

15:46 <ichernev> this is dmesg, something is off

15:46 <[TheBug]> yep

15:46 <[TheBug]> Either: A. drive is dead, B. SATA cable is bad

15:46 <[TheBug]> those are my primary though

15:47 <[TheBug]> thought*

15:47 <[TheBug]> the last would be C. sata controller has issues, but lets hope not that

15:48 <[TheBug]> [ 5609.510650] md/raid10:md0: Disk failure on sdd, disabling device.

15:48 <[TheBug]> md/raid10:md0: Operation continuing on 3 devices.

15:48 <[TheBug]> it even says

15:48 <[TheBug]> if you have another drive I would just throw it in and try to rebuild

15:48 <[TheBug]> if it rebuilds... awesome you had a bad drive

15:48 <[TheBug]> if it fails or does something similar, replace SATA cable (or do both at once if you have a spare)

15:48 <[TheBug]> if it fails after that.. welll....maybe have a good cry in your cherios

15:49 <[TheBug]> cause would sound like controller has problem and you may not anylonger be able to use port 4 (if you try the rest and it keps failing)

15:49 <ichernev> I don't have an 8GB drive lying around ... but it's good to figure out the cable situation before getting a new drive. Will reconnecting to different ports be OK? It should be

15:50 <[TheBug]> port shouldn't matter, technically I believe it should even be hot plug

15:50 <[TheBug]> but I am not normally that brave

15:50 <[TheBug]> hehe

15:50 <ichernev> also it could be the controller. I have 2 identical boards (but due to the covit lockdown they are not with me ATM)

15:50 <[TheBug]> well those are the things I would do

15:50 <[TheBug]> sounds like at leasty your array is online in meantime now

15:50 <[TheBug]> which is a plus

15:50 <[TheBug]> but yeah

15:51 <[TheBug]> you need to try replace drive / sata cable

15:51 <[TheBug]> thats where you are

15:51 <[TheBug]> okay time for a cancer stick, bbiab

15:52 <ichernev> [TheBug], thanks a lot!

15:52 <[TheBug]> np

15:53 <[TheBug]> pay it forward :)

15:58 macc24 has joined #armbian

16:01 redentor has quit [Quit: Leaving]

16:28 dani has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

16:36 Hokedli has quit [Quit: Konversation terminated!]

16:58 dani has joined #armbian

17:05 dddddd has quit [Ping timeout: 240 seconds]

17:59 xecutertool has quit [Remote host closed the connection]

18:17 archetech has joined #armbian

18:38 dani has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

18:39 dani has joined #armbian

18:40 sunshavi has quit [Quit: nil]

18:43 xecuter has joined #armbian

18:47 sunshavi has joined #armbian

18:53 xec has joined #armbian

18:57 xecuter has quit [Ping timeout: 260 seconds]

19:32 dani has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]

19:34 dani has joined #armbian

20:16 dddddd has joined #armbian

20:40 macc24 has quit [Quit: WeeChat 2.8]

20:40 macc24 has joined #armbian

21:23 Tenkawa has left #armbian [#armbian]

21:43 toketin has quit [Ping timeout: 260 seconds]

22:11 macc24 has quit [Quit: sleep]

23:30 NeuroScr has joined #armbian