Werner changed the topic of #armbian to: armbian - Linux for ARM development boards | www.armbian.com | Github: github.com/armbian | Commits: #armbian-commits | Forums Feed: #armbian-rss | This channel is logged -> irc.armbian.com
archetech has quit [Quit: Textual IRC Client: www.textualapp.com]
ChriChri_ has joined #armbian
ChriChri has quit [Ping timeout: 264 seconds]
ChriChri_ is now known as ChriChri
dani has joined #armbian
sunshavi has quit [Ping timeout: 265 seconds]
sunshavi has joined #armbian
sunshavi has quit [Ping timeout: 265 seconds]
Tenkawa has quit [Quit: Leaving.]
sunshavi has joined #armbian
xecutertool has joined #armbian
xec has quit [Ping timeout: 244 seconds]
dddddd has quit [Ping timeout: 256 seconds]
_whitelogger has joined #armbian
oida has quit [Remote host closed the connection]
oida has joined #armbian
skiboy has quit [Quit: Leaving]
IgorPec has joined #armbian
IgorPec has quit [Changing host]
IgorPec has joined #armbian
IgorPec has quit [Read error: Connection reset by peer]
IgorPec has joined #armbian
IgorPec has joined #armbian
IgorPec has quit [Changing host]
NeuroScr has quit [Quit: NeuroScr]
ScrumpyJack has quit [Ping timeout: 264 seconds]
ScrumpyJack has joined #armbian
archetech has joined #armbian
_whitelogger has joined #armbian
sassinak-work has quit [Ping timeout: 260 seconds]
sassinak-work has joined #armbian
<IgorPec> Tony_mac32: hi
macc24 has joined #armbian
<Werner> npi-a64-only-audio-usb.patch and board-h6-orangepi-lite2-fix-missing-all.patch fail to apply. Dont know why yet.
<IgorPec> I think we should open a Jira with a topic "Cleanup sunxi patch mess" :) you are welcome to do that
<IgorPec> not sure if we can do much prior to this release
<Werner> I do not have a Lite2 which may be broken due to this
<IgorPec> aha, i will check it late. lite2 in fact have some issues ... must go now, later in the evening
<Werner> Not that much important though. It is dev branch
<IgorPec> aha, then later later ;)
<Werner> Yep. Have fun
<IgorPec> tnx, u2
<Werner> ty
Strykar has quit [Ping timeout: 240 seconds]
Strykar has joined #armbian
dddddd has joined #armbian
macc24 has quit [Quit: WeeChat 2.8]
macc24 has joined #armbian
macc24 has quit [Ping timeout: 272 seconds]
macc24 has joined #armbian
Tenkawa has joined #armbian
ichernev has joined #armbian
<ichernev> hello. I'm having some issues with Helios4 running ubuntu 18.04 with armbian kernel. For some reason my raid10 raid says all devices are busy, and it can't assemble them
<ichernev> there are no messages in dmesg. Is there a simple way to increase dmesg verbosity? Or recompile with DEBUG enabled for specific devices?
<plntyk> dmesg -n <level> , see man page
<plntyk> 9 should be more verbose than default for example
<plntyk> also kernel boot parameter "loglevel" could be changed eh... 7 is the max dmesg/kernel loglevel
<ichernev> plntyk, I don't know how to change the kernel command line
<ichernev> I tried less /proc/kmsg but I think it hung...
<ichernev> plntyk, well, the problem is that I get this device busy, and I can't figure out why. lsof doesn't show anything, fuser doesn't show anything...
<ichernev> plntyk, I did dmesg -n 8 (9 was not supported), and it didn't really help (no more logs in dmesg when I try to assemble the raid)
<plntyk> the kernel cmdline is probably set in u-boot bootloader configuration file / environment variable so you have to edit that
<plntyk> https://forum.armbian.com/topic/6033-helios4-support/page/7/ shows one example - but that involves having serial connection set up
<plntyk> dont know if Helios4 supports a user editable text-based bootscript in /boot
<ichernev> under /boot/armbianEnv.txt there is verbosity=1. If I change it to 8, do I need to rebuild /boot/boot.cmd?
macc24 has quit [Ping timeout: 258 seconds]
<ichernev> so /boot/armbianEnv.txt is "sourced" in /boot/boot.cmd, which contains setenv bootargs which loglevel=${verbosity}. I just need to recompile it then
<plntyk> yes
<ichernev> hm 'mkimage -C none -A arm -T script -d /boot/boot.cmd /boot/boot.scr' was listed at the end of boot.cmd, but it doesn't fetch the new value from armbianEnv.txt. boot.cmd says explicitly not to modify it directly ...
<ichernev> armbianEnv.txt is the current values of a working system (if I understand correctly), not the proper place to change values for boot.cmd. Anyway. I modified the file that should not be modified and I have verbosity=12 now
<ichernev> https://termbin.com/h4v5 -- this is (part of) dmesg. It has a few errors around sata, not sure how critical they are
<Tenkawa> what is on sdb?
<Tenkawa> theres an io error on it too
<Tenkawa> or is that the sata drive?
<ichernev> it has 4 sata devices (sda-sdd), they are supposed to run in raid. The / is on mmc, so it is not critical for boot, but my mdadm assemble gives "DEVICE BUSY"
<Tenkawa> can they be polled individually?
<ichernev> Tenkawa, I don't understand the question. What command should I run to poll them? I do have /dev/sda /dev/sdb .. devices. I can run smartctl commands on them, and they respond well. The LEDs flash when polled...
<Tenkawa> what about fdisk -l /dev/sda
<Tenkawa> does it come back at least to a prompt
<Tenkawa> (there may be no table on the drive but this will check communications)
<Tenkawa> I'll brb.. I need to go outside for 5 minutes and plug up lawnmower battery
<ichernev> they are part of raid array, so no partitions, but it find basic info: https://termbin.com/hcne
<Tenkawa> I know that however fdisk -l still sends a call to the controller and drive to make sure its readable
<Tenkawa> if you dont even get feedback at the os level the raid isnt going tp matter anyway
<Tenkawa> er to
<Tenkawa> for this test
Hokedli has joined #armbian
<Tenkawa> afk again.. brb (doing a lot of stuff today while our weather is half way decent)
<ichernev> Tenkawa, well, I can dd if=/dev/sdX and read as much as I want to, so in that regard they are accessible
<Tenkawa> back
<Tenkawa> thats good
<Tenkawa> so you have "direct" access
<Tenkawa> hmm
<Tenkawa> let me look up something
<[TheBug]> Helios4 is Marvel much like ESPRESSOBin
<Tenkawa> hmm.. do you have smartctl installed?
<[TheBug]> your using onboard SATA channels or have you added additional via a card?
<ichernev> this is smartctl on all devices. I ran short tests and they are all good
<[TheBug]> ^^
<ichernev> [TheBug], what do you mean? It is marvel 380, and it is helio4, yes
<[TheBug]> so are you using the built on 4 ports onlyu
<[TheBug]> or did you add another sata via mPCIe
<[TheBug]> I can't remember if they provided a mPCIe there
<[TheBug]> or used it for the 4 ports they supply
<ichernev> [TheBug], ah, sorry. I'm using "onboard" -- the ones that came in with the board. I'm not sure how the board itself is configured, but there is a mainline DTS for inspiration :)
<[TheBug]> k just checking because for my ESPRESSOBin's I use a 4 port sata card in mPCIe
<Tenkawa> hmmm thats not good that they wont even register
<[TheBug]> which I think Helios uses similar just on board using the mpcie lanes
<[TheBug]> Do you have any patition table on the drives your using?
<[TheBug]> when you first assembled the raid10
<[TheBug]> did you do it on direct device
<[TheBug]> or on paritions?
<ichernev> [TheBug], I used whole devices. I read it is the "best" way? ... :-/
<[TheBug]> or is this the first time trying to assemble?
<[TheBug]> thats fine
<[TheBug]> just trying to get a better idea of y our setup
<ichernev> Tenkawa, what do you mean "wont even register". Register where
<Tenkawa> actually using partitions is
<[TheBug]> can you pastebin the command your running that returns the error?
<[TheBug]> (if its too long to paste here)
<Tenkawa> these drives arent in the smart db
<Tenkawa> harder to diag
<ichernev> https://termbin.com/fx858 -- assemble gives "busy"
<Tenkawa> line 324 of your smartctl is one example
<ichernev> Tenkawa, that is weird, they are relatively new model. 8GB Seagate Skyhawk
<[TheBug]> when your do a cat /proc/mdstat
<[TheBug]> what do you see?
<[TheBug]> it sounds to me like it already auto assembled
<[TheBug]> most of time mdadm runs on boot and auto assembles
<[TheBug]> so unless you did something to cause it not to during boot you may be fighting a game you already won
<ichernev> I was running this setup for 1.2 years, whithout any issues. One day I receive an email that it is degraded (one disk out), but when I logged in I saw the "missing" drive. For some reason it had a different name (sde), but it is all configured with UUID (I hope). And now after a reboot it can't attach any drives -- they are all busy
<[TheBug]> the new name means it likely lost connectivity to the drive for some reason like hot-swap
<[TheBug]> and when the controller came back up gave it a new name
Hokedli has quit [Quit: Konversation terminated!]
<[TheBug]> ya bro
<[TheBug]> just look what you just sent me
<[TheBug]> ohh
<[TheBug]> I see
<[TheBug]> now I better understand
<[TheBug]> it won't activate
<[TheBug]> it put them all together but won't activate
<[TheBug]> so it is assembled already
<[TheBug]> but something not allowing it to activate
<ichernev> the (S) stands for spare, as far as I read?
<[TheBug]> ichernev: so
<[TheBug]> two things I would do
<[TheBug]> actually a few
<ichernev> can I tell it to reread and see if it's OK? I'm pretty sure the problem is a few blocks in the device that got dropped
<[TheBug]> this will help you to figure
<[TheBug]> first
<[TheBug]> mdadm -D /dev/md0
<ichernev> delete?
<[TheBug]> this will give you better stats about whats going on in your raid
<[TheBug]> no, that is not delete
<[TheBug]> just dumb use of a D flag on a program
<[TheBug]> lol seems scary but that is not it
Hokedli has joined #armbian
<[TheBug]> mdadm --help if you need
<[TheBug]> anyways
<[TheBug]> check that
<[TheBug]> it will describe current state more
<ichernev> yes, I checked :) it is detail -- I've run it. I had a long conversation in the ubuntu channel, but when they figured I was running armbian kernel they got kind of angry :|
archetech has quit [Quit: Leaving]
<[TheBug]> um has nothing to do with kernel
<[TheBug]> things are just out of whack and you will need to speed time on it
<[TheBug]> you will need to stop the mdadm raid
<[TheBug]> you will need to examine the drives to make sure they are still in sync
<[TheBug]> then you will want to likely force assemble the raid again
<[TheBug]> you will probably want to follow similar case
<[TheBug]> stop the raid
<[TheBug]> examine drives
<[TheBug]> then force re-assemble if you need
redentor has joined #armbian
<ichernev> [TheBug], ok, that explains most of it. So should I let mdadm figure out the "test", or I need to test the drives myself? It would make sense that mdadm can do that
<[TheBug]> not sure what you mean by test, I didn't use that word, I said examine which is a function of the tool also described in that reference
<ichernev> I've run examine -- they all look "good"
<ichernev> let me paste it again (it was at the top)
<[TheBug]> it shows better how to examine the volumes
<[TheBug]> so
<[TheBug]> to me it looks like you need to stop the array
<[TheBug]> and probably force assemble it
<[TheBug]> right now
<[TheBug]> array thinks it is missing one drive though
<[TheBug]> er
<[TheBug]> Array State : AAA. ('A' == active, '.' == missing, 'R' == replacing)
<[TheBug]> mdamd --stop /dev/md0
<ichernev> yes, so if I force assemble it, will it think that it is "correct" or it will start checking?
<ichernev> this is the only worry I have about force assembly
<[TheBug]> I mean there is some risk here, cause I don't know how you got the drives into that weird state
<[TheBug]> where they 'assembled' but can't decide hot to activate
<[TheBug]> but all docs show you examine
<[TheBug]> stop
<[TheBug]> then if it won't assemble manually, you would force
<ichernev> :)
<[TheBug]> um
<[TheBug]> do you know which volume was the one that failed previously
<[TheBug]> whats really weird is all 4 drives believe 1 drive is missing
<[TheBug]> they are all AAA.
<ichernev> no, the last one is AAAA
<ichernev> sdd
<[TheBug]> ahh okay
<ichernev> at least it was when I ran it a few hours ago
<[TheBug]> you said this is raid10?
<ichernev> yes, raid10 with 4 drives
<[TheBug]> okay so
<[TheBug]> if I were you, I would do like the following maybe..
<[TheBug]> I would first stop and try to force assemble with all drives
<[TheBug]> barring that
<[TheBug]> I would assemble /dev/sda,sdb,sdc and then I would readd sdd to the array after it is only started with those 3 drives
<[TheBug]> as it seems that the array its self only thinks it has 3 right now
<[TheBug]> though forcing assemble may resolve that
<ichernev> I think re-add is for replacing
<[TheBug]> in fact sdc is also a bit desync, if you look sda and sdb are on event 91238
<[TheBug]> the other two drives are a bit off
<[TheBug]> probably why in this state
<[TheBug]> ichernev: if you check it doesn't understand its been dropped from array (sdd) thats why it shows AAAA while others AAA.
<ichernev> ok, after force, it shows the 3 drives only. I guess I have to --readd the 4th one now
<[TheBug]> so either by force assemble it will need to be placed back or
<[TheBug]> you can try assemble array with only 3 drives
<[TheBug]> then re-add the sdd and let it rebuild
<[TheBug]> its already a bit off in events
<[TheBug]> that may be a better idea than force assembling
<[TheBug]> see on examine
<[TheBug]> Events : 91238
<[TheBug]> this counter should technially all be the same
<[TheBug]> they should all be on same event
<[TheBug]> but in this case based on your pastebin, they are not
<[TheBug]> sdc is close
<[TheBug]> sdd seems a lot more off
<[TheBug]> so you can probably assemble a, b , c and then re-add d
<[TheBug]> as long as a, b, c come up in an assembled device
<[TheBug]> readding d shoudl be nominal
<[TheBug]> so something like
<[TheBug]> mdadm --stop /dev/md0
<[TheBug]> mdadm --assemble /dev/md0 /dev/sda /dev/sdb /dev/sdc
<[TheBug]> then mdadm -D /dev/md0
<[TheBug]> assuming it comes up
<[TheBug]> should show actrive array lacking 1 drive
<[TheBug]> then re-add volume to array (sdd)
<[TheBug]> and it will rebuild
<[TheBug]> if you use write intenal bitmap should be fast recovery
<[TheBug]> since not super behind
<ichernev> ok, so after force assembly a b and c showed up. Now I re-added d and it started rebuilding, but then it stopped and marked sdd as (F)
<[TheBug]> then mayube your drives bad
<[TheBug]> whast dmesg saying
<[TheBug]> or
<[TheBug]> actually
<[TheBug]> ichernev: I maye you a bet
<[TheBug]> I could be wrong
<[TheBug]> but I bet your SATA cable went bad on that drive
<[TheBug]> or isn't connected tightly
<[TheBug]> causing some issue with connectivity
<[TheBug]> its either that or the sata controller is having issues
<[TheBug]> would seem
<[TheBug]> woo
<ichernev> this is dmesg, something is off
<[TheBug]> yep
<[TheBug]> Either: A. drive is dead, B. SATA cable is bad
<[TheBug]> those are my primary though
<[TheBug]> thought*
<[TheBug]> the last would be C. sata controller has issues, but lets hope not that
<[TheBug]> [ 5609.510650] md/raid10:md0: Disk failure on sdd, disabling device.
<[TheBug]> md/raid10:md0: Operation continuing on 3 devices.
<[TheBug]> it even says
<[TheBug]> if you have another drive I would just throw it in and try to rebuild
<[TheBug]> if it rebuilds... awesome you had a bad drive
<[TheBug]> if it fails or does something similar, replace SATA cable (or do both at once if you have a spare)
<[TheBug]> if it fails after that.. welll....maybe have a good cry in your cherios
<[TheBug]> cause would sound like controller has problem and you may not anylonger be able to use port 4 (if you try the rest and it keps failing)
<ichernev> I don't have an 8GB drive lying around ... but it's good to figure out the cable situation before getting a new drive. Will reconnecting to different ports be OK? It should be
<[TheBug]> port shouldn't matter, technically I believe it should even be hot plug
<[TheBug]> but I am not normally that brave
<[TheBug]> hehe
<ichernev> also it could be the controller. I have 2 identical boards (but due to the covit lockdown they are not with me ATM)
<[TheBug]> well those are the things I would do
<[TheBug]> sounds like at leasty your array is online in meantime now
<[TheBug]> which is a plus
<[TheBug]> but yeah
<[TheBug]> you need to try replace drive / sata cable
<[TheBug]> thats where you are
<[TheBug]> okay time for a cancer stick, bbiab
<ichernev> [TheBug], thanks a lot!
<[TheBug]> np
<[TheBug]> pay it forward :)
macc24 has joined #armbian
redentor has quit [Quit: Leaving]
dani has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
Hokedli has quit [Quit: Konversation terminated!]
dani has joined #armbian
dddddd has quit [Ping timeout: 240 seconds]
xecutertool has quit [Remote host closed the connection]
archetech has joined #armbian
dani has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
dani has joined #armbian
sunshavi has quit [Quit: nil]
xecuter has joined #armbian
sunshavi has joined #armbian
xec has joined #armbian
xecuter has quit [Ping timeout: 260 seconds]
dani has quit [Quit: https://quassel-irc.org - Chat comfortably. Anywhere.]
dani has joined #armbian
dddddd has joined #armbian
macc24 has quit [Quit: WeeChat 2.8]
macc24 has joined #armbian
Tenkawa has left #armbian [#armbian]
toketin has quit [Ping timeout: 260 seconds]
macc24 has quit [Quit: sleep]
NeuroScr has joined #armbian