dwradcliffe changed the topic of #rubygems-aws to: RubyGems.org Ops | Log: http://irclog.whitequark.org/rubygems-aws | https://github.com/rubygems/rubygems-aws
seanlinsley has joined #rubygems-aws
seanlinsley has quit [Quit: …]
vertis has quit [Quit: vertis]
mocara has joined #rubygems-aws
mocara has quit [Client Quit]
seanlinsley has joined #rubygems-aws
seanlinsley has quit [Quit: …]
seanlinsley has joined #rubygems-aws
seanlinsley has quit [Quit: …]
sferik has joined #rubygems-aws
seanlinsley has joined #rubygems-aws
seanlinsley has quit [Read error: Connection reset by peer]
seanlinsley has joined #rubygems-aws
mocara has joined #rubygems-aws
hbeaver has joined #rubygems-aws
almostwhitehat has joined #rubygems-aws
mocara has quit [Quit: Leaving.]
mocara has joined #rubygems-aws
hbeaver has quit [Quit: hbeaver]
dwradcliffe_ is now known as dwradcliffe
hbeaver has joined #rubygems-aws
vertis has joined #rubygems-aws
vertis has quit [Client Quit]
vertis has joined #rubygems-aws
<vertis> dwradcliffe: hey
<dwradcliffe> vertis: hey!
<vertis> happy new year
<vertis> was trying to get hold of you or any of the other ops yesterday
<dwradcliffe> saw that, sorry I wasn't online. spent the day painting the nursery :)
<dwradcliffe> problems?
<vertis> just the memory pct alert
<vertis> on the dbmaster
<vertis> and not much to be done about it
<vertis> or not much I can do about it
<dwradcliffe> redis again?
<dwradcliffe> oh wrong node
<vertis> which node is redis on…app?
<vertis> yeah
almostwhitehat has quit [Remote host closed the connection]
<dwradcliffe> vertis: there it goes again
<vertis> indeed
<vertis> and I can't see anything wrong? Can you?
hbeaver has quit [Quit: hbeaver]
<dwradcliffe> I can't figure out why the other metrics don't show the memory usage the same
<dwradcliffe> seems like it might be console-kit that is using all the memory
<dwradcliffe> and I don't think we need that
<dwradcliffe> but I could be wrong
<dwradcliffe> vertis samkottler
<vertis> oops
<vertis> sorry, doing work
<vertis> yeah I saw that the other day, couldn't work out why it would be needed
<vertis> "ConsoleKit is a framework for defining and tracking users, login sessions, and seats"
evan_ has joined #rubygems-aws
<evan_> yo dudes
seanlinsley has quit [Quit: …]
<evan_> whats up with these datadog alerts?
<dwradcliffe> hey evan_
<dwradcliffe> we were just talking about it
<evan_> k
<dwradcliffe> evan_: I think it might be console-kit using up all the memory. From my quick research, I don't think we should even have console-kit installed
<evan_> it's dbmaster again, yes?
<dwradcliffe> yes
<evan_> I looked yesterday
<evan_> memory looked fine.
<vertis> evan_: was trying to ping you guys yesterday
<evan_> no prob
<vertis> couldn't see anything broken
<vertis> not sure that check is a helpful one
<vertis> too many false positives
<evan_> right
<vertis> might be better to just have it alert if redis/postgres falls over
<vertis> or something
mocara has quit [Quit: Leaving.]
<dwradcliffe> yeah, I was just hoping to find out before it falls over
<dwradcliffe> but if it's not reporting correctly, (seems like the case)...
<vertis> yeah it's difficult with only one server
<vertis> *
<vertis> you can't do cluster checks
<vertis> i.e. memory is funny on >2 boxes
<vertis> I typically get on the boxes pretty quickly after an alert
<vertis> the only reason I didn't silence that one, was I wasn't sure enough
<dwradcliffe> I muted it for now
<dwradcliffe> until we can figure out a better check
<vertis> what are the symptoms if redis goes down
<vertis> will it come through pager duty?
<dwradcliffe> I don't think we have a separate check, the app will just start failing in some way I think
<dwradcliffe> probably gem pushes :(
<vertis> hmmm
<dwradcliffe> the original problem that prompted me to setup the memory check was when redis didn't fall over, but stopped working because of lack of memory
<vertis> right
<vertis> maybe we should isolate the memory check to just the box with redis on it for now
<vertis> that or move to elasticache
<vertis> hint hint
seanlinsley has joined #rubygems-aws
<vertis> evan_: what are your thoughts on RDS/HerokuPostgres && Elasticache (Redis)
<evan_> sure, a few things
<evan_> 1) I'd be fine moving to RDS Postgres
gazoombo has quit [Ping timeout: 240 seconds]
<evan_> no problem with doing that
<evan_> would simplify things for sure
<vertis> the only problem I can see is doing it without an outage
<evan_> Elasticache wise, the issue much more how/why redis is used than redis itself.
<vertis> right
<evan_> vertis: a windowed outage is fine.
<evan_> our stats are a mess
<vertis> okay
<evan_> we can't use redis to store them.
<evan_> we need to come up with a system that manages them properly
<vertis> I haven't looked into why we use redis
<vertis> I might do that instead
<evan_> I recently shut off the daily stats
<evan_> which buys us time
<evan_> but I want to get them back
<evan_> but to do that, we have to manage them properly.
<vertis> I've done quite a bit of stats collection
<vertis> okay
<vertis> might take that approach instead
<vertis> what do you need from me to setup a time for a planned outage
<vertis> re postgres
<evan_> just information about how long the window is
<evan_> and when you'd like to do it.
<vertis> okay, I'll do some testing
<vertis> on RDS and heroku
<vertis> with our stack, and migration
<vertis> and I'll let you know
<evan_> thanks bud!
<dwradcliffe> ok, now cpu alert
<vertis> yeah
<dwradcliffe> rsyslog is using 100%
<dwradcliffe> on balancer02
<vertis> hmmm
<vertis> I'll leave that one to you
<vertis> I have a standup at work in 10 minutes
qrush has quit [Ping timeout: 252 seconds]
qrush has joined #rubygems-aws
seanlinsley has quit [Quit: …]