siruf has quit [*.net *.split]
dwknoxy has quit [*.net *.split]
vlad_starkov has quit [*.net *.split]
ckrailo has quit [*.net *.split]
blowmage has quit [*.net *.split]
dwknoxy has joined #rubygems
blowmage has joined #rubygems
ckrailo has joined #rubygems
vlad_starkov has joined #rubygems
siruf has joined #rubygems
jitendravyas has joined #rubygems
jitendravyas has quit [Ping timeout: 272 seconds]
Hanmac has quit [Ping timeout: 260 seconds]
huoxito has quit [Remote host closed the connection]
huoxito has joined #rubygems
huoxito has quit [Ping timeout: 265 seconds]
havenwood has quit [Remote host closed the connection]
huoxito has joined #rubygems
huoxito has quit [Ping timeout: 245 seconds]
havenwood has joined #rubygems
huoxito has joined #rubygems
tenderlove has quit [Quit: Leaving...]
Hanmac has joined #rubygems
jhass is now known as jhass|off
havenwood has quit [Remote host closed the connection]
huoxito has quit [Remote host closed the connection]
lsegal has joined #rubygems
huoxito has joined #rubygems
huoxito has quit [Ping timeout: 245 seconds]
tbuehlmann has joined #rubygems
_redmenace has joined #rubygems
redmenace has quit [Ping timeout: 244 seconds]
_redmenace has quit [Read error: Connection reset by peer]
redmenace has joined #rubygems
redmenace has quit [Ping timeout: 250 seconds]
redmenace has joined #rubygems
_redmenace has joined #rubygems
redmenace has quit [Ping timeout: 258 seconds]
elia has joined #rubygems
seanlinsley has quit [Ping timeout: 272 seconds]
seanlinsley has joined #rubygems
dangerousdave has joined #rubygems
tbuehlmann has quit [Remote host closed the connection]
dangerousdave has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
workmad3 has joined #rubygems
sferik has joined #rubygems
<
sferik>
evan qrush drbrain samkottler: anyone awake? we’ve got problems
<
dwradcliffe>
Just woke up
<
sferik>
dwradcliffe: good morning
<
sferik>
dwradcliffe: there appears to be a situation
lsegal has quit [Quit: Quit: Quit: Quit: Stack Overflow.]
<
sferik>
dwradcliffe: I haven’t had much time to diagnose the problem but it seems we’re getting DDoS’d
<
dwradcliffe>
yeah pagerduty woke me up :)
<
sferik>
dwradcliffe: my internet connection sucks and I need to go give a conference talk in about an hor
<
dwradcliffe>
give me a second to sort out what's happening
<
sferik>
dwradcliffe: I noticed that app01-aws.rubygems.org was missing some security updates, so I decided to apply those
<
sferik>
dwradcliffe: I hope that’s not a mistake
<
sferik>
dwradcliffe: it seemed prudent since the site was already down
<
dwradcliffe>
that's still running?
<
dwradcliffe>
that's not in use
<
sferik>
dwradcliffe: yes (please don’t reboot)
<
sferik>
dwradcliffe: what's not in use?
<
dwradcliffe>
that server
<
sferik>
oh, I am very confused then
<
sferik>
which are the production servers?
sferik has joined #rubygems
redmenace has joined #rubygems
<
dwradcliffe>
ok looks like redis is down again
<
sferik>
dwradcliffe: okay, the update to app01-aws.rubygems.org is complete
<
sferik>
dwradcliffe: which servers are involved?
_redmenace has quit [Ping timeout: 265 seconds]
jhass|off is now known as jhass
tcopeland has quit [Quit: Leaving.]
rossgeesman has joined #rubygems
rossgeesman has quit [Remote host closed the connection]
<
qrush>
Seems like things are ok?
<
qrush>
Apparently texts do not wake me always
dangerousdave has joined #rubygems
dwknoxy has joined #rubygems
dangerousdave has quit [Client Quit]
dangerousdave has joined #rubygems
rossgeesman has joined #rubygems
bbrowning_away is now known as bbrowning
rossgeesman has quit [Ping timeout: 265 seconds]
tcopeland has joined #rubygems
willywos has joined #rubygems
workmad3 has quit [Ping timeout: 250 seconds]
rossgeesman has joined #rubygems
dangerousdave has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
huoxito has joined #rubygems
tbuehlmann has joined #rubygems
workmad3 has joined #rubygems
bradland has joined #rubygems
rossgeesman has quit [Remote host closed the connection]
tenderlove has joined #rubygems
rossgeesman has joined #rubygems
tbuehlmann has quit [Quit: Leaving]
seanlinsley has quit [Ping timeout: 244 seconds]
havenwood has joined #rubygems
rossgeesman has quit [Remote host closed the connection]
rossgeesman has joined #rubygems
rossgeesman has quit [Remote host closed the connection]
rossgeesman has joined #rubygems
seanlinsley has joined #rubygems
rossgeesman has quit [Remote host closed the connection]
havenwood has quit [Remote host closed the connection]
havenwood has joined #rubygems
rossgeesman has joined #rubygems
dvu has joined #rubygems
dvu has quit [Remote host closed the connection]
dvu has joined #rubygems
rossgeesman has quit [Ping timeout: 260 seconds]
bradland has quit [Quit: bradland]
bbrowning is now known as bbrowning_away
drbrain has quit [Ping timeout: 240 seconds]
drbrain has joined #rubygems
tbuehlmann has joined #rubygems
elia has quit [Quit: Computer has gone to sleep.]
elia has joined #rubygems
bbrowning_away is now known as bbrowning
workmad3 has quit [Ping timeout: 272 seconds]
dwknoxy is now known as dknox-lunch
thumpba_ has joined #rubygems
havenwood has quit [Remote host closed the connection]
thumpba has quit [Ping timeout: 246 seconds]
thumpba has joined #rubygems
thumpba_ has quit [Ping timeout: 255 seconds]
bbrowning has quit [Remote host closed the connection]
havenwood has joined #rubygems
bbrowning has joined #rubygems
djbkd has joined #rubygems
havenwood has quit [Remote host closed the connection]
<
samkottler>
qrush: we need to move to a new redis box
<
samkottler>
because we're evicting too quickly
<
qrush>
evicting what
<
samkottler>
to the disk
<
evan>
we also need to get off redis
<
qrush>
i thought the dependency API is off us anyway
<
evan>
it's still on my todo.
<
qrush>
which is the primary redis issue
<
evan>
we've still got stats in redis
<
samkottler>
evan: CRDT's
<
samkottler>
just sayin
<
qrush>
had to google that -_-
<
samkottler>
while we're at it stat-update or whatever that thing is called should get rewritten
<
dwradcliffe>
hey guys, already moved redis this morning
<
samkottler>
oh awesome dwradcliffe, to which instance type?
<
dwradcliffe>
m3-large
<
qrush>
samkottler: ah you were responding to my earlier question?
<
qrush>
i'm really surprised that just doing increments in redis is problematic
<
qrush>
is it because we're on AOF too?
<
samkottler>
it's not doing increments themselves
<
samkottler>
it's that heap frag means we need a huge amount of free memory
<
dwradcliffe>
I had to reboot redis02 about a dozen times. it was only lasting about 10 minutes before freezing.
<
samkottler>
like 40% overhead
<
samkottler>
AWS should make a redis instance type
<
samkottler>
1 core and a bunch of memory
<
samkottler>
trololol
<
evan>
what does elasticcache use?
<
qrush>
we run a shitload of redis at basecamp and it's not an issue
<
qrush>
i can bring one of our guys in here, maybe we can check our config against theirs?
<
samkottler>
this a known issue with redis
<
samkottler>
it just requires more hardware
<
samkottler>
evan: cache.r3.large 2 CPU's and 13.5 GB of RAM
<
qrush>
samkottler: sure as in yes, let's double check?
<
qrush>
just trying to help :)
<
samkottler>
qrush: I'm totally open to chatting with folks about :)
<
dwradcliffe>
our setup is fairly vanilla so maybe there's something we can tweak
mr_ndrsn has joined #rubygems
mkent has joined #rubygems
* qrush
summoned a few to take a look
<
qrush>
samkottler: can you gist the config for mr_ndrsn and mkent ? also
*waves*
<
mr_ndrsn>
Hey Sam!
<
qrush>
or evan dwradcliffe :)
<
johnmwilliams___>
Basecamp Ops REPRESENT!
<
samkottler>
hey hey mr_ndrsn!
<
dwradcliffe>
howdy mr_ndrsn
<
dwradcliffe>
I can gist it unless someone else has it already
<
samkottler>
I'm gisting right now
<
samkottler>
one second
<
dwradcliffe>
gist race!
<
samkottler>
high level overview:
<
samkottler>
maxmemory-policy means we use it like a LRU cache
<
samkottler>
but
_never_ evict key permanently, just to rdb
<
samkottler>
you can see the write policy
havenwood has joined #rubygems
<
samkottler>
we fsync every second
<
samkottler>
not particularly aggressive about BGREWRITEAOF
<
samkottler>
and then activerehashing
<
samkottler>
that's mostly it
<
samkottler>
other than some setting around data types
<
samkottler>
which are generally unimportant
<
johnmwilliams___>
Just to be clear, this is a disk space issue not a disk IO issue, correct?
<
samkottler>
mr_ndrsn: do you have lots of issues at basecamp around heap presusre?
<
samkottler>
pressure**
<
samkottler>
johnmwilliams___: the disk isn't the issue at all
<
johnmwilliams___>
Ok, getting mixed things from here and internal chat.
<
samkottler>
internal chat where?
<
dwradcliffe>
johnmwilliams___: memory issue
<
samkottler>
oh work
<
mr_ndrsn>
No idea SK. mkent/johnmwilliams are better candidates for that question.
<
samkottler>
so here's what it looks like to me and past experience has shown this problem before
<
samkottler>
extreme pressure around heap alloc/dealloc
<
samkottler>
it's hard to actually prove that
<
samkottler>
other than throwing more memory at it and then hoping it works better
<
johnmwilliams___>
Already verified that you are not hitting max open files or anything like that?
<
johnmwilliams___>
(We have done that before)
<
samkottler>
this issue is pretty well isolated the memory pressure
<
samkottler>
to memory pressure**
<
mr_ndrsn>
What are they symptoms you’re seeing?
<
mr_ndrsn>
err, the.
<
qrush>
if it helps i do have ssh access and we're all in the same place today
<
qrush>
so they could poke around with my box
<
qrush>
wow that sounds awful
<
mkent>
don't see maxmemory set in the gist
<
mkent>
wonder if it'll even observe the policy
<
samkottler>
mkent: maxmemory itself isn't set, but the policy is
<
samkottler>
mkent: it's possible this is a bug in redis
<
samkottler>
where is doesn't start using the policy when it's just putting pressure on system memory
<
dwradcliffe>
kernel: [13900.835381] Out of memory: Kill process 4162 (redis-server) score 888 or sacrifice child
<
dwradcliffe>
kernel: [13900.839891] Killed process 4162 (redis-server) total-vm:3635904kB, anon-rss:3590092kB, file-rss:0kB
<
samkottler>
oom killer is a whole other thing
<
samkottler>
the problem should stop before oom-killer kicks in
<
samkottler>
maybe we should try to set maxmemory statically
<
mr_ndrsn>
statically == “uncomment the entry in the gist?”
<
johnmwilliams___>
If you don't set maxmemory there is a chance it will just eat up all available memory.
<
johnmwilliams___>
Including swap.
<
samkottler>
alright lemme try setting the maxmemory to 3gb
<
johnmwilliams___>
I'd say set it in the config to 85% of system memory.
<
samkottler>
or actually, 6.5GB on the new box
<
johnmwilliams___>
5.5 would be alright.
<
mkent>
worth a try, we set one to 3G, seems to keep it at about 3.3G of rss
<
samkottler>
I'm somewhat scared about the policy
<
samkottler>
which is why this hasn't been set before
<
samkottler>
because the data in our redis instance is not cache, it's real long-term data that needs to be persisted at all costs
<
johnmwilliams___>
It should get persisted to disk.
<
samkottler>
which is a big reason in an of itself to get rid of redis in its current form for us
<
samkottler>
johnmwilliams___: not necessarily, some of the policies evict keys like an LRU
<
samkottler>
an LRU in which a population means the data is gone
<
samkottler>
pollution
<
samkottler>
brb, it's like 3:05pm and I haven't had lunch yet
<
mkent>
well, volatile-lru should only dump stuff set with an expiration
<
mkent>
unless there's data set with an absurdly high value
<
mr_ndrsn>
Or could try noeviction and handle it in the application code? ¯\_(ツ)_/¯
drbrain has quit [Quit: Goodbye]
drbrain has joined #rubygems
tbuehlmann has quit [Remote host closed the connection]
mr_ndrsn has quit [Quit: mr_ndrsn]
mkent has quit [Quit: Leaving.]
tenderlove has quit [Remote host closed the connection]
drbrain has quit [Ping timeout: 255 seconds]
dknox-lunch is now known as dknox
drbrain has joined #rubygems
seanlinsley has quit [Quit: seanlinsley]
seanlinsley has joined #rubygems
djbkd has quit [Remote host closed the connection]
dangerousdave has joined #rubygems
<
qrush>
hey evan samkottler
<
qrush>
what if we just had a separate redis per year
<
qrush>
this isn't too stupid right? :)
<
qrush>
and then only one is written to per year
willywos has quit [Ping timeout: 272 seconds]
tcopeland has quit [Ping timeout: 244 seconds]
djbkd has joined #rubygems
dangerousdave has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
djbkd has quit [Ping timeout: 245 seconds]
djbkd has joined #rubygems
dvu has quit [Remote host closed the connection]
<
evan>
qrush: I dump the year stats anyway
<
evan>
thats not fine grained enough.
djbkd has quit [Remote host closed the connection]
djbkd has joined #rubygems
mkent has joined #rubygems
mr_ndrsn has joined #rubygems
huoxito has quit [Remote host closed the connection]
mr_ndrsn has quit [Client Quit]
elia has joined #rubygems
tenderlove has joined #rubygems
tenderlove has quit [Read error: Connection reset by peer]
tenderlove has joined #rubygems
lsegal has joined #rubygems
tcopeland has joined #rubygems
tenderlove has quit [Quit: Leaving...]
mkent has quit [Quit: Leaving.]
mkent has joined #rubygems
bbrowning is now known as bbrowning_away
tenderlove has joined #rubygems
jhass is now known as jhass|off
tenderlove has quit [Client Quit]
mkent has left #rubygems [#rubygems]
tenderlove has joined #rubygems
jhass|off is now known as jhass
dvu has joined #rubygems
<
Rennex>
ugh... i tried "gem install pry -v", "gem -v install pry", "gem --verbose install pry", and "gem install -v pry" before finally landing on the winning combination of "gem install --verbose pry"
seanlinsley has quit [Quit: seanlinsley]
huoxito has joined #rubygems
seanlinsley has joined #rubygems
tenderlove has quit [Read error: Connection reset by peer]