kentonv changed the topic of #sandstorm to: Welcome to #sandstorm: home of all things sandstorm.io. Say hi! | Have a question but no one is here? Try asking in the discussion group: https://groups.google.com/group/sandstorm-dev | Public logs at https://botbot.me/freenode/sandstorm/
pie_ has quit [Ping timeout: 252 seconds]
cbaines_ has quit [Quit: bye]
cbaines has joined #sandstorm
_whitelogger has joined #sandstorm
_whitelogger has joined #sandstorm
pie_ has joined #sandstorm
simpson has quit [Ping timeout: 252 seconds]
simpson has joined #sandstorm
ripdog has quit [Ping timeout: 250 seconds]
<Zarutian> kentonv: just out of sheer curiosity, how much, roughly, compute and memory does Oasis consume on a weekly basis?
<kentonv> Zarutian, there are currently 8 VMs that make up the system: 5x n1-standard-1, 2x n1-highmem-2, 1x g1-small
<kentonv> (that's for Oasis itself; Sandstorm's web site, app store, and update downloads are served elsewhere)
TC01 has quit [Ping timeout: 240 seconds]
isd has joined #sandstorm
<TimMc> so capped at about 10 vCPU and 46.5 GB RAM (but not *using* all of that)
<TimMc> and adding vCPU across machine types, well...
digitalcircuit has quit [Ping timeout: 250 seconds]
digitalcircuit has joined #sandstorm
TC01 has joined #sandstorm
<kentonv> TimMc, almost all the VMs are in single-digit percent utilization of CPU. The system was designed to be a lot more scalable than it needed to be, I guess. >_>
<kentonv> in fact self-hosted (single-machine) sandstorm on a beefy instance would probably have handled the load fine. Oops.
<simpson> On GCE, it doesn't matter quite as much. I suppose it depends on what's on each machine.
<TimMc> kentonv: That happens. :-)
<kentonv> Oasis has: master, storage, mongo, 2x worker, 2x shell
<kentonv> oh, and gateway
<kentonv> the gateway is a g1-small, the workers are n1-highmem-2, and the rest are n1-standard-1
<kentonv> master could probably be reduced to g1-small and probably storage could too.
<kentonv> but I worry about subtle performance loss
<kentonv> we could also probably go to just one shell
<simpson> Mm. Are you running full systemd? As I've containerized, I've found that that's actually one of the costs, and that there's been a modest savings from running more stuff on k8s.
<kentonv> these are full VMs. Some of the things could maybe run in containers but the workers definitely can't since they do a lot of root-only stuff, like setting up nbd devices.
<simpson> Mm, makes sense. It was only recently that I was able to get my Tahoe-LAFS storage servers off of VMs, and for similar reasons: Wiring up storage is non-trivial.
<mokomull> ooh, nbd? that's kind of my life these days :)
<kentonv> mokomull, yeah Blackrock makes heavy use of nbd in order to give each grain its own virtual-volume that's actually maintained on the remote storage server.
<kentonv> it's my favorite crazy systems hack
<mokomull> haha you're in good company, though ... ISTR someone big was using Ceph via a userspace NBD translator too.
<kentonv> nbd is basically fuse at the block layer
<mokomull> Do you preallocate a gajiggaton of /dev/nbd* devices, or are you using the netlink API?
<kentonv> gajiggaton of devices
<kentonv> didn't know you could use netlink for this
<mokomull> it's relatively new
<kentonv> I think I create 4096 devices at startup and then I have some code for locking them to grains.
<kentonv> and it's really easy for devices to get permanently stuck so I have some logic to route around stuck ones, it's gross
<kentonv> hahaha, before I even clicked I was wondering if the patch comes from Facebook
<kentonv> sure enough
<mokomull> kentonv: if you still end up with devices permanently stuck, I would absolutely love to hear about it. We've hit some of that after the blk-mq migration because the kyber and deadline scheduler somehow manage to mess with request IDs enough that confuses nbd.
<kentonv> (I talked to some people there who seemed excited about nbd recently)
<mokomull> I am one of those people there :)
<kentonv> oh hah
<mokomull> although my crazy patchset hasn't progressed beyond the "rewrite it before you publish this or you're gonna get skewered" stage
<kentonv> it's been years since I wrote the code, I'm sure it has gotten better
<mokomull> There was quite the onslaught of XFS fixes when we started this, that's for sure
<kentonv> mokomull, I've been using ext4 and it's been remarkably solid. I don't think I ever saw an instance of an unrecoverable volume or data loss caused by ext4, even though we disconnect mid-stream all the time.
<mokomull> that might say some things about our design choices :)
<kentonv> I'm sure you push a hell of a lot more bits though
<mokomull> I don't even know how many bits anymore. It's kind of mindblowing.