#sandstorm on 2020-06-08 — irc logs at freenode.irclog.whitequark.org

2019-09-28 22:33 kentonv changed the topic of #sandstorm to: Welcome to #sandstorm: home of all things sandstorm.io. Say hi! | Have a question but no one is here? Try asking in the discussion group: https://groups.google.com/group/sandstorm-dev

00:42 xet7 has joined #sandstorm

01:46 wings has joined #sandstorm

01:48 wings has quit [Client Quit]

03:01 <abliss> pty update: I spent a couple days staring at the gvisor pty code, trying to understand how it handles job control (e.g. translating ^C into SIGINT, ^Z to SIGSTP). Stared hard at the code, right where i thought it should be implemented, couldn't find it. Grepped all over. Turns out: it's not implemented at all yet. So, popping way back to the earlier discussion, where kentonv speculated that GCP might be using gvisor for

03:01 <abliss> its web shell: it must not be (at least not the public version of gvisor), because those webshells (i'm pretty sure) do provide job control.

03:04 <abliss> yep, just double checked with one on https://console.cloud.google.com/home/dashboard?cloudshell=true , running `sleep 5` and then hitting ctrl-C works.

03:05 <abliss> i guess it must be a real emulated kernel. (Interestingly, `dmesg` claims it's Chromium OS)

03:05 <abliss> dmesg also claims "Hypervisor detected: KVM"

03:08 <abliss> also: dang! https://threatpost.com/100k-google-cloud-shell-root-compromise/153665/

03:11 <abliss> google left inside the container a socket pointing to the docker instance that runs on the external host. This dude stumbled across it, used it to break out of the container, and earned $100k

03:12 <abliss> that is some pretty low-hanging fruit, which makes me think that there are probably other privilege-escalation bugs in Cloud SHell waiting to be found...

05:22 <isd> So I'm starting to think about cgroups again.

05:23 <isd> One snag with using them is that we can't really rely on their availability on an arbitrary linux system without asking the user to tweak some system config. I guess we could try to do this in the install script, but it would involve some complexity and I'm not sure how portably we could write that. And there would be the matter of upgrading existing installations.

05:24 <isd> I suppose we could have it be the sort of thing where we use it if available, like the way we treat user namespaces.

11:23 xet7 has quit [Quit: Leaving]

11:25 xet7 has joined #sandstorm

12:41 <abliss> sandstorm works without user namespaces? does it just require the sandstorm daemon to be run as root?

14:14 <kentonv> abliss, yes, it supports two modes, one where it runs an root and one where it uses user namespaces

14:14 <kentonv> we did that because a few distros disallowed unprivileged user namespaces for a long time (and maybe still do?)

14:15 <kentonv> also wanted to support some ancient CentOS version that used kernel 3.10 where user namespaces didn't fully work yet

14:41 <JacobWeisz[m]> I sent a typo PR this morning for node-capnp. Unrelated to the issue someone opened this morning, but found while browsing because of it.

14:41 <JacobWeisz[m]> github.com/kenton is not kentonv :P

14:52 <kentonv> funny that no one ever noticed that in 6 years of existence

14:55 <abliss> yeah. (we don't know for sure it was there the whole time, but I bet it was). Reminiscent of the first android device, the Tmobile G1, that shipped with a root console listening to the physical keyboard.

14:55 <kentonv> slightly less severe than that... :)

14:56 <abliss> i dunno, that required physical access to exploit. running amok with root access on a google container host could attack sideways at other users in the cloud, right?

14:57 <JacobWeisz[m]> It's the sort of thing you'd only hit if you explicitly copy pasted that line. If you were looking at the repo name or whatever, you'd not hit it. And since it moved to the capnproto org, most people probably would glance at that line and go "oh, that's the old location".

14:57 <JacobWeisz[m]> But it's an invalid old location.

14:58 <JacobWeisz[m]> He means this typo is slightly less severe than that. Especially since it just led to a 404.

15:00 <kentonv> yeah I was talking about the readme typo

15:00 <kentonv> which I guess Mr. Kenton Newby could have exploited maybe, but he didn't.

15:03 <abliss> oh haha, yeah, sorry. i was still thinking of the cloud shell docker bug.

15:04 <kentonv> ahh

15:05 <kentonv> that bug I'd say is pretty embarrassing considering how many times the docker socket being exposed has led to security bugs in other systems... people should be looking for that

15:05 <kentonv> also I guess this implies that they aren't running the shell inside a VM which is very surprising to me

15:07 <kentonv> and yes, it's more severe than the G1 bug

16:11 frigginglorious has joined #sandstorm

21:11 frigginglorious1 has joined #sandstorm

21:12 frigginglorious has quit [Ping timeout: 246 seconds]

21:12 frigginglorious1 is now known as frigginglorious

21:13 <JacobWeisz[m]> kentonv: I think you didn't sync 0.267

21:13 frigginglorious has quit [Read error: Connection reset by peer]

21:14 <kentonv> fixed

21:15 <JacobWeisz[m]> Woo. Probably unlikely to cause conflicts but likely nice to have updated. ;)

21:16 frigginglorious has joined #sandstorm

21:17 <abliss> random thought of the day: would it be possible to implement the matrix protocol (for a sparsely-used server) atop google cloud functions / aws lambda / cloudflare workers? (is there any difference between them?) -- the one big problem I see right a way is that clients will do long-polling with hanging GET requests, and i will be uneconomical (or downright forbidden) to keep the backend VM alive the whole time.

21:17 <abliss> Which makes me wonder, again, whether there could be some kind of middleware that can detect when a process is stuck waiting for IO (accept/read/poll/select), and freeze the whole thing to disk, and then later, thaw it all out (including restoring TCP connections) when IO becomes possible.

21:17 <kentonv> definitely should not be pushing releases containing non-public code (unless maybe if it's a critical security fix and we want everyone to update before revealing)

21:17 <kentonv> sorry about that

21:19 <kentonv> abliss, CF Workers doesn't mind long polling (as long as you're OK with the occasional random cancellation)

21:20 <abliss> do i get charged a per-hour for how long the worker stays resident?

21:22 <abliss> looks like google charges for each 100ms for each GB of ram and GHz of cpu provisioned ( https://cloud.google.com/functions/pricing )

21:24 <kentonv> nope, workers is billed on request count only

21:24 <abliss> oh, nice, looks like CF wokers are "0.50 per million requests, with a $5 monthly minimum"

21:25 <abliss> so as long as i can keep the hanging get alive for at least 259 msec on average, i can be resident 24/7 for that $5 minimum

21:28 <abliss> AWS lambda seems like GCF: charges for duration, but with a free tier. i wonder why CFW pricing is so different, and whether it'll eventually converge? I guess the backend implementation model is quite different to goog/amazon, and maybe CF itself is a fundamentally different kind of company?

21:28 <kentonv> our implementation is massively more efficient than Lambda and GCF

21:29 <kentonv> we don't allocate a VM to run your function. We allocate a V8 isolate. Overhead is very low.

21:31 <kentonv> Lambda basically reserves a minimum of 128MB of RAM on one of their machines for each concurrent request. We don't reserve anything, we just allocate a V8 isolate wherever the request landed and run it. A typical isolate takes a couple megs of RAM.

21:31 <kentonv> and we let one isolate handle multiple concurrent requests

21:34 <abliss> and storage/ingress/egress/cpu/ram is all free/unmetered?

21:34 <abliss> that $5/mo seems like it might be competitive with matrix's own pay-for-hosting, Modular ($1.50 / active-user / month)

21:35 <abliss> if you have around 4-50 users. which describes most chat backends, i'd guess

21:35 <isd> I mean, you can get a vps for around that.

21:35 <isd> ...and not have to write a custom matrix server to run on it...

21:37 <kentonv> abliss, keep in mind CPU time per request is limited to 50ms, and RAM per instance is limited to 128MB

21:38 <isd> Unless your workload is very bursty or having to manage the OS is a big burden I think a VPS is probably a better fit.

21:39 <kentonv> FWIW I don't think you could write a chat app entirely on Workers today... there's no way to coordinate state between requests

21:39 <kentonv> (but I'm working on it...)

21:47 <abliss> there's no r/w persistent state at all? i guess i could use CF workers as the frontend to hold open the hanging gets, and do all the actual logic in google/amazon

21:58 <kentonv> there's an eventually-consistent KV store, but since it's eventually-consistent you can't use it for coordination...

22:10 <abliss> is there really no limit /metering on how much i store in the KV or how often i access it? what's a realistic propagation time for a write?

22:11 <kentonv> you are charged for KV operations and KV storage

22:12 <kentonv> propagation time can be several seconds. More importantly, there's no way to present concurrent writes from clobbering each other.

22:13 <abliss> ah, i found it, https://developers.cloudflare.com/workers/about/pricing/

22:17 <abliss> i could manually shard keys to prevent concurrent writes ... or either fetch out to a lockserver somewhere or figure out how to elect a leader internally. if one worker can poll the KV every few hundred MS and block until it detects a change, then another worker could write to that key to trigger an event. several-second latency might be acceptable for a chat app...

22:20 <abliss> why the $5 minimum, i wonder? is there a forum somewhere where 5 people who each have $1/month of workload can meet up and split costs?

22:22 <JacobWeisz[m]> There's probably some point where it's silly to charge a credit card for something. Arguably every account has a support cost burden of "this guy might have questions" too.

22:23 <kentonv> We don't like billing people less than $5 at a time. The payment fees become excessive.

22:23 <JacobWeisz[m]> I wonder if you could get around that with a rechargeable account system like my toll pass...

22:23 <kentonv> Trying to synchronize on KV will lead to tears and hair loss, trust me.

22:23 <abliss> why not offer a no-support $5/yr prepaid level?

22:24 <kentonv> we want to work on the product, not on the billing. :)

22:24 <JacobWeisz[m]> I still find it hilarious that AWS bills me like 79 cents a month or something. They probably get nothing.

22:24 <abliss> makes sense :)

22:24 <kentonv> that said there may be some changes in the works...

22:29 sknebel has quit [Ping timeout: 240 seconds]

22:32 <abliss> i don't suppose there's some secret header i can send to force my request to be routed to a particular worker?

22:32 sknebel has joined #sandstorm

22:51 <abliss> if I can manage to force two requests to hit the same ipv4 address, is it guaranteed (or at least likely) that they'll be routed to the same workers (or workers that are close-by, as measured in 99pctile KV write-propagation time)?

22:52 <kentonv> nope, not in the slightest

22:52 <kentonv> all our colos advertise the same addresses.

22:53 <kentonv> so a request to some IP could land on absolutely any machine in our fleet

22:53 <abliss> i see 3 different ipv4 addys when resolving my playground worker... it's the same 3 for all of them?

22:54 <kentonv> we have a lot of addresses, but your hostname will resolve to the same addresses no matter where you are in the world, and those same addresses are advertised by all of our locations

22:54 <abliss> gotcha, that's what i figured

22:55 <kentonv> it's intentional that you can't send a request to a specific location since we want DDoSes to be spread out. :)

22:55 <abliss> if a worker tries to fetch its own url, will it be routed back to itself?

22:56 <kentonv> no, it will be routed to your back-end.

23:04 <kentonv> the thing you're trying to do here is exactly the thing that I'm building a whole system to do so I'm pretty sure you can't do it yet. :)

23:04 <abliss> aw, no free tier for the KV. guess i'll pony up the $5 just to play around with it for a month

23:04 <abliss> i don't doubt you at all. but it sounds like a (doomed!) attempt to prove you wrong would be fun and educational for me :)

23:05 <kentonv> yeah unfortunately we kind of have to charge for KV because we use third-party storage backends that charge us...

23:06 <abliss> (and my alternative is to go try to contribute job-control to gvisor's pty impl, which sounds kinda dull and i'd rather wait a bit to see if they're about to implement it themselves :)

23:06 <abliss> (alternative in the sense of "my other current for-fun project", not that the functionality is at all related)

23:08 <kentonv> just don't burn yourself out on Workers right before I go and ship the thing that makes your project super-easy... :)

23:14 <abliss> oh, the playground doesn't let one bind a KV namespace anyway, so now i'll have to take the plunge and actually install the tooling...

23:15 <kentonv> it does actually.

23:15 <kentonv> assuming you made an account and you're on dash.cloudflare.com (not cloudflareworkers.com)

23:15 <abliss> oh! yeah i guess i had to do a hard refresh after buying the plan.

23:15 <kentonv> you can back out of the editor and then look at your worker, and there's a place where you can add bindings

23:16 <kentonv> ohhh, I see

23:17 <kentonv> I, uh, have a little bit of a debate with my team, where they really want to just delete the online editor and tell people to use the tooling because writing code online is Bad Software Engineering and we shouldn't let people do it, or something

23:17 <abliss> thanks for chatting about it. i'm gonna go have a think about what i could do with this architecture. this ends my hijacking of the sandstorm channel. :)

23:17 <abliss> (the online editor is crap for Software Engineering, but awesome for fast-onramp fooling around!)

23:18 <kentonv> yeah I keep trying to explain that...

23:18 <kentonv> also for trivial things like my e-mail catch-all implementation from last friday

23:18 <kentonv> sometimes you just need a really simple HTTP endpoint

23:23 kentonv has quit [Ping timeout: 260 seconds]

23:36 kentonv has joined #sandstorm

23:56 frigginglorious1 has joined #sandstorm

23:57 frigginglorious has quit [Ping timeout: 246 seconds]

23:57 frigginglorious1 is now known as frigginglorious