#sandstorm on 2020-05-14 — irc logs at freenode.irclog.whitequark.org

2019-09-28 22:33 kentonv changed the topic of #sandstorm to: Welcome to #sandstorm: home of all things sandstorm.io. Say hi! | Have a question but no one is here? Try asking in the discussion group: https://groups.google.com/group/sandstorm-dev

01:04 _whitelogger has joined #sandstorm

01:21 <JacobWeisz[m]> Some of the docs rewrite stuff that scares me is just straight architectural... Sandcats has both an HTTPS page and a Dynamic DNS page. Both are potentially useful and valuable info, but realistically a lot of that should be simplified down into the general HTTPS stuff...

01:21 <JacobWeisz[m]> Sandcats now is only really distinct in its providing dynamic DNS now.

01:39 <JacobWeisz[m]> I kinda want to eliminate the Sandcats HTTPS page entirely under that concept.

02:11 kentonv has quit [Quit: Leaving]

02:23 frigginglorious has quit [Ping timeout: 256 seconds]

03:13 _whitelogger has joined #sandstorm

03:55 _whitelogger has joined #sandstorm

05:15 frigginglorious has joined #sandstorm

05:30 <isd> So I started fussing with porting the WIP filesystem LD_PRELOAD library I'd started over to Rust. I think we could sensibly include logic to handle the pty stuff in the same library.

06:08 frigginglorious has quit [Remote host closed the connection]

06:08 frigginglorious has joined #sandstorm

06:17 frigginglorious has quit [Ping timeout: 265 seconds]

06:21 frigginglorious has joined #sandstorm

06:26 frigginglorious has quit [Ping timeout: 260 seconds]

06:31 frigginglorious has joined #sandstorm

06:52 wings has joined #sandstorm

07:47 frigginglorious has quit [Ping timeout: 265 seconds]

11:10 <vertigo_38> Hi again! I'm just inviting friends to test my sandstorm installation -- which brings me to a point, which is completely different from a 'normal UI' (subject to be proven wrong ;)). What I'd like to do would be 2 things. The first is some kind of space quota (I see that the concept of sandstorm alone drastically reduces storage needs, but I at least want to control, how much my people can put into davros eg.). The second is,

11:10 <vertigo_38> authentication tries are being logged -- neither in nginx, nor sandstorm itself. Authentication itself is done via a docker-ldap-container on the sandstorm machine itself and works nicely in any other regard.

11:11 <vertigo_38> (Sandstorm login reacts to wrong credentials with 'wrong user/password' at the login interface -- that's what I'd like to track with fail2ban)

11:39 <vertigo_38> (more exactly -- if the tried username does not exist, I get 'User not found in LDAP', if I enter a proper username but the wrong password, I get 'invalid credentials')

11:41 <xet7> vertigo 38: Sandstorm shows space usage. If you limit for example Wekan grain space usage, so Wekan disk space gets full, it does corrupt Wekan MongoDB database.

11:42 <xet7> vertigo 38: You have some disk space monitoring script that checks there is enough total free disk space

11:44 <xet7> vertigo 38: So your docker-ldap-container does not log login attempts? You could check with "docker logs CONTAINER-ID"

11:44 <xet7> vertigo 38: And check your docker-ldap-container software docs how to add logging

11:44 <xet7> vertigo 38: Also check /var/log/syslog

11:47 <xet7> vertigo 38: for davros space usage, probably that would be davros feature request in davros issues

11:47 <xet7> vertigo 38: usually in davros there is static files, not databases that can corrupt

11:50 <xet7> vertigo 38: Maybe it is related to this https://github.com/mnutt/davros/issues/2

11:51 <xet7> vertigo 38: But here is something about quota feature https://github.com/sandstorm-io/sandstorm/pull/567

11:54 <xet7> hmm, if seems that reqular Sandstorm (non-Blackrock) does not have good way for checking quota yet

11:55 <xet7> Actually, there are many existing quota issues https://github.com/sandstorm-io/sandstorm/issues?q=is%3Aissue+is%3Aopen+quota

11:55 <xet7> maybe some of those is most appropriate for this

11:57 <xet7> vertigo 38: You can read those issues, and ask there about status

11:57 <xet7> on one issue that is most appropriate for this

11:58 <vertigo_38> xet7: thank you for the links & sorry for not digging enough!

11:59 <xet7> No problem, thanks for asking :D

11:59 <xet7> It's not required to dig so much

11:59 <xet7> Anyone is welcome to as anything

12:00 <xet7> ask

12:18 <vertigo_38> I think I'm coming off the idea running ldap as docker container -- if I tail -f its logs I see with which username somebody wants to log in, but I cannot link this directly to nginx's access log. I think at least in this regard my setup is quirked ;)

12:27 <JacobWeisz[m]> vertigo_38 displaying storage usage of users to the admin is a pretty straightforward feature, one we've talked about a lot. I think it probably isn't far off.

12:38 <vertigo_38> JacobWeisz[m]: thanks for the outlook!

12:40 <JacobWeisz[m]> I think Sandstorm might try to enforce quotas set in LDAP, but I don't have experience with that.

12:42 <JacobWeisz[m]> Sandstorm Oasis definitely had quota management but Blackrock uses a different storage backend than normal Sandstorm, so the functionality isn't 100% identical.

12:53 <vertigo_38> Quota is not that dramatic for now, if I tell my friends not to upload their movies into davros, I trust them not to do so ;), and if so, it is so. For now I'd mainly like to get fail2ban watch over the login interface to sleep better... In case we implement sandstorm in our lab, quota would be really nice (as it's most likely me watching over storage ;)).

12:55 <JacobWeisz[m]> Yeah, I have some users on my Sandstorm instance and I just feel like I don't know if they're using a bunch of storage stuff or not.

12:55 <JacobWeisz[m]> Nobody uses it as a major resource but me, but I dislike the current invisibility of that info.

12:57 <JacobWeisz[m]> Generally, Sandstorm's position has been that it doesn't do logins. Especially if you're doing LDAP or the like, it's assumed your login provider should be managing account lockout or logging or whatever.

13:00 <JacobWeisz[m]> Email login assumes you're well-managing the email accounts, someone who can access email can access Sandstorm. Google and GitHub both have features to show login failures and logged in sessions and such. LDAP and SAML generally are used with platforms that can do lockouts and logging and such.

13:00 <JacobWeisz[m]> Sandstorm avoids writing a lot of account security features by only permitting login strategies that are capable of that externally. It's why it's never offered a straight username/password option.

13:04 <vertigo_38> I like that approach, actually! I'm just hunting the connection where sandstorm transmits the login-info to my LDAP on the sandstorm localhost.

13:08 <vertigo_38> And wonder whether I can there find the attempting IP together with which username was tried...

13:27 wings has quit [Ping timeout: 260 seconds]

14:57 <vertigo_38> A resource-wise question -- do you think it's feasable thinking sandstorm as user interface for <= 100 simultanous users (students, teachers, staff)? We currently think about setting up a FOSS VM server in our institute and discuss designs...

14:59 <abliss> That's a great question and I would also love to know the answer. I wonder if kentonv has any insights he can share about the largest installs, what kind of hardware they used, and how well they performed?

15:00 <abliss> My uninformed guess is that 100 simultaneous users would probably make you the largest install ever (or possibly 2nd largest behind alpha.sandstorm.io)

15:00 <vertigo_38> abliss: you scare me

15:01 <vertigo_38> :)

15:02 <kentonv> alpha.sandstorm.io isn't actually used by many people. oet.sandcats.io might be the biggest self-hosted instance

15:04 <kentonv> I believe Oasis got up to 300 concurrent users at times.

15:04 <kentonv> of course, Oasis was using a "more scalable" architecture, but TBH it would have been fine on a beefy machine

15:04 <kentonv> get a VM with like 128GB of RAM and you're probably good

15:11 <vertigo_38> Thanks, that sounds like sane numbers. I doubt that we will have the 100 concurrent users very often, but that's would we have to be able to offer. I could imagine 70 students using jupyternotebooks during lessons to be the max load.

15:12 <JacobWeisz[m]> It might not hurt to clock how much RAM Jupyter is using in a grain and do some napkin math.

15:12 <JacobWeisz[m]> Obviously RAM usage of Sandstorm is heavily dependent on what apps people are running.

15:15 <vertigo_38> How can I clock a single grain's RAM usage?

15:54 <abliss> that's another great question. and the napkin math may get tricky because some of a grain's RSS should be shared with other grains of the same app.

15:55 <abliss> Simplest might be to log into the server and run "free", then start the grain and run "free" again, then start a second copy and run "free" again. Maybe repeat up to n=5, then graph and try to extrapolate?

16:01 <vertigo_38> I just tried to 'crash' the system with some python code I found on stackoverflow ~['A'*1024 for _ in xrange(0, 1024*1024*1024)]~ from within the jupyter notebook, but did not succeed... As soon as memory limit (including swap) on the vm is reached the 'python-kernel' dies. If that's the result of an overload I can live with it. Still it feels a little on the edge

16:01 <kentonv> yes, when the system runs out of memory, the kernel chooses something to kill. If there's one process eating all the memory, it's probably going to kill that.

16:03 <abliss> try this one too: https://www.reddit.com/r/tinycode/comments/iqaub/one_line_fork_bomb_in_python/

16:04 <abliss> might also be fun to try to open an infinite number of filehandles to see what happens on the machine.

16:05 <kentonv> you can probably DoS the machine if you try. Sandstorm doesn't do a whole lot to prevent this.

16:11 <isd> ...we should probably at least be creating cgroups for grains

16:13 <vertigo_38> that forkbomb seems to work nicely. rebooting the machine now through my vps panel

16:13 <isd> Yeah, cgroups might help with that, as it would allow the kernel to understand that the whole grain should be treated as a unit.

16:15 <kentonv> as of Linux 4.6 we finally have cgroup namespaces so we may actually be able to use cgroups without being root now?

17:03 vertigo_38 has quit [Ping timeout: 256 seconds]

17:17 vertigo_38 has joined #sandstorm

17:48 frigginglorious has joined #sandstorm

18:29 <abliss> I'm continuing to read up on pseudoterminals. I still don't see why they are handled by the kernel. it seems like it's just a fancy bidirectional socket with a bunch of special ioctls. (handling of ctrl-z and ctrl-c are often mentioned as special but i don't see why). It seems like it would be possible to implement one using cuse (character device in userspace, similar to fuse), but that probably has its own security

18:29 <abliss> issues.

18:31 <abliss> doing a glibc hack and/or a LD_PRELOAD seems straightforward but a bit unpleasant. I suppose you'd create a socketpair to act as the master and slave devices, then intercept every ioctl() to check if it's one of your special FDs, and then you'll have to implement all the weird terminal ioctls yourself (either in-band, by wrapping each message that gets sent through the socket, or out-of-band by setting up a separate

18:31 <abliss> control-plane socketpair)

18:34 <abliss> oops wait, ioctl is a system call, not a glibc function. so custom glibc and ld_preload seem impossible, and you'd have to do some ptrace or BPF?

18:35 <isd> I mean, most programs will call into the glibc wrapper, same as with open() and friends.

18:35 <kentonv> most people use the glibc wrapper, so usually you can intercept it with LD_PRELOAD -- as long as the call isn't coming from glibc itself, and as long as the app isn't statically linked

18:37 <isd> It might make sense to integrate this into the existing LD_PRELOAD thing that I started working on for the filesystem stuff. This way we can share a lot of the "shadow fd table" logic and such.

18:37 <abliss> yes indeed

18:38 <abliss> though i'm pretty curious why nobody seems to have tried this (or even considered it) before... though perhaps my google-fu is lacking; there are a bunch of adjacent concepts with overlapping keywords

18:39 <isd> I mean, how common is it for ptys to not actually be available, yet still needed?

18:40 <isd> I'm not totally surprised no one has done this given the likelyhood of needing it and the annoyance of building it.

18:41 <abliss> given the history of security issues in the tty layer, and the trend towards containerization as a security boundary to allow shared compute resources, i would think that a google or amazon would want to look at moving ptys out of the kernel and into userspace?

18:42 <abliss> actually that make me wonder what microsoft has done with ptys in the the linux subsystem for windows

18:42 <isd> I mean, it's rare for server apps to even need them?

18:44 <abliss> in the google cloud platform you can click a button in your browser and be insantly dropped into a disposable shell session on a small temporary machine. it seems to have a very capable pseudoterminal (e.g. emacs and screen work fine). so is that a "real" devpts running in a real kernel on a real machine somewhere? or is it in an entirely emulated kernel?

18:46 <kentonv> I don't think Google trusts containers for security. It probably runs in gvisor.

18:46 <abliss> though i guess, now that i think about it, kubernetes doesn't support ssh into containers. you have to do 'kubectl exec' to get a shell, and it seems to lack a lot of pty functions...

18:48 <abliss> looks like docker uses a real pty for "docker attach" https://iximiuz.com/en/posts/linux-pty-what-powers-docker-attach-functionality/

18:48 <isd> Yeah, I haven't talked to anyone who seriously thinks it's a good idea to rely on docker for security isolation. At best it's an extra layer to punch through, defense in depth and all that.

18:49 <isd> Even in its own namespace, the kernel's default attack surface is just way too big.

18:54 <abliss> would gvisor work in a grain, or does it require cap_sys_admin to set up a gvisor sandbox? if the latter, perhaps gvisor contains a pty impl that we could steal...

18:54 <kentonv> gvisor is a virtual machine engine. It probably doesn't even work inside other VMs.

19:00 <abliss> here's a bunch of go code that claims to implement 'line discipline' (which i still don't understand what it is) https://github.com/google/gvisor/blob/696feaf10c9339a57d177a913e847ddb488ece69/pkg/sentry/fs/tty/line_discipline.go

19:04 <isd> We could potentially just yank that whole-sale and wrap it in a capnp server.

19:05 <isd> and just have the preload lib redirect calls to it.

19:06 <abliss> yeah, i'm wondering which impl of line discipline and terminal codes will be easier to tease out of its parent project: the C code out of the linux kernel, or this go code out of gvisor

19:09 <abliss> (looks like gvisor can actually attach using ptrace, as an alternative to KVM, so maybe it could work inside a grain?)

19:09 <isd> I think we block ptrace?

19:10 <isd> Ideally we'd find a way to do this that doesn't involve giving grains more attack surface.

19:10 <kentonv> we definitely block ptrace, it has had all kinds of security issues :)

19:11 <abliss> doesn't look like it

19:12 <abliss> i seem to be able to run 'strace -f' inside a grain though? or maybe it only works in 'spk dev'

19:12 <isd> IIRC we relax things a bit for dev mode.

19:13 <isd> Yeah, we allow some ptrace stuff in dev mode. Have a look at supervisor.c++

19:16 <mokomull> abliss: ctrl-Z and ctrl-C are "special" because the terminal end of the pipe puts a byte 0x03 or 0x1a in the pipe, but that byte doesn't come out the other end: a signal is sent instead. Likewise with, e.g., backspace, ctrl-U in cooked-mode.

19:18 <abliss> yep, confirmed, it works in spk dev but not after pack/upload. is there an easy way to get a 'non-relaxed' sandbox to play around in?

19:19 <abliss> kentonv: you mentioned seccomp-bpf in the original email thread. is that allowed inside prod grains? (it's hard for me to believe that seccomp-bpf could be less of a risk than ptrace, but i'm quite ignorant of both)

19:20 <isd> We block that too. Indeed, the configuration end of seccomp-bpf has had some vulnerabilities in the past.

19:22 <kentonv> we of course *use* seccomp to implement the sandbox, but yeah, we don't let grains use it

19:23 <isd> Wouldn't it be nice if the kernel had sandboxing mechanisms that compose...

19:27 <abliss> https://gvisor.dev/docs/user_guide/filesystem/ seems similar in spirit to your filesystem-over-capnp design. and gvisor also has some checkpoint-restore stuff which could be useful for the quick grain startup you mentioned on the last call.

19:29 <isd> I'm sure we couldn't adapt the checkpoint-restore stuff from gvisor without loosening sandstorm's sandbox.

19:29 <isd> ...and I still think it's just not worth the extra complexity

19:30 <isd> One thing I hit working on the fs bits of the LD_PRELOAD is that it's a bit hard to figure out what to set errno to in some cases.

19:31 <isd> I wish capnp either had structured data in exceptions or some way to check in-band error codes without giving up pipelining.

19:32 <isd> kentonv: ^

19:33 <isd> How would you feel about adding a field to rpc.capnp's exception that could be use to attach non-Text data?

19:35 <kentonv> https://github.com/capnproto/capnproto/blob/master/style-guide.md#exceptions

19:35 <kentonv> TBH I'd feel better about extending pipelining to support conditional results in some way.

19:36 <isd> Yeah, I kinda like that better too... It's just a lot less obvious to me what it would look like.

19:37 <kentonv> oh but you're specifically trying to tunnel an errno code?

19:37 <isd> Or just enough information to reconstruct one, yeah.

19:37 <kentonv> so it's really about being able to convert the error to a different format, not about being able to handle errors

19:38 <kentonv> I mean, not about being able to trigger different logic based on different errors

19:38 <kentonv> I actually do think KJ exceptions need to support that

19:39 <kentonv> but I've never been able to come up with an approach I liked

19:39 <kentonv> in the Workers runtime, we literally prefix error strings with e.g. "cfjs.TypeError: " to say "if this gets thrown to JavaScript, turn it into a JavaScript TypeError"

19:40 <kentonv> which is an awful hack

19:40 <isd> Yeah, that occured to me; I could just put e.g. ENOENT at the front of the string. But yes. that's horrible.

19:41 <isd> What about just adding two fields to the rpc exception, a typeId and an AnyPointer, the former indicating the type of the latter?

19:43 <isd> I guess this is a little less clear for the C++ implementation because kj::Exception is theoretically not dependent on capnp?

19:43 <kentonv> I think to be convinced of any design here, I would need to look at a lot of use cases to verify that it fits and that it doesn't lead to abuse.

19:44 <isd> Yeah, I think fundamentally this is the kind of thing where if you were doing a synchronous API, it should just be an error code.

19:45 <isd> So figuring out how to do pipelining on conditionals is "The Right Thing."

19:47 <kentonv> if you pipeline on a pointer, and it turns out to be null, then the pipelined call will fail, and upon detecting that failure you can perhaps go back and check the result of the earlier call to see if it had an error code?

19:48 <isd> I mean I guess that would work. But it feels both more error prone and like even more of a hack than abusing exceptions for "everyday" errors...

19:50 <isd> I dunno, maybe it wouldn't be that bad. I should experiment.

19:50 <isd> But it seems like it doesn't really solve the conceptual wrongness of using exceptions for error handling; you're still doing that, just having to check the data for it out of band...

19:52 <kentonv> it's difficult because errnos are *usually* used for reporting only but *occasionally* people trigger logic based on them. So they *usually* map to KJ exceptions but... not quite always

19:54 <isd> So I'm envisioning an api where you can pipeline on a union variant, and it fails if it's the wrong variant.

19:54 <isd> This would also allow you to pipeline different branches, and be sure only one of them will actually execute. You just make the calls and then wait for the original union and check which result you should use...

19:55 <isd> Haven't thought it through deeply though.

19:56 <kentonv> hmm, "ideally" you'd make one call that somehow returns a union result, but I don't know what the API would look like

19:56 <isd> The protocol upgrade for that seems hairy though; what do you do if the receiver doesn't know how to dispatch like that?

19:57 <isd> (for my thing that is)

19:58 <kentonv> well, what we're talking about here is adding new operations to `PromisedAnswer.Op` (in rpc.capnp)

19:58 <kentonv> I'm trying to see what the RPC code does if it receives an unknown op

19:59 <isd> The Haskell implementation returns an exception.

19:59 <isd> Maybe it should return an unimplemented message instead?

20:00 * isd looks to see what the Go implementation does

20:01 <kentonv> unfortunately it appears the C++ implementation throws an exception. I was hoping for an Unimplemented message, yeah.

20:01 <isd> ...and that brings us back to square one of not being able to tell why an error happend...

20:02 <isd> The Go implementation also throws an exception.

20:03 <kentonv> well... that was clearly a mistake on my part, not creating a way to introduce new pipeline ops with the ability to fall back if unimplemented

20:06 <isd> I hate this idea, but it would probably at least work: we could add a new _message_ type, that's basically just 'Call, but you understand what to do with unknown Ops. <long paragraph about historical design mistake>.'

20:07 <isd> ...and we'd say regular "call" is only allowed to use the existing options...

20:10 <kentonv> eh, I don't know if it's worth working that hard to accommodate existing implementations, vs. just saying "you gotta upgrade Cap'n Proto before you can use this protocol"...

20:12 <isd> Yeah, I guess we could just document it with the caveat that you shouldn't use newer pipeline ops with implementations that don't do the the right thing here.

20:12 <isd> Just expect a message.unimplemented for unimplemented things and if the receiver throws an exception we just treat it as normal.

20:13 <kentonv> I mean it may be worth fixing the implementations so that they treat unknown pipeline ops reasonably going forward

20:14 <isd> Yeah. But we should also probably mention in docs somewhere that older versions of implementations might have this problem.

20:14 <kentonv> right

20:15 <isd> I guess the good thing about having a fairly small number of prod quailtiy rpc implementations is that tracking them all down is doable.

20:15 <isd> I'll put updating the haskell implementation on my TODO list then.

20:15 <kentonv> I mean... it's ugly, but we _could_ also string-match the known error messages for unknown pipeline ops from the known implementations... there's not many of them

20:16 <isd> Yeah, but we'd have to impose that burden on every implementation.

20:17 <isd> It also means you could trigger the library re-trying a method call erroneously by throwing an appropriate exception from user code

20:18 <isd> Like, I'm imagining bugs where somebody shoves a malcious string in a long and it ends up in an exception messsage, and causes some side-effect to happen twice...

20:18 <isd> s/long/log message/

20:24 <isd> Going afk for a bit.

20:47 <abliss> I wonder if User-Mode Linux could be configured to export/provide its pty implementation, while allowing other syscalls to go directly to the host OS for performance.

21:25 frigginglorious has quit [Read error: Connection reset by peer]

22:22 vertigo_38 has quit [Remote host closed the connection]

22:23 vertigo_38 has joined #sandstorm

23:33 prompt-laser has quit [Quit: Connection closed for inactivity]

23:47 xet7 has quit [Quit: Leaving]

23:51 xet7 has joined #sandstorm