SeanTAllen changed the topic of #wallaroo to: Welcome! Please check out our Code of Conduct -> https://github.com/WallarooLabs/wallaroo/blob/master/CODE_OF_CONDUCT.md | Public IRC Logs are available at -> https://irclog.whitequark.org/wallaroo
_whitelogger has joined #wallaroo
_whitelogger has joined #wallaroo
aturley has quit [Ping timeout: 272 seconds]
aturley has joined #wallaroo
fabianski has joined #wallaroo
<fabianski> Heya! Is there any documentation about a deployment or updating process of wallaroo and wallaroo applications? Especially in terms of data persistency when restarting wallaroo?
pzel has joined #wallaroo
<pzel> fabijanski: We have some examples of deploying with pulumi+ansible under https://github.com/WallarooLabs/wallaroo_blog_examples/tree/master/provisioned-classifier but that project sets up a one-shot computation pipeline and then shuts it down.
<pzel> So likely not what you're after.
<pzel> Which uses the relatively fresh resilience functionality to recover from induced crashes.
<jtfmumm> The existing documentation for resilience is here: https://docs.wallaroolabs.com/operators-manual/resilience-crash/.
<pzel> Maybe if you could describe your use case a bit more closely, we could suggest something more specific?
<pzel> Usually, if you're changing internal protocols, you're likely better off starting a new cluster with the new version of your data and switching over the data sources.
<fabianski> thanks for the links :)
<fabianski> rn we're thinking about replacing an existing own fw with a public one. Our usecase is parsing information out of emails and aggregating different information.
<pzel> fabianski: sounds interesting! By use-case here I was hoping to find out more about your setup: are you running on hardware, containers, etc? And what are the uptime requirements?
<pzel> because if you want to upgrade Walaroo workers in-place, you'll have to consider backward- and forward- compatibility in your processing functions
<fabianski> ah sorry! rn I do not have any setup :D just local try-outs. One approach that crossed my mind are containers (aws ecs/fargate clustering). We can accept down times in between deployments since we have a (almost) infinite queueing mechanism (aws sqs/s3) in front of the data processing.
<pzel> Oh, in that case do take a look at the `provisioned-classifier` examples with ansible. They're a bit ugly because of `pip-install`ling on the production nodes, but apart from that they can set up a cluster in single-digit minutes.
<pzel> We also use pulumi (www.pulumi.io) to provision machines, but this is orthogonal to deploying the cluster w/ansible. You can deploy on top of existing infrastructure by changing ansible/inventory.yml
<fabianski> Cool, yeah I was looking at it just now. Thanks that might help :)
<pzel> Good luck and keep us in the loop. A lot of this stuff is work-in-progress, and we definitely appreciate open source contributions :)
<fabianski> And another question :) There is nothing like pipeline configuration management in wallaroo right now, is it? Like where I can predefine workers and replace or append them in/to the a certain pipeline without reconfiguring/restarting everything?
<SeanTAllen> Correct. Pipelines are currently static and can't be changed at runtime.
<fabianski> Can you tell if sth like that is on the roadmap? :)
<SeanTAllen> It's not something there has been great demand for and is a rather large undertaking. So, at the moment it isn't on the roadmap. We do work with folks on a commercial basis for add features that they want. If that was a route you wanted to go.
<fabianski> That's fair, thanks :) Probably not, but I'll get back in case! thanks for the help!
<SeanTAllen> you're welcome
pzel has quit [Read error: Connection reset by peer]
fabianski has quit [Ping timeout: 256 seconds]