nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nemosupremo has joined #wallaroo
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
aturley_ has quit [Quit: aturley_]
nemosupremo has joined #wallaroo
puzza007 has quit [Ping timeout: 256 seconds]
nisanharamati has joined #wallaroo
aturley has joined #wallaroo
aturley_ has joined #wallaroo
aturley has quit [Ping timeout: 256 seconds]
nemosupremo has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
nemosupremo has joined #wallaroo
nemosupremo has quit [Ping timeout: 240 seconds]
nemosupremo has joined #wallaroo
nisanharamati has quit []
<nemosupremo>
Does the kafka connector start at OFFSET_NEWEST or OFFSET_OLDEST? And is there anyway to configure this?
<slfritchie>
nemosupremo: It starts at 0 or whatever is oldest
<nemosupremo>
gah that explains so much
<nemosupremo>
is there any way to configure this? or is it hard coded?
<slfritchie>
The wallaroo master branch may have a command line arg to change it, or it may be waiting on an unmerged branch, sorry, not quite sure
<slfritchie>
machida/.deps/WallarooLabs/pony-kafka/pony-kafka/kafka_config.pony looks like the place where such a setting would be, but a quick look at git tag 0.3.3 of that file doesn't have an obvious setting.
<slfritchie>
I was far to hasty, sorry.
<slfritchie>
too
<nemosupremo>
yeah I was trying to see how I could change, but I'm not even sure how pony works
<slfritchie>
I'm not sure which version of Wallaro you're using, a tagged release or master branch or other. If your local dependency repo for pony-kafka is tagged 0.3.3, then my very naive suggestion is to try a hard-coded hack to specify another offset, a bit like https://gist.github.com/slfritchie/6050408a3363ad301c1214921634f601
<slfritchie>
but choosing your own 4321 offset
<SeanTAllen>
nemosupremo: dipin is working on that now to add support for it not starting from 0 on restart. are you looking to do? we can work that into a version coming soon...
<SeanTAllen>
if you could open an issue for what you need to do, we can address it.
<SeanTAllen>
an issue is better than IRC. much eaier to keep track of.
<nemosupremo>
SeanTAllen I want to start at "offset newest" - I thought it may have been configurable.
<nemosupremo>
Not sure what the best issue is here, I think you would have to decide on how you wish for users to track offsets with the Kafka spout
<SeanTAllen>
so if i understand, this is when you first start up right? the topic has messages and you want to be able to pick a specific point to start from if you arent doing something like recovery from a crash or something like that, yes?
dipin has joined #wallaroo
<dipin>
nemosupremo: Hi... I'm catching up on the conversation but I can explain what's possible in the release version of wallaroo/kafka client and what has already been implemented in them that is unreleased if it will help.. also, do you need more control beyond being able to specify which offset (or newest) to start from?
<nemosupremo>
dipin: Today, all I want to to do is set to newest, for testing. I'm reading a live stream that likely has data that is weeks old. However, I've brought this up before, without any offset tracking there is no safe way to "upgrade" my wallaroo pipeline. With more control I could store my offsets somewhere else and restore them when I restart the pipeline
<dipin>
Completely understood.. let me take a look if I can come up with a quick patch/diff for you to apply that might be able to accomplish what you need at the moment...
<dipin>
I'll read through the archives to catch up on your previously raised concerns..
<dipin>
I'm currently working on kafka source recovery for wallaroo which might overlap with your need to be able to "upgrade" a wallaroo pipeline and restore offsets for when you restart the pipeline
<dipin>
nemosupremo: I have an idea for a hack that might work but it will likely take me a couple of hours until I'm able to test it out due to life stuff. I'll let you know once I'm done later tonight.
<nemosupremo>
dipin: I wouldn't make this an urgent priority because most of my work is still exploratory - so you can avoid hacks. I've mentioned this before but I'm evaluating alternatives to Spark for my team and while wallaroo seems like its 90% there, I don't think I will be deploying it seriously soon
<SeanTAllen>
nemosupremo: would love to know what the 10% is, you have my email, whenever you have time if you could send the 10% to me, it would be helfpul for us planning new work.
<dipin>
nemosupremo: Gotcha. Sorry if you've already mentioned this, but do you have a timeframe for your evaluation? Additionally, are there any specific features related to Wallaroo and Kafka that you need for your use case? The Pony Kafka client is slowly gaining all the features that the Java and C clients have but your feedback would be valuable to ensure we're prioritizing the appropriate features first.
<nemosupremo>
dipin: I don't really have a timeframe - spark "works" today, but I doubt it will work tomorrow so I've been researching alternatives. As far as features go, I can put those together, but given my needs and what I see in the API, I really just think its missing some documentation or actual production use cases (which I think gives it a lack of actual polish). For example, I think the fact that I can't read the kafka key, nor
<nemosupremo>
set the offsets is a little half baked. In a production setup, if wallaroo today were to crash and restart, or if someone deployed a change, I'd have to reprocess the entire history of the kafka topic.
<nemosupremo>
Even if the Kafka source didn't exist, I wouldn't actually mind writing my own mini service to feed data to TCP, but I think there would need to be a way to pass arguments directly to the TCP source, so theres a way to solve that problem
<SeanTAllen>
Pass what sort of arugments to the TCP source nemosupremo ?
<nemosupremo>
SeanTAllen: Anything really. Let's say instead of the kafkasource, I used a TCP source which was a program that read from kafka and streamed to wallaroo. If wallaroo could send me a pipeline-id, the kafka program could say "hey pipeline-foo, you are at offset 200, so here are messages 200+"
<nemosupremo>
Like an optional handshake from wallaroo to the tcp source
<SeanTAllen>
O so, a way to have a recoverable TCPSource? Yeah we've had discussions about that.
<nemosupremo>
That way people who are willing to get dirty could write their own sources until non-pony source API is ready
<SeanTAllen>
what i was thinking was a way to extract the "message id" from incoming messages and be able to return when its acknowledged to the sender.
<SeanTAllen>
not every message ack'd but something like "everyhting up to id X" has been handled.
<SeanTAllen>
thoughts on that?
<SeanTAllen>
acking every message would be painful