<headius[m]>
Gitter is semi-official but we aren't huge fans because it's not clear that it will survive
<headius[m]>
and obviously matrix is more OSS friendly
<rtyler>
did you see that blog post a couple weeks ago about mozilla looking to move away from IRC?
<rtyler>
I think Matrix is a very safe bet, I just wish there were more clients running around, Riot is really the only sensible game AFAICT
<headius[m]>
yeah I guess Pidgin also works
<headius[m]>
enebo is using it
<rtyler>
I'll take a crack at it
<rtyler>
to give you an update on Spark, I've not had the time to really dive back into it, but I've got a couple different places where either a `RubyObject` doesn't serialize and deserialize properly, or it _appears_ that Ruby<->Java invocation is not mapping correctly
<headius[m]>
Ok maybe we can sort of pair on it some time
<headius[m]>
the readObject side of serialization will use Ruby.getThreadLocalRuntime() to get the runtimein which to create objects
<headius[m]>
hmmm
<headius[m]>
it occurs to me now that you need to set that
<headius[m]>
Ruby.setThreadLocalRuntime
<headius[m]>
if none is set it raises a Java IOException from readObject
<headius[m]>
for some reason I was thinking it used getGlobalRuntime
<headius[m]>
which should be the first one created if not set
<rtyler>
hrm, I'll have to do some tinkering later to figure out how to set that. At the current time there's not really any control I've inserted on the deserialization part of the equation
<headius[m]>
somewhere, someone on the receiving side needs to set that
<headius[m]>
you can set it from within Ruby if you like, as long as that runs before you start receiving objects on that end
<headius[m]>
so just org.jruby.Ruby.thread_local_runtime = JRuby.runtime
<rtyler>
well, that snippet I can give a try right now
<rtyler>
nice dice, still ClassNotFoundException. I'm not sure the approach you describe would work anyways in a scenario where the master and worker JVMs are two different processes
<headius[m]>
well any JVM receiving serialized Ruby objects will need to do this on the receiving threads
<headius[m]>
master or worker
<headius[m]>
this also has not been revisited in years...it may be there's a better compromise for Java serialization of Ruby objects these days
<rtyler>
my difficulty in understanding is that setting the thread local runtime wouldn't matter because the entire JVM deserializing would be different
<headius[m]>
that JVM needs to set it too
<headius[m]>
any JVM that's deserializing needs to set this to get Ruby objects back
<headius[m]>
any/all
<rtyler>
I thought the problem was that these objects are being tagged with these generated class names, e.g. java.lang.ClassNotFoundException: org.jruby.gen.BeeForeach_601729979 and that generated class name simply wouldn't exist in the deserializing JVM?
<headius[m]>
for that case it's failing to find the class to even construct it...that would be happening before the deserialization of the object's data
<headius[m]>
answer me a question about those objects... why do data objects in Spark need to implement interfaces?
<headius[m]>
or whatever these objects are for
<headius[m]>
that's essentially the problem here...because the data object you're sending has to implement a Java interface, it's not just data...it's also got code associated with it
claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<rtyler>
in a meeting sorry, be back in 30
claudiuinberlin has joined #jruby
<rtyler>
headius: these objects are what are referred to as RDDs in Spark, which is described in great detail here: https://spark.apache.org/docs/latest/rdd-programming-guide.html Basically it's like a chunk of data and the code to do some partition of the work on that data, such that this can be distributed across the spark cluster
claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<headius[m]>
rtyler: ok so the easiest way to do this then would be to have some normal Java class that implements the interfaces and lives on both sides, and have that aggregate the ruby stuff
<headius[m]>
I mean easiest where "best" would be getting the Ruby impls of Java interfaces to serialize
<headius[m]>
presumably "chunk of data and the code" would still mean the worker has to have a copy of that code for Java, no?
<headius[m]>
yeah so if this is using Java lambdas, they don't send the code on either
<headius[m]>
whew
<headius[m]>
so you still need to have the code you intend to run against the RDD live on all JVMs
<rtyler>
I am a bit fuzzy on the details to be honest on what might get serialized and sent around. I _believe_ that the expectation is that your Spark .jars are available in the classpath for all JVMs in the cluster