<headius[m]>
yeah cl4es messaged me later and we paired on some code spelunking
<headius[m]>
I think this is the root problem
<headius[m]>
you can see the closing in URLClassLoader.close, and via the openConnection logic it eventually uses a JDK-global cache of JarFile instances
<headius[m]>
so blammo, eventually it closes a JarFile that's in the cache
<headius[m]>
this is the cause of all the exceptions in that bug, plus all our sporadic "zip file closed" errors in CI
<enebo[m]>
Atilla mentioned making his own URLCL? Is that too big an idea?
<headius[m]>
I'd much rather find one we could reuse because this isn't simple stuff
<headius[m]>
I'm actually not hating the idea of not closing URLClassLoader
<headius[m]>
I don't really get why it's doing all these gymnastics to cache JarFile instances anyway
<headius[m]>
the right fix, however, is probably to avoid ever using ClassLoader.getResourceAsStream or URL.openStream directly
<headius[m]>
this is really shockingly broken
<enebo[m]>
we have a need to caching presumably because we do not require all files within a jar (like warbler or our kernel files)
<headius[m]>
so there's two caches involved here
<headius[m]>
inside JarResource there's a static reference to JarCache, which was contributed to us by a user
<headius[m]>
that cache takes a jar path string and builds a cache of JarEntry that's faster to search than manually walking entries
<headius[m]>
it's used anywhere we use JarResource.openWhatever
<headius[m]>
that does not appear to be broken, at least not exactly
<headius[m]>
the cache that's broken is internal to OpenJDK... whenever you request a resource from a jar using either a URL or a URLClassLoader, it eventually calls into a global cache mapping jar URLs to JarFile instances
<headius[m]>
to reuse those JarFile for... reasons?
<headius[m]>
but URLClassLoader also adds the jar files it encounters into a list of things to close
<headius[m]>
I just don't understand how this hasn't been reported and fixed years ago
<enebo[m]>
but there are in fact two separate things in a jar for us as well
<enebo[m]>
classfiles which presumably somehow is the reason for some deep cache
<enebo[m]>
and ordinary files (e.g. .rb)
<headius[m]>
the cache we make doesn't get used for classloading
<headius[m]>
it's only used for things like globbing files from a jar
<enebo[m]>
I guess us reading all ordinary ones into memory and closing/not accessing from the jarfile is probably not a solution because it would be time consuming
<enebo[m]>
oh yeah I got that
<headius[m]>
I think classloader resources may at some level also be going through this jar file cache
<enebo[m]>
I am just trying to think about why we actually hit the error and it is to access these resources and not actual classfiles
<headius[m]>
yeah seems like all the errors I saw were loading ruby scripts from inside the jar
<enebo[m]>
I am surprised there is no bug parade on this
<headius[m]>
I think I saw a case where it was looking up a class, but it was opening the class as a resource to see if it was there
<headius[m]>
so again a resource access
<enebo[m]>
yeah if we could know a jar is loaded we could reload it with our own jar code and then all requiers would grab from that...seems far fetched
<headius[m]>
the logic inside URLClassLoader and on down for opening these connections is just stupid complicated
<enebo[m]>
I imagne it ends up being a stream at some point so not super useful...and I doubt we want to read the entire resource stream into memory
<enebo[m]>
I mean a second time
<enebo[m]>
and sequentially
<enebo[m]>
even forget that this is about concurrent interaction
<enebo[m]>
that will not even work because it might be closed by the time you try it
<headius[m]>
yeah single-thread sequential access will also break
<headius[m]>
it's totally broken
<headius[m]>
like if you use two URLClassLoader to read the same jar, and you close them, one will break
<headius[m]>
it's that simple
<enebo[m]>
Too bad there is no concept of dynamic shading
<enebo[m]>
Or not really shading but something which confuses the lowest level into thinking they are in fact different jars
<enebo[m]>
like adding something to the url which is not stripped off via normalization
<headius[m]>
we could copy it to a temp location :-)
<enebo[m]>
possibly it is a good solution for warbler at least
<enebo[m]>
we could try and generalize it but I don't know
<enebo[m]>
So I guess though this is purely a problem with emebdding
<enebo[m]>
I would think most warbler people would not start up n runtime instances in one runtime but startup one with n threads
elia has joined #jruby
elia has quit [Client Quit]
<headius[m]>
warbler's a good example... I have never gotten a good picture of which servers, or which versions of those servers, unpack the war to a temp location
<headius[m]>
but we've gotten a few reports recently about FFI-based libraries that have native lilbs
<headius[m]>
they clearly don't work from within a jar
<headius[m]>
but FFI would have to know the file URL it was given to open as a library is in a jar and needs to be unpacked, etc
<headius[m]>
could try to load the file into shared memory somewhere and dlopen it as a pointer?
<headius[m]>
madness
<headius[m]>
anyway yeah the typical case when you control all the threads would be to use same instance
<headius[m]>
but I think there are a lot of apps out there using JRuby as a scripting language for a larger app, so they separate it by runtime or ScriptingContainer
<enebo[m]>
yeah so we see this in testing via multiple instances in same process and the people reporting are embedding use cases
<enebo[m]>
Just thinking through impact of this mostly
<enebo[m]>
So my comment about warbler maybe is not relevant
<enebo[m]>
but if we could rename jars somehow on each script container load then it might eliminate this problem and create a new problem
<enebo[m]>
since we would have the same jar possibly loaded hundreds of times
<headius[m]>
yup exactly
<headius[m]>
I never put all these cases together until this guy came up with a pocket reproduction
<enebo[m]>
Seems like scripting container with jar resources is just a problem we might be stuck with until JVM actually fixes it
<enebo[m]>
Our position will be to not store resources in a jar due to this I guess?
<enebo[m]>
for some applications that just means exploding those resources to some runtime location
<enebo[m]>
That sucks of course but the way you have painted this I am not seeing a lot of recourse
<enebo[m]>
Another solution would be for apps to not load multiple containers but use them in one of our threaded modes
<enebo[m]>
(as a suggested fix)
<headius[m]>
well, all of jruby-complete is resources in a jar
<enebo[m]>
that really only helps if you do not require isolation
<enebo[m]>
yeah but if you only load it once it is fine right?
<enebo[m]>
finalization of our runtime makes it think it is done even though another comes in
<headius[m]>
it is possible to work around this by always getting a URLConnection and turning off caching
<headius[m]>
rather than using the shortcut methods
<headius[m]>
it's just gross
<enebo[m]>
ok but at least it is a workaround although I guess that will have an impact :)
<headius[m]>
this plus a ClassLoader.getResourceAsString patch should fix it for all JRuby
<headius[m]>
Stream
<headius[m]>
yeah I have no idea
<headius[m]>
hmmm
<headius[m]>
I wonder when we started calling close
<headius[m]>
URLClassLoader.close was added in Java 7
<enebo[m]>
yeah is this a function on n concurrent opens/closes lifecycles from only us
<headius[m]>
yeah
<enebo[m]>
so no close would probably not have them ever close but would be a leak
<headius[m]>
I also figured out in JDK code it is a concurrency bug
<enebo[m]>
although not a leak in practice probably for most uses
<headius[m]>
it does try to remove this JarFile from the cache, but there's no guarantee that someone else isn't using it
<enebo[m]>
So perhaps a property added for this problem saying 'do not close jar resources' which we put in a wiki and they can determine whether that is really a leak for them
<headius[m]>
oh a property for not using the jar file cache maybe?
<headius[m]>
clearly this is going to be needed in enough places I will have to make a utility method
<enebo[m]>
well I was just thinking if we never close the jar file in any way it will never close so we will never see the error
<enebo[m]>
it is about finalization of the whole jar file right?
<headius[m]>
yeah that's the problem though, the jar file gets registered in JRubyClassLoader and we explicitly close it on teardown
<enebo[m]>
or individual resource contention within a jar?
<headius[m]>
URLClassLoader closes these JarFile that it used, which it probably got from the cache, and it does it in such a way that someone might get a stale entry
<enebo[m]>
oh I guess even if we did not close on teardown it may still think they need to close
<headius[m]>
oh I think I get what you're asking... the other fix
<headius[m]>
yeah I have no idea if we don't close URLClassLoader will it leak a ton of stuff?
<headius[m]>
JarFile is a ZipFile which eventually holds a RandomAccessFile
<headius[m]>
huh I just realized Java 9 removed a bunch of finalizers
<headius[m]>
anyway I don't think RandomAccessFile will close itself on finalization
<headius[m]>
it's just so broken every way I think about it
<headius[m]>
I was thinking "oh well the jar file will just stay in the cache and be okay" but then why did they write this code to close jar file instances?
<enebo[m]>
probably an optimization or thought of closing the corner case that you really are done with a jar and you do not want it anymore
<enebo[m]>
seemingly though this is only an implementation problem which can get fixed
<enebo[m]>
not for what most people are using today but the semantics of closing seems reasonable if done right
<headius[m]>
now the problem in JRuby though is that we also use URLClassLoader for nested jars, which get unpacked to unique temp locations
<headius[m]>
ugh we could manually clean those up
<headius[m]>
unique temp locations would leak pretty badly
<headius[m]>
ugh I wonder if his example is unpacking jars for every loop
<headius[m]>
what a can of worms
<enebo[m]>
so if all distinct files always had the same location and no temp file stuff we could potentially just say anything loaded will never be unloaded (with a property perhaps so it can be disabled (or enabled))
<enebo[m]>
time it !
<headius[m]>
right
<headius[m]>
that would be the worst outcome if it's all just filesystem locations
<enebo[m]>
It is possible this is way faster also
<headius[m]>
well we are clearly not benefiting from the JDK cache if they close it when we close the CL
<headius[m]>
we do have in hand a list of the tempfile jars, so I think it should be possible to get a JarURLConnection that has the cached JarFile and kill it
<headius[m]>
we'll know it's ours because it's a tempfile
<headius[m]>
this is relying on some super deep magic though... opening a new connection assuming we are going to get the cached JarFile to close it
<headius[m]>
😬
<enebo[m]>
heh
<enebo[m]>
So unpacking to tempfiles happend for jffi but what else does that?
<enebo[m]>
If we had an extended URLClassL that was TempURLClassL then assuming we do not reuse temp loaded between instances they could just close
<headius[m]>
nested jars
<headius[m]>
like all the jars packed inside jruby-complete.jar
<enebo[m]>
or do we not control how those get loaded out of a jar
<headius[m]>
hmm
<headius[m]>
ugh it's possible
<headius[m]>
we'd have a separate TempCL that wraps JRubyCL
<headius[m]>
TempCL would actually have to become JRubyCL since that's our top-level classloader on every API
<enebo[m]>
top-level jars we never close (maybe on property) and for nested exploded to temp dirs we use different CL and always close?
<headius[m]>
this would be hidden inside it I suppose
<headius[m]>
internally delegate so that only temp jars get closed
<enebo[m]>
yeah
<headius[m]>
that's not terrible
<enebo[m]>
I am just brainstorming of course but it sounds like it might work :)
<enebo[m]>
Still sounds like we need top-level urlclassloaders potentialy not close (not sure about defaulting that on or off by default)
<enebo[m]>
defaulting it on means potentially more memory someone might notice and report an issue. Leaving close on will probably generate an issue report and then they will be told/discover they can use the property
<enebo[m]>
I am all about not getting issue reports :)
<headius[m]>
I think I have a patch for your idea
<headius[m]>
I seems to work with jruby.jar but I will check if it leaks with jruby-complete