#jruby on 2020-06-05 — irc logs at freenode.irclog.whitequark.org

2019-08-12 18:53 ChanServ changed the topic of #jruby to: Get 9.2.8.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:06 KarolBucekGitter has quit [Ping timeout: 240 seconds]

00:10 KarolBucekGitter has joined #jruby

00:11 rg[m] has quit [Ping timeout: 246 seconds]

00:11 rdubya[m] has quit [Ping timeout: 246 seconds]

00:11 kares[m] has quit [Ping timeout: 246 seconds]

00:11 TimGitter[m]1 has quit [Ping timeout: 246 seconds]

00:11 sintel[m] has quit [Ping timeout: 246 seconds]

00:11 venkatkms[m] has quit [Ping timeout: 246 seconds]

00:11 i8her8oat[m] has quit [Ping timeout: 246 seconds]

00:11 multislacker[m] has quit [Ping timeout: 246 seconds]

00:11 smarks[m] has quit [Ping timeout: 246 seconds]

00:11 elisabeth[m] has quit [Ping timeout: 246 seconds]

00:12 rg[m] has joined #jruby

00:12 smarks[m] has joined #jruby

00:12 multislacker[m] has joined #jruby

00:13 i8her8oat[m] has joined #jruby

00:13 kares[m] has joined #jruby

00:13 TimGitter[m]1 has joined #jruby

00:13 rdubya[m] has joined #jruby

00:13 venkatkms[m] has joined #jruby

00:13 sintel[m] has joined #jruby

00:16 elisabeth[m] has joined #jruby

00:21 ur5us has quit [Ping timeout: 256 seconds]

00:43 ur5us has joined #jruby

00:48 jmalves has joined #jruby

00:50 nirvdrum has quit [Ping timeout: 260 seconds]

00:53 jmalves has quit [Ping timeout: 246 seconds]

01:41 bga57 has quit [Ping timeout: 272 seconds]

01:44 bga57 has joined #jruby

01:55 bga57 has quit [Ping timeout: 272 seconds]

01:56 bga57 has joined #jruby

02:06 _whitelogger has joined #jruby

02:08 bga57 has quit [Ping timeout: 272 seconds]

02:14 bga57 has joined #jruby

03:16 Antiarc has quit [Ping timeout: 265 seconds]

03:30 Antiarc has joined #jruby

04:52 ur5us has quit [Remote host closed the connection]

04:53 ur5us has joined #jruby

05:02 ur5us has quit [Ping timeout: 246 seconds]

05:54 ur5us has joined #jruby

06:02 snickers has joined #jruby

06:07 snickers has quit [Ping timeout: 260 seconds]

06:41 ur5us has quit [Ping timeout: 246 seconds]

09:52 drbobbeaty has joined #jruby

11:42 ur5us has joined #jruby

12:15 nirvdrum has joined #jruby

13:10 ur5us has quit [Ping timeout: 246 seconds]

14:47 <headius[m]> enebo: you around?

14:47 <enebo[m]> headius: sure

14:47 <headius[m]> OpenJDK is BUSTE*D

14:47 <headius[m]> wow that formatted nice

14:47 <enebo[m]> I saw your tweet

14:47 <headius[m]> https://github.com/jruby/jruby/issues/6218#issuecomment-639539200

14:47 <enebo[m]> with Atilla

14:47 <headius[m]> yeah cl4es messaged me later and we paired on some code spelunking

14:47 <headius[m]> I think this is the root problem

14:48 <headius[m]> you can see the closing in URLClassLoader.close, and via the openConnection logic it eventually uses a JDK-global cache of JarFile instances

14:48 <headius[m]> so blammo, eventually it closes a JarFile that's in the cache

14:49 <headius[m]> this is the cause of all the exceptions in that bug, plus all our sporadic "zip file closed" errors in CI

14:50 <enebo[m]> Atilla mentioned making his own URLCL? Is that too big an idea?

14:51 <headius[m]> I'd much rather find one we could reuse because this isn't simple stuff

14:51 <headius[m]> I'm actually not hating the idea of not closing URLClassLoader

14:51 <headius[m]> I don't really get why it's doing all these gymnastics to cache JarFile instances anyway

14:52 <headius[m]> the right fix, however, is probably to avoid ever using ClassLoader.getResourceAsStream or URL.openStream directly

14:53 <headius[m]> this is really shockingly broken

14:54 <enebo[m]> we have a need to caching presumably because we do not require all files within a jar (like warbler or our kernel files)

14:54 <headius[m]> so there's two caches involved here

14:55 <headius[m]> inside JarResource there's a static reference to JarCache, which was contributed to us by a user

14:55 <headius[m]> that cache takes a jar path string and builds a cache of JarEntry that's faster to search than manually walking entries

14:55 <headius[m]> it's used anywhere we use JarResource.openWhatever

14:55 <headius[m]> that does not appear to be broken, at least not exactly

14:56 <headius[m]> the cache that's broken is internal to OpenJDK... whenever you request a resource from a jar using either a URL or a URLClassLoader, it eventually calls into a global cache mapping jar URLs to JarFile instances

14:57 <headius[m]> to reuse those JarFile for... reasons?

14:57 <headius[m]> but URLClassLoader also adds the jar files it encounters into a list of things to close

14:58 <headius[m]> I just don't understand how this hasn't been reported and fixed years ago

14:58 <enebo[m]> but there are in fact two separate things in a jar for us as well

14:58 <enebo[m]> classfiles which presumably somehow is the reason for some deep cache

14:58 <enebo[m]> and ordinary files (e.g. .rb)

14:59 <headius[m]> the cache we make doesn't get used for classloading

14:59 <headius[m]> it's only used for things like globbing files from a jar

14:59 <enebo[m]> I guess us reading all ordinary ones into memory and closing/not accessing from the jarfile is probably not a solution because it would be time consuming

14:59 <enebo[m]> oh yeah I got that

15:00 <headius[m]> I think classloader resources may at some level also be going through this jar file cache

15:01 <enebo[m]> I am just trying to think about why we actually hit the error and it is to access these resources and not actual classfiles

15:01 <headius[m]> yeah seems like all the errors I saw were loading ruby scripts from inside the jar

15:02 <enebo[m]> I am surprised there is no bug parade on this

15:02 <headius[m]> I think I saw a case where it was looking up a class, but it was opening the class as a resource to see if it was there

15:02 <headius[m]> so again a resource access

15:03 <enebo[m]> yeah if we could know a jar is loaded we could reload it with our own jar code and then all requiers would grab from that...seems far fetched

15:04 <headius[m]> the logic inside URLClassLoader and on down for opening these connections is just stupid complicated

15:05 <enebo[m]> I imagne it ends up being a stream at some point so not super useful...and I doubt we want to read the entire resource stream into memory

15:05 <enebo[m]> I mean a second time

15:05 <enebo[m]> and sequentially

15:06 <enebo[m]> even forget that this is about concurrent interaction

15:06 <enebo[m]> that will not even work because it might be closed by the time you try it

15:07 <headius[m]> yeah single-thread sequential access will also break

15:07 <headius[m]> it's totally broken

15:08 <headius[m]> like if you use two URLClassLoader to read the same jar, and you close them, one will break

15:08 <headius[m]> it's that simple

15:09 <enebo[m]> Too bad there is no concept of dynamic shading

15:10 <enebo[m]> Or not really shading but something which confuses the lowest level into thinking they are in fact different jars

15:10 <enebo[m]> like adding something to the url which is not stripped off via normalization

15:22 <headius[m]> we could copy it to a temp location :-)

15:23 <enebo[m]> possibly it is a good solution for warbler at least

15:23 <enebo[m]> we could try and generalize it but I don't know

15:24 <enebo[m]> So I guess though this is purely a problem with emebdding

15:25 <enebo[m]> I would think most warbler people would not start up n runtime instances in one runtime but startup one with n threads

15:25 elia has joined #jruby

15:25 elia has quit [Client Quit]

15:26 <headius[m]> warbler's a good example... I have never gotten a good picture of which servers, or which versions of those servers, unpack the war to a temp location

15:26 <headius[m]> but we've gotten a few reports recently about FFI-based libraries that have native lilbs

15:27 <headius[m]> they clearly don't work from within a jar

15:27 <headius[m]> but FFI would have to know the file URL it was given to open as a library is in a jar and needs to be unpacked, etc

15:28 <headius[m]> could try to load the file into shared memory somewhere and dlopen it as a pointer?

15:28 <headius[m]> madness

15:29 <headius[m]> anyway yeah the typical case when you control all the threads would be to use same instance

15:29 <headius[m]> but I think there are a lot of apps out there using JRuby as a scripting language for a larger app, so they separate it by runtime or ScriptingContainer

15:30 <enebo[m]> yeah so we see this in testing via multiple instances in same process and the people reporting are embedding use cases

15:30 <enebo[m]> Just thinking through impact of this mostly

15:30 <enebo[m]> So my comment about warbler maybe is not relevant

15:31 <enebo[m]> but if we could rename jars somehow on each script container load then it might eliminate this problem and create a new problem

15:31 <enebo[m]> since we would have the same jar possibly loaded hundreds of times

15:31 <headius[m]> yup exactly

15:31 <headius[m]> I never put all these cases together until this guy came up with a pocket reproduction

15:32 <enebo[m]> Seems like scripting container with jar resources is just a problem we might be stuck with until JVM actually fixes it

15:32 <enebo[m]> Our position will be to not store resources in a jar due to this I guess?

15:33 <enebo[m]> for some applications that just means exploding those resources to some runtime location

15:33 <enebo[m]> That sucks of course but the way you have painted this I am not seeing a lot of recourse

15:34 <enebo[m]> Another solution would be for apps to not load multiple containers but use them in one of our threaded modes

15:34 <enebo[m]> (as a suggested fix)

15:34 <headius[m]> well, all of jruby-complete is resources in a jar

15:34 <enebo[m]> that really only helps if you do not require isolation

15:34 <enebo[m]> yeah but if you only load it once it is fine right?

15:34 <enebo[m]> Or I guess it could still close

15:34 <enebo[m]> Do we know why it closes?

15:34 <headius[m]> did you see this one? https://github.com/jruby/jruby/pull/6269

15:35 <enebo[m]> finalization of our runtime makes it think it is done even though another comes in

15:35 <headius[m]> it is possible to work around this by always getting a URLConnection and turning off caching

15:35 <headius[m]> rather than using the shortcut methods

15:35 <headius[m]> it's just gross

15:36 <enebo[m]> ok but at least it is a workaround although I guess that will have an impact :)

15:36 <headius[m]> this plus a ClassLoader.getResourceAsString patch should fix it for all JRuby

15:36 <headius[m]> Stream

15:36 <headius[m]> yeah I have no idea

15:36 <headius[m]> hmmm

15:36 <headius[m]> I wonder when we started calling close

15:36 <headius[m]> URLClassLoader.close was added in Java 7

15:37 <enebo[m]> yeah is this a function on n concurrent opens/closes lifecycles from only us

15:37 <headius[m]> yeah

15:37 <enebo[m]> so no close would probably not have them ever close but would be a leak

15:37 <headius[m]> I also figured out in JDK code it is a concurrency bug

15:37 <enebo[m]> although not a leak in practice probably for most uses

15:37 <headius[m]> it does try to remove this JarFile from the cache, but there's no guarantee that someone else isn't using it

15:38 <enebo[m]> So perhaps a property added for this problem saying 'do not close jar resources' which we put in a wiki and they can determine whether that is really a leak for them

15:39 <headius[m]> oh a property for not using the jar file cache maybe?

15:40 <headius[m]> clearly this is going to be needed in enough places I will have to make a utility method

15:40 <enebo[m]> well I was just thinking if we never close the jar file in any way it will never close so we will never see the error

15:40 <enebo[m]> it is about finalization of the whole jar file right?

15:40 <headius[m]> yeah that's the problem though, the jar file gets registered in JRubyClassLoader and we explicitly close it on teardown

15:40 <enebo[m]> or individual resource contention within a jar?

15:41 <headius[m]> URLClassLoader closes these JarFile that it used, which it probably got from the cache, and it does it in such a way that someone might get a stale entry

15:41 <enebo[m]> oh I guess even if we did not close on teardown it may still think they need to close

15:41 <headius[m]> oh I think I get what you're asking... the other fix

15:41 <headius[m]> yeah I have no idea if we don't close URLClassLoader will it leak a ton of stuff?

15:42 <headius[m]> JarFile is a ZipFile which eventually holds a RandomAccessFile

15:42 <headius[m]> huh I just realized Java 9 removed a bunch of finalizers

15:43 <headius[m]> anyway I don't think RandomAccessFile will close itself on finalization

15:44 <headius[m]> it's just so broken every way I think about it

15:45 <headius[m]> I was thinking "oh well the jar file will just stay in the cache and be okay" but then why did they write this code to close jar file instances?

15:47 <enebo[m]> probably an optimization or thought of closing the corner case that you really are done with a jar and you do not want it anymore

15:47 <enebo[m]> seemingly though this is only an implementation problem which can get fixed

15:47 <enebo[m]> not for what most people are using today but the semantics of closing seems reasonable if done right

15:55 <headius[m]> https://gist.github.com/headius/32416a79faf14f63d660c40d83021bcf

15:55 <headius[m]> that fails immediately

15:57 <enebo[m]> if you remove urlc.close() and do it alot I wonder how much growth occurs (or if it also fails)

15:57 <headius[m]> in theory it should use the same cached JarFile

15:58 <headius[m]> I'll try i

15:58 <headius[m]> it

15:59 <headius[m]> hmm mac Activity Monitor doesn't seem to show file descriptors

15:59 <headius[m]> java 5193 headius 10r REG 1,5 15440228 27842244 /Users/headius/projects/jruby/lib/jruby.jar

15:59 <headius[m]> only the one entry

16:00 <headius[m]> now the problem in JRuby though is that we also use URLClassLoader for nested jars, which get unpacked to unique temp locations

16:00 <headius[m]> ugh we could manually clean those up

16:00 <headius[m]> unique temp locations would leak pretty badly

16:01 <headius[m]> ugh I wonder if his example is unpacking jars for every loop

16:01 <headius[m]> what a can of worms

16:01 <enebo[m]> so if all distinct files always had the same location and no temp file stuff we could potentially just say anything loaded will never be unloaded (with a property perhaps so it can be disabled (or enabled))

16:01 <enebo[m]> time it !

16:01 <headius[m]> right

16:01 <headius[m]> that would be the worst outcome if it's all just filesystem locations

16:01 <enebo[m]> It is possible this is way faster also

16:02 <headius[m]> well we are clearly not benefiting from the JDK cache if they close it when we close the CL

16:02 <headius[m]> we do have in hand a list of the tempfile jars, so I think it should be possible to get a JarURLConnection that has the cached JarFile and kill it

16:02 <headius[m]> we'll know it's ours because it's a tempfile

16:03 <headius[m]> this is relying on some super deep magic though... opening a new connection assuming we are going to get the cached JarFile to close it

16:03 <headius[m]> 😬

16:03 <enebo[m]> heh

16:04 <enebo[m]> So unpacking to tempfiles happend for jffi but what else does that?

16:05 <enebo[m]> If we had an extended URLClassL that was TempURLClassL then assuming we do not reuse temp loaded between instances they could just close

16:05 <headius[m]> nested jars

16:05 <headius[m]> like all the jars packed inside jruby-complete.jar

16:05 <enebo[m]> or do we not control how those get loaded out of a jar

16:06 <headius[m]> hmm

16:06 <headius[m]> ugh it's possible

16:07 <headius[m]> we'd have a separate TempCL that wraps JRubyCL

16:07 <headius[m]> TempCL would actually have to become JRubyCL since that's our top-level classloader on every API

16:07 <enebo[m]> top-level jars we never close (maybe on property) and for nested exploded to temp dirs we use different CL and always close?

16:07 <headius[m]> this would be hidden inside it I suppose

16:08 <headius[m]> internally delegate so that only temp jars get closed

16:08 <enebo[m]> yeah

16:08 <headius[m]> that's not terrible

16:08 <enebo[m]> I am just brainstorming of course but it sounds like it might work :)

16:09 <enebo[m]> Still sounds like we need top-level urlclassloaders potentialy not close (not sure about defaulting that on or off by default)

16:10 <enebo[m]> defaulting it on means potentially more memory someone might notice and report an issue. Leaving close on will probably generate an issue report and then they will be told/discover they can use the property

16:10 <enebo[m]> I am all about not getting issue reports :)

16:15 <headius[m]> I think I have a patch for your idea

16:19 <headius[m]> I seems to work with jruby.jar but I will check if it leaks with jruby-complete

16:19 <headius[m]> hmm still blew up

16:20 <headius[m]> https://gist.github.com/headius/6130d94eb71763c534917e6f1a9c3955

16:25 <enebo[m]> hmm

16:50 <headius[m]> I'm going to take a breather... been messing with this thing for two days

16:50 <headius[m]> If you see anything obvious in that patch give it a try... the example class and command line are in the bug

16:50 <enebo[m]> ok

18:28 Antiarc has quit [Ping timeout: 246 seconds]

18:37 subbu is now known as subbu|lunch

18:53 Antiarc has joined #jruby

19:41 subbu|lunch is now known as subbu

21:41 ebarrett has quit [Quit: WeeChat 2.8]

22:25 ur5us has joined #jruby

22:38 JohnPhillips3141 has joined #jruby

22:57 nirvdrum has quit [Ping timeout: 272 seconds]

23:23 ur5us has quit [Ping timeout: 272 seconds]