<chrisseaton[m]1> `unloaded signature classes` seems to stop `invokedynamic` doing as much as it could in JRuby
<headius[m]> What's that?
<headius[m]> I do periodic audits of the native code but it has been a while and I have not done it on a recent jdk
<chrisseaton[m]1> I see this for trivial examples I think should inline (`foo` calls `bar`, `bar` just returns a small integer constant.)
<chrisseaton[m]1> I don't really know much about this stuff - possibly a handle's signature not set up properly? I'm guessing.
<headius[m]> That seems like a bug. The only classes that would be unusual in any of our signatures would be the generated code itself
<chrisseaton[m]1> There's some discussion of it for other apps if you google the message
<chrisseaton[m]1> Happens on 8 and 14
<headius[m]> Are you seeing this in print inlining output?
<chrisseaton[m]1> Yeah I'm asking C2 to tell me why it's inlining and why it isn't.
<chrisseaton[m]1> Cos it seemed funny it wasn't.
<headius[m]> If that's no longer inlining there's something wrong, because that's the sort of case I used for auditing call site optimization
<chrisseaton[m]1> ```
<chrisseaton[m]1> % bin/jruby -Xcompile.invokedynamic=true -J-XX:+UnlockDiagnosticVMOptions -J-XX:+PrintInlining "-J-XX:CompileCommand=print,*::*foo*" -Xcompile.dump -Xir.print inline.rb
<chrisseaton[m]1> ```
<headius[m]> It's not possible for those classes to be unloaded because I literally use the class objects when generating the code
<chrisseaton[m]1> Oh and I have a patch to disable AOT compilation of the main file
<headius[m]> But our funky classloaders could certainly complicated
<headius[m]> Complicate it
<chrisseaton[m]1> Well let me know if you can reproduce and if you can't I'll see if there's anything else in my environment
<headius[m]> Yeah will be back at my machine shortly
<chrisseaton[m]1> I use this when experimenting, so that the main file is subject to the same compilation logic as the rest (I know it damages benchmarks but I'm looking at compilation behaviour, not performance.)
<chrisseaton[m]1> Oh and I'm on a slightly old release
<headius[m]> mostly minor changes from 9.2.11 on master
<headius[m]> .9 to .10 to .11 have larger ones
<headius[m]> I think there's a few more specialized indy patterns similar to .new on master
<headius[m]> chrisseaton: precompiling the target script shouldn't be too different, but I can see it would be useful to have a pure JIT mode
<chrisseaton[m]1> I was looking at thresholds and things
<headius[m]> sure, and there will be less invalidation early on with AOT mode, which is part of the reason it's on normally
<headius[m]> we're asking a lot of the JVM already
<headius[m]> oh one thing I wanted to mention about the nil thing
<headius[m]> we only have that field on ThreadContext as a shortcut to avoid doing context.runtime.nil
<headius[m]> on master there's a pretty large change to Ruby.java that makes almost everything final, so on JITs that optimize instance finals it may be less valuable now
<headius[m]> Graal's "truly final" optimization works so well I went through the pain of moving most of Ruby initialization into the constructor
<headius[m]> oh btw I don't think this has changed but PrintInlining is not particularly accurate
<headius[m]> It will show things inlining but not show later invalidation of that inlining, so I got a very skewed view of things in the past
<headius[m]> you want to use LogCompilation
<headius[m]> PrintInlining will show early inlining decisions but not show that they were invalidated in favor of later ones, so you're left to sort out a bunch of disjoint information
<headius[m]> LogCompilation combines what you get from PrintCompilation and PrintInlining with information at each inlining level of why it did or did not inline
<headius[m]> but you need a separate command to parse the XML it vomits out
<headius[m]> I do not get the "unloaded" failure locally but haven't tried your AOT disable
<chrisseaton[m]1> I'll try myself on a clean master tomorrow
<headius[m]> ah I am on master too
<headius[m]> I don't think anything in this script should compile differently though
<headius[m]> patch has no effect
<headius[m]> things are different but it still inlines
<headius[m]> hmm actually PrintInlining may be misleading me here
<headius[m]> I did find it elsewhere, which is confusing because it shouldn't have invalidated
<headius[m]> can't see why that would happen in this output
<chrisseaton[m]1> Look at the machine code
<headius[m]> hmm I think kares added a property to change how we classload jitted methods
<headius[m]> I believe there's a mode that uses only one and never unloads
<headius[m]> I see the hs-compiler-dev thread on it
<chrisseaton[m]1> On what sorry? Classloading or the inlining message I'm seeing?
<headius[m]> hmm
<headius[m]> the inlining message
<headius[m]> something we changed in the generated methods may be tripping this up
<headius[m]> I'm still not sure why it deopts though
<headius[m]> disabling tiered compilation seems to help
<headius[m]> I see an invalidation and "unloaded" message, and then shortly after that a new inlining compilation that does go through bar
<headius[m]> the deopt was likely from tiered
<headius[m]> well, we need to look into this in any case
<headius[m]> nothing should be unloaded here
<headius[m]> I need to get LogCompilation output and see what that says
<chrisseaton[m]1> 'unloaded' means like uncommon trap doesn't it? Similar to if the class hadn't been loaded at all yet.
<headius[m]> I'm not sure
<headius[m]> the jit classes are pretty boring, just a bag of static methods
<headius[m]> that's the output I see with -XX:-TieredCompilation
<headius[m]> the unloaded message is there but it's followed by a new compilation that inlines ok
<headius[m]> something's amiss
<chrisseaton[m]1> Do you actually get the right machine code at the end?
<headius[m]> I'll check
<chrisseaton[m]1> That's the underlying thing I see that I don't think I should see, no matter what inlining tells me.
<headius[m]> this appears to be the final code for "foo" but there's other code from some other entry point I'm not sure about
<headius[m]> (that paste scrolls on my display and there's a retq after the fixnum load)
<chrisseaton[m]1> Looks like it's working for you
<headius[m]> try -TieredCompilation
<headius[m]> I will read more about this unloaded message
<headius[m]> it could be some side effect of improvements in tiered compilation since the last time I dug into this... I see PrintInlining output early on that shows it inlining, and then it goes away
<headius[m]> sadly it's unlikely to be our bug, but that's the way she goes
<chrisseaton[m]1> JRuby doesn't support 0 invokedynamic any more does it? The JVM6* classes still use it.
<headius[m]> not directly but work I did for GraalVM AOT provides a mode that will not use any indy at all
<headius[m]> you can hack it there (use AOT mode for JIT)
<headius[m]> the non-indy impls are not going to be great
<chrisseaton[m]1> Away from keyboard for the day
<headius[m]> no problem, it's in here for reference when you're back at it
<headius[m]> I didn't go deeper on customization of this because AOT is not a critical path right now
<headius[m]> that's only on master too I believe
<headius[m]> I guess there's larger changes in 9.3 than I realized... it has been a long gap from 9.2.11
aoeuiiueoa[m] has joined #jruby
<headius[m]> Thank you
<headius[m]> Who knows, maybe we've been missing out on a bunch of optimization
<chrisseaton[m]1> It's difficult to test this stuff - GraalJS literally write some tests to look at the resulting compiler graph I believe. We have some primitives in TruffleRuby to check things become constants which we use to write tests that could catch this kind of thing, but it's very hard to do.
<chrisseaton[m]1> The `main.rb` -> `inline.rb` indirection is key for reproducing I think - if you experiment with a single-file benchmark, you may miss it as well.
<headius[m]> yeah periodically staring at assembly and inlining dumps is clearly not a good method
<headius[m]> if we had the resources a performance regression suite and dedicated machine to run it on might help, but c'est la vie
<headius[m]> god I hate C++ sometimes
<headius[m]> hmmm
<headius[m]> I bet I know the problem
<headius[m]> seem to be the classloading
<headius[m]> seems
<headius[m]> using a shared classloader (had to patch a separate issue with class naming to make it work)
<headius[m]> this is definitely the final code for foo in this output
<chrisseaton[m]1> What is `foo$2`?
<headius[m]> the number indicates the index of the code body being compiled in a given scope
<headius[m]> for these simple methods it should usually be zero in pure JIT mode
<headius[m]> but it will be 0 for main scope, 1 for bar, 2 for foo in your foo bar example from yesterday
<headius[m]> when compiled at once
<headius[m]> it's just for uniqueness when there's multiple method bodies of the same name in a script
<headius[m]> I didn't bother doing it per name so it's per scope
<headius[m]> oh lord
<headius[m]> I hope it's not the $ in the names
<headius[m]> if it's incorrectly interpreting those as inner class delimiters it could think there's an inner class missing
<headius[m]> it better not be something that dumb
<headius[m]> ok, doesn't seem likely
<headius[m]> disabling tiered does indeed still seem to help your example, so perhaps you could confirm that on your end
<headius[m]> I'm getting a feeling this isn't our bug if going straight to C2 works fine but passing through tiers doesn't
<chrisseaton[m]1> Literally just `-J-XX:-TieredCompilation`?
<headius[m]> yeah
<chrisseaton[m]1> Yeah fixes it for me
<headius[m]> I very much doubt this is something we're doing wrong then
<chrisseaton[m]1> Can you try on an alternate JVM like J9?
<headius[m]> I don't recall the flags for J9 jit logs
<headius[m]> did you try graal jit?
<headius[m]> I'm trying PrintAssembly on graal on Java 13 but it doesn't seem to be decoding the instructions?
<headius[m]> 0x0000000120ab9e94: ; {metadata(method data for {method} {0x00000001370eb398} 'RUBY$method$bar$0' '(Lorg/jruby/runtime/ThreadContext;Lorg/jruby/parser/StaticScope;Lorg/jruby/runtime/builtin/IRubyObject;Lorg/jruby/runtime/Block;Lorg/jruby/RubyModule;Ljava/lang/String;)Lorg/jruby/runtime/builtin/IRubyObject;' in 'Users/headius/projects/jruby/foo_bar_inline1')}
<headius[m]> 0x0000000120ab9e94: 48bf b0b6 | 0e37 0100 | 0000 8b9f | dc00 0000 | 83c3 0889 | 9fdc 0000 | 0081 e3f8 | ff7f 0083
<headius[m]> 0x0000000120ab9eb4: fb00 0f84 | b70d 0000
<chrisseaton[m]1> Missing or incompatible hsdis
<chrisseaton[m]1> Where do you get or build your hsdis from?
<headius[m]> ah
<headius[m]> from source, perhaps it's not installed? I would have expected an error like C2
<chrisseaton[m]1> No at some point it changed to print bytes (and that's normal Java as well)
<headius[m]> huh, ok
<headius[m]> I don't see how this is useful but whatever
<chrisseaton[m]1> Well when I look at assembly what I'm usually really looking at is the info points. You may even be able to see if it's inlined or not in this case without the instruction memenonic.
<headius[m]> once I visually parse out the noise, I suppose
<chrisseaton[m]1> (I wrote a new disassembler for Truffle languages because of the problem redistributing hsdis.)
<headius[m]> it's a lot of garbage
<headius[m]> only works for truffle languages I suppose?
<chrisseaton[m]1> Yeah, which is useful to me as it means it can print things in a way that makes sense for Truffle, and it can be written all in user-space Java code.
<headius[m]> ah, too bad
<chrisseaton[m]1> Doesn't have to deal with mangled names or anything, can print guest-language code source locations, etc.
<headius[m]> well, our bytecode should also reflect guest-language source locations
<chrisseaton[m]1> I don't get the bug on GraalVM
<chrisseaton[m]1> That's without changing any tiering options, but of course JVMCI does mean the tiering isn't quite the same.
<headius[m]> so we have a workaround
<headius[m]> disable tiered
<headius[m]> might not be worth me digging deeper into this myself since -Tiered works
<headius[m]> I'll post something to hs-compiler-dev
<headius[m]> could you add a comment saying it works for you with -Tiered and on Graal?
<chrisseaton[m]1> Done
<headius[m]> Thanks!
<chrisseaton[m]1> Broken on Java 8 and 14, so it's not anything recent
<headius[m]> guess we'll need to rerun some perf measurements with tiered off
<headius[m]> my fault for trusting that running tiered wouldn't break inlining, I guess
<headius[m]> does GraalVM still use C1 for lower tiers?
<headius[m]> if it tiers at all, I haven't kept up
<chrisseaton[m]1> Yes C1 to gather profiling info in the same way, and normally replaces C2 with Graal. I think you can run Graal on top of C2 as a fifth tier, but not sure how.
<headius[m]> interesting
<headius[m]> so that would seem to indicate this is something specific to the combination of C1 and C2 tiered
<chrisseaton[m]1> I can't really help any more so I'll leave this here. I'll disable tiering in the relevant experiments I'm doing and note why.
<headius[m]> note? What is this for?
<chrisseaton[m]1> I'm writing something on optimising Ruby. I'll share it with you before I make it public.
<headius[m]> I see
<headius[m]> well I do hope you'll share any other failed optimizations you find
<headius[m]> or any obvious opportunities
<headius[m]> I also hope you won't be too harsh when discussing JRuby 😉
<chrisseaton[m]1> Well I'll share it first so you can debate it with me, but it's not a competition
<headius[m]> I think some would disagree with that statement
<headius[m]> I appreciate your openness
<headius[m]> posted to hs-compiler-dev
<headius[m]> I'll file an OpenJDK bug if nobody responds by tomorrow
<headius[m]> chrisseaton: were you running 9.2.11 yesterday?
<chrisseaton[m]1> For everything in the issue I was using master.
<chrisseaton[m]1> For what I'm writing I'm using whatever this old version I told you about is.
<headius[m]> it doesn't appear that you told me
<headius[m]> you just said "a slightly older version"
<chrisseaton[m]1> `458ad3ed9cdb18b3e69fb96b947b978a193afeb6`
<chrisseaton[m]1> I think that's 9.2.9.0
<headius[m]> it is
<headius[m]> too bad... .11 can inline some block yields and I believe .10 has similar optimizations
<chrisseaton[m]1> I'm building with some patches and things, and in a complex virtualised environment so wasn't easy to upgrade right now, but I will do later on
<headius[m]> and there's gobs of minor optimizations and allocation cleanup along the way
<headius[m]> ah ok
<chrisseaton[m]1> I've opened a PR for the first of the patches already
<chrisseaton[m]1> I'm not even benchmarking anything!
<headius[m]> well, that's even better
<headius[m]> I look forward to seeing what you find, and thanks for that PR, it looks neat
<headius[m]> I'm always looking for low-hanging fruit since we don't get a lot of time to work on more advanced optimizations
<chrisseaton[m]1> Seems like most obvious things are stuck without a Nashorn-style deoptimisation exception
<headius[m]> could be, but I feel they were far too aggressive and it hurt their results
<headius[m]> there's a lot we can do with an interpreter tier that they had to do with massive failover during bytecode specialization
<chrisseaton[m]1> Or a JVM API to access frame state
<headius[m]> apparently there's sneaky ways to do that, but they seem a bit too sneaky for my tastes
<chrisseaton[m]1> What if a failed `SwitchPoint` received an object that let you access local variables in the method it was triggered from?
<headius[m]> how would it do that?
<chrisseaton[m]1> Like if the method handle for the failure case got an object `JavaFrame` that had methods to read and write the method that contained the check.
<headius[m]> sure, that would be great if it were possible using public APIs
<headius[m]> once we can reliably stack allocate objects we should be able to use a frame struct and pass it around
<headius[m]> I've also wanted to try eliminating our DynamicScope stack altogether and see if escape analysis can eliminate those objects... in theory that would be as good as having a stack-allocated frame and we could just pass it into call sites that might have to deoptimize
<headius[m]> the stack on ThreadContext isn't really necessary anymore
nirvdrum has joined #jruby
travis-ci has joined #jruby
<travis-ci> jruby/jruby-openssl (master:dd78267 by Karol Bucek): The build is still failing. (https://travis-ci.org/jruby/jruby-openssl/builds/698270852)
travis-ci has left #jruby [#jruby]
travis-ci has joined #jruby
<travis-ci> kares/jruby-openssl (master:b6928f6 by Karol Bucek): The build was canceled. (https://travis-ci.org/kares/jruby-openssl/builds/698273807)
travis-ci has left #jruby [#jruby]
nirvdrum has quit [Remote host closed the connection]
travis-ci has joined #jruby
<travis-ci> kares/jruby-openssl (master:e65ab65 by Karol Bucek): The build was canceled. (https://travis-ci.org/kares/jruby-openssl/builds/698276414)
travis-ci has left #jruby [#jruby]
<chrisseaton[m]1> headius: the possibility of maybe having to deal with a `Bignum` also seems to consitently poison simple data-flow
travis-ci has joined #jruby
<travis-ci> kares/jruby-openssl (master:f5853fe by Karol Bucek): The build failed. (https://travis-ci.org/kares/jruby-openssl/builds/698279857)
travis-ci has left #jruby [#jruby]
<subbu> headius[m], chrisseaton[m]1 ya .. sorry .. i 've fallen off as I sunk deeper and deeper into the wikimedia world.
<headius[m]> chrisseaton in what way? I guess you are not using Graal jit for this exploration?
<chrisseaton[m]1> I often get two code paths - the case for when it returns a fixnum, and the case for everything else (really bignum I think) - they're never used, they're just there.
<chrisseaton[m]1> And it means after they merge the types are generic.
<headius[m]> I assume you are seeing this in IGV, yeah?
<chrisseaton[m]1> Or in the machine code
<chrisseaton[m]1> I'll create some concrete issues later on
<chrisseaton[m]1> I think at least one may be easily fixable...
<headius[m]> It would be interesting to see how much better Graal does on the cases you're seeing
<chrisseaton[m]1> Root cause usually seems to go back to something coming in from a field read
<chrisseaton[m]1> But as I say, I'll create issues later rather than talking in the abstract
<headius[m]> Ok
drbobbeaty has quit [Ping timeout: 256 seconds]
ur5us has joined #jruby
<chrisseaton[m]1> When I have a `RubyFloat` as a constant in generated code, Graal seems able to turn the field read from that constant into a constant by itself, where C2 seems to read the field every time even though it's final. Is that expected?
<headius[m]> C2 does not treat instance finals as constant
<headius[m]> There may be a flag to enable "truly final" optimization but by default I believe only Zing and Graal do it today
<chrisseaton[m]1> I've seen a presentation on truly final in Zing
<chrisseaton[m]1> Seems like it should be like any other deoptimisation event - not sure why it's more complicated
<headius[m]> I agree and it's not entirely clear to me why C2 has not done this
<headius[m]> I think I mentioned it yesterday that I have made some efforts recently to move more jruby runtime state into final fields because it's clearly a valuable optimization and I assume every jit will eventually be doing it
<headius[m]> You're going to find that many of the optimization challenges in jruby are due to C2
<headius[m]> Well, I guess we could just say due to some compilers being better than others
<headius[m]> For my money, jruby is generally doing the right things since graal for example does a much better job on certain patterns
<headius[m]> Any analysis of optimization in jruby necessarily becomes an analysis of the underlying jvm jit
<headius[m]> I suppose this points out one aspect of our approach to optimization in jruby. In cases where it's been clear to me that the jvm should be doing the optimization, we have not pushed hard to do it at the jruby level. When I am able to talk to C2 engineers, I will point out such cases, and sometimes that produces results
<headius[m]> This is one reason we have never been as aggressive with specialization as nashorn, because in their case it was never quite clear if that level of optimization was really bearing the fruit they wanted once it got down to the jvm jit level
<chrisseaton[m]1> I don't understand why JRuby isn't a critical test case for them - it's now (again) the major user of indy
<headius[m]> I'd say other than perhaps nashorn there's no other jvm language even close to our level of Indy usage
<headius[m]> Some of the C2 engineers do use us as an example case, but I get the feeling, and this is just a suspicion, that we make some ugly truths more evident
<headius[m]> I mean really, C2 escape analysis can't even handle a phi
<headius[m]> Because we use indy, basically every operation in jruby has at least one branch point in it... C2 EA does almost nothing for us
<headius[m]> subbu: no worries whatsoever... you gave us a huge leg up on building a better compiler for JRuby
<chrisseaton[m]1> subbu: I've been writing about the design of your IR recently, and how each instruction is like a tiny tree, with the rich operand types.
<headius[m]> chrisseaton: regarding using 9.2.9 vs 9.2.11 or master, it occurred to me that you'll still largely see the strategy even if some specializations are not there
<headius[m]> when indy works right, we can do a lot with a tree of handles... and most of my recent optimization work has focused on making better use of those handles
<chrisseaton[m]1> Storage strategies?
<headius[m]> optimization strategies
<headius[m]> like 9.2.9 does not inline yields or optimize method_missing, but hopefully it's evident from other patterns that we can do those things
<headius[m]> I'm just saying that for your analysis the general story isn't really different from 9.2.9 to master, even if specific cases aren't all there
<chrisseaton[m]1> Got it