<headius[m]>
chrisseaton: finally got my apple device, where did you the openjdk zero build you're using?
<headius[m]>
enebo: I'm looking at IR for def foo(*a); super(*a); end and seeing it splat twice
<headius[m]>
I'd like to get super calls inlining but this is a bit confusing
<headius[m]>
I think for now I will skip trying to inline super sites that have splats... the signature juggling is a distraction right now
<headius[m]>
we may want to look at specializing IR a bit more for the different super argument forms
nirvdrum has quit [Ping timeout: 256 seconds]
<enebo[m]>
yeah that is interesting
<headius[m]>
class Y; def foo(*a); super(*a); end; end
<enebo[m]>
but if a is changed then a splat should happen twice shouldn't it?
<headius[m]>
I see the rest arg receive and then two splats before the call
<headius[m]>
well, whether it changes or not why are there two?
<enebo[m]>
probably because we do not track how a is used
<enebo[m]>
but I wonder how MRI does this
<headius[m]>
could be
<headius[m]>
getting super to inline will be easy but for this argument logic
<enebo[m]>
well it could just be conservatively ignored in this form
<enebo[m]>
I would like to understand what should happen but this is only so common and other forms woudl still benefit
<headius[m]>
yeah I am taking a step back and just looking at non-splatted now, because those are passing args through pretty much like regular calls
<headius[m]>
so the remaining logic change is just in looking up and guarding the site
<enebo[m]>
and IR did eliminate a class of calls where it does just decompose to a regular call already
<enebo[m]>
but as far as improvements go there are plenty of unresolved super
<headius[m]>
yeah I'm also only looking at instance super right now
<headius[m]>
we have all the logic in all the right places but figuring out the right way to turn that inside out and cache/inline is tricksy
<headius[m]>
perhaps I should do this as a prototype in the old style non-indy CallSite first
<headius[m]>
then non-indy and interp will cache
<enebo[m]>
it is helpful to have non-indy version regardless
<enebo[m]>
Or at least I am a fan of the longer term idea of only indy'ing hot stuff once we can crack the profiling nut
<headius[m]>
yeah well the simple profiling option is still pretty easy... add a counter at the site and until it reaches N we just use a non-inlining mono cache
<headius[m]>
so very quick to bind to a very simple dispatcher, but if it's continually hit we flip to indy
<headius[m]>
actually this is almost how it works right now... the end of the PIC is a simple monomorphic cache
<headius[m]>
so basically we bind that first and then if we start to see heavily-hit momonorphic targets we start wrapping that monomorphic cache with some PIC laters
<headius[m]>
mostly it just flips the current logic on its head... current is "build pic until we see N targets and then failover to mono cache"... new logic would be "use mono cache until we see that it's monomorphic and hot, and then build PIC"
<headius[m]>
but we'd always use indy, just with a trivial target at first
<enebo[m]>
I was thinking a bit more coarse but sure
<enebo[m]>
For me the largest problem in determining hotness of lets say an entire method has eluded me so I just mention it
<enebo[m]>
I did make that delta time stuff but it lacks any reasonable calibration
<enebo[m]>
we cannot just pick a value and have it work everywhere but I do not know how to determine that value at runtime
<headius[m]>
yeah I am just eager to find a way to have always-on indy without impacting warmup and startup and memory profile
<enebo[m]>
heh yeah me too
<headius[m]>
the benefit of using indy all the time is that by changing call site target the JVM will deopt/reopt for us rather than us trying to do it
<headius[m]>
otherwise we need to re-emit code
<enebo[m]>
yep just emitting the perfect thing once would obviously be best
<headius[m]>
I mean a lot of what we've discussed could be shoved into indy sites, like "here's my frame data, make it lazily if you need to"
<headius[m]>
rather than trying to profile for frame, we just create it on demand as part of the calls that need it
<headius[m]>
if no call needs it it stays on JVM frame
<headius[m]>
but that's beyond this discussion
<enebo[m]>
yeah
<headius[m]>
we need to get supers inlining and kwargs optimized first I think :-)
<enebo[m]>
I remember java dude (name is escaping me but perhaps I shouldn't name him anyways) at JVMLS getting a little nasty at a Scala talk?
<enebo[m]>
He was like what are the cost metrics or something like that on how much time a feature would take
<enebo[m]>
Like somehow he wanted an accounting like "this feature takes 10 clocks"
<headius[m]>
probably Josh Bloch and his "semantic gap" talk
<enebo[m]>
but this is somewhat an issue with JVM itself
<enebo[m]>
well it was Josh Bloch but not his talk
<enebo[m]>
he was attacking someone who gave a talk
<headius[m]>
ah sure
<headius[m]>
talk might have been David Pollack's talk on "wow look at what scala does unnder the covers"
<enebo[m]>
but the semantic gap or sorts is the JVM itself
<headius[m]>
which had the opposite effect on most of those watchinng
<enebo[m]>
That actually was it I think
<enebo[m]>
We do not really have a lot of ability to examine all but small chunks of code and say, "ah it turns out like that"
<enebo[m]>
but once you put it into something large it might not turn out like that
<enebo[m]>
this is not really special to any runtime or compiler for that matter but it is what makes it difficult to reason with
<enebo[m]>
I think qualitatively early invokedynamic suffered from code explosion
<enebo[m]>
I am not fully sure how true it is but we do still see warmup issues (I think anyways) on large apps
<enebo[m]>
Actually if someone who worked on indy would write a blog on changes that would be very enjoyable
<headius[m]>
well I think this has improved somewhat since they fixed how the metrics for inlining depth and size are calculated across indy/MH calls
<headius[m]>
it's also unclear how much this failed inlining is affecting our stuff... it's certainly slow, but it may actually be producing a lot more code since none of the indy stuff inlines either
<headius[m]>
so if we figure out how to fix that and more indy sites start inlining the warmup and memory effects may reduce
<enebo[m]>
ah yeah I do not know which case that is but I saw your email and is that engineer back from vacation? :)
<headius[m]>
looks like vladimir might have a trick to fix this
<enebo[m]>
as I have said in the past I feel like we need something we can just run occasionally to measure impact of changes on warmup
<enebo[m]>
I think it ultimately should be something close to a Rails app but I have been thinking about that and I am concerned that Rails is too much of a moving target to use it as a long term bench
<enebo[m]>
So something with a lot of disparate callsites which are called in a mix like a traditional website where a few how paths and a lot of occasional ones
<enebo[m]>
s/how/hot
<enebo[m]>
we could vendor lock some version of Rails and it will work for a couple of years or more if we are luck
<headius[m]>
mmm yeah
<headius[m]>
rails is just a mess to try to use for investigating this stuff
<headius[m]>
perhaps some benchmark that jeremyevans uses on roda or sequel? We know it will be designed to be as fast as Rubily possible
<headius[m]>
there's a kernel of typical Ruby patterns that we always want fast
<headius[m]>
Rails has so much wacky stuff outside that kernel it's hard to see through
<enebo[m]>
yeah fast as Rubily possible may not be the code we should examine though
<enebo[m]>
It does not hurt we can execute that well obviously
<headius[m]>
maybe
<enebo[m]>
and perhaps all code will go that way? I don't know
<enebo[m]>
but I do agree with the pattern notion
subbu is now known as subbu|lunch
<enebo[m]>
I am not sure how much of most of Rails codebase actually cares or not
<enebo[m]>
We are missing some big wins with kwargs no doubt already in rails
<headius[m]>
at this point I feel like we have enough known unknowns to keep us busy, like super inlining, zero-alloc kwargs, and so on
<enebo[m]>
yeah
<headius[m]>
but I think those need us both because there's IR work to do
<headius[m]>
I am proceeding with a simple call site for super stuff to get a feel for it now
subbu|lunch is now known as subbu
nirvdrum has joined #jruby
mistergibson has joined #jruby
byteit101[m] has joined #jruby
<headius[m]>
enebo: I'm calling it a day... super caching is a little bit trickier because we need to know nothing changed below the superclass, or the lookup might change
<headius[m]>
I think it should be possible to just verify bottom class, like we do for normal sites, but the lookup returns a cache entry from the superclass
<headius[m]>
so might need a new way to cache super methods
<enebo[m]>
I think with stablization of types being generally common it should be fine
<headius[m]>
yeah
<enebo[m]>
singletons/eigenonsense being the exception perhaps
<headius[m]>
I pushed a PR with some other small optimizations
<headius[m]>
hmmm something small regressed though
<enebo[m]>
ok
ur5us has joined #jruby
mistergibson has quit [Quit: Leaving]
<lopex>
numbers
nirvdrum has quit [Ping timeout: 260 seconds]
_whitelogger has joined #jruby
ur5us has quit [Remote host closed the connection]