ur5us has quit [Ping timeout: 244 seconds]
ur5us has joined #jruby
ur5us has quit [Ping timeout: 244 seconds]
quadz_ has quit [*.net *.split]
quadz_ has joined #jruby
ur5us has joined #jruby
ur5us has quit [Ping timeout: 244 seconds]
nirvdrum has joined #jruby
<headius[m]> chrisseaton: finally got my apple device, where did you the openjdk zero build you're using?
<headius[m]> enebo: I'm looking at IR for def foo(*a); super(*a); end and seeing it splat twice
<headius[m]> I'd like to get super calls inlining but this is a bit confusing
<headius[m]> I think for now I will skip trying to inline super sites that have splats... the signature juggling is a distraction right now
<headius[m]> we may want to look at specializing IR a bit more for the different super argument forms
nirvdrum has quit [Ping timeout: 256 seconds]
<enebo[m]> yeah that is interesting
<headius[m]> class Y; def foo(*a); super(*a); end; end
<enebo[m]> but if a is changed then a splat should happen twice shouldn't it?
<headius[m]> I see the rest arg receive and then two splats before the call
<headius[m]> well, whether it changes or not why are there two?
<enebo[m]> probably because we do not track how a is used
<enebo[m]> but I wonder how MRI does this
<headius[m]> could be
<headius[m]> getting super to inline will be easy but for this argument logic
<enebo[m]> well it could just be conservatively ignored in this form
<enebo[m]> I would like to understand what should happen but this is only so common and other forms woudl still benefit
<headius[m]> yeah I am taking a step back and just looking at non-splatted now, because those are passing args through pretty much like regular calls
<headius[m]> so the remaining logic change is just in looking up and guarding the site
<enebo[m]> and IR did eliminate a class of calls where it does just decompose to a regular call already
<enebo[m]> but as far as improvements go there are plenty of unresolved super
<headius[m]> yeah I'm also only looking at instance super right now
<headius[m]> we have all the logic in all the right places but figuring out the right way to turn that inside out and cache/inline is tricksy
<headius[m]> perhaps I should do this as a prototype in the old style non-indy CallSite first
<headius[m]> then non-indy and interp will cache
<enebo[m]> it is helpful to have non-indy version regardless
<enebo[m]> Or at least I am a fan of the longer term idea of only indy'ing hot stuff once we can crack the profiling nut
<headius[m]> yeah well the simple profiling option is still pretty easy... add a counter at the site and until it reaches N we just use a non-inlining mono cache
<headius[m]> so very quick to bind to a very simple dispatcher, but if it's continually hit we flip to indy
<headius[m]> actually this is almost how it works right now... the end of the PIC is a simple monomorphic cache
<headius[m]> so basically we bind that first and then if we start to see heavily-hit momonorphic targets we start wrapping that monomorphic cache with some PIC laters
<headius[m]> mostly it just flips the current logic on its head... current is "build pic until we see N targets and then failover to mono cache"... new logic would be "use mono cache until we see that it's monomorphic and hot, and then build PIC"
<headius[m]> but we'd always use indy, just with a trivial target at first
<enebo[m]> I was thinking a bit more coarse but sure
<enebo[m]> For me the largest problem in determining hotness of lets say an entire method has eluded me so I just mention it
<enebo[m]> I did make that delta time stuff but it lacks any reasonable calibration
<enebo[m]> we cannot just pick a value and have it work everywhere but I do not know how to determine that value at runtime
<headius[m]> yeah I am just eager to find a way to have always-on indy without impacting warmup and startup and memory profile
<enebo[m]> heh yeah me too
<headius[m]> the benefit of using indy all the time is that by changing call site target the JVM will deopt/reopt for us rather than us trying to do it
<headius[m]> otherwise we need to re-emit code
<enebo[m]> yep just emitting the perfect thing once would obviously be best
<headius[m]> I mean a lot of what we've discussed could be shoved into indy sites, like "here's my frame data, make it lazily if you need to"
<headius[m]> rather than trying to profile for frame, we just create it on demand as part of the calls that need it
<headius[m]> if no call needs it it stays on JVM frame
<headius[m]> but that's beyond this discussion
<enebo[m]> yeah
<headius[m]> we need to get supers inlining and kwargs optimized first I think :-)
<enebo[m]> I remember java dude (name is escaping me but perhaps I shouldn't name him anyways) at JVMLS getting a little nasty at a Scala talk?
<enebo[m]> He was like what are the cost metrics or something like that on how much time a feature would take
<enebo[m]> Like somehow he wanted an accounting like "this feature takes 10 clocks"
<headius[m]> probably Josh Bloch and his "semantic gap" talk
<enebo[m]> but this is somewhat an issue with JVM itself
<enebo[m]> well it was Josh Bloch but not his talk
<enebo[m]> he was attacking someone who gave a talk
<headius[m]> ah sure
<headius[m]> talk might have been David Pollack's talk on "wow look at what scala does unnder the covers"
<enebo[m]> but the semantic gap or sorts is the JVM itself
<headius[m]> which had the opposite effect on most of those watchinng
<enebo[m]> That actually was it I think
<enebo[m]> We do not really have a lot of ability to examine all but small chunks of code and say, "ah it turns out like that"
<enebo[m]> but once you put it into something large it might not turn out like that
<enebo[m]> this is not really special to any runtime or compiler for that matter but it is what makes it difficult to reason with
<enebo[m]> I think qualitatively early invokedynamic suffered from code explosion
<enebo[m]> I am not fully sure how true it is but we do still see warmup issues (I think anyways) on large apps
<enebo[m]> Actually if someone who worked on indy would write a blog on changes that would be very enjoyable
<headius[m]> well I think this has improved somewhat since they fixed how the metrics for inlining depth and size are calculated across indy/MH calls
<headius[m]> it's also unclear how much this failed inlining is affecting our stuff... it's certainly slow, but it may actually be producing a lot more code since none of the indy stuff inlines either
<headius[m]> so if we figure out how to fix that and more indy sites start inlining the warmup and memory effects may reduce
<enebo[m]> ah yeah I do not know which case that is but I saw your email and is that engineer back from vacation? :)
<headius[m]> looks like vladimir might have a trick to fix this
<enebo[m]> as I have said in the past I feel like we need something we can just run occasionally to measure impact of changes on warmup
<enebo[m]> I think it ultimately should be something close to a Rails app but I have been thinking about that and I am concerned that Rails is too much of a moving target to use it as a long term bench
<enebo[m]> So something with a lot of disparate callsites which are called in a mix like a traditional website where a few how paths and a lot of occasional ones
<enebo[m]> s/how/hot
<enebo[m]> we could vendor lock some version of Rails and it will work for a couple of years or more if we are luck
<headius[m]> mmm yeah
<headius[m]> rails is just a mess to try to use for investigating this stuff
<headius[m]> perhaps some benchmark that jeremyevans uses on roda or sequel? We know it will be designed to be as fast as Rubily possible
<headius[m]> there's a kernel of typical Ruby patterns that we always want fast
<headius[m]> Rails has so much wacky stuff outside that kernel it's hard to see through
<enebo[m]> yeah fast as Rubily possible may not be the code we should examine though
<enebo[m]> It does not hurt we can execute that well obviously
<headius[m]> maybe
<enebo[m]> and perhaps all code will go that way? I don't know
<enebo[m]> but I do agree with the pattern notion
subbu is now known as subbu|lunch
<enebo[m]> I am not sure how much of most of Rails codebase actually cares or not
<enebo[m]> We are missing some big wins with kwargs no doubt already in rails
<headius[m]> at this point I feel like we have enough known unknowns to keep us busy, like super inlining, zero-alloc kwargs, and so on
<enebo[m]> yeah
<headius[m]> but I think those need us both because there's IR work to do
<headius[m]> I am proceeding with a simple call site for super stuff to get a feel for it now
subbu|lunch is now known as subbu
nirvdrum has joined #jruby
mistergibson has joined #jruby
byteit101[m] has joined #jruby
<headius[m]> enebo: I'm calling it a day... super caching is a little bit trickier because we need to know nothing changed below the superclass, or the lookup might change
<headius[m]> I think it should be possible to just verify bottom class, like we do for normal sites, but the lookup returns a cache entry from the superclass
<headius[m]> so might need a new way to cache super methods
<enebo[m]> I think with stablization of types being generally common it should be fine
<headius[m]> yeah
<enebo[m]> singletons/eigenonsense being the exception perhaps
<headius[m]> I pushed a PR with some other small optimizations
<headius[m]> hmmm something small regressed though
<enebo[m]> ok
ur5us has joined #jruby
mistergibson has quit [Quit: Leaving]
<lopex> numbers
nirvdrum has quit [Ping timeout: 260 seconds]
_whitelogger has joined #jruby
ur5us has quit [Remote host closed the connection]
ur5us has joined #jruby
neoice has joined #jruby