<headius[m]> works on a regular JDK?
<chrisseaton[m]> It's a Ruby app
<chrisseaton[m]> Oh sorry I see what you mean - ah yes it takes Graal dumps
<headius[m]> GraalVM is much slower on everything I test but it would be good to figure out why
<chrisseaton[m]> I don't know if C2 can still generate the same BGV files or if it's historical?
<headius[m]> BGV?
<chrisseaton[m]> That's the file format IGV accepts
<headius[m]> ah, well I believe it still can
<chrisseaton[m]> I don't see any code for it in their repo, apart from the vendored Graal sources
<headius[m]> it would be strange if they removed it... I have hooked IGV up to Hotspot many times
<headius[m]> looks like it is in debug builds
ur5us_ has joined #jruby
<headius[m]> would be good to try on graal anyway and figure out why everything seems slower
<headius[m]> ok now that is more like it
<headius[m]> so with no arity-splitting of dig...
<headius[m]> normal:
<headius[m]> indy
<headius[m]> I have posted to mlvm-dev... responding to my email there that brought up this problem five years ago
<headius[m]> chrisseaton: FWIW I modified your bench to the "manual loop" form so the eval'ed loop harness is not skewing results so badly
<chrisseaton[m]> What does 'no arity-splitting' mean?
<headius[m]> manually providing unboxed digoverloads
<headius[m]> dig overloads
<headius[m]> so this is still using arg[] path but much much faster
<headius[m]> there was no perf issue in dig, really
<headius[m]> just bad varargs logic in method handles
<headius[m]> untold perf lost due to this but should be greatly improved across the board now
<chrisseaton[m]> Is this the thing we reported a year or so ago?
<chrisseaton[m]> Ah I see your email
<headius[m]> no that has been fixed in JDK and there was a modest workaround (make sure all types in target method signature get resolved by method's classloader first)
<headius[m]> wish we had some resources to set up perf regression testing
<chrisseaton[m]> Is this an identity thing - does the array have to be copied because it has identity so must exist?
<headius[m]> I don't see any good reason for it
<headius[m]> other than maybe they wanted to reuse logic that is untyped (Object, Object[]) for populating the array
<headius[m]> it all inlines but does not eliminate the middle man
<chrisseaton[m]> Ha has Duncan in the email chain!
<headius[m]> yeah and he was using Object[] and didn't see it
<headius[m]> I never dug back into it after that
<headius[m]> if people would just trust me when I say stuff is broken we'd save a lot of time
<chrisseaton[m]> Do you use BIPS with the string version of `report`?
<headius[m]> no, while loop
<headius[m]> or do you mean in general?
<chrisseaton[m]> In general, if you're just whipping up a benchmark?
<headius[m]> depends on what I am measuring
<headius[m]> I don't use BIPS for really small stuff
<headius[m]> for something like this I will use the while loop form
<headius[m]> if it is big I just use it like you did
<headius[m]> this BIPS issue is real though... with TR optimizing the loop and the benchmark together you are getting a very different view of perf
<chrisseaton[m]> Yes they should optimise to literally the same machine code
<headius[m]> all that tells you is that doing the exact same thing N times optimizes to less work than N * something
<headius[m]> it doesn't give you a very good picture of the cost of something in isolation
<chrisseaton[m]> Yeah I don't think there's a great solution to that - we've got some blackholes now actually - I think there's even a node for it... but they're also artificial
<chrisseaton[m]> Generally I try to do real IO for input and real IO for output
<headius[m]> there needs to be some way to insert a boundary between the measurement and the work
<chrisseaton[m]> That does things that a blackhole doesn't even do - like flatten strings
<headius[m]> yeah that helps but it still can see the loop working against the same input unless you get new input per loop
<headius[m]> this is one reason I was opposed to the BIPS change
<headius[m]> not only does it make it useless for comparing an impl like TR with other impls that can't eliminate the loop, it also gives you a bad view of perf
<headius[m]> the version that did not inline was better
<headius[m]> and for small things like this we we won't even compile the loop, you have the added cost of a slowly interpreted loop surrounding the bench body
<chrisseaton[m]> Doesn't bips call the whole method every second?
<chrisseaton[m]> So it doesn't just enter the method and sit there - so you don't need OSR.
<headius[m]> no, the "call_times" method is generated per benchmark and contains the outer loop
<headius[m]> so it is evaluated code called once for every bench body (maybe twice for warmup + bench, but that is still below our JIT threshold)
<chrisseaton[m]> Yes it calls `call_times` with `times` set so it lasts about a second, and then calls it `n` times for your timing value, so `3` or whatever
<chrisseaton[m]> The loop inside `call_times` gets hot, and triggers compilation of `call_times`, which even without OSR, you then re-enter again
<headius[m]> so for five seconds you get five calls... but our threshold is 50
<headius[m]> granted we have meant to make that threshold be based more on the work done but that is how it is now
<chrisseaton[m]> I thought the loop back-jumps would add to the threshold but maybe not
<headius[m]> enebo has played with that but not landed anything
<chrisseaton[m]> Maybe BIPS should start with `times = 1` and increase more slowly to a higher number
<headius[m]> well, perhaps warmup should include some heavy calls with low iters to ensure it actually warms up the harness
<chrisseaton[m]> Yes I mean during warmup, start with `times = 1` for the first second
<headius[m]> which may be the same thing you ... yeah
<headius[m]> but still... I don't want the loop to optimize with the bench block
<headius[m]> I don't think any of us really want that because it taints the measurement of the block
<headius[m]> so it looks like this does jit call_times but in the middle of the bench run
<headius[m]> I see the first dozen or so dots (printed when call_times is called) tick up slowly, and then they speed way up
<chrisseaton[m]> I have an approach I called continuous-adpative-benchmarking... can't find it now - it's sort of designed to be unstable so it doesn't rely on becoming stable
<headius[m]> that is the main bench run with logging niblets added
<chrisseaton[m]> This never picks some `times` and doesn't have separate warmup and measurement - it just runs
<chrisseaton[m]> `times` always varys so it's not a constant
<headius[m]> yeah I think we need to do something about the BIPS numbers because there are a lot of benchmarks out there using it now
<headius[m]> some for really tiny units of work
<headius[m]> and it should probably try to subtract the cost of benchmarking an empty block to normalize across impls that can or cannot inline that
<chrisseaton[m]> Or some sort of weird debug yield operation that pastes in the code statically
<headius[m]> yeah
<chrisseaton[m]> Tricky not to become something unrepresentative the other way though
<headius[m]> it's also going through proc.call which has more overhead than either yield or a simple method call on JRuby and MRI
<headius[m]> been meaning to get that inlined through but yeah
<headius[m]> well good news is that the version of MH varargs logic that takes a collector function also fixes the problem, so I don't have to roll my own
justinmcp has quit [Quit: No Ping reply in 180 seconds.]
justinmcp has joined #jruby
<headius[m]> chrisseaton: thanks for bringing this to our attention... I had either forgotten this was a problem or assumed they had fixed it
<headius[m]> should speed up tons of method calls in JRuby
<headius[m]> there will be a separate PR to improve dig itself, but this addresses the issue you found
ur5us_ has quit [Ping timeout: 258 seconds]
<headius[m]> well I think 10x improvement is good enough for today, but I sure wish those boxed call paths optimized better on Graal JIT
<headius[m]> need to double check they are inlining, but with all three PRs in place they should be
<headius[m]> chrisseaton: additional improvements from dig overloading and another indy call tweak, I don't know how much if any of this will land in 9.2.17.0 but it is in the pipeline
valphilnagel has joined #jruby
drbobbeaty has quit [Ping timeout: 250 seconds]
drbobbeaty has joined #jruby
valphilnagel has quit [Quit: Leaving]
snickers has joined #jruby
<chrisseaton[m]> All looks good! I understand what you mean by arity splitting now.
<enebo[m]> chrisseaton: if you look at some generated Scala stuff they will overload some methods out to 30 params to avoid the arg boxing in a primitive array
<enebo[m]> Or they did anyways. I don't look at Scala very much :)
<headius[m]> enebo: some edge cases I have to look into on those PRs but it would be nice to have at least the collect fix in .17
<enebo[m]> collect fix and you fixed that getAndCache already right?
<headius[m]> that is in a PR still as well
<headius[m]> there are four of themπŸ˜€
<enebo[m]> ok that one seems like a no-brainer because it is obviously wrong
<headius[m]> that is an easy one yeah
<enebo[m]> dig split I don't personally care for .17 but it looks safe so I am indifferent
<enebo[m]> collectAs looks pretty substantial but as you said there are corners to work otu still but if you can be satisified it seems like a pretty small change for a nice win
<headius[m]> yeah the whole set combined helps a lot of different configurations
<headius[m]> I will work on making sure everything is super green today and we can decide
<enebo[m]> ok
<headius[m]> enebo: found my issue on the collect PR... I was not adding 1 for a method_missing binding
<headius[m]> added tests and added JI specs with indy because that would have caught it too
<headius[m]> so that is back to 100% again πŸ˜€
<headius[m]> there is something in 9.2.16.0 that prevents -X+C -Xjit.threshold=0 from running optcarrot btw
<headius[m]> at least that release, maybe earlier
<headius[m]> visibility checking bug it seems
<enebo[m]> but only visibility on something bytecode compiled
<headius[m]> yeah
<headius[m]> so the compiled call is not properly checking visiblity (comes up as private but should be callable)
<enebo[m]> I have been impressed how simple it has been to not have these mismatches since moving to IR
<enebo[m]> Or it feels like we have not had very many
<headius[m]> oh yeah it has been much cleaner that way
<headius[m]> enebo: the hits keep coming
<headius[m]> just ran into this by accident while working on these optimizations
<enebo[m]> hah
<enebo[m]> I suppose we do not need to worry about a java package becoming another object
<headius[m]> interestingly the subpackages gottten from package modules seem fine, as fast as this after
<headius[m]> only affected these five that are special
<enebo[m]> makes sense we probably do not use constants maybe to access
<headius[m]> I am looking to see if the other ones can be cached more but the cost of accessing them in that loop is almost zero already
<headius[m]> ahh right, the package modules add a method on first access so it is just a get then
<headius[m]> so that is already good
<headius[m]> enebo: all those PRs are ready to go
<headius[m]> so just a matter of deciding
subbu is now known as subbu|away
<enebo[m]> ok
<enebo[m]> I just merge getandcache fix we already decided on that
<enebo[m]> dig looks ok but I don't see the benefit of the names dig1 and dig2. Feels like the params point that out
<headius[m]> I had to use a different name because the arity splits have the same signature as dig
<headius[m]> I mean as dig1/dig2
<enebo[m]> oh on RubyObject since things are also RubyObjects
<enebo[m]> ah I see
<headius[m]> yeah
<headius[m]> they are static but signature was same so it rejects it
<headius[m]> could make them package private
<headius[m]> they are just unrolled versions of dig
<headius[m]> well sort of unrolled... they recurse but only once or twice
<enebo[m]> So I don't really see much risk on this one so I guess I don't see any harm here. I have been hoping we can reduce the changeset on 9.2 overall
<headius[m]> heh yeah
<headius[m]> it has very little impact on perf compared to the collect PR
<headius[m]> it could be punted to .18
<enebo[m]> Are these static on RubyObject due to history or visibility?
<headius[m]> history... the other dig was
<enebo[m]> Or even 9.3
<headius[m]> CRuby basically just has the one function so we put ours on object as static
<enebo[m]> public elsewhere would remove the naming artifact
<headius[m]> true
<enebo[m]> So I am ok if we land this but it is not part of the brief on 9.2.
<enebo[m]> since I saw chrisseaton was recently over on this tab...do you know of a particular use case where dig is used?
<headius[m]> it looked from your bench like you did some auditing
<chrisseaton[m]> Does the benchmark have the comments on how deep we found it being used? Can't remember.
<enebo[m]> Faster is great but it is more motivating if someone says "oh rails does it all the time"
<headius[m]> at 3 you had a comment "easy to find" and at 7 "possible to find"
<headius[m]> does not say where you found these levels though
<chrisseaton[m]> There wasn't some major obvious problem we were fixing. There are tons of them in Shopify's code-base but I don't think it's some inner-loop operation in Rails.
<chrisseaton[m]> It was also partly based on a conversation about pattern matching I think.
<enebo[m]> ah yeah if TR impl pattern matching in Ruby then dig is probably a reasonable way to search
<enebo[m]> although even for us we could use dig...we tend not to use our public exposed methods as much for that stuff because it usually involves argument parsing and stuff like that.
<enebo[m]> This PR does provide essentially the raw helpers though for us
<chrisseaton[m]> Not about directly using dig necessarily, but about recursion and JIT performance cliffs.
<chrisseaton[m]> There's no massive thing behind this - someone found it was slow, we fixed it, that's about it.,
<enebo[m]> ah ok. yeah makes sense to fix stuff when you see it.
<chrisseaton[m]> I Tweeted it to amplify a beginner's contribution.
<enebo[m]> headius: so I guess I am neutral to ambivalent :)
<enebo[m]> chrisseaton: yeah
<headius[m]> on the dig PR?
<enebo[m]> yeah
<enebo[m]> it is not really worth this much so I am fine merging
<enebo[m]> I did not think this had any risk
<headius[m]> ok
<enebo[m]> It really would only have been risky if we both didn't realize we dropped the boxed path
<enebo[m]> or something like that
<headius[m]> ok, so how about collect PR
<enebo[m]> heh
<enebo[m]> I have been looking at both
<headius[m]> I know we wanted a minimal changeset for this one
<headius[m]> but these are nice
<enebo[m]> I am more nervous the more I read but I am also not sure how much more we can do to validate this
<enebo[m]> For something like RG upgrade I was hoping someone would notice an environmental sort of problem
<enebo[m]> It is complicated enough from an env perspective where it made me want some more time in the hope someones env happened to notice it
<enebo[m]> for indy though? Or changing invokers
<headius[m]> kalenp: I don't know if y'all can test out a prerelease easily yet but it would be nice for .17
<enebo[m]> I mean if our test suite does not catch this I feel it is less likely someone will just happen to hit it
<enebo[m]> especially if compile.invokedynamic is enabled
<headius[m]> yeah and workaround is disable it again
<enebo[m]> a large rails app is where I would expect something weird to fall out and I just don't think people roll with that flag in this case
<enebo[m]> like hitting bugs in either of these would be an esoteric enough use of Ruby that nothing we test would hit it
<enebo[m]> Seems like massive codebases are our only chance at that
<enebo[m]> (which is painting stuff in a not completely logical way but I think it is generally true from a will someone catch this problem on a dev branch in a week or two)
<enebo[m]> and also consider I doubt anyone runs any test suites with indy enabled
<enebo[m]> unless it is for perf regressions and they roll with it on in production perhaps?
<headius[m]> I did add spec:ji to indy for one of these PRs because it would have caught a regression
<headius[m]> we have most suites running with and without now
<enebo[m]> yeah I saw that. So that in iteself may end up saving us in the future (as well as what you fixed in the PR)
<enebo[m]> So I am somewhat leery that we will add some risk but I don't know how to feel better about the risk
<enebo[m]> Waiting 2 months on jruby-9.2 won't do that either
<enebo[m]> So it is never or now in my mind
<enebo[m]> now == 9.3 no problems :)
<headius[m]> yeah that is true too
<enebo[m]> carrot and stick applies here maybe
<enebo[m]> If 9.3 has it then it is more motivation to move to 9.3
<enebo[m]> then the never would be we want to do whatever we can to harden 9.2 so we decided to put that risk into 9.3
<headius[m]> heh well that is another angle
<headius[m]> we just won't be able to post about it until 9.3 then πŸ˜€
<enebo[m]> It is not some artificial carrot in my mind either since I do have some concerns something is broken and we will have no idea
<enebo[m]> It is good marketing whenever it is posted
<enebo[m]> and maybe better for 9.3 than the end of 9.2
<enebo[m]> I am willing to introduce the risk knowing it is adding it but I also think if we wait we will put it into something that people expect some risk with
<headius[m]> I guess I don't care where they land
<headius[m]> if collect doesn't land the rest are not really meaningful
<headius[m]> maybe the block thing
<enebo[m]> ok let's just put them on 9.3 for now
<enebo[m]> I would like 9.2 to be done and only address reported issues
<headius[m]> I will rebase and move them to 9.3
<headius[m]> the java package one is pretty simple but not a reported issue
<enebo[m]> I merged it
<enebo[m]> hahah I guess I didn't
<enebo[m]> The only risk I could possibly see here is if we have something else which is constantly putting new shit onto the package instance and by wiping it out we are wallpapering over some leak
<enebo[m]> but that leak only happens for the top-level method names
<headius[m]> it only does it for these four packages
<enebo[m]> seems extremely unlikely to me
<headius[m]> five
<headius[m]> the others are cached as methods by the package module logic
<enebo[m]> as you pointed out every other package is basically getting saved
<enebo[m]> So the other issue is someone is adding to package instance for whatever reason expecting it to go away
<enebo[m]> In that case I feel we are just fixing a bug rather than breaking a feature
<headius[m]> yeah
<enebo[m]> no one reported this but I don't see much risk and you opened it :)
<headius[m]> these might be cached somewhere else but it goes through a complicated process to get it
<headius[m]> so this skips the process
<enebo[m]> I did wonder
<enebo[m]> I mean I guess it is weird that the others work
<enebo[m]> but they are all rooted off of either Java or an existing package
<enebo[m]> so I assume that magic is in those two places
<enebo[m]> and these methods probably just do it a bit different and are lacking going through some code path
<headius[m]> yeah I did not investigate that but started to refactor it
<headius[m]> it could be cleaned up a lot on 9.3
<enebo[m]> I think it is at the end it does bindJavaPackageOrClassMethod
<headius[m]> yeah
<enebo[m]> singleton.addMethod(name.intern(), new JavaAccessor(singleton, packageOrClass, parentPackage, name));
<enebo[m]> almost feels like if we want to unconditionally bind Ruby init could just bind to kernel doing this and then we wouldn't need that code at all
<headius[m]> ahh I think these are like this so if you do not use java we do not initialize any of these packages at boot
<headius[m]> otherwise they could just get the package and bind it like that right away
<enebo[m]> yeah
<enebo[m]> I figured this is an attempt and not loading JI stuff but let's face it we do anyways
<headius[m]> well let's merge that one since there's a lot of accesses of those top level packages
<enebo[m]> yeah that one will be huge
<enebo[m]> for JI users it is sooo common
<enebo[m]> As for caching I cannot think of anything other than the above and neither of those are things we should worry about
<enebo[m]> My only comment is we maybe can share some different code later and not manually cache there
<headius[m]> yeah that is what I started but it was too big for 9.2
<enebo[m]> ah ok
<headius[m]> this is a simple fix for now
<headius[m]> merging!
<enebo[m]> cool. so we should be good for next week
<headius[m]> yeah nothing else new reported
<enebo[m]> I wish I had a cold beer waiting
<headius[m]> hah
<headius[m]> I do but they are all crowlers or imperial stouts
<enebo[m]> TGIF
<headius[m]> YOLO
<enebo[m]> TGIIS
<enebo[m]> Its Imperial Stouts is the end of that one
<headius[m]> I was wondering
<headius[m]> I should have chilled that Darkness 2014 I apparently have never checked in
<headius[m]> the crowlers are pilseners and a sour
<enebo[m]> 50 minutes in the freezer...good to go
<enebo[m]> I have a 2014. We can do a syrup chug
<headius[m]> don't forget to... nevermind
<enebo[m]> lol
<headius[m]> I was wrong, 2015
<headius[m]> the Bat
<enebo[m]> oh well I have that too
<enebo[m]> Wait I should check but I think I might have two of those
<headius[m]> I do which is why I'm cool just hitting it
<headius[m]> ok rebasing is going fine, and then I will just merge to master
<enebo[m]> barrel or no?
<headius[m]> regular
<headius[m]> I have barrel of a few other years but not that one
<enebo[m]> yeah I guess I don't have any of that year
<headius[m]> come on over, you can wear a mask and drink it through a straw
<headius[m]> collector is rebased and green, merging
<headius[m]> I will merge varargs tweak when green and then 9.2 and everything will be all set
drbobbeaty has quit [Read error: Connection reset by peer]
drbobbeaty has joined #jruby
<headius[m]> enebo: everything is merged everywhere
<headius[m]> time to watch Majin Buu wake up
drbobbeaty has quit [Read error: No route to host]
drbobbeaty has joined #jruby