Antiarc has quit [Quit: ZNC 1.7.4+deb7 - https://znc.in]
Antiarc has joined #jruby
nirvdrum has quit [Ping timeout: 265 seconds]
victori has quit [Ping timeout: 240 seconds]
victori has joined #jruby
_whitelogger has joined #jruby
rusk has joined #jruby
rusk has quit [Write error: Connection reset by peer]
rusk has joined #jruby
drbobbeaty has joined #jruby
KeyJoo has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
KeyJoo has quit [Quit: KeyJoo]
drbobbeaty has joined #jruby
nirvdrum has joined #jruby
lucasb has joined #jruby
victori_ has joined #jruby
victori has quit [Ping timeout: 268 seconds]
<headius[m]> enebo: so about this async-profiler thing
xardion has quit [Remote host closed the connection]
<headius[m]> I did find another sibling project called honest-profiler that only profiles Java code, but it has a lot of dependencies like CMake, JavaFX and stuff...I've submitted some PRs to get it more usable but that's not an option right now
xardion has joined #jruby
<headius[m]> I am not sure how maintained it is, either, and the async-profiler folks just did a talk at OC1, so that seems the best path
<headius[m]> but
<headius[m]> I'm not sure how best to link it into JRuby
<enebo[m]> you mean link async-profilter?
<enebo[m]> heh filter
<headius[m]> yeah
<headius[m]> this is what I have at the moment: https://gist.github.com/headius/e453c221f5060218bd2580b3a9457ca6
<headius[m]> oh I should push the gem repo, one sec
<enebo[m]> It is really too bad agentlib cannot be hooked after the JVM has started
<headius[m]> well I think it can but you still have to pass some flags to allow it
<headius[m]> that may be less onerous than passing the actual agent library, but I have not tried to make it work
<headius[m]> so the gem builds fine...where it starts to feel a little icky to me is copying the .so
<headius[m]> it would be nice if we could just query for installed jruby-async-profiler and add that in at startup...that would be ideal
<enebo[m]> I am thinking we probably do not care what happens up to a certain point for this so it might be a nice option and also potentially be simpler than updating both bash and launcher
victori_ has quit [Quit: ZNC 1.7.5 - https://znc.in]
victori has joined #jruby
travis-ci has joined #jruby
travis-ci has left #jruby [#jruby]
<travis-ci> jruby/jruby (master:3c190fd by Charles Oliver Nutter): The build was fixed. https://travis-ci.org/jruby/jruby/builds/607713395 [216 min 18 sec]
<headius[m]> FYI rdubya enebo I deprecated --client and --server in the launcher
<rdubya[m]> cool
<headius[m]> you'd hope this wouldn't break anything but I'm sure someone will have this output now and complain 🙄
<enebo[m]> that would be my only reservation
<enebo[m]> but other than displaying something I am not sure how we could let anyone know
<headius[m]> yeah
<headius[m]> FWIW we have not recommended this for a long time
<headius[m]> I will audit wiki quick to remove it
<headius[m]> enebo: hmm I guess there's another option... make --client set Tier 1 on hotspot
<headius[m]> that's basically what it was before
<headius[m]> I suppose by the same logic --server could set tier 4, but I dunno, I'd rather just get rid of both
<enebo[m]> --server could noop and --client could Tier 1
<enebo[m]> I don't see a lot of maintenance issue with these existing but then again removing (after deprecating) or noop is fine by me
<enebo[m]> noop is least resistance
<headius[m]> --client could just be --dev too
<headius[m]> that's basically what it was for
<headius[m]> then it would pick up all the other goodies
<headius[m]> actually that seems like a great idea now that I think about it
<headius[m]> it would also pick up OpenJ9 --dev stuff
<headius[m]> and anyone using --client out there now would suddenly get startup improvements back
<enebo[m]> yeah seems fine to me
rusk has quit [Remote host closed the connection]
<headius[m]> I'm going to do that
<headius[m]> still unsure what to do with --server though
<headius[m]> should client warn that it's just an alias for --dev, so we can still deprecate it?
<headius[m]> jruby-async-profiler support: https://github.com/jruby/jruby/pull/5957
<headius[m]> I could probably push the gem any time if y'all want to play with it
<enebo[m]> --server as noop possibly to use it later if we realize there are reasonable setting beyond the default?
lucasb has quit [Quit: Connection closed for inactivity]
<headius[m]> I don't remember why we added it originally...I'm wondering if it was just for parity with --client
subbu is now known as subbu|lunch
<headius[m]> holy crap, 400µs per getCurrentContext?
<headius[m]> I just wrote a stupid benchmark and that seems to be the case
<headius[m]> wait no
<headius[m]> whew ok my math was off...4ns per get but unsure how much is optimizing away
<headius[m]> kares: this is a little noisy but new getCurrentContext is maybe 0.1ns slower than 9.2.8
<headius[m]> could be the branching logic versus your recursion
<kares[m]> ufff
<headius[m]> I'm going to fiddle with it a bit
<headius[m]> 4ns for runtime.threadService.threadLocal.softReference.get seems good but it's wasted cycles when we have to do it
<headius[m]> this is also one thread, I need to test with many threads that are evacuating CPU caches
<headius[m]> getCurrentContext is also probably too big to inline at the smallest thresholds
<headius[m]> I was just workign on this experimental to_a(ThreadContext) PR and wondered if we can reduce getCurrentContext overhead
<headius[m]> hard to believe I can measure things like making these methods static
victori has quit [Remote host closed the connection]
victori has joined #jruby
<headius[m]> oh wow, how gross is this idea: make ThreadService extend ThreadLocal
<headius[m]> one less field to traverse
<headius[m]> could even go as far as making Ruby extend ThreadLocal but I think that's a bit too far 😀
<headius[m]> working on the to_a PR I just realized we still have quite a few paths that traverse getCurrentContext
<headius[m]> yeah this cuts off a good 0.5ns
<headius[m]> I can do better
<lopex> dark matter!
<headius[m]> yeah some kind of dark 😀
<headius[m]> 0.6ns
<headius[m]> holy crap, I crossed some threshold...0.75ns
<headius[m]> gotta be getting pretty damn close to pthread_local overhead or whatever JVM uses
<headius[m]> hah, Java 11 is "way slower"... 4.2ns versus 3.2s
<headius[m]> GraalVM CE is faster, 3.0ns
<rdubya[m]> headius: I know you've been talking a lot about the performance of OpenJ9, is there a reason not to use that over OpenJDK?
<headius[m]> it's not as fast yet
<rdubya[m]> ah ok
<headius[m]> they did some recent work to improve invokedynamic performance that helps a TON, but they're still like 20-25% slower than Hotspot C2
<headius[m]> and they're MUCH slower on this, interestingly
<headius[m]> 7.5ns or so
<rdubya[m]> oh wow
<headius[m]> oh that was J9 Java 11...may be some OpenJDK ThreadLocal impl change that's responsible for part of that, since Hotspot Java 11 also degraded
<headius[m]> J9 Java 8 is around 6ns, so nearly 2x slower than C2 Java 8
<enebo[m]> I am ok with massive deprecations on this. I have always wanted TC to be the main path through all methods down. Ruby down to some methods is ok but it is not future proof.
<enebo[m]> I have occasionally moved private methods to using TC but public is not as appealing do to the deprecations
<enebo[m]> Improving getCurrentContext is never a bad thing but we should slowly move all code to just accepting it as a param
<headius[m]> enebo: so you're saying I ought to restore the other signatures throughout my PR
<enebo[m]> unless we are saying pushing it on the stack everywhere somehow will be more expensive over the long haul
<enebo[m]> headius: well if they can be used by native extensions I fear we may get reports
<headius[m]> it didn't occur to me until I pushed it that code compiled to call RubyArray.to_a() will hard fail at compile or runtime even though there's RubyBasicObject.to_a() still
<headius[m]> stupid statically bound virtual dispatch
<enebo[m]> This is one reason I have wanted to make some embedding API for all the typical operations via probably static API methods
<enebo[m]> I guess that will largely be entry points into the type though
<headius[m]> well if you look at what some other languages do, their "runtime" is our "context"
<enebo[m]> once you have an array you will just want to call to_a(xxx) on it
<headius[m]> so the equivalent would be context.newArray() instead of context.runtime.newArray
<headius[m]> the fact that we have two separate things makes the usability a bit worse
<enebo[m]> yeah arguably we started with Ruby when we should have started with TC but neither of us were on the project at that point
<headius[m]> yeah blame it on Chad
<enebo[m]> well I want static org.jruby.embed.Creators.array(context, ...)
<headius[m]> yeah
<enebo[m]> then how we actually do this is insulated. For all the things it maybe is too far in the sense we will have to inline a static method with acall in it but for native exts it feels like the right path
subbu|lunch is now known as subbu
<enebo[m]> We have talked about making the cext lexicon app too which fits into this where rb_new_ary2 will display out array methods
<headius[m]> here's this getCurrentContext patch: https://gist.github.com/c5c6ccf0b0b31c4eaf0318c159e312e5
<enebo[m]> In this case I am not too worried about direct ThreadService consumers here but we could preserve the original public signature
<enebo[m]> I did not contextualize how much this improved that call
<headius[m]> yeah this is binary compatible and nobody will break from this patch
<headius[m]> the big gain was from making ThreadService actually be the ThreadLocal it aggregated
<enebo[m]> ah yeah I missed the no-arg version
<headius[m]> still there and only called in like two places, for exit and setup
<headius[m]> everyone goes through runtime.getCurrentContext, which does the static version in this patch
<enebo[m]> so we will not be boiling the ocean today so a faster getCurrentContext is great. Seems reaonable to me too unless it somehow pins a runtime in some weird unexpected way
<headius[m]> there's one detail
<headius[m]> service.tearDown can't dereference the threadlocal anymore, because it is the threadlocal
<headius[m]> but I can make runtime.tearDown clear the reference to the ThreadService, which should be mostly the same thing
<headius[m]> probably better/cheaper than tearing down the service anyway
<headius[m]> I'll push this as a PR anyway since we have a bunch of GC/leak tests that should run
<headius[m]> these changes don't feel too hacky do they?
<headius[m]> I mean all runtimes should benefit from removing the extra hope, and the virtual => static change should at least not do any harm and probably help other runtimes too
<headius[m]> not like I'm trying to reorder a loop for C2 or something
<enebo[m]> it is not single responsibliity :) but hey it looks like it will work
<enebo[m]> and visibly other than the static call aspect it appears to be invisible to consumers
<enebo[m]> except for shutdown but that is our code
<enebo[m]> even the static part is technically invisible unless you are in an IDE
<headius[m]> nobody (should be) calling ThreadService.tearDown anyway
<headius[m]> yeah
<enebo[m]> FAMOUS LAST WORDS :P
<enebo[m]> I agree though
victori_ has joined #jruby
<headius[m]> I suppose I could gank the assembly and see if there's anything else, but that would be optimizing to the VM
victori has quit [Read error: Connection reset by peer]
<enebo[m]> OMG 397 references. I had this down in the 200s at one point but I guess a very long time ago
<headius[m]> hah
<headius[m]> I have seen a few places where it's called repeatedly in the same method
<enebo[m]> A lot of uses which came into existence are way deep in call stack in areas where we decided we needed to make a method call.
nirvdrum has quit [Ping timeout: 240 seconds]
<headius[m]> yeah, many cases where we weren't doing a dynamic call and that was wrong
<headius[m]> there's the final ASM I get out of Java 8 C2
<headius[m]> oops that's not it
<enebo[m]> hahaha I had 9.0.0.0 checked out...we shall see how many references since then I guess
<enebo[m]> AHHHH YEAH: 512 references
<headius[m]> fixed
<headius[m]> woot, 2^9
<headius[m]> sweet
<headius[m]> I see no calls along the hot path
<headius[m]> so that's probably a big part of improvement...all the way into the threadlocal map inlines
<headius[m]> that's the hot path return
<headius[m]> basically dug all the way into the threadlocal map entry, and if it's not null we unroll
<enebo[m]> it is amusing to see allour convertTo methods does this
<headius[m]> yeah :-(
<enebo[m]> DOES THIS
<headius[m]> hey, log calls and see how many we actually make
<headius[m]> there's gotta be a few hot path getCurrentContext that are massively called
<enebo[m]> Well another 1000 lines of confusing methods never hurt anyone
<enebo[m]> yeah I mean in obvious places like this we should just deprecate and change to context based one...it ends up being a reverse onion though since you know some amount of convertTo callers will not have context either
<enebo[m]> nonetheless we probably need to be a bit more serious about this if we actually want it to go away
<enebo[m]> All numeric methods end up getting a long or an int will end up callint through it since we only know we have an IRubyObject
<enebo[m]> well I guess I partially take it back with fastmathops
<headius[m]> yeah it propagates out
<headius[m]> but ultimately it should trend toward fewer callers
<enebo[m]> and we definitely have some short-circuits to avoid it
<headius[m]> in my to_a PR I did track down callers and make sure they were passing context
<enebo[m]> which is almost disappointing because it feels a bit inconsistent
<headius[m]> that was a much smaller diff though
<enebo[m]> yeah great
<enebo[m]> just keep original versions with deprecations
<enebo[m]> which I hate to say :)
<headius[m]> I will add back the signatures I removed
<enebo[m]> the odds of a to_a() in particular is probably very high...I did not look at that PR
<headius[m]> well when you think about it most consumers call RubyBasicObject.to_a
<headius[m]> but this method isn't on IRubyObject so there's far fewer cases
<enebo[m]> did you see a measurable speed up from doing it? (not that I care in this case)
<headius[m]> I didn't even check
<enebo[m]> I am just happy to consistently pass TC through :)
<headius[m]> it would probably be hard to see
<headius[m]> I'm not going to dig up sub-ns timings for RubyArray.to_a
<enebo[m]> add those nanos up :)
<enebo[m]> This morning I was going to get to the bottom of where and with what unpack('m') is used in Rails...the day is withering before me
<headius[m]> I'm sure between optimized getCurrentContext and unpack('m') we're poised to take over the Ruby world
<headius[m]> meanwhile we're chewing up 1GB/s allocating fixnums
<lopex> for SPEED
<headius[m]> woot
<headius[m]> I'm filing all the issues today
<enebo[m]> so looking at a single model from a single controller/single model app will STRICT decode 4 strings
<enebo[m]> three reasonably small strings and one about 200 chars
<headius[m]> so small and fast sure
<headius[m]> woot, good thread started on openj9 slack about AOT now
nirvdrum has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]