jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
_whitelogger has joined #jruby
rusk has joined #jruby
_whitelogger has joined #jruby
shellac has joined #jruby
drbobbeaty has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
drbobbeaty has joined #jruby
<kares[m]> considering whether the JIT max compilation limit should be on by default
<kares[m]> its now 4096 but the limit wasn't used
<headius[m]> Maybe
<headius[m]> It would help if we had some idea of what's really getting hot in a large at
<kares[m]> from a real-world numbers 10.000 compilation cost a 570 allocated meta space (345 used)
<headius[m]> You would assume the stuff we want to compile is almost all at the beginning
<headius[m]> Nice
<kares[m]> so maybe based on that 8096? or let's go higher since memory is cheap? :)
<kares[m]> (that should fill into a 512M meta)
<kares[m]> oh right I should share some more since those numbers are not using the defaults
<kares[m]> but reduced max-size from 2000 -> 1000 and increased threshold 50 -> 100
<kares[m]> using JIT defaults meta-space is at 618M/370M with 11.500 compilations
<headius[m]> Wow that's a lot
<headius[m]> Do we have a count of bytecode size?
<kares[m]> total? well not sure if I am able to distinguish from normal classes ...
<headius[m]> I can't remember if I added that metric to JMX but you can check there
<headius[m]> JVMs have code cache limits, we might as well too
<headius[m]> Plus some soft LRU maybe
<kares[m]> oh yeah, I recall seeing some numbers on JMX
<kares[m]> oh okay, so the above numbers only contained methods (did not include blocks)
<kares[m]> here's some new totals from the compiler mbean:
<headius[m]> So meatspace is like 5x bytecode size
<headius[m]> 164, wot
<headius[m]> That's probably impractical
<kares[m]> IRLargestSize 'only' 1263 Avg: 37
<kares[m]> so reducing max.size might make sense
<headius[m]> Double check my logic that those numbers are bytecode and not IR size
<kares[m]> its 2000 now
<headius[m]> Oh right
<headius[m]> Ok
<kares[m]> yeah bytecode
<headius[m]> Yeah 1000 max seems better
<kares[m]> we have that on the other machine let's see
<headius[m]> We need to do another pass over the jit to make sure it's as efficient as possible
<headius[m]> There may be more things we can push to utility methods
<kares[m]> machine running with max.size = 1000 generates 10% less code (but it also has threashold = 100)
<kares[m]> yeah largest still seems big - so maybe another pass makes sense or re-instantiating a byte-code max check
<kares[m]> yet, avg is much lower so not sure - we need a median ;)
<kares[m]> * yet, avg is much lower so not sure - we need median 😺
<headius[m]> Haha
<headius[m]> More metrics!
<kares[m]> obviously threshold 100 does not seem to have a negative effect either ... but than this might be a supernova app compared to others ....
<headius[m]> Yeah, having a hard static threshold means it will always grow to the same size eventually
<headius[m]> It just grows slower
shellac has quit [Quit: Computer has gone to sleep.]
<kares[m]> exactly
shellac has joined #jruby
<kares[m]> somehow pleased no 'enormous' Rails/gems method hit JIT
<kares[m]> .... the 2000 instruction limit pretty much filtered nothing
<headius[m]> Yeah those biggest ones are outliers
<headius[m]> Generated parser code etc
<kares[m]> Ruby libraries must be getting better - cause there used to be some crazy stuff back the days 😇
<headius[m]> Giant case/when (which we need to jit more efficiently anyway
<kares[m]> that's what I had in mind - giant generated case .rb methods but I am not sure what gem that was
<headius[m]> Taking off for Bangkok, bbiab
lucasb has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
<rdubya[m]> my tests don't seem to want to run when I'm trying to test core, this is what I'm running and the output and then it just sits
<rdubya[m]> been going for over half an hour now
<enebo[m]> rdubya: that sounds like too long unless you are on a raspi
<rdubya[m]> lol
<rdubya[m]> i just tried running it with --verbose on the end and it isn't giving me any more info, is there any other way I can see what it might be hung up on?
<rdubya[m]> it takes a minute or two before the warnings show up too
<rdubya[m]> so i'm not sure if it ever even gets into rspec
<headius[m]> jstack
<rdubya[m]> let me throw the results of that into a gist, it looks like everything is hung up waiting on a lock
<rdubya[m]> are these typical? `Fiber thread for block at: uri:classloader:/jruby/kernel/enumerator.rb:62" #27 daemon prio=5 os_prio=31 tid=0x00007fde2018f000 nid=0x9c03 waiting on condition [0x00007000079f0000]`
<headius[m]> Those might be back in the pool?
<headius[m]> I'm on mobile
<rdubya[m]> looks like its hung up on spec/ruby/library/socket/udpsocket/send_spec.rb:7
<headius[m]> Stupid fibers
<rdubya[m]> yeah there are a bunch of threads that are blocked that are "Fiber thread for block at:" with a bunch of different specs attached to them
<rdubya[m]> i'll try switching to a previous commit instead of running against master
<headius[m]> I'm not sure if I'm clearing that thread name properly when they go back in the pool. You could try to fix that so we can see the ones actually in code
<rdubya[m]> I manage to get it to finish by commenting out the UDPSocket#send specs
<headius[m]> Oh network oddity on your end maybe?
xardion has quit [Remote host closed the connection]
shellac has quit [Ping timeout: 250 seconds]
xardion has joined #jruby
<rdubya[m]> sorry got sidetracked, I reverted back to the 9.2.8.0 tag and the problem doesn't happen, but there are a bunch of other failures
<rdubya[m]> things like:
<rdubya[m]> and
<rdubya[m]> ```
<rdubya[m]> the good news is, making RubySymbol return a cached frozen string doesn't break any of the fast tests
<rdubya[m]> is 9.2.8 supposed to be compatible with mri 2.4 or 2.5?
<rdubya[m]> lol, on the other hand, our app's specs won't even start to run with frozen strings ☹️
<rdubya[m]> which is basically `:name.to_s.chomp!('=')`
<rdubya[m]> so I'll create the PR but making symbols return frozen strings won't work for rails apps without some work
<enebo[m]> rdubya: 2.5.3
<rdubya[m]> cool, for some reason I thought 2.5 support was being skipped
<enebo[m]> rdubya: 2.6 is being skipped
<rdubya[m]> ah ok
<enebo[m]> ok I found an interesting idea for our counters looking at a paper on self recompilation
<enebo[m]> Their implementation seems to be global optimizer in that every n seconds they seem to evaluate methods to compile (although I am guessing this based on the text)
<enebo[m]> but what they do is they divide the call counters by a factor to give it a half life
<enebo[m]> So every period of time t it will divide the individual callsite counter by some value (like 1.2) and see if it passes the counter threshold
<enebo[m]> our method implementations do this callsite checking per call so the mechanism would have to be adapted.
<enebo[m]> self == the language
<enebo[m]> rdubya: so time as a parameter is a function of this design
<rdubya[m]> sounds like it would be worth exploring
<enebo[m]> yeah I am just trying to think about how we would adapt this idea
<enebo[m]> If we stored a nano time when we create the method (or first call) then looked again once we hit the count how would we age that count?
<enebo[m]> subtract the two values and then use a dividing constant and kick it back out since we would only be at the threshold that that point (then resetting the time counter)
<enebo[m]> The second time it hits the threshold if it was hot then presumably that factor would be less
<enebo[m]> Just talking through this I feel like the counter for checking the JIT wuold be one value and the actual value for performing it would be a second valud which was less than the first one
<enebo[m]> substract the two values == time(start) & time(threshold1)
<enebo[m]> those two subtracted would give some aging divider(multiplier) which would give a new counter value
<enebo[m]> That new counter value would compare to threshold2 which is < threshold1
<enebo[m]> If it is greater than 2 it JITs otherwise the new counter value is written and the timestamp is replaced
<enebo[m]> The additional cost of this new heuristic would be one nanotime call per firs JIT threshold and an extra field for it + the simple math of reducing the counter when that first threshold is hit. Seems pretty low
<enebo[m]> rdubya: I think you are only one reading atm based on riot web UI and you may not be fully up to date on how we JIT
<enebo[m]> past just using a counter
<rdubya[m]> yeah i'm trying to follow along but I don't know much about those internals, my JIT experience is mostly high level reading on the java JIT
<enebo[m]> here I will show you a class and explain this since I think you may be interested
<rdubya[m]> would that be similar to what I mentioned the other day, where the JIT could keep a "last ran" timestamp and then clear out the counters at the end of each run, that might be way too much overhead lol
<rdubya[m]> i guess that is probably too short of a window
<enebo[m]> well it is similar to be sure
<enebo[m]> This heuristic is about half life of counter values
<enebo[m]> so if you encounter it reasonably quickly it may deside to just reduce the counter by 20% or 50% vs wipe them
<rdubya[m]> ah ok
<rdubya[m]> that makes sense
<enebo[m]> wiping them would be half life (bad word) of 100%
<enebo[m]> So a good question is why did they decay the counter vs nuking it
<enebo[m]> nuking is much simpler
<rdubya[m]> decaying would probably catch ones that are used sporadically more effectively
<enebo[m]> I also have had problems with timing because a good time on one piece of hardware feels like a poor one on another
<enebo[m]> decaying could also allow for better tuning
<enebo[m]> well for the sporadic aspect of something which is being called occasionally but is sometimes hot
<enebo[m]> Although if it is really hot then nuking should still be enough to put it over the edge
subbu is now known as subbu|lunch
<enebo[m]> ok I think I can see a reason for their decay
<enebo[m]> if they run every 4 seconds and then can evaluate all candidate sites to compile the actual counter may potentially take longer than 4 seconds (let's pretend 30s)
<rdubya[m]> decaying probably also lets you still compile the methods eventually, but not immediately after they've hit the limit, it kind of forces them to prove that they are used frequently enough lol
<enebo[m]> So the decaying would happen more often than the promotion to compiled versions
<enebo[m]> yeah for sure that must be the intent
<enebo[m]> You are being called enough to be interesting but not neccesarily enough to be reasonably compiled
<enebo[m]> Using nuking would just get rid of cold compiles
<enebo[m]> cold compiles is our main issue though
<enebo[m]> Another main difference in their system and ours is we look for threshold at each method as it is called so we cannot grow a counter size greater than threshold
<enebo[m]> wonders if subbu has much experience in how they implemented this
<enebo[m]> or remembers :)
<rdubya[m]> maybe you could use a decaying decay to address the cold compiles? (if I'm understanding that right) so when the server starts it decays 90% then after a minute it drops to 80%, etc
<rdubya[m]> or maybe the reverse
<enebo[m]> well I am thinking more about your nuking counter idea more atm since it is less details (a subset)
<enebo[m]> what I don't like in either is specifying an explicit time value
<rdubya[m]> yeah, would be good if that were tunable with a decent default
<enebo[m]> but I don't know how you would possibly generate one on the fly...a calibration of some sort maybe but a first start would be to just have a settable value
<enebo[m]> If we can self-tune good behavior from it and not reduce perf but reduce number of native methods it would be a good first step
<enebo[m]> I was talking to my wife about this at lunch and she suggested another thing we have not really discussed very much which would be aging out compiled methods back to interp
<rdubya[m]> that could help with the memory
<rdubya[m]> would that cause a cycle of recompiling if there wasn't enough memory?
<enebo[m]> I look at memory as the main issue and believe that most of our native compiles are not actually useful for performance
<enebo[m]> I believe the heuristics cannot be a simple counter even with aging out methods as you say it will just compile again later unless we mark it as 'you're dead to me"
<enebo[m]> I don't think doing marking like that is an end-goal solution since some methods do become important in different life-cycles of a program (or at least theoretically they could)
<enebo[m]> but aging out methods is good for killing off early hot methods that are no longer needed
<enebo[m]> so I think that is a good idea as a companion to the topic
<rdubya[m]> yeah, makes sense
<rdubya[m]> would it be possible to cap it by memory, i.e. before we jit something we check to see how much metaspace is left and then either evict something or skip doing the compile?
<enebo[m]> metaspace usage for me is our largest contemporary problem. The fact that you happen to have a larger app is a good opportunity for us (and I guess for you too :) )
<enebo[m]> I don't know. I think we can look up metaspace although those stats are lies
<rdubya[m]> we have 2 different reasons for wanting the cap too 🙂
<enebo[m]> or I think they are heavily underreported because malloc is allocing much more than is really being used....maybe the JVM is reporting that right?
<rdubya[m]> for our clusters we want to not have it grow indefinetly
<enebo[m]> well it won't technically :P
<rdubya[m]> for our containers we are trying to make the containers handle "x" traffic and if we go over that we spin up another container
<enebo[m]> It will just grow for so long it will seem like it is growing forever
<enebo[m]> unless you are generating source code as time moves on forever
<rdubya[m]> lol
<enebo[m]> but from a practical standpoint you are right
<enebo[m]> It is an unimportant distinction
<enebo[m]> rdubya: I do believe we have same exact goal. I want process memory to be as favorable as possible against MRI and that cannot include some endless tail of infrequent compiles
<enebo[m]> or "appearing to be endless"
subbu|lunch is now known as subbu
<enebo[m]> kares: rdubya This has a tunable -Xjit.time.delta=some_nanosecond_delta which I am hoping you can play with. The default value is I think too small but I am curious to see if it kills the method growth altogether
<kares[m]> okay - thanks. looks reasonable but we haven't yet setup using snapshots
<kares[m]> in the mean time I pushed jit.max with some cleanups as e.g. excludes weren't excluding blocks
<rdubya[m]> cool
<enebo[m]> kares: yeah if this strat works out I don't think a .max will be needed either
<kares[m]> enebo: but did you really mean just 0.1millis delta?
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<kares[m]> and if a method takes longer then a delta (say 100ms) than it will never JIT
<kares[m]> which should be good in theory - yeah need to give it a try ...
<kares[m]> this might end up more challenging than I hoped if I end up comparing jit logs 😄
travis-ci has joined #jruby
travis-ci has left #jruby [#jruby]
<travis-ci> jruby/jruby (master:b2f2018 by kares): The build was broken. https://travis-ci.org/jruby/jruby/builds/583341525 [208 min 12 sec]
travis-ci has joined #jruby
<travis-ci> jruby/jruby (counter_nuke:24632b3 by Thomas E. Enebo): The build failed. https://travis-ci.org/jruby/jruby/builds/583343121 [192 min 57 sec]
travis-ci has left #jruby [#jruby]
<enebo[m]> kares: that was the first value I tried and startup stuff JITted much less but did not see any real change
<enebo[m]> for perf stuff we may end raising this quite a bit but we are mostly looking to cull the cold methods which are not called much
<enebo[m]> in doing some reading I think larger methods with loops should get more consideration but that will be a later experiment (plus we have no OSR so it may not end up being as useful)
jrafanie has joined #jruby
lucasb has quit [Quit: Connection closed for inactivity]
jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]