_whitelogger has joined #jruby
bga57 has quit [Remote host closed the connection]
lucasb has quit [Quit: Connection closed for inactivity]
bga57 has joined #jruby
claudiuinberlin has joined #jruby
shellac has joined #jruby
shellac has quit [Ping timeout: 246 seconds]
shellac has joined #jruby
<JulesIvanicGitte> Hey
<JulesIvanicGitte> can you spot when we put Jruby in prod and we removed it:
<JulesIvanicGitte> :/
<JulesIvanicGitte> (ok its a bit unfait because, it’s mostly the warmup. We removed it from prod because we found a bug in our app, not because of JRuby)
<headius[m]> Yeah, after warm up it doesn't look too far off
<headius[m]> Definitely not where I'd like to see it
<headius[m]> I'm going to be experimenting the next couple days with some jvm flags to shorten that warm-up curve, but the jvm gets in our way a little bit there
<headius[m]> If you've got time, I still think the first thing we should do is focus on single-threaded throughput and try to get back to a comfortable level
<JulesIvanicGitte> Do you think that tuning the JRuby JIT can improve the first requests response time ? Something like that: `-Xjit.threshold=0`
<JulesIvanicGitte> (edited) ... that: `-Xjit.threshold=0` => ... that: `-Xjit.threshold=0` ?
<enebo[m]> @Ju
<enebo[m]> Jules Ivanic (Gitter): That will force every method in the system to JIT. Even though JIT compiles happen off thread once finished all that new code will need to compile and warmup in the JVM. That should dramatically slow down warmup time
<JulesIvanicGitte> can something like this `BUNDLE_DISABLE_EXEC_LOAD=true` affect JRuby performances ?
<JulesIvanicGitte> I had to add that to my Dockerfile because of https://github.com/puma/puma/issues/1572#issuecomment-411288015
Aethenelle has quit [Quit: Aethenelle]
shellac has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
<CharlesOliverNut> hey I'm back now
<CharlesOliverNut> I'm thinking low JRuby JIT threshold partially, but also turning off some startup-time features of JVM and reducing compile thresholds
<CharlesOliverNut> -XX:-TieredCompilation -XX:Tier4CompileThreshold=15000 (that's default, lower might kick it off sooner)
<CharlesOliverNut> unsure if this will help or not...may impact peak perf, definitely will impact startup time, but may warm up more quickly
<CharlesOliverNut> where does timers.after come from?
<CharlesOliverNut> wow, timers has no locks at all
<CharlesOliverNut> oops
<CharlesOliverNut> wrong channel
claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<JulesIvanicGitte> > -XX:-TieredCompilation -XX:Tier4CompileThreshold=15000 (that's default, lower might kick it off sooner)
<JulesIvanicGitte> Does it work with G1
<JulesIvanicGitte> (edited) ... G1 => ... G1 ??
<JulesIvanicGitte> > where does timers.after come from?
<JulesIvanicGitte> What are you talking about ? I don’ understand 🤔
<headius[m]> G1 shouldn't have any effect on tiered compilation
<headius[m]> timers lines were wrong channel
xardion has quit [Remote host closed the connection]
xardion has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
Aethenelle has joined #jruby
travis-ci has joined #jruby
<travis-ci> jruby/jruby (master:fd40550 by Charles Oliver Nutter): The build has errored. https://travis-ci.org/jruby/jruby/builds/524546052 [313 min 36 sec]
travis-ci has left #jruby [#jruby]
_whitelogger has joined #jruby
havenwood has quit [Quit: ZNC 1.7.2 - https://znc.in]
kiwi_55 has joined #jruby
<kiwi_55> hi
<kiwi_55> i need help
<kiwi_55> how to use .wav file in jruby
<kiwi_55> please helo
<kiwi_55> help
travis-ci has joined #jruby
travis-ci has left #jruby [#jruby]
<travis-ci> jruby/jruby (master:cba96aa by Marcin Mielzynski): The build has errored. https://travis-ci.org/jruby/jruby/builds/524596947 [224 min 16 sec]
kiwi_55 has quit [Remote host closed the connection]
<headius[m]> ok stupid build I'm comin for you
<headius[m]> maybe it's time to start moving some of these to other services
subbu is now known as subbu|lunch
<headius[m]> I have no idea why this keeps hanging in the encoding/case folding tests
<headius[m]> I think it may be some other test that's causing stdout to shut down
<lopex> is there a thread dump as it hangs ?
<lopex> those case folding table loads are ont synchronized, so there might be a race that make something stuck ?
<headius[m]> well it's on Travis so I can't tell
<headius[m]> it just says that output stopped, so it could be a hang or it could just have stopped printing
<lopex> intermittent ?
<headius[m]> yeah but more failing than not
<headius[m]> at first it seemed usual flakiness but it's so frequently happening in these tests
<lopex> there's static final int[] Values = ArrayReader.readIntArray("CaseMappingSpecials");
<lopex> inside a singleton holder
<lopex> it gets loaded in the middle of caseMap(...) for unicode
<lopex> but all jcodings tables are loaded this way
havenwood has joined #jruby
havenwood has joined #jruby
havenwood has quit [Changing host]
<headius[m]> hmm
<headius[m]> yeah
<lopex> and the other tables for folding too
<lopex> all static final
<lopex> with no interdeps I hope
<lopex> headius[m]: since it's all static finals is there a chance two jruby rutimes could race ?
<headius[m]> look at that job output too
<lopex> [exec] TestCaseOptions#test ?
<headius[m]> the output stops in the middle of the name of a test method
<lopex> this one ?
<headius[m]> that's why I'm leaning toward it being something shutting down stdio
<headius[m]> yeah
<lopex> hmm, there's some jiggling with symbols for that too
<headius[m]> MRI does indeed just ignore bad encodings
<headius[m]> need to figure out the right place for us to do that
<lopex> shouldnt be that a problem right ?
<headius[m]> at least ignores them for system encoding
<lopex> headius[m]: is it a random test or always this test ?
<headius[m]> that's what doesn't make sense...I've seen it happening consistently around here
<headius[m]> there should be more in build history
<lopex> the in/out steal would show on different places right ?
<headius[m]> failed in a shift-jis test
<headius[m]> hung
<headius[m]> or near it anyway
<lopex> maybe they should TD for java on timeouts ?
<headius[m]> TestCaseOptions#test: https://travis-ci.org/jruby/jruby/jobs/524596962
<headius[m]> died printing out time for TestCaseMappingPreliminary#test_ascii_option: https://travis-ci.org/jruby/jruby/jobs/521592229
<headius[m]> TestCaseOptions#test output again: https://travis-ci.org/jruby/jruby/jobs/520753015
<headius[m]> I mean it almost seems random but it's so frequently around the same
subbu|lunch is now known as subbu
<headius[m]> I'm going to try a few runs locally, but I'm pretty sure I've never seen it happen here
<headius[m]> and another TestCaseOptions#test: https://travis-ci.org/jruby/jruby/jobs/520752645
<headius[m]> 🤷
<lopex> well, if there's a hang somewhere based on real time, similar tests might show up ?
<headius[m]> sure
<lopex> but we cannot rule out any races
<lopex> infinit loops might occur then
<headius[m]> problem is we also cannot rule out Travis, but it seems too consistent
<headius[m]> hmmm
<headius[m]> where?
<lopex> hmm, like loading twice some arrays, and based on data in them...
<lopex> er, actually not, those would be replaced ?
<lopex> headius[m]: do we know how many runtimes are involved in this test ?
<headius[m]> well there certainly could be some kind of race but I'd expect it to happen at boot, not in some test 15 minute after boot
<headius[m]> nor would I expect it to cause output to cut out mid-puts
<lopex> so cpu time could be a measure of where it hangs ?
<headius[m]> test file for that one that keeps coming up is test_case_options.text
<lopex> for that slice
<headius[m]> er .rb
<lopex> io buffers allocation etc ?
<lopex> headius[m]: which suite has the longest output ?
<headius[m]> maybe
<headius[m]> this is one of them
<headius[m]> test:mri:stdlib
<headius[m]> well this is telling
<headius[m]> the successful time for that suite is around 7min
<lopex> maybe try to redirect is somewhere else for a try ?
<lopex> *it
<headius[m]> the hung times are around 16min
<headius[m]> so it's actually hanging, not just killing stdout
<lopex> slice time overrun ?
<headius[m]> so like we're getting paged out or something?
<lopex> just guessing :P
<headius[m]> yeah that's all I've got to too 😆
<lopex> or system wise, virtualization ?
<lopex> I'm sure ppl run longer tests though
<lopex> with more threads
<headius[m]> we do too
<headius[m]> I was wrong about this being one of the longest
<headius[m]> most recent green
<lopex> so io seems to be a suspect...
<headius[m]> test:mri:core is still the longest set
<lopex> I mean std(err/out)
<headius[m]> when it appears to hang it does take significantly longer than when it succeeds
<headius[m]> it's not just going quiet
<lopex> how do those ruby testing frameworks chunk the out ?
<lopex> ah
<headius[m]> that's a good question, I'm not sure
<headius[m]> this is also going through multiple levels because it's run via mvn
<headius[m]> mvn -> rake -> test/mri/runner.rb
<headius[m]> which is test/unit based
<headius[m]> so yeah some buffer in there not getting flushed could explain the cropped output
<lopex> or a race on it
<headius[m]> yeah
<headius[m]> also possible
<headius[m]> huh
<headius[m]> that case options test is one of the last ones in the suite
<headius[m]> this is a green run
<headius[m]> and it's followed by Queue tests 🤔
<lopex> hah
<lopex> and test_thr_kill
<lopex> also sounds bad
<headius[m]> that seems like a good suspect
<headius[m]> yeah
<lopex> so yeah, the formed are just not flushed
<headius[m]> gonna have a look at these
<lopex> *former
<headius[m]> oh yeah these could have some races
<headius[m]> running it in a loop
subbu is now known as subbu|busy
travis-ci has joined #jruby
<travis-ci> jruby/jruby (master:1654de7 by Charles Oliver Nutter): The build has errored. https://travis-ci.org/jruby/jruby/builds/524623150 [217 min 45 sec]
travis-ci has left #jruby [#jruby]
<headius[m]> running that test in a loop for a while here, maybe I'll get lucky
<headius[m]> I think I'm going to remove test_queue from the build for now
<headius[m]> there's still specs and the MRI tests are highly suspect to me since they might never see a race
<headius[m]> still hung
<headius[m]> I pushed a PR that tries to run these rake targets directly so we can maybe see where it's actually hanging
<lopex> does jvm have some thread dump on some hang, like that oom thing ?
<lopex> or, would it make sense
<headius[m]> hmm, not that I know of
<lopex> or, heap dump actually
<headius[m]> I wish we could tell travis to send SIGQUIT before terminating it
<lopex> btw, someone did a good job of summarizing https://www.jsparrow.info/home/an-overview-on-jdk-vendors
<lopex> cant explaint it better than in a table in this case
<lopex> unless there's errors :P
travis-ci has joined #jruby
<travis-ci> jruby/jruby (master:54b875d by Charles Oliver Nutter): The build has errored. https://travis-ci.org/jruby/jruby/builds/524640907 [215 min 15 sec]
travis-ci has left #jruby [#jruby]
<lopex> headius[m]: wrt that encoding length, I'm astonished how small is the surface area for that issue (given mri prevalidates strings in it's own semantics)
<lopex> it's like 3 issues for us and not having that approximate length
<lopex> something stinks in mri then
<headius[m]> that ArrayIndexOOB thing?
<lopex> and infinite loops
<lopex> basically in MRI it all goes like this
<lopex> io/string literals/regexp literals/ are all validated
<lopex> so there's little chance broken strings get into the guts
<lopex> but yeah, I wonter for how many cases you can craft https://github.com/jruby/joni/issues/38
<lopex> *wonder
<lopex> it's almost like we could fuzz all the string api with [0xA4].pack("C")
<headius[m]> So MRI has issues with some of these cases too
<lopex> and we know that's not the case, since it would have come up years ago
<lopex> no
<headius[m]> ?
<lopex> onigmo just uses that approx length for most of it's length calls
<headius[m]> heh that's fun
<lopex> like I said, approx length gives 1 for broken char
<lopex> so it's always good for parsing and traversing
<lopex> and yet, we have like 3 issues over the years
<lopex> two in joni as a non jruby use
<lopex> headius[m]: what I'm about is, to be efficient here we need those safe/unsafe encpding versions
<lopex> *encodings
<lopex> since you can get away with unsafe one for almost all cases
<lopex> or, to put it on other words, how wasteful mri is
<lopex> we also discussed it with nirvdrum
<lopex> headius[m]: it's like having a broken string to be a deopt in a way
<headius[m]> hmmm ok
<lopex> ok, that didnt sound as being convinced
<headius[m]> So we need this safe/unsafe stuff after all?
<headius[m]> I'mn not sure I understand entirely
<lopex> we need approx
<headius[m]> can you show me?
<lopex> a case ?
<headius[m]> sure, or a pointer to the code we are missing
<lopex> were a user injects broken char into the guts of matching
<lopex> it's a broken char so our length will give -1
<lopex> mri makes this more complex thant should be because there's lots of historical versions of encoding length
<headius[m]> ok
<lopex> onigenc_mbclen_approximate and that return 1
<headius[m]> So where is the logic in MRI that deals with this
<lopex> no, that's a place where onigmo uses it
<lopex> and yet, in 99.9999...% cases we dont need that, since MRI prevalidates the input that will be fed to onigmo
<lopex> so, basically we have two choices
<headius[m]> ok
<headius[m]> I'm with you so far
<headius[m]> I guess I'm not clear why this doesn't get kicked out when it has a bad leading byte
<lopex> mirror mri, and waste perf in that additional length logic that will almost never fail (even then that function is used mostly for parsing afaik)
<headius[m]> for example
<headius[m]> ah maybe that answers my question
<lopex> oh I forgot
<lopex> for [0xA4].pack("C")
<lopex> the cr is unknown I suppose
<lopex> and then String#grapheme_clusters traverses that
<lopex> for me it's an edge case where nonvalidated broken input is fed right into joni
<lopex> and, as a second option, we could pass have safe encoding version
<lopex> er s/have/here/
<headius[m]> ahh
<headius[m]> that does extra validation as it walks characters
<lopex> and returns 1 for that byte
<headius[m]> we could pass both and redo the match with safe encoding if it blows up
<headius[m]> providing a proper error then
<headius[m]> hamfisted approach maybe
<lopex> it's so centralized, so I guess we could hardcore enc.getSaveVersion() etc
<headius[m]> how much overhead are we talking?
<headius[m]> ah sure, that would be cleaner
<lopex> that onigenc_mbclen_approximate in the wiki
<headius[m]> oh ok
<lopex> I would add a length version to Encoding interface though
<lopex> so we have out old length and preciseLength
<lopex> but encoding instance would determine if it's that approx version
<lopex> headius[m]: I included descriptions for those usages in mri in wiki
<headius[m]> yeah I'm parsing it
<headius[m]> so the -1 case from precise_mbc_enc_len gets converted to 1 by onigenc_mbclen_approximate
<headius[m]> where we just use the -1 and then blow up
<lopex> basically, for this case
<lopex> but it;s all mess
<enebo[m]> lopex: are you saying above that there are a limited set of known paths where a string is not guaranteed valid and those must potentially use approximate length?
<lopex> enebo[m]: yeah
<lopex> good way to put it
<enebo[m]> lopex: how about the parser?
<lopex> enebo[m]: actually 2 not
<lopex> *now
<headius[m]> so if it can't figure out mbc length it needs to just assume length 1 to advance safely
<lopex> enebo[m]: hah, which parser :P
<headius[m]> and this is also how we end up with infinite loops because we use the -1 blindly to increment index and then walk back into the character again
<lopex> enebo[m]: mri uses parser_mbclen
<enebo[m]> lopex: ah sorry I mean ruby parser but I suppose regexp are mostly validated through ruby parser as well
havenwood has quit [Remote host closed the connection]
<lopex> enebo[m]: it used to use preciseLength and have it's own guards
<lopex> in callers
<headius[m]> well it would be worth seeing if the extra checks add enough overhead to worry
<lopex> enebo[m]: it's hard to explain since there's so many versions used inconsistently
<lopex> yeah, we can always change Encoding.length
<headius[m]> so I'm getting that it's the difference between get length, advance character vs get length, validate length, advance char
havenwood has joined #jruby
havenwood has joined #jruby
havenwood has quit [Changing host]
<lopex> but you need to be careful, since it can be by site basis when there's a guard agains <0
<headius[m]> well my gut says we should do what MRI does and we'll measure it
<lopex> I also wanted to get rid of those intermmediate char length tables for utf-8
<enebo[m]> lopex: just one more question: precise return -1 so is guard to see that and then do approximate length so it continues?
<lopex> enebo[m]: in the callers ?
<enebo[m]> wherever we run into the endless loops I guess
<lopex> enebo[m]: I mean the callers might have the guards
<enebo[m]> callers is a bit vague to me but I only partially read the conversation
<enebo[m]> or I read it all but pretty quickly
<enebo[m]> I should have not asked any questions but I wanted to understand your original statement (which you answered already)
<headius[m]> so I think you are saying not all places should normalize bad char length to approx
<headius[m]> which is why we need the separate path
<headius[m]> and that's why you were suggesting we pass safe encoding into those paths that should approx
<lopex> yeah, additionally for unsave paths we could use very simplified length version for validated strings
<lopex> much more efficient than we have now
<lopex> so there's that
<headius[m]> when we know that we're handling it appropriately from those callers
<headius[m]> ok I think we're on the same page
<lopex> we will need to change some call sites anyways though
<headius[m]> sweet my travis changes worked
<headius[m]> rubyspec is hanging in a concurrent autoload spec
<headius[m]> so that needs to be tagged and fixed
<headius[m]> the other hang was clearly in the case folding stuff, so I'm stumped for the moment
<headius[m]> I'll put the queue tests back though
<lopex> enebo[m]: I forgot, but precise gives also missing as (-n -1) right ?
<headius[m]> enebo: I switched all our rake-based targets to run directly rather than via that mvn PHASE stuff
<headius[m]> so they aren't triple buffering output or whatever
<lopex> headius[m]: so that was the hang ?
<headius[m]> it was the last hang
<headius[m]> I'm rerunning a couple times to see
<headius[m]> the output is not cropped now though
<lopex> so that kill thing >
<lopex> ?
<headius[m]> seems unrelated
<headius[m]> didn't fix it anyway
subbu|busy is now known as subbu
<lopex> so that autoload then ?
<headius[m]> rubyspec intermittent hang: https://api.travis-ci.org/v3/job/524660545/log.txt
<headius[m]> looks like it
<headius[m]> work in progress there
<headius[m]> post RailsConf 9.2.8
<headius[m]> the case folding, I dunno
<lopex> and how it differs from the other suites ?
<headius[m]> stdlib only runs in one suite
<headius[m]> ditto for that rubyspec hang
<headius[m]> but both of these have been fairly consistently hanging, like maybe 50% or more
<lopex> wrt races, is it that on our local machines threads are prone to be more coarsened execution wise so we dont see these problem as often ?
<headius[m]> beats me
<headius[m]> I'd expect to see it hang more locally
<lopex> why ?
<headius[m]> I've got 4/8 cores and certainly faster than Travis
<headius[m]> I guess faster could make it less likely
<lopex> but would seem to me they'd be more coarsened
<headius[m]> less chance of pauses causing two threads to race
<lopex> since there's less in your system
<headius[m]> less what?
<lopex> threads
<lopex> well, depends on how that travis thinks works
<headius[m]> yeah another mystery
<lopex> it's probably some hyper, and then os, and then docker ?
<headius[m]> well they moved to Google Compute at some point
<headius[m]> I'm not sure if everythign runs in a docker now
<lopex> or, well, if you see such thing sow many interrupters would you expect
<lopex> or, well, stealers
<lopex> if you're sliced, then more probably ?
<headius[m]> 🤷
<headius[m]> maybe?
<headius[m]> I'm glad these might be real hangs though
<headius[m]> not looking forward to the inevitable migration off Travis
<lopex> to the farm of rpi's :P
<headius[m]> hung in the same place this time in case options test
<headius[m]> two out of three times
<lopex> what if you exclude the two last ones entirely ?
<headius[m]> all the rake jobs have proper names now too 👍
<headius[m]> well that last one completed enough for it to print out
<headius[m]> so it seems like it's hanging either in teardown after it or setup before the next one
<headius[m]> there's no teardown in that test
<lopex> ah, and thr kill treadown would seem problematic too if there's something after
<lopex> ah
<headius[m]> that wasn't included in these hung runs
<headius[m]> I just added that suite back in because it didn't appear to help to remove it
<headius[m]> I mean that test, the queue tests
<lopex> so I'm out of speculations not to mention ideas
<headius[m]> so lopex let's just do what MRI does and see how it looks
<headius[m]> re: bad char stuff
<headius[m]> at least from joni since those hangs have been a real pain
<lopex> headius[m]: for these cases it will work, but it will also need to revisit the callers
<lopex> so there's pain there too
<headius[m]> life is pain
<lopex> I know
<lopex> enebo[m]: oh, that utf8 "alias" jcodings issue turned out to be a regression from switch generation
<lopex> which I'm still not sure of the fix I've done
shellac has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]