<headius[m]>
with repo removed locally -Pbootstrap does fail
<headius[m]>
same as on travis
<headius[m]>
marshal data too short... I think it's trying to access a pom file like a maven repo from rubygems.org but they may have changed how they handle 404
<headius[m]>
nothing seems to have changed on our end so I suspect rubygems.org changed some 404 handling or marshal data or something
<headius[m]>
build on the PR branch now doesn't install rspec in -Pbootstrap so it has a Gemfile instead
nirvdrum has quit [Ping timeout: 256 seconds]
<enebo[m]>
headius I added a single comment on your fix commit
<headius[m]>
ok... working on scope flags now
<headius[m]>
this is untangling some stuff but seems to be working
<enebo[m]>
subbu: is that guys last name really werkmeister?
<subbu>
enebo[m], i suppose it is. I am just going by his github profile.
<headius[m]>
mister work
<headius[m]>
scope flags may be done, running some sanity checks
<headius[m]>
I also tweaked InterpreterContext/FullIC to better handle that initialization that only non-full does (getEngine and prep flags) because it meant a bunch of methods had to be overridden for no good reason
<headius[m]>
looking forward to reworking the lot before 9.3
<headius[m]>
enebo: replied to your comment
<headius[m]>
the methods in question are overrides from CompilerPass, which passes IRScope since many passes run against non-full
<headius[m]>
well, I don't know "many" but some
<headius[m]>
I would have made them receive fullIC otherwise
<headius[m]>
they could receive IC maybe but then I'd be changing passes unrelated to this fix work
<headius[m]>
unsure how good our tests are at detecting scope elimination etc but I have done some one-off checks and it's still working as in 9.2.12
<headius[m]>
greeeeeeen
<headius[m]>
last flag will be frame related I guess, need to untangle what's actually dynamic and what is determined at compile time
<headius[m]>
kalenp johnphillips31416 are you able to reproduce 6319 regularly?
nirvdrum has joined #jruby
<headius[m]>
hmm it doesn't look like any of these are dynamice
<headius[m]>
enebo: check my work but this PR may be ready for some testing
<headius[m]>
needsFrame is looking at flags but I don't think those flags are ever modified
<johnphillips3141>
We can repro
<headius[m]>
if this branch looks good to enebo we'll want to see if you still can
<headius[m]>
I have no test case to trigger the error right now
<enebo[m]>
headius: I just means acp is only called directly
<enebo[m]>
It can have a separate entry point and pass it in directly
<daveg[m]>
i can repro, but it's killing production nodes, about one every 30 minutes, so I'll have to revert soonish
<headius[m]>
ahh you filed that bug yes?
<daveg[m]>
yes
<headius[m]>
so you weren't able to get a thread dump for that, but maybe you can for this?
<headius[m]>
that would be the biggest help and would confirm where it's getting stuck
<daveg[m]>
that's right. i have dump, all 15M of it
<headius[m]>
if it's actually that synchronize it could be related to enum... since enum.next runs in a fiber thread it could be getting stuck on those locks
<headius[m]>
fix isn't trivial, have to make locks work across fiber threads... but it's doable
<headius[m]>
if that is indeed what's happening
<headius[m]>
15M of thread dump? Not a heap dump?
<headius[m]>
if you can zip and post that somewhere it will tell us a lot
<daveg[m]>
yep, 15M of threads. We run w/ a lot of threads normally, and with this bug we get even more. ~360 threads when it died.
<headius[m]>
the jruby advantage
<headius[m]>
ok post that zipped dump or email team@jruby.org with a link if you prefer
<headius[m]>
I dont' see any fiber threads though I didn't let it finish loading
<headius[m]>
no mention of fiber whatsoever in the full dump
<daveg[m]>
yeah, i'm not sure it's related to the other issue. I changed these to to #each instead of #to_enum for creating the Enumerators, which is what let me move forward. So it may be unrelated entirely
<headius[m]>
260 references to 0x00000006c883da18
<headius[m]>
aha well that's interesting then
<headius[m]>
next thing to look for is which thread has actually acquired that lock
<daveg[m]>
Ruby-0-Thread-6 might
<daveg[m]>
* ignore that
<headius[m]>
hmmm
<headius[m]>
I don't see any thread that has acquired the lock
<headius[m]>
how did you determine they're trying to release it?
<headius[m]>
i found that line a few times in the dump but they all seem to be trying to acquire the lock
<headius[m]>
for example "rpc.app_inventory.get_applications" #2424
<headius[m]>
since I can't see any thread that appears to be holding the thread (all places that mention the lock are in threads "parked to wait for" it) I'm looking into whether it's possible a lock fails to get released somewhere
<daveg[m]>
look at that line in connection_pool -- it's in release
<headius[m]>
holding the lock not holding the thread
<headius[m]>
oh releasing the connection
<headius[m]>
yeah they acquire the lock again to release connection
<headius[m]>
do you have any unexplained exceptions in the app log?
<daveg[m]>
I think I might have found it -- We have a JMX call that does ActiveRecord::Base.connection_pool.stat[:waiting] under the hood -- there are 17 threads in there. I don't see it holding the connection yet, but i bet it does. we use datadog for for monitoring, and it does jmx dumps periodically. 17 is WAY more than there should be
<daveg[m]>
no unexplained exceptions
<headius[m]>
ah that's interesting
<headius[m]>
I wonder if it's causing some kind of deadlock
<headius[m]>
I would have expected to see the JMX threads in this dump
<headius[m]>
dave.g: thank you, I updated with some findings from discussion
<headius[m]>
9.1.17.0 was quite a while ago so there's a lot of changes that could be related
<daveg[m]>
yep. we've tried upgrading to 9.2 a few times and have run into problems each time, but were in a crunch previously and failed to report them :-(
<headius[m]>
I unnderstand
<daveg[m]>
We have an upgrade of ActiveRecord to 5.2 ready to go -- we could revert jruby, do that change first, then re-apply the jruby upgrade. Obviously AR 4.2 is way old. I don't think this is an AR bug, BUT newer versions have definitely improved connection pool handling. But it would hurt our ability to help debug this problem
<headius[m]>
how easy is it for you to reproduce this in a test environment?
<headius[m]>
the more easily we can reproduce it the more combinations we could try
<headius[m]>
it is still puzzling that nobody owns the lock and everyone's trying
<daveg[m]>
we've been unable to reproduce this in staging :-( We have some load tests, but i think they aren't diverse enough in request type -- they are largely focused around a lot of load of a single type at a time
<headius[m]>
I see
<headius[m]>
that particular commit was done in 9.2.8, so if 9.2.7 runs it would tell us something
<headius[m]>
diagnosing this will be difficult without easy reproduction or access to the server in question
<daveg[m]>
yep. arranging direct access to the server in question will be tricky, but we could perhaps do a zoom session or something like that to allow you virtual access
<daveg[m]>
i've got to run for right now. kroth is also here, and working on this with me, he can continue possible. Thanks so much for all of your help.
<headius[m]>
sure, thank you for your patience...hopefully we can find the problem so we can fix it for everyone
<headius[m]>
kroth: hello
kphns has quit [Remote host closed the connection]
<headius[m]>
that would be worth a shot
<headius[m]>
I added another commit in the bug that happened with the stdlib upgrade for 9.2, that would be difficult or impossible to remove but it's a candidate
<headius[m]>
if it were possible to get access to the instance we could quickly figure out who has that mutex locked, or I could walk you through the process
<headius[m]>
walkthrough is fine, I understand the access limitations
<headius[m]>
it's not a difficult process
<headius[m]>
mostly we just need to find that lock, which should be easily visible in the stacks of these threads, and then see what thread it says locked it
<headius[m]>
I will also look into other tooling for getting a lock report
<abelsromero[m]>
I found a curious behaviour with "--ri" and "--rdoc". Those where removed in Ruby 2.6, but jRuby reports 2.5.7 and does not accept them
<abelsromero[m]>
is a bug or some decisition?
<headius[m]>
that was a change in rubygems
<headius[m]>
there's now a single flag --[no-]document
<abelsromero[m]>
yes, but I wanted to use that to fix a problem in jruby-maven-plugin, and now I am out of ideas :/
<headius[m]>
ah yes that problem
<abelsromero[m]>
to support multiple ruby versions and implementations I wanted to check against the ruby version reported, but that changes it
<headius[m]>
you'd have to check against the rubygems version reported
<headius[m]>
unfortunately we've had to keep updating rubygems because of other issues they fixed (perhaps some security fixes too)
<abelsromero[m]>
how do I do that?
<headius[m]>
that's a good question :-)
<abelsromero[m]>
I am processing the response of "ruby -v"
<headius[m]>
it might be possible to read the rubygems/version.rb from the plugin maybe?
<headius[m]>
hmm
<headius[m]>
well gem -v should work normally
<abelsromero[m]>
if it's a file in the classpath for sure
<abelsromero[m]>
let me check, thanks :D
<headius[m]>
yeah sure
<headius[m]>
it should be in there somewhere
<headius[m]>
enebo: I'm doing one more review of the IR flags but the PR is marked ready for review
<headius[m]>
without a repro it's hard for me to say if it fixes anything 🤷♂️
<abelsromero[m]>
I found the rubygems in jruby-complete but no sign of a version, I was expecting some descriptor file, but it's a 398 lines gem. I asume I could invoke it someway?
<headius[m]>
hmmm
<abelsromero[m]>
I am not versed Ruby, sorry
<enebo[m]>
headius: we really need to make nightly with it and have some brave soul just try it
<enebo[m]>
One person (do not remember who offhand) had 100% problem but with real env only not in dev/test
<headius[m]>
I see META-INF/jruby.home/lib/ruby/stdlib/rubygems/version.rb in jruby-complete 9.2.12
<headius[m]>
it's a Ruby file but could be parsed for the version string that looks like this
<headius[m]>
oh jeez nevermind
<headius[m]>
this is some rubygems internals
<headius[m]>
ok near the top of rubygems.rb there's this:
<headius[m]>
other than running `gem -v` that's the only suggestion I have
<headius[m]>
enebo: merging it to 9.2 will produce a snapshot
<abelsromero[m]>
Already check the content looking for something "version like", another thing is that rubygems is not in jruby-core. So not sure this a good approach for all cases
<headius[m]>
yeah not in core, it would be in jruby-stdlib jar if using that layout
<headius[m]>
running without a normal on-filesystem JRuby install, if RubyGems is there, then rubygems.rb should be there
<headius[m]>
running against a JRuby install, it will be <jruby home>/lib/ruby/stdlib/rubygems.rb or run gem -v
<headius[m]>
thank you for trying to fix this issue btw, it has been a thorn in my sie
<headius[m]>
side
<abelsromero[m]>
my interest is for asciidoctor, some people use the plugin to install ruby extensions and if we can't support most recent version, it's a matter of time before we hit a bigger problem
ur5us has joined #jruby
<abelsromero[m]>
another option is checking against actual jruby version, as a last resort...
<headius[m]>
ahh yeah well let's get this working right then
<headius[m]>
actual JRuby version is fine too, just need to figure out when we updated to rubygems that removed those flags
<headius[m]>
so hard to track changes to the flags enum because it's exposed as getFlags and "add" and "remove" are just from Set
<abelsromero[m]>
that's the first thing I got 9.2.11.0
<headius[m]>
abelsromero: ok
<abelsromero[m]>
if I can't get the rubygems, I'll resort to that 🤷♂️:
<abelsromero[m]>
good this is not work and I can take extra time
<headius[m]>
ok keep me posted and we can test things out
<headius[m]>
we are going to take over plugin maintenance as well, just a little bogged down with JRuby issues at the moment
<abelsromero[m]>
I alreade mentioned in the PR, I have no issue helping with cleaning and puting the plugin up to date
<abelsromero[m]>
but I cannot take ownershipt, too many things already 😳
<headius[m]>
yeah any help you can provide is great ❤️
<headius[m]>
enebo: I can't be positive but it seems like all other scope and frame-related fields are set during the initial flag calculation
<headius[m]>
that happens lazily though so I'm unsure if they're a risk or not
<headius[m]>
in any case they do not appear to be updated as a result of full/jit preparation or passes
<enebo[m]>
I just know that these 3 are the one exposed through IC itself so it makes sense none of the others would matter too much
<enebo[m]>
if there are any others
<headius[m]>
there's not really any single frame flag, just the flags for frame fields... and they're static based on method call names, $~ existence, etc
<headius[m]>
ACP just looks at those flags and either adds or doesn't add frame instructions directly
<headius[m]>
which is already isolated to FullIC
<enebo[m]>
sure but they are toggle ons as well so even if we had two running and hitting them by the time one finished they would all still be true for the ones which should be
<enebo[m]>
unless we reset some of those at beginning (which I am fairly sure we don't) they should be fine
<headius[m]>
hmmm
<headius[m]>
yeah they don't appear to get memoized in IC either
<headius[m]>
everyone that accesses them goes back to the scope flags
<headius[m]>
there's one other thing I wasn't sure about
<headius[m]>
InterpreterContext has buildComplete()
<headius[m]>
it always returns true for IC, but for Full it checks linearizedBBList
<headius[m]>
I don't know where this fits into Full prep cycle but I'm worried about a partially-complete Full IC getting used
<headius[m]>
because it's created and set into the IRScope field before running passes
<headius[m]>
yeah I think this is a problem too
<headius[m]>
IRScope.fullInterpreterContext should not be visibly set until the full IC is actually ready
<enebo[m]>
well I think that is fine...I believe the main reason for this is passes to run
<enebo[m]>
so probably the proper simple fix to this is to get a FIC as a local and run it to completion then set IRScope field once done which includes linearizing
<enebo[m]>
This would mean the finish point is not construction but being set into IRScope
<enebo[m]>
which would remove the need to call buildComplete
<headius[m]>
yeah which would perhaps make passing it into the pass the right choice again
<enebo[m]>
but rationale is that FIC holds info and linearize cannot happen until after passes have run which will set more state in FIC
<headius[m]>
or the pass produces a full IC rather than making IRScope create a bogus one right away
<headius[m]>
so linearize is a final stage
<abelsromero[m]>
Last thing before going to bed -> 9.2.9.0 accepts "--ri" but rubygem has VERSION = "2.7.10", which does not match the official Ruby deprecation which was in 2.6. So, not reliable. Tomorrow I'll start working on validating against jruby version. Thanks!
<enebo[m]>
well we have n passes so there would still be one I guess it would just be constructed on first one?
<headius[m]>
abelsromero: ok I leave this in your capable hands... I'm not sure about which versions are which when it comes to RG
<enebo[m]>
my other concern (not too much though) is that for opt/deopt we construct new fics from existing ones (well we start with that it has made along with FIC info)
<enebo[m]>
so if we tie the FIC materializing from running a set of passes we need to take that into consideration
<headius[m]>
I know we have code that just looks to see if full IC == null and if it's not proceed to use it
<headius[m]>
that is insufficient as a check
<enebo[m]>
Personally I do not see this as much of an issue. We should just guarantee via method like prepareFullBlah that linearize is called before setting the field
<enebo[m]>
but yeah two things entering the same method to make it at the same time largely just means only one can win
<headius[m]>
prepareFullBuild does if (fullInterpreterContext != null && fullInterpreterContext.buildComplete()) return fullInterpreterContext;
<headius[m]>
otherwise it starts trying to do another full build
<enebo[m]>
yeah that logic should change
<enebo[m]>
it should make it local var and then complete it then set it
<headius[m]>
it is also synchronized
<headius[m]>
it can't though because passes use getFullIC to get it
<headius[m]>
it's a more complicated state problem than it seems
<headius[m]>
your thought of making the passes receive full IC directly might be an answer
<enebo[m]>
well I don't know
<headius[m]>
like you say they're called explicitly and the two called only ever run against full
<enebo[m]>
I mean I think one error is that fullinterpreterContext should not be set as a field until buildComplete()
<enebo[m]>
which means prepareFullBuildCommon() should not set a field
<headius[m]>
yeah
<headius[m]>
hmm that's doable
<enebo[m]>
I believe at one time that method did mre than call the constructor
<enebo[m]>
then prepareFullBuild does not need anything more than a null check at front since it would already be totally done if set
<headius[m]>
so last thing in prepareFullBuild should be setting the field and it's a local until then, but the passes still need an alternate way to get it
<enebo[m]>
but as you can see beneath it does complete it
<enebo[m]>
I need to glance at prepareForCompilation but it must be about the same thing
<enebo[m]>
yeah
<enebo[m]>
passes should receive it
<headius[m]>
largely the same but runs a list of passes
<enebo[m]>
yeah seems reasonable
<headius[m]>
DEFAULT_JIT_PASSES
<headius[m]>
which includes a lot of passes
<headius[m]>
I guess I will see what I can untangle here
<enebo[m]>
well I mean it will just mean one more param for each pass impl and the interface right?
<headius[m]>
yeah I suppose that would be fine
<headius[m]>
I don't know how much that change will cascade
<enebo[m]>
oh and it extends abstract CompilePass
<enebo[m]>
So in truth you only edit that and add FIC to run() I think and then cheap way would be to save to a field
<headius[m]>
it starts to look nasty already
<headius[m]>
getCFG calls prepareFullBuildCommon expecting it to set the field
<headius[m]>
and getCFG is called in a million places
<headius[m]>
it does not call prepareFullBuild or passes
<enebo[m]>
so getCFG is always going to be off of FIC
<enebo[m]>
and so we should get from there and not IRScope
<headius[m]>
but it needs to be there to run passes
<enebo[m]>
getExecutedPasses() is also really on FIC and not IRScope
<enebo[m]>
but it does not run anything on a CFG until run is called
<enebo[m]>
and FIC passed to run will have a valid CFG
<enebo[m]>
It is interesting looking at this because I have already moved most of the stuff from IRScope to FIC already but continued to access via IRScope
<headius[m]>
what I'm saying is everywhere that calls getCFG needs to also be audited for whether it expects fullInterpreterContext field is set
<headius[m]>
everywhere that currently gets the side effect of setting that field has to be audited to work properly without setting that field
<headius[m]>
so everywhere that calls getCFG also has to be modified to accept full IC as a parameter rather than going to the field
<enebo[m]>
yeah I believe those runCompilerPasses in IRScope just needs to also get the FIC you are setting up so all those getCFGs go away (or you get from the fic)
<enebo[m]>
anyd getCFG within context of running a pass will have a FIC passed into it
<enebo[m]>
but yeah I agree all of them will be needed to be examined
<headius[m]>
might simply have to kill IRScope.getCFG completely and change every place to do the right thing against FIC
<headius[m]>
like I did for the limited changes in PR
<enebo[m]>
yeah it is many references but with FIC available as field from base CompilerPass and knowing it can only happen in context of run it should be more manual than anything
<enebo[m]>
I admit it is not a tiny change but it does not feel like a hack to me
<enebo[m]>
It is just a continuation of promoting FIC over IRSCope as the main player
<headius[m]>
yes
<enebo[m]>
I think you may remember I wanted to swap the direction on this a long time ago
<headius[m]>
yeah and that was how I approached fixing this for 9.2.12
<enebo[m]>
IRScope to get FIC is weirder than asking FIC for IRScope
<headius[m]>
it just got way too big too fast
<headius[m]>
I can change passes to accept IC without a big impact
<enebo[m]>
but we also ended up fixing a lot of issues for fosdem by killing IRScope references in runtime
<headius[m]>
and then go from there to handling getCFG and FIC lifecycle
<enebo[m]>
yeah it seems doable but quite a bit of editing
<enebo[m]>
I would not view this as being risky either
<headius[m]>
intellij don't fail me now
<enebo[m]>
but I have to help with dinner...god speed
nirvdrum has quit [Ping timeout: 240 seconds]
thegeekinside has joined #jruby
thegeekinside_ has joined #jruby
thegeekinside has quit [Quit: Leaving]
thegeekinside_ has quit [Quit: Leaving]
thegeekinside has joined #jruby
thegeekinside has quit [Remote host closed the connection]