ur5us has quit [Ping timeout: 244 seconds]
ur5us has joined #jruby
ur5us has quit [Ping timeout: 260 seconds]
nirvdrum has joined #jruby
sagax has quit [Read error: Connection reset by peer]
<enebo[m]> kares: JRUBY_OPTS="--dev -Xjit.logging -Xjit.logging.verbose -Xjit.threshold=0 -Xdebug.parser" jruby -r optparse -e 1
<enebo[m]> If you try this on master you will see a Java proxy class get setup via compilation where it passes a null superClass to newClass.
<enebo[m]> Not sure if you would care to look at it or not but I figured you know that code
<enebo[m]> kares: if that gets fixed interestingly we will see a second error raising NPE when compiling optparse
<enebo[m]> kares: which goes back to 9.2...that is something else though :P
<kares[m]> heh, reproduced. is this in an issue on the tracker?
<kares[m]> would be interested to look into but my work-queue is too full ... maybe tomorrow or over the weekend
<enebo[m]> kares: ah yeah no rush at all...it is only on master atm
<enebo[m]> but no issue
<enebo[m]> I was trying to debug IRFlags changes and there are lots of errors in our logging code now and then I realized multiple errors have nothing to do with my changes
<headius[m]> I fixed a bug in jit logging on ir_concurrency
<headius[m]> NPE in InterpretedIRMethod
<headius[m]> er maybe it was InterpretedIRBlockBody... it was returning null for implementationClass
<enebo[m]> well I have another now since fic never gets set on failed compile we cannot see what it tried to do
<enebo[m]> My changes work for startup interp but I hit some bugs when I switch to Full/JIT
<enebo[m]> now that I am debugging I am hitting things
<enebo[m]> I did see that problem so II probably had not pulled since then
<kares[m]> yet ... another attempt to run CI on Win: https://github.com/jruby/jruby/pull/6340
<kares[m]> wonder if we should finish that with pends - most seems to be io/posix issues
<enebo[m]> kares: pend with windows platform and that is just fine
<enebo[m]> kares: I think seeing what new things we break is better than not running it at all
<kares[m]> okay I will look into that and add a test there for the regression
<kares[m]> will re-target fir 9.2 it is still merged to master, right?
<headius[m]> kares: not surprising
<headius[m]> yes we are merging periodicailly
<kares[m]> yy not at all I expected worse ;)
<kares[m]> simce tgat suite is 'crazy'
<kares[m]> * since that suite is 'crazy'
<headius[m]> I modified spec tags on master to ad RbConfig::CONFIG['host_os'] to excluded tags so if you copy that over to 9.2 branch it should be possible to tag as "mswin" or something
<headius[m]> I can't remember what we use for host_os on Windows but it's always the same
<kares[m]> this is only test:jruby
<headius[m]> might be mswin32 😟
<headius[m]> oh I see
<headius[m]> still no posix IO for most stuff on Windows
<kares[m]> tagged some in the code already
<headius[m]> ok
<kares[m]> will do the rest with TODOs
<headius[m]> we used to run specs on windows but that was years ago... would be a project for a day or two
<headius[m]> kares: yeah maybe we should file a bug with that tagging commit so we don't just forget about them
<headius[m]> wish we had a better mechanism for tagging test/unit
<kares[m]> ok
joast has joined #jruby
<headius[m]> both we and MRI do consider encoding in String hash calculation, but we don't use the same logic for symbol bytelist hashing
<headius[m]> so that's probably the only remaining fix to get identical symbols with different encodings working
<enebo[m]> I am not sure
<enebo[m]> we already will make any 7bit sequence into US-ASCII before it enters the table
<headius[m]> yeah I think that's wrong
<headius[m]> what we should do instead is make it US-ASCII only if the original encoding is 7bit-compatible and the content is all 7-bit
<headius[m]> I think
<headius[m]> otherwise we leave it as is and use the encoding index as part of the hash
<enebo[m]> I have never seen a 7bit clean symbol not be US-ASCII but I guess it is not us-ascii compatible string but happens to be us-ascii bytes maybe?
<headius[m]> yes
<headius[m]> in the example case it's an ascii string forced into utf-16
<enebo[m]> The parser does do this check though. Although it could be wrong
<headius[m]> it may be unlikely that we see bugs with this anymore in practice, but that particular report is still broken
<enebo[m]> I actually don't know of a scenario where that could happen either but I can believe it could
<headius[m]> also we scan for code range in #intern which rejects that forced UTF-16 string
<headius[m]> MRI does not reject it
<headius[m]> they appear to happily make symbols that are CR_BROKEN
<headius[m]> which is... weird
<enebo[m]> which intern?
<headius[m]> String#intern
<headius[m]> I added another comment after the one linked above
sagax has joined #jruby
<enebo[m]> I admit I have not looked in a while but this surprises me
<headius[m]> I think if we review their logic again we'll be able to sort out which piece go where
<enebo[m]> symbol in RubySymbol should be an id into the symbol table as a string which is always ISO-8859_1 bytes which will never fail to intern
<headius[m]> but they do not appear to check for CR_BROKEN on the way into symbol
<headius[m]> we do
<enebo[m]> I think we do have String entry points which maybe is not honoring that but I believe that is all coming from Java and should then all be valid UTF-16LE
<enebo[m]> Admittedly there are likely loose ends but in theory that string should not fail to intern
<headius[m]> this is just RubyString#intern for a bad UTF-16 string
<headius[m]> as in the linked issue's original repro
<enebo[m]> which I did not read :P
<headius[m]> if I remove that check, though, we still have the old behavior and are not distinguishing between the "ab" string forced to UTF-16 and the "ab" string that's just US-ASCII
<enebo[m]> at I see so RubySymbol itself could be fine here with Broken
<headius[m]> yeah seems to be
<enebo[m]> it would just be a really messed up symbol that only this RubyString would be a path to
<headius[m]> and even if string contents are 7-bit, it's still keeping the UTF-16 and US-ASCII versions of symbol separate
<headius[m]> right
<enebo[m]> if you wanted to get it by that again (or itself obviously)
<headius[m]> obviously any other UTF-16 string should not collide with a US-ASCII string ever
<headius[m]> unless there's embedded nulls, which would get rejected other ways usually
<enebo[m]> so now having properly read what you said with it being RubyString.intern() it looks like we could almost just remove the coderangescan
<headius[m]> it seems so
<headius[m]> RubySymbol uses RubyString hashing?
<headius[m]> I mean the symbol table
<enebo[m]> oh yeah ok that is the other part of that issue right?
<headius[m]> yeah
<headius[m]> from what I can see all String#intern symbols just go through getSymbol(byteList) which does not consider encoding for hash
<headius[m]> CRuby explicitly uses rb_str_hash for it
<headius[m]> (which incidentally also does the hash DOS stuff, we are not doing in symbol table)
<headius[m]> our symbol hash appears to just be the iso-8859-1 bytes, in javaStringHashCode
<enebo[m]> tbh I am not sure I have much more invested in this conversation today. It seems like you are close to having figured this out already.
<enebo[m]> it is as far as I remember
<enebo[m]> which is why a RubyString will probably just find the broken CR symbol from a string
<headius[m]> yeah I can look into it but wanted to get your thoughts
<headius[m]> it shouldn't be hard to make logic match MRI and this issue has been dangling so I thought I'd knock it down
<headius[m]> on master
<enebo[m]> I think removing just that check will address the case of broken but I still believe two weird ISO compatible strings containing the 8th bit will not work
<enebo[m]> which is where that last comment of mine came from
<headius[m]> I'll try to align logic for 9.3 and see if I can kill this issue today
<enebo[m]> but 8bit 8859-11 and 8859-13 (made these up) will still not work
<enebo[m]> I don't think this is a very significant incompatibility but since we do not tuple encoding with those bytes I think that will still be broken
<enebo[m]> broken strings probably are actually fairly common
<headius[m]> yeah unsure about that case
<headius[m]> I think e.g. 11 and 13 may be compatible under 7bit so they wouldn't show up as 7-bit only if there were differing characters?
<enebo[m]> the issue as I see it is they will be identical 8859_1 hashes
<enebo[m]> but the 8bit values will mean different things
<headius[m]> but if both had same chars and were all under 7bit they would collide in MRI too I thinkn
<enebo[m]> no but they are not only 7bit
<headius[m]> if they're not 7bit MRI will include encoding index in hash
<enebo[m]> they contain something 8bit which is the same byte but a different character
<enebo[m]> yeah MRI will but we won't
<headius[m]> right, which is what I will try to fix
<enebo[m]> ok
<headius[m]> basically just has to propagate some string hash logic into symbol table
<enebo[m]> that will be great to fix it
<headius[m]> might need to store CR
<headius[m]> I'll see
<enebo[m]> CR can be calculated but it forces a scan so it is probably nice to prop it
<enebo[m]> I still sort of wish it was in bytelist
<enebo[m]> going up to start some lunch
<headius[m]> yeah
<headius[m]> patch got bigger than expected but I have it propagating CR into Symbol now and using String hashing for the table
<headius[m]> it seems like most places that could pass a stored CR through already do... and parser already saves RubySymbol in SymbolNode
<headius[m]> there are a few places, mostly coming from Java that only have a String or ByteList in hand
<headius[m]> gotta get use to this new icon for riot/element
<headius[m]> bbiab
<enebo[m]> ok
<headius[m]> if we ever move CR into ByteList it can just become CodeRangeable and work with this change
<kares[m]> https://github.com/alibaba/dragonwell8_jdk/pull/18 coroutines used in production ;)
<headius[m]> cool
<headius[m]> need to spend a little time making a loom backend for fibers
<headius[m]> shouldn't be too difficult
ruurd has quit [Quit: ZZZzzz…]
<headius[m]> hmm ok symbol patch isn't quite right
<headius[m]> aha, unmarshaling a symbol is not properly encoding-aware
<headius[m]> MARSHAL
ruurd has joined #jruby
travis-ci has joined #jruby
travis-ci has left #jruby [#jruby]
<travis-ci> jruby/jruby (ir_concurrency:9a2d272 by Thomas E. Enebo): The build was broken. https://travis-ci.org/jruby/jruby/builds/711222898 [159 min 39 sec]
<enebo[m]> wot
<headius[m]> shame
<enebo[m]> lol if this is my last commit what am I watching in that other tab :)
<headius[m]> oh concurrent-ruby
<headius[m]> that suite has been marked allow-failures on master because there's a bad test
<headius[m]> yeah it's the same one
<headius[m]> bad thread accounting
<enebo[m]> well I am not even concerned about that although it is good to know it was from that
<enebo[m]> I had some other tab open with about 40% of the tests green and the rest yellow
<enebo[m]> If I click on the commit it appears to be all my last changes
<enebo[m]> What am I watching?
<enebo[m]> OH I SEE
<headius[m]> PRs done against origin test twice
<enebo[m]> It does one build for the commit and one for the PR
<headius[m]> not sure how to prevent that but still allow branches to CI
<enebo[m]> ok makes sense too but I thought I was a bad luck time traveller for a moment
<headius[m]> yeah it's annoying
<headius[m]> the push (branch) CI always runs first, fwiw
<enebo[m]> so I may be as done as I can be without rewriting it to be the right design
<headius[m]> hot crackers
<enebo[m]> but I do have one more thing to do (well two)
<enebo[m]> 1. Make sure nothing has stopped compiling and is falling back to interp
<enebo[m]> 2. Sort of related but make sure the same opts are happening before/after
<headius[m]> ugh this symbol unmarshal logic blows
<enebo[m]> #2 proves #1 already but it is not super simple...I
<headius[m]> stopped compiling as in couldn't JIT to bytecode?
<enebo[m]> headius: stopped compiling as it hit and error and bailed out
<enebo[m]> I think I can trivially just change test:mri:core:jit probably to do verbose logging on jit and see what shows up
<enebo[m]> but I am not super concerned about this as much as determining whether the passes actually did stuff like eliminate dyn scopes and stuff like that
<enebo[m]> even examining the flag does not tell me anything more than it might have happened
<headius[m]> oh like you aren't sure jit is still doing what it should after your changes
<headius[m]> compiler specs are pretty unforgiving so that's one assurance
<headius[m]> if something fails in there now it's broken in full or jit logic for other stuff too
<headius[m]> it doesn't have any fallback
<enebo[m]> headius: do spec:compiler actually depend on all passes running a particular way?
<headius[m]> they don't check anything related to passes but they run jit like jit would run
<headius[m]> they are just testing that jit works and new code behaves as expected
<enebo[m]> oh sure so it probably is a better indicator of #1 than #2
<headius[m]> yeah
<enebo[m]> ok
<enebo[m]> I was freaked this morning when I uncovered the getImplementationClass NPE
<headius[m]> deeper testing could be done against full to make sure the code structure is as expected
<enebo[m]> I was thinking oh crap we have some stuff not compiling that precedes all this
<enebo[m]> funny it was literally just the log statement that was broken (as far as we know)
<headius[m]> I guess neither full nor jit would be "easy" to verify because we're basically looking at the "assembly" after and making sure it's what's expected
<enebo[m]> well if you know JIT is eliminating a scope then that would be cool
<enebo[m]> I think largely that and scopes replaced with temps are the two main opts I want to make sure have not changed
<enebo[m]> I don't want any of them to stop but those are bigger ones from a changed perf perspective
<enebo[m]> I cannot do before/after on all flag state because I made all scopes get flags whereas we used to not do it for several scopes
<headius[m]> yeah it would be easier to do against full
<headius[m]> once it's in bytecode all I can really inspect is the bytecode
<enebo[m]> yeah I think my plan is to run same thing with same printlns before/after and those prints will say I removed dyn scope for X
<headius[m]> we have some pass listener framework I've never tried to use
<headius[m]> could possibly hook into that
<enebo[m]> I think if there is no interleaved output I can just sort and compare
<enebo[m]> yeah it exists largely to report what was run but not how it was run
<enebo[m]> So I can verify the same passes are run but I already know they are
<enebo[m]> I maybe will think a little about this though. It would be nice to report what happened in a pass. compilerpasslistener could be instrumental in making something we could regularly test
<enebo[m]> maybe some void actionPerformed(pass, fix, data...) or something
<headius[m]> I think marshal should be registering ByteList for symbols
<headius[m]> MRI appears to do something similar before they actually create a symbol object
<enebo[m]> Although it is not perhaps a great idea for next 24 hours in that I need to apply it to older code too ... same could be said of printlns but new code uses fic and old uses IRScope so println probably is simpler
<headius[m]> that's the only way I can untangle this lifecycle
<headius[m]> yeah actionPerformed kind of callback would be good
<headius[m]> maybe throw together an enum of actions they take and just feed those out if there's a listener
<headius[m]> we can improve it later
<enebo[m]> yeah I think it could also just be a log as well but as a listener testing could register that and pull it into Ruby
<headius[m]> right
<enebo[m]> oh I plan on it being string data largely
<enebo[m]> based on my 3 minutes of thinking about it
<enebo[m]> but enum has disadvantage of being a pain to extend and if it will end up as Ruby we do not gain a lot from it
<headius[m]> just a more structured way to log them really
<headius[m]> we have no consumers of it so we could reevaluate the enum later too
<enebo[m]> yeah I think at this point I also need to evaluate what an action really is for each thing
ur5us has joined #jruby
<headius[m]> well this is wacky but it's working
<headius[m]> enebo: I've finished additional fixes for the symbol encoding thing at https://github.com/jruby/jruby/pull/6341
<headius[m]> the marshaling rework puts us more in line with CRuby as far as the workflow of getting a symbol out of the marshal stream
<headius[m]> additionally I had to remove all "fast" paths from a java.lang.String to Symbol because they were using iso-8859-1 hash... the only way to get a symbol now is via a ByteList, so the only "fast" way is to keep the ByteList (and ideally a CR) on hand
<headius[m]> we will want to audit all newSymbol/getSymbol(java.lang.String) paths and try to find ways to propagate a ByteList through
<headius[m]> some like method_missing and const_missing currently use a Java String and will now be creating intermediate ByteList along the way
<headius[m]> it might be possible to restore the "fast" paths by doing all the hash calculation from the String bytes as though they have been decoded to UTF-8... I will take a quick look to see if that's possible
<headius[m]> I don't know if it would be more efficient than just creating the UTF-8 ByteList and using that, but it may be possible to eliminate the alloc
<headius[m]> I've updated the issue with a description of the two extra pieces and will look into fast pathing some of this now
<headius[m]> heh
<headius[m]> fun with optimization... I have a method that can take a Java String and calculate both Ruby hash and code range from it without any allocation, using our existing ByteBuffer cache
nirvdrum has quit [Ping timeout: 246 seconds]
<headius[m]> yay
<headius[m]> I have fast String to Symbol logic working with minimum allocation
<headius[m]> no reason not to go forward with this now