sagax has quit [Remote host closed the connection]
sagax has joined #jruby
nirvdrum has joined #jruby
nirvdrum has quit [Remote host closed the connection]
nirvdrum has joined #jruby
Antiarc_ has quit [Remote host closed the connection]
Antiarc has joined #jruby
<headius[m]>
enebo: yo
<headius[m]>
so where do we stand and what can I do for .13
<enebo[m]>
you can review the PR but I also pinged johnathon to test again
<enebo[m]>
is he on matrix?
<headius[m]>
yeah johnphillips31416
<headius[m]>
I will start having a look at the PR
<headius[m]>
frustratingly nothing I threw at it would break, regardless of how I loaded/evaluated methods, how many threads I use, structure of the methods and nested closures
<enebo[m]>
I verified dynscope removal on all of test:mri:core:jit is the same
<headius[m]>
there's some complexity to the failing cases that I have not been able to reproduce synthetically (and I don't know what the failing cases actually look like
<enebo[m]>
I think another thing I will try this afternoon is endlessly making new methods and then forcing them to jit
<enebo[m]>
across threads
<headius[m]>
I tried normal jit mode, -X-C, different thresholds... all that
<enebo[m]>
if it is a timing issue eventually a race should happen
<headius[m]>
logged jit output to see that things were compiling
<headius[m]>
saw occasional double jitting but no failures
<enebo[m]>
but it might takes hundreds to thousands of methods JITing at same time for it to fail
<headius[m]>
hmmm
<enebo[m]>
based on how long it seems to take to hit it I am thinking thousands of possible races before one hits it
<headius[m]>
perhaps multiple independent stacks of closures in the same method
<headius[m]>
so you have one stack triggering jit at method level while another stack is still happy to interpret
<enebo[m]>
that was what I was trying to do originally
<headius[m]>
the DynamicScope error is a closure problem for sure
<headius[m]>
the stack error likely is as well
<enebo[m]>
force one sibling to JIT then second sibling would not
<enebo[m]>
part of the problem is an old closure activation has to be alive before a new one JITs
<enebo[m]>
or happen to get used at same time the new one JITs and is used
<headius[m]>
yeah that makes sense
<enebo[m]>
I guess perhaps a Thread which lives with a closure which reaches out of it but then a parent closure jits outside the thread?
<headius[m]>
basically the same as my messages to you on Friday but with one possible enhancement we could make: force all 7-bit clean identifiers to always use US-ASCII, explicitly acknowledging that it's not possible to have differently-encoded 7-bit identifiers
<headius[m]>
I was not quite sure if we're doing that now
<headius[m]>
I don't see this limitation as particularly damning, since it's pretty edgy to be running a system with lots of mixed-encoding identifiers in the first place, and even edgier to be doing so with ASCII-compatible 7-bit identifiers and expecting them to all be different logical identifiers separated only by encoding
<enebo[m]>
Well 7bit == USASCII was the intent and it should work that way coming in through the parser
<headius[m]>
ok, then it may be this only needs a "fix" in String#intern to always force such symbols to US-ASCII
<headius[m]>
that is the root case from #1348
<enebo[m]>
it has to be ascii compatible encoding as a source
<enebo[m]>
yeah this is probably just that code path itself
<headius[m]>
it will be an explicitly behavioral difference from CRuby, but it will be clearly spelled out: if you have 7-bit ASCII bytes it will be a US-ASCII symbol, always
<enebo[m]>
It is more inconsistent on their parts if so
<headius[m]>
that will avoid the cases where the first symbol intern that "wins" has some other encoding and others weirdly inherit it
<enebo[m]>
their parser will also make clean 7bit in ascii compat encodings US-ASCII
<enebo[m]>
if they allow it from String that is an odd man out behavior
<headius[m]>
yeah, and the case in 1348 was not even CR_VALID anymore, so it's stretching the boundaries pretty far
<enebo[m]>
ultimately though 7bit means ascii 7 bit so it being a different encoding does not really work in anyones favor
<headius[m]>
it was US-ASCII bytes pretending to be a valid UTF-16 string
<enebo[m]>
yeah so it really makes you wonder why they removed the error
<enebo[m]>
I would not bet money but I suspect that error was us following them
<headius[m]>
that is clearly contrived, but even the valid cases where you might have something with ASCII bytes encoded as 8859-13 still would work fine if we just force them back to US-ASCII in all cases
<enebo[m]>
yeah since the 7bit would be the same chars/bytes
<headius[m]>
wanting to differentiate valid 7-bit identifier collisions is a bridge too far
<enebo[m]>
It should not have any effect other than a reflective one
<headius[m]>
and has very little value
<headius[m]>
right
<enebo[m]>
:foo and :foo should hash different
<headius[m]>
plus there's a ton of logic throughout Ruby to allow US-ASCII strings to interact non-destructively with any ASCII-compat encoding
<enebo[m]>
I mean it is a very weird real world issue...when would you want the same string with the same chars to be a different key
<enebo[m]>
yeah tons of logic which says...oh all ascii make 7bit
<enebo[m]>
And this is not even about correctness
<headius[m]>
yeah basically that's it
<enebo[m]>
They do is so they can know how to walk it quickly and get length easily
<headius[m]>
7bit is treated as US-ASCII almost everywhere, until it needs to be something else... and if that something else non-7bit, it graduates to the proper encoding
<headius[m]>
I will look into this as post .13 work after I review IR patches
<enebo[m]>
as far as m17n rules goes this is one of the easiest
<headius[m]>
you're right, it is actually more internally consistent
<enebo[m]>
anything which is 7bit will become whatever non-7bit value it is added to (so long as encoding is ascii compat.)
<headius[m]>
I suppose there's an additional case that would represent an actual flaw in JRuby: ASCII-incompatible encoding that happens to only use 7-bit characters
<enebo[m]>
The only weird bit of this is that Symbols will go US-ASCII but Strings will keep their encoding
<headius[m]>
I don't know such an encoding in general use offhand
<headius[m]>
EBCDIC would be one
<enebo[m]>
is ebcdic ascii?
<headius[m]>
it is not
<headius[m]>
so if you had an EBCDIC identifier we would incorrectly mark it as US-ASCII and it would never be retrievable as EBCDIC
<headius[m]>
I mean if that identifier only used 7-bit range, which would mean like half the alphabet and all numbers are off the table
<enebo[m]>
but I think the real problem here is not that the parser or the string functions would make that symbol wrong
<enebo[m]>
It is if somehow the same chars end up coming in from non-ebcidic source they will collide
<headius[m]>
yes, just that if you wanted to view it as EBCDIC it would be nonsense
<headius[m]>
yeah
<enebo[m]>
and I suppose if you use a gem on the internet those files will not be EBCIDIC
<headius[m]>
and I was wrong, it takes the entire alphabet off the table
<enebo[m]>
I do not want to be glib but if people want EBCIDIC symbol encodings JRuby is not for them
<headius[m]>
so it would seem unlikely you'd have a 7-bit ebcdic identifier that would be very useful
<enebo[m]>
I did actually know it was not ascii-compat
<headius[m]>
the only characters in 7-bit range are symbols
<headius[m]>
so like... the ! method would encode improperly
<headius[m]>
and ==, ===, !=, `, etc
<enebo[m]>
for (c = 'A'; c <= 'Z'; ++c) putchar(c);
<enebo[m]>
lol
<enebo[m]>
we inherited a lot of weird shit in Java as a result of C
<headius[m]>
yeah I used ebcdic as an example because it's so broken in other ways
<enebo[m]>
yeah for sure
<headius[m]>
but it's the only one I could think of that's an 8-bit encoding completely incompatible with ASCII
<enebo[m]>
but ultimately all of these discussions evolve into weird combinations
<headius[m]>
there might be some CJK encodings that are problems
<headius[m]>
EUC-JP is weird
<headius[m]>
most encodings in wide use have accepted they have to be ASCII-compat though
<headius[m]>
ASCII won
<headius[m]>
G0 is almost always an ISO-646 compliant coded character set such as US-ASCII, ISO 646:KR (KS X 1003) or ISO 646:JP (the lower half of JIS X 0201) that is invoked on GL (i.e. with the most significant bit cleared). An exception from US-ASCII is that 0x5C (backslash in US-ASCII) is often used to represent a Yen sign in EUC-JP (see below) and a won sign in EUC-KR.
<headius[m]>
so this is why you see the ¥ symbol in older Windows paths on JP machines
<headius[m]>
but otherwise it's basically compat
<enebo[m]>
yeah I do remember knowing about why there was a yen symbol at one point
<enebo[m]>
but windows codepages
<enebo[m]>
gives me enough to realize anything probably is anything somewhere in the world
<headius[m]>
yeah it's an intractable problem for locales that have not accepted unicode
<headius[m]>
but even Ruby has gone to default UTF-8 internally
<headius[m]>
if only the byte had been represented as 2^4 instead of 2^3 we might never have had this problem (and chinese would still have to use 2^5 anyway)
<enebo[m]>
I guess it all just came down to squeezing
<headius[m]>
I may have thought of an interesting idea
<headius[m]>
to blunt the cost of booting JRuby when there's no gems to load, perhaps we should disable RubyGems if we don't see a path or environment variable that would indicate where gems live
<headius[m]>
the use case is mostly for embedding JRuby, where you are already packaging all libraries you need at a root level of some jar file
<headius[m]>
if there's no gem home, rubygems serves no purpose other than to slow down boot time
<lopex[m]>
numbers!
ur5us has quit [Ping timeout: 260 seconds]
ur5us has joined #jruby
<headius[m]>
could be
<headius[m]>
we start up significantly faster without RG loaded