#jruby on 2020-07-27 — irc logs at freenode.irclog.whitequark.org

2020-07-01 18:55 ChanServ changed the topic of #jruby to: Get 9.2.12.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:56 ur5us has quit [Ping timeout: 260 seconds]

01:29 ur5us has joined #jruby

04:32 ur5us has quit [Ping timeout: 260 seconds]

04:37 ur5us has joined #jruby

05:32 ur5us has quit [Ping timeout: 244 seconds]

05:58 _whitelogger has joined #jruby

06:37 _whitelogger has joined #jruby

09:35 ur5us has joined #jruby

10:50 ur5us has quit [Ping timeout: 244 seconds]

12:36 sagax has quit [Remote host closed the connection]

12:44 sagax has joined #jruby

12:59 nirvdrum has joined #jruby

15:30 nirvdrum has quit [Remote host closed the connection]

15:57 nirvdrum has joined #jruby

16:29 Antiarc_ has quit [Remote host closed the connection]

16:34 Antiarc has joined #jruby

18:35 <headius[m]> enebo: yo

18:35 <headius[m]> so where do we stand and what can I do for .13

18:37 <enebo[m]> you can review the PR but I also pinged johnathon to test again

18:37 <enebo[m]> is he on matrix?

18:38 <headius[m]> yeah johnphillips31416

18:38 <headius[m]> I will start having a look at the PR

18:39 <headius[m]> frustratingly nothing I threw at it would break, regardless of how I loaded/evaluated methods, how many threads I use, structure of the methods and nested closures

18:39 <enebo[m]> I verified dynscope removal on all of test:mri:core:jit is the same

18:39 <headius[m]> there's some complexity to the failing cases that I have not been able to reproduce synthetically (and I don't know what the failing cases actually look like

18:39 <enebo[m]> I think another thing I will try this afternoon is endlessly making new methods and then forcing them to jit

18:40 <enebo[m]> across threads

18:40 <headius[m]> I tried normal jit mode, -X-C, different thresholds... all that

18:40 <enebo[m]> if it is a timing issue eventually a race should happen

18:40 <headius[m]> logged jit output to see that things were compiling

18:40 <headius[m]> saw occasional double jitting but no failures

18:40 <enebo[m]> but it might takes hundreds to thousands of methods JITing at same time for it to fail

18:40 <headius[m]> hmmm

18:41 <enebo[m]> based on how long it seems to take to hit it I am thinking thousands of possible races before one hits it

18:41 <headius[m]> perhaps multiple independent stacks of closures in the same method

18:41 <headius[m]> so you have one stack triggering jit at method level while another stack is still happy to interpret

18:41 <enebo[m]> that was what I was trying to do originally

18:41 <headius[m]> the DynamicScope error is a closure problem for sure

18:41 <headius[m]> the stack error likely is as well

18:41 <enebo[m]> force one sibling to JIT then second sibling would not

18:42 <enebo[m]> part of the problem is an old closure activation has to be alive before a new one JITs

18:42 <enebo[m]> or happen to get used at same time the new one JITs and is used

18:43 <headius[m]> yeah that makes sense

18:44 <enebo[m]> I guess perhaps a Thread which lives with a closure which reaches out of it but then a parent closure jits outside the thread?

18:46 <headius[m]> oh that's a thought too

19:03 <headius[m]> ok I still can't make it fail

19:03 <headius[m]> I'm going to review

19:19 <headius[m]> enebo: I wanted to close the loop on this symbol experiment: https://github.com/jruby/jruby/pull/6341#issuecomment-664588352

19:19 <headius[m]> basically the same as my messages to you on Friday but with one possible enhancement we could make: force all 7-bit clean identifiers to always use US-ASCII, explicitly acknowledging that it's not possible to have differently-encoded 7-bit identifiers

19:20 <headius[m]> I was not quite sure if we're doing that now

19:22 <headius[m]> I don't see this limitation as particularly damning, since it's pretty edgy to be running a system with lots of mixed-encoding identifiers in the first place, and even edgier to be doing so with ASCII-compatible 7-bit identifiers and expecting them to all be different logical identifiers separated only by encoding

19:22 <enebo[m]> Well 7bit == USASCII was the intent and it should work that way coming in through the parser

19:22 <headius[m]> ok, then it may be this only needs a "fix" in String#intern to always force such symbols to US-ASCII

19:22 <headius[m]> that is the root case from #1348

19:23 <enebo[m]> it has to be ascii compatible encoding as a source

19:23 <enebo[m]> yeah this is probably just that code path itself

19:23 <headius[m]> it will be an explicitly behavioral difference from CRuby, but it will be clearly spelled out: if you have 7-bit ASCII bytes it will be a US-ASCII symbol, always

19:24 <enebo[m]> It is more inconsistent on their parts if so

19:24 <headius[m]> that will avoid the cases where the first symbol intern that "wins" has some other encoding and others weirdly inherit it

19:24 <enebo[m]> their parser will also make clean 7bit in ascii compat encodings US-ASCII

19:24 <enebo[m]> if they allow it from String that is an odd man out behavior

19:25 <headius[m]> yeah, and the case in 1348 was not even CR_VALID anymore, so it's stretching the boundaries pretty far

19:25 <enebo[m]> ultimately though 7bit means ascii 7 bit so it being a different encoding does not really work in anyones favor

19:25 <headius[m]> it was US-ASCII bytes pretending to be a valid UTF-16 string

19:25 <enebo[m]> yeah so it really makes you wonder why they removed the error

19:26 <enebo[m]> I would not bet money but I suspect that error was us following them

19:26 <headius[m]> that is clearly contrived, but even the valid cases where you might have something with ASCII bytes encoded as 8859-13 still would work fine if we just force them back to US-ASCII in all cases

19:26 <enebo[m]> yeah since the 7bit would be the same chars/bytes

19:26 <headius[m]> wanting to differentiate valid 7-bit identifier collisions is a bridge too far

19:26 <enebo[m]> It should not have any effect other than a reflective one

19:26 <headius[m]> and has very little value

19:26 <headius[m]> right

19:27 <enebo[m]> :foo and :foo should hash different

19:27 <headius[m]> plus there's a ton of logic throughout Ruby to allow US-ASCII strings to interact non-destructively with any ASCII-compat encoding

19:27 <enebo[m]> I mean it is a very weird real world issue...when would you want the same string with the same chars to be a different key

19:27 <enebo[m]> yeah tons of logic which says...oh all ascii make 7bit

19:27 <enebo[m]> And this is not even about correctness

19:27 <headius[m]> yeah basically that's it

19:28 <enebo[m]> They do is so they can know how to walk it quickly and get length easily

19:28 <headius[m]> 7bit is treated as US-ASCII almost everywhere, until it needs to be something else... and if that something else non-7bit, it graduates to the proper encoding

19:29 <headius[m]> I will look into this as post .13 work after I review IR patches

19:29 <enebo[m]> as far as m17n rules goes this is one of the easiest

19:29 <headius[m]> you're right, it is actually more internally consistent

19:29 <enebo[m]> anything which is 7bit will become whatever non-7bit value it is added to (so long as encoding is ascii compat.)

19:30 <headius[m]> I suppose there's an additional case that would represent an actual flaw in JRuby: ASCII-incompatible encoding that happens to only use 7-bit characters

19:30 <enebo[m]> The only weird bit of this is that Symbols will go US-ASCII but Strings will keep their encoding

19:30 <headius[m]> I don't know such an encoding in general use offhand

19:30 <headius[m]> EBCDIC would be one

19:30 <enebo[m]> is ebcdic ascii?

19:30 <headius[m]> it is not

19:31 <headius[m]> so if you had an EBCDIC identifier we would incorrectly mark it as US-ASCII and it would never be retrievable as EBCDIC

19:31 <headius[m]> I mean if that identifier only used 7-bit range, which would mean like half the alphabet and all numbers are off the table

19:31 <enebo[m]> but I think the real problem here is not that the parser or the string functions would make that symbol wrong

19:32 <enebo[m]> It is if somehow the same chars end up coming in from non-ebcidic source they will collide

19:32 <headius[m]> yes, just that if you wanted to view it as EBCDIC it would be nonsense

19:32 <headius[m]> yeah

19:32 <enebo[m]> and I suppose if you use a gem on the internet those files will not be EBCIDIC

19:32 <headius[m]> there's a table here: https://en.wikipedia.org/wiki/EBCDIC

19:33 <headius[m]> and I was wrong, it takes the entire alphabet off the table

19:33 <enebo[m]> I do not want to be glib but if people want EBCIDIC symbol encodings JRuby is not for them

19:33 <headius[m]> so it would seem unlikely you'd have a 7-bit ebcdic identifier that would be very useful

19:33 <enebo[m]> I did actually know it was not ascii-compat

19:33 <headius[m]> the only characters in 7-bit range are symbols

19:34 <headius[m]> so like... the ! method would encode improperly

19:35 <headius[m]> and ==, ===, !=, `, etc

19:35 <enebo[m]> for (c = 'A'; c <= 'Z'; ++c) putchar(c);

19:35 <enebo[m]> lol

19:35 <enebo[m]> we inherited a lot of weird shit in Java as a result of C

19:36 <headius[m]> yeah I used ebcdic as an example because it's so broken in other ways

19:36 <enebo[m]> yeah for sure

19:36 <headius[m]> but it's the only one I could think of that's an 8-bit encoding completely incompatible with ASCII

19:36 <enebo[m]> but ultimately all of these discussions evolve into weird combinations

19:36 <headius[m]> there might be some CJK encodings that are problems

19:36 <headius[m]> EUC-JP is weird

19:37 <headius[m]> most encodings in wide use have accepted they have to be ASCII-compat though

19:37 <headius[m]> ASCII won

19:38 <headius[m]> G0 is almost always an ISO-646 compliant coded character set such as US-ASCII, ISO 646:KR (KS X 1003) or ISO 646:JP (the lower half of JIS X 0201) that is invoked on GL (i.e. with the most significant bit cleared). An exception from US-ASCII is that 0x5C (backslash in US-ASCII) is often used to represent a Yen sign in EUC-JP (see below) and a won sign in EUC-KR.

19:38 <headius[m]> so this is why you see the ¥ symbol in older Windows paths on JP machines

19:38 <headius[m]> but otherwise it's basically compat

19:44 <enebo[m]> yeah I do remember knowing about why there was a yen symbol at one point

19:44 <enebo[m]> but windows codepages

19:44 <enebo[m]> gives me enough to realize anything probably is anything somewhere in the world

19:46 <headius[m]> yeah it's an intractable problem for locales that have not accepted unicode

19:46 <headius[m]> but even Ruby has gone to default UTF-8 internally

19:48 <headius[m]> if only the byte had been represented as 2^4 instead of 2^3 we might never have had this problem (and chinese would still have to use 2^5 anyway)

19:50 <enebo[m]> I guess it all just came down to squeezing

20:31 <headius[m]> enebo: I have opened https://github.com/jruby/jruby/issues/6344 for the additional work we discussed

20:49 ur5us has joined #jruby

22:42 <headius[m]> hmmm

22:42 <headius[m]> I may have thought of an interesting idea

22:43 <headius[m]> to blunt the cost of booting JRuby when there's no gems to load, perhaps we should disable RubyGems if we don't see a path or environment variable that would indicate where gems live

22:43 <headius[m]> the use case is mostly for embedding JRuby, where you are already packaging all libraries you need at a root level of some jar file

22:43 <headius[m]> if there's no gem home, rubygems serves no purpose other than to slow down boot time

22:49 <lopex[m]> numbers!

22:50 ur5us has quit [Ping timeout: 260 seconds]

22:54 ur5us has joined #jruby

22:55 <headius[m]> could be

22:55 <headius[m]> we start up significantly faster without RG loaded