<nirvdrum>
jruby 9.2.0.0 (2.5.0) 2018-05-24 81156a8 Java HotSpot(TM) 64-Bit Server VM 25.181-b13 on 1.8.0_181-b13 +jit [linux-x86_64]
<lopex>
nirvdrum: yeah, that's master
<nirvdrum>
I just recently tracked down an encoding bug we inherited from Rubinius that only presented itself when printing error messages from failing MRI tests.
<nirvdrum>
And since it failed while printing the message, I could never tell which test actually failed.
<nirvdrum>
That was annoying to track down.
<lopex>
yeah, those are the worst
<lopex>
nirvdrum: but that approx thing, we'll need it at some point soon
<lopex>
so for example
<lopex>
"\u{1F48C}".bytes -> [240, 159, 146, 140]
<lopex>
if you'll feed enclen(enc, p, 2) to that
<nirvdrum>
Sorry. You'll have to refresh my memory.
<lopex>
our default impl will return -3
<nirvdrum>
What's this change all about?
<lopex>
which means there's two bytes missing
<nirvdrum>
MRI 2.4.4 yields `nil` for that expression.
<lopex>
yes, that's irreevant
<lopex>
*irrelevant
<nirvdrum>
That's fine. But I'm lost on what changed in MRI. Is there a new feature? Or was it just a bug before?
<lopex>
nirvdrum: the approx thing makes onigmo immune to broken chars
<lopex>
so let;s start with c here, char *p = {240, 159, 146, 140} // disregard signedness here for now
<lopex>
this is that "\u{1F48C}"
<nirvdrum>
Isn't it already "immune"? It just returns the first byte in a broken MBC.
<nirvdrum>
MRI really just needs to stop allowing broken strings. It's crazy.
<lopex>
nirvdrum: it's onigmo thing
<lopex>
and mri uses that as well, in places
subbu|lunch is now known as subbu
<lopex>
so for our length if we pass enc.length(p, 0, 2)
<lopex>
but since they use approx they move forward
<lopex>
and we ooib
<nirvdrum>
Case folding is only done for downcasing, right?
<lopex>
yeah
sgeorge has quit [Remote host closed the connection]
<nirvdrum>
Okay. Getting my head back into this.
<lopex>
er, I know why end - p is 2
<lopex>
because of /\=\?/
<lopex>
which is two chars in the regexp
<nirvdrum>
I'm still trying to build a version of MRI where I can see that.
<lopex>
so, if there's constant parts in regexp then onigmo will create that fast skip thing and will try to find that without entering interpreter loop
sgeorge has joined #jruby
<lopex>
nirvdrum: so in essence it will try to compare first two bytes of "\u{1F48C}" with "=?"
<lopex>
and length for that will definitely be a correct one
<lopex>
*will not
<lopex>
lolz
<lopex>
why do I skip "not" at times
jmalves has joined #jruby
<lopex>
I'll run joni with optimization flag off
<lopex>
yeah, the opcode is immune
jmalves has quit [Ping timeout: 268 seconds]
<lopex>
nirvdrum: it would be easier with onigmo alone