00:37
ur5us has quit [Ping timeout: 260 seconds]
00:50
ur5us has joined #jruby
01:03
ur5us has quit [Quit: Leaving]
05:55
nirvdrum has quit [Remote host closed the connection]
07:16
ur5us has joined #jruby
08:00
joast has quit [Ping timeout: 260 seconds]
08:35
ur5us has quit [Ping timeout: 260 seconds]
09:21
ur5us has joined #jruby
10:27
ur5us has quit [Ping timeout: 260 seconds]
12:53
nirvdrum has joined #jruby
13:21
joast has joined #jruby
14:02
lucasb has joined #jruby
14:14
<
enebo[m] >
lopex: my new code only shows two errors (here is one): wrong constant name "\x{8260}"
14:15
<
lopex >
what encoding ?
14:15
<
enebo[m] >
lopex: this is deep in some set of MRI test methods so I do not know which encoding but does that value appear to be anything to you
14:15
<
enebo[m] >
I will figure that out though
14:19
<
enebo[m] >
NAME ENCODING: EUC-JP
14:19
<
enebo[m] >
NAME: ��
14:19
<
enebo[m] >
\243\301
14:20
<
enebo[m] >
lopex: comes from *%W"\u{391} \u{ff21}".flat_map {|c| [c, c.encode("cp932"), c.encode("euc-jp")]},
14:21
<
lopex >
bleh Windows-31J again, which is just sjis
14:22
<
enebo[m] >
Windoes-31J version passes
14:22
<
enebo[m] >
the EUC-JP version of it fails
14:22
<
lopex >
yeah, just saw that
14:23
<
enebo[m] >
Sorry I just pasted the line the EUC-JP string is made from and that cp932 just happened to be on the same line
14:23
<
lopex >
yeah, doable the same way
14:24
<
enebo[m] >
lopex: ah so you fixed this always for 31J and EUC-JP will just end up the same fix?
14:25
<
enebo[m] >
This is my new classification of symbol port where symbols get an enum specifying whether it is a Const Class Identifier etc...
14:26
<
enebo[m] >
It uses a method headius already pushed (with a small change to process from an offset) IdUtil.isConstantInitial
14:27
<
enebo[m] >
Actually I doubt I need that offset at all now that I look at it
14:29
<
lopex >
so.. s = %W"\u{391} \u{ff21}".flat_map {|c| [c, c.encode("cp932"), c.encode("euc-jp")]}; Object.const_set(s[2], 1)
14:29
<
lopex >
weid [-90, -95] folds to [-90, -63]
14:30
<
lopex >
which is different so it sould pass
14:32
<
enebo[m] >
lopex: I will see what is not working...I assumed it was specifically not detecting it was a capital
14:32
<
lopex >
enebo[m]: if they're different it sohuld pass
14:32
<
enebo[m] >
lopex: it is only one codepoint right?
14:32
<
lopex >
as of if (r > 0 && (r != len || ByteList.memcmp(fold, 0, bytes, begin, r) != 0)) in idutil
14:32
<
enebo[m] >
lopex: you are assuming I ported something correctly
14:33
<
enebo[m] >
oh it is in the capitalization method
14:33
<
enebo[m] >
isConstantInitial
14:39
<
lopex >
enebo[m]: it's rb_sym_constant_char_p in mri
14:39
<
lopex >
oh, it's in the comment there
14:41
<
enebo[m] >
lopex: it is but I also ported about 60% of that method before I realized he had already made it
14:43
<
enebo[m] >
lopex: ok I see in debuffer fold and bytes have differences so I think that method is working
14:44
<
enebo[m] >
lopex: something else in my method and I see it
14:45
<
enebo[m] >
this advances through multiple bytes to figure out where capital letter is but then when I bump out of it my pointer is m++;
14:45
<
enebo[m] >
where it needs to be m+=codepoint length
14:46
<
enebo[m] >
lopex: but this is the like I ported: type = rb_sym_constant_char_p(m, e-m, enc) ? ID_CONST : ID_LOCAL;
14:46
<
enebo[m] >
I guess m is getting a new address as an outparam since it is an updated pointer?
14:48
<
lopex >
no name is not changed
14:48
<
enebo[m] >
oh I see a problem m++ I am doing is not there in MRI method so it leaves pointer at initial character
14:48
<
enebo[m] >
so it drops to id: label and walks that first codepoint all over again
14:50
<
lopex >
yeah, no loops there in the switch
14:50
<
lopex >
which can be tricky
14:50
<
enebo[m] >
no but that m++ had nothing to do with a loop
14:50
<
enebo[m] >
there is a loop in that method down below
14:51
<
enebo[m] >
my code point walking method is only reading in the first 2 bytes of that string
14:52
<
lopex >
yeah, just m += rb_enc_mbclen(m, e, enc);
14:52
<
enebo[m] >
m = ByteListHelper.eachCodePointWhile(data, m, (index, codepoint, enc) ->
14:52
<
enebo[m] >
enc.isAlnum(codepoint) || codepoint == '_' || !Encoding.isAscii(codepoint));
14:56
<
enebo[m] >
I think I see my error there now too
14:56
<
enebo[m] >
p < len should be p < end
14:57
<
enebo[m] >
boo either the debugger did not pick up the change or it is still broken
15:00
<
enebo[m] >
lopex: so it does n = 2 for first weird A and then for second it gets n = 1 for what appears to look like a spacelike thing? and it fails which returns 2 which is the index of that first thing
15:00
<
enebo[m] >
The p < end was an error but in this case begin was 0 so it made no difference
15:00
<
enebo[m] >
err p < len
15:02
<
enebo[m] >
GHAHAHAHAHA
15:02
<
enebo[m] >
mri26 -e " s = %Q{\u{391} \u{ff21}}.encode(%q{euc-jp}); Object.const_set(s, 1)"
15:02
<
enebo[m] >
-e:1:in `const_set': wrong constant name "\x{A6A1} \x{A3C1}" (NameError)
15:03
<
enebo[m] >
Perhaps my m++ removal fixed the problem by returning a different expected error vs expecting this to actually work
15:03
<
enebo[m] >
lopex: FIXED!
15:04
<
enebo[m] >
bleh...so I think my problem was the m++; which would then try and walk codepoints from byte index 1 which would be an invalid codepoint
15:04
<
lopex >
not I'm confused bot \u{391} \u{ff21} fold
15:05
<
enebo[m] >
lopex: ok well that is a fine thing to be but I am going to leave that to you since I want to verify I am at least green testwise
15:05
<
lopex >
enebo[m]: there's a space in there
15:05
<
enebo[m] >
so it was expecting an error
15:06
<
enebo[m] >
but I was getting another error because I cleaved the first byte off the first capital letter
15:06
<
enebo[m] >
which tells me there are not many tests/specs which test mbc capitals as this would fail all the time
15:07
<
enebo[m] >
lopex: I still need to make that method I displayed above have an ascii fast path
15:08
<
enebo[m] >
I do not want to scan to check though so that sort of sucks
15:08
<
enebo[m] >
I wish bytelist had CR
15:09
lanceball has quit [Changing host]
15:09
lanceball has joined #jruby
15:12
<
enebo[m] >
ok nevermind...I can still do this for symbols
15:12
<
enebo[m] >
I spaced out symbols will try as hard as possible to change encoding to US-ASCII
15:13
<
enebo[m] >
lopex: thank you for being my mbc security blanket
15:18
<
enebo[m] >
olleolleolle: thanks for putting your efforts into this
15:20
<
olleolleolle[m] >
My pleasure.
16:03
xardion has quit [Remote host closed the connection]
16:03
xardion has joined #jruby
16:26
travis-ci has joined #jruby
16:26
travis-ci has left #jruby [#jruby]
16:36
MarcinMielyskiGi has quit [Ping timeout: 246 seconds]
16:36
rg_3[m] has quit [Ping timeout: 246 seconds]
16:36
MarcinMielyskiGi has joined #jruby
16:36
rg_3[m] has joined #jruby
16:36
ThomasEEneboGitt has quit [Ping timeout: 246 seconds]
16:36
ThomasEEneboGitt has joined #jruby
16:40
<
enebo[m] >
well that will teach me to not fo a full rebuild
16:52
fzakaria[m] has quit [Ping timeout: 246 seconds]
16:52
anubhav8421[m] has quit [Ping timeout: 246 seconds]
16:52
fzakaria[m] has joined #jruby
16:52
anubhav8421[m] has joined #jruby
17:06
travis-ci has joined #jruby
17:06
travis-ci has left #jruby [#jruby]
17:59
JasonRogers[m] has quit [*.net *.split]
17:59
ChrisSeatonGitte has quit [*.net *.split]
17:59
TimGitter[m]1 has quit [*.net *.split]
17:59
HarlemSquirrel has quit [*.net *.split]
17:59
Iambchop has quit [*.net *.split]
18:03
TimGitter[m]1 has joined #jruby
18:03
JasonRogers[m] has joined #jruby
18:03
ChrisSeatonGitte has joined #jruby
18:03
HarlemSquirrel has joined #jruby
18:03
Iambchop has joined #jruby
18:33
subbu is now known as subbu|lunch
18:35
CharlesOliverNut has quit [Ping timeout: 246 seconds]
18:36
CharlesOliverNut has joined #jruby
18:58
subbu|lunch is now known as subbu
19:54
<
lopex >
enebo[m]: code2mbcBlanket
19:54
claudiuinberlin has joined #jruby
19:55
<
lopex >
enebo[m]: the bad part of mri encoding is that they have two paths involving folds
19:55
<
lopex >
enebo[m]: and they're inconsitent
19:56
<
lopex >
mbcCaseFold, caseMap, and almost applyAllCaseFold all go different paths wrt casing
19:59
lucasb has quit [Quit: Connection closed for inactivity]
20:02
<
lopex >
headius[m]: tried to come up with a more typsafe way of restricting those ranges to encodings
20:03
<
lopex >
but since the api is frozen it;s hard to do
20:54
claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]