ur5us has quit [Ping timeout: 240 seconds]
ur5us has joined #jruby
<headius[m]> it's merged, welcome to Ruby 2.6 support on master
dhoc has quit [Quit: dhoc]
dhoc has joined #jruby
dhoc has quit [Quit: dhoc]
<byteit101[m]> Should I use `java_annotation 'annotation'; java_signature 'void myMethod()'` or `java_signature '@annotation void myMethod()'`? Both seem to be partially supported...
<headius[m]> hmm I think enebo wrote the parser for that so I'm not sure
<headius[m]> I didn't know both of those work :-)
<byteit101[m]> Yea, just doing a git blame on NoMethodErrors and I see enebo is to blame :-)
<byteit101[m]> The former works only for jrubyc, the latter works only for become_java! on my branch
<byteit101[m]> The former saves nothing for become_java, the latter throws up with ` @annotationpublic void myMethod() {` for jrubyc (note the lack of space. will file a bug)
<headius[m]> hmm I wonder what kind of tests we have for this stuff
<byteit101[m]> minimal. annotation parameters are a explosion of exceptions
<byteit101[m]> Though `@javax.annotation.Resource(description=@java.lang.Deprecated("Testing"))` parses, which I didn't know was possible
<byteit101[m]> javac stuff and non-parameterized annotations seem well enough tested though
ur5us has quit [Ping timeout: 252 seconds]
ur5us has joined #jruby
shellac has joined #jruby
ur5us has quit [Ping timeout: 256 seconds]
shellac has quit [Ping timeout: 256 seconds]
xardion has quit [Remote host closed the connection]
shellac has joined #jruby
xardion has joined #jruby
bga57 has quit [*.net *.split]
lanceball has quit [*.net *.split]
bga57 has joined #jruby
lanceball has joined #jruby
shellac has quit [Ping timeout: 240 seconds]
shellac has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
<lopex> headius[m]: looking
<lopex> ONIG_ENCODING_UTF8->is_code_ctype(498, ONIGENC_CTYPE_UPPER, ONIG_ENCODING_UTF8)
<lopex> zero
<lopex> so it's not a jcodings issue
<lopex> enebo[m]: you here ?
<headius[m]> lopex: hmm
<lopex> headius[m]: we;re missing quite bit of logic for is_special_global_name
<lopex> and rb_sym_constant_char_p
<lopex> there's a lot of logic
<headius[m]> I don't doubt it
<headius[m]> but does something in there magically treat this as upper-case?
<lopex> rb_sym_constant_char_p - the name suggests that someone made it public for debug purposes at some point
<lopex> if (ISASCII(*name)) return ISUPPER(*name);
<lopex> but then
<lopex> yea!
<lopex> if (rb_enc_isctype(c, ctype_titlecase, enc)) return TRUE;
<lopex> from static const UChar cname[] = "titlecaseletter"; core range
<lopex> headius[m]: ^^
<headius[m]> titlecase
<lopex> headius[m]: but we're missing almost all unicode / encoding supoort for symbol name walking
<headius[m]> that's a new term to me!
<headius[m]> JDK provides isTitleCase
<lopex> but it's equally easy to do in jcodings
shellac has joined #jruby
<headius[m]> sure
<headius[m]> let's do that
<lopex> mri also caches ctype in static function variable there
<headius[m]> $ jruby -e 'Object.const_set("Dz", 1); p Object.const_get("Dz")'
<headius[m]> 1
<headius[m]> woot
<headius[m]> titlecase does work
<lopex> jdk one ?
<headius[m]> yeah
<lopex> what unicode version jdks provide ?
<headius[m]> for Java 8
<headius[m]> looking for a way to query that
<lopex> in any case
<lopex> byte[] titleCase = "titlecaseletter".getBytes();
<lopex> enc.isCodeCType(498, ctype);
<lopex> int ctype = enc.propertyNameToCType(titleCase, 0, titleCase.length);
<lopex> they check both rb_enc_isupper and rb_enc_islower
<lopex> so that explains three variants
<lopex> that Dz i neither
<lopex> *is neither
<headius[m]> well it's elaborate but not too much code
<lopex> headius[m]: that's a fraction of the whole thing
<lopex> well like 1/3
<headius[m]> we have logic for these other things around
<lopex> I cant even see where it's encoding aware
<headius[m]> are you looking at IdUtil?
<lopex> no, at RubySymbol
<lopex> well, then there's som duplication
<headius[m]> RubySymbol.validConstantName has the checks for leading caps
<headius[m]> IdUtil is using Java strings but we don't keep unicode constant names as normal Java strings
<lopex> indeed
<headius[m]> so this is the bulk of the logic
<lopex> it's fallsback to case folding
<lopex> for other encodings too
<headius[m]> lopex: I filed this about IdUtil and cleaning up this stuff: https://github.com/jruby/jruby/issues/6144
<lopex> headius[m]: what's the equivalent of https://github.com/ruby/ruby/blob/master/template/id.h.tmpl#L23 ?
<headius[m]> I don't think we have anything
<headius[m]> we could add it but we don't track what type of symbol a symbol is anywhere right now
<headius[m]> I have seen this code before too... it's not used much but it does avoid having to re-scan the symbol
<headius[m]> so it would be worth it for that
subbu is now known as subbu|lunch
<headius[m]> lopex: I think this will pass if I just add a titlecase check for now, using JDK isTitleCase
<headius[m]> but it would be very nice to get the symbol type logic in here
<headius[m]> are all isUpper characters also isTitleCase I wonder
<headius[m]> doesn't get the test passing without the case folding part it looks like
<headius[m]> but it gets past those first few odd chars
<lopex> headius[m]: and with jcodings ?
<lopex> it should be ok using jcodings
<lopex> Dz is not upper
<headius[m]> well I was just going to add the Character.isTitleCase to this check
<headius[m]> but I'm trying to port nobu's case folding version now
<headius[m]> I don't get this static titlecase stuff in the middle
<lopex> but do you also check for islower and isupper ?
<lopex> for unicode ?
<lopex> you need to
<headius[m]> can it be titlecase and lower
<headius[m]> ?
<headius[m]> I assumed isupper || istitle would cover it
<lopex> it can be title and not upper
<lopex> just mimic what mri does
shellac has quit [Quit: Computer has gone to sleep.]
<headius[m]> ok this logic in the middle is just because they don't have a titlecase type?
<lopex> title case is separate from lower and upper
<headius[m]> so they have to look it up from onigmo
<headius[m]> right ok they look up the ctype for "titlecaseletter" characters
<headius[m]> and save that in a static for future calls
<lopex> and islowet and isupper as well
<headius[m]> so meh I don't need this
<lopex> you need all three
<headius[m]> oh this case folding stuff goes at the tables directly
<headius[m]> boo
<lopex> the one for non unicode ?
<headius[m]> well the last part of nobu's logic
<headius[m]> I have the first part
<lopex> first check is for unicode
<lopex> upper then lower then title right ?
<headius[m]> yes
<lopex> the non unocode is folding
<lopex> why not title case from jcodings ?
<headius[m]> we don't have a method for title case
<lopex> jdk might have obsolete tables
<lopex> I posted an example abouve
<headius[m]> oh
<lopex> :
<lopex> byte[] titleCase = "titlecaseletter".getBytes();
<lopex> int ctype = enc.propertyNameToCType(titleCase, 0, titleCase.length);
<lopex> enc.isCodeCType(498, ctype);
<lopex> the other one is pretty straight too
shellac has joined #jruby
<headius[m]> I see
<headius[m]> I think we should add isTitleCase :-)
<lopex> and gazzilion of others :P
<lopex> but yeah
<lopex> I think you can hardcode ctype, it sohuldnt change
<headius[m]> we can add the gazillion later
<lopex> or we can add one to CharacterType in jcodings
<headius[m]> yeah it's just leaky abstraction
shellac has quit [Client Quit]
<lopex> yeah, the onse in CharacterType should match those lookep up by name
<lopex> *ones
<headius[m]> ah yeah
<headius[m]> so I guess 15 it is
lucasb has joined #jruby
<headius[m]> where are these CharacterType used?
<lopex> in isUpper for example
<headius[m]> oh yeah I see now
<headius[m]> I guess I'm done
<lopex> CASE_TITLECASE is for case folding, not for char type
<lopex> hmm int r = enc->mbc_case_fold(ONIGENC_CASE_FOLD
<lopex> it's wrong
<lopex> case_fold and case_map are different mechanisms
<headius[m]> ok
<lopex> it's not ok
<lopex> wait
<headius[m]> hmm
<headius[m]> assertTrue(UTF8Encoding.INSTANCE.isTitle("Dz".codePointAt(0)));
<headius[m]> fails
<headius[m]> public final boolean isTitle(int code) {
<headius[m]> return isCodeCType(code, CharacterType.TITLE);
<headius[m]> }
<lopex> you need to shift that
<lopex> er
<headius[m]> I'm doing what isUpper does
<lopex> yeah
<headius[m]> I'll push WIP PR
shellac has joined #jruby
<lopex> headius[m]: ctype for title is 41
<lopex> it's not an arbitrary number
<headius[m]> so what's up with CharacterType then
<lopex> it's what you get from propertyNameToCType
<lopex> you get name -> ctype
<lopex> and lookup using ctype
<lopex> those in CharacterType are for convenience
<headius[m]> ah because they are the same
<lopex> so they can be used in isUpper etc
<headius[m]> but this one may differ across encodings?
shellac has quit [Client Quit]
<lopex> no
<headius[m]> ok, think I have it then
<headius[m]> passing now
<lopex> unicode uses UnicodeProperties.CodeRangeTable
<lopex> and those indexes match up for other encodings
<headius[m]> check out PR now
<lopex> for other there are char type maps (where in int array the bits describe the chars)
<headius[m]> basically what nobu's code was doing with the static ctype int
<lopex> why not hardcode the 15 ?
<headius[m]> what is the significance of the 15
<lopex> it's the same like other constants in CharacterType
<headius[m]> versus ctype of 41
<lopex> every code range has it's own ctype
<lopex> and it avoids hash lookup
<headius[m]> but what do I do with the 15
<headius[m]> this is using the lookup logic and caching the ctype that results in the encoding object
<lopex> yeah, but would you add a field for any other ctype ?
<lopex> I agree it's not perfect but it's how it is in onigmo
<headius[m]> no, but this is special
<headius[m]> I haven't had a need for any other ctype
<lopex> and it pollutes Encoding.java with unicode bits
<headius[m]> ok so that clarifies something for me I guess... other encodings may or may not have the concept of a title case
<lopex> yeah, that too
<headius[m]> I could simply move this down to UnicodeEncoding
<headius[m]> but at that point is the ctype always just 41 then?
<headius[m]> I want to do this right
<lopex> I think I'd do that mri way here
<headius[m]> nobu's logic already confirmes we're dealing with unicode so moving this to UnicodeEncoding seems to make sense to me
<lopex> the title case is used only once in the entire code base
<headius[m]> yeah but then I have to cache the ctype in JRuby
<headius[m]> which seems wrong
<lopex> you can cache it in private static
<lopex> it will never change across runtimes
<lopex> yeah, I get it
<headius[m]> so you don't think I should add isTitle to jcodings
<headius[m]> because it doesn't exist in onigmo
<lopex> what is special there ?
<lopex> you'll do as you wish but I think we can come up with something better
<headius[m]> 🤷‍♂️
<lopex> like cacheable hash for those
<lopex> for unicode ?
<headius[m]> well my justification is that nobody outside jcodings should have to know about ctype lookup just to check if a codepoint is titlecase
<lopex> so one can happilly use isType("titlecase")
<lopex> yeah
<lopex> ^
<headius[m]> maybe it's not as special as isUpper but it's more special than currency symbol
<headius[m]> I understan what you're saying... why add this when it's just one of many ctypes
<headius[m]> my answer is that it's the first non-standard ctype I've needed
<headius[m]> so 🤷‍♂️
<lopex> I think CodeRangeTable could have yet another hash int -> code range
<lopex> er string => int
<lopex> then we could provide isType(someName)
<headius[m]> and do a hash lookup each time?
<lopex> well,
<lopex> or I could generate named constants
<lopex> isType(TITLE_CASE)
<lopex> ?
<headius[m]> hmm
<headius[m]> how about enums
<lopex> yeah, they would provide name lookup too
<headius[m]> could CodeRangeEntry be an enum then?
<headius[m]> I mean lots of things in jcodings could/should be enums but this is a good place to start
<headius[m]> and CodeRangeEntry is not public right now so we can do whatever we want
<lopex> enum indexes follow definition order ?
<headius[m]> yes
<headius[m]> they always have a sequential ordinal
<lopex> but we cant provide byte[] lookup ?
<headius[m]> I guess we can but I believe EnumSet is a perfect hash using ordinals
<headius[m]> EnumMap or whatever I mean
<headius[m]> I'm fine with not adding isTitle if we add something that can use constants or enums and avoid a string or byte[] hash search
subbu|lunch is now known as subbu
<headius[m]> hmm
<headius[m]> RuntimeError: class not found for encoding "CESU-8"
<headius[m]> generate_encoding_list at generate.rb:86
<headius[m]> they've added some encodings
<lopex> yep
<lopex> that one
<headius[m]> yeah just that one I guess
<lopex> commited bits to ignore it for now
<headius[m]> yeah I made it skip
<lopex> it's a utf-8 copy
<headius[m]> yeah looks like it
<lopex> I meant commited changes to the script
<headius[m]> yup I'll pick them up
<lopex> weird define for "GB2312" vanished
<lopex> it's replicated now
<headius[m]> hmm objdump logic is failing now
<lopex> havent tested it on osx
<lopex> but owrks for me here on linux
<lopex> gobj_dump might have different output ?
<headius[m]> that could be
<headius[m]> haha
<headius[m]> header line
<headius[m]> but no data?
<lopex> stripped ?
<headius[m]> dunno
<headius[m]> well basically I'm trying to do this
<lopex> no changes in tables though
<byteit101[m]> How do I convert unsigned to signed across the ruby-java boundary? (0xffff_0000_aaaa_5555.to_java Java::long) => RangeError (bignum too big to convert into `long')
<headius[m]> hmm
<headius[m]> that's a good question
<headius[m]> that does yield a bignum
<byteit101[m]> (Same thing for all max values of all types too, though those can be marshalled through longs at least)
<headius[m]> yeah
<headius[m]> I'm guessing this is a color?
<byteit101[m]> No, I'm implementing java_signature parsing&codegen for java proxies
<byteit101[m]> */java_annotation
<byteit101[m]> @Annotation(longvalue=0xffff_0000_aaaa_5555)
<headius[m]> ah ok
<byteit101[m]> I added hex to the lexer/parser, and realized this issue. Or should we not support unsigned hex :-)
<headius[m]> I suppose we should support any literal format Java itself supports
<headius[m]> there's nothing to indicate that this is intended to be an unsigned long though so it wouldn't parse in Java I think
<byteit101[m]> though annotations in java_signature already is different from java in that you must use MyClass.java_class. Not sure if that is a bug or not?
<headius[m]> yeah Java says "Integer number too large"
<headius[m]> shouldn't have to do that
<headius[m]> we should see it's a Java proxy class and do the right thing
<headius[m]> I'm trying to find a way to get raw long bits from a bignum
<byteit101[m]> Ok, I'll fix that java_class bug while I'm mucking around here
<headius[m]> aha
<headius[m]> there's Long.toUnsignedString
<headius[m]> tell me there's a reverse
<headius[m]> aha, parseUnsignedLong
<headius[m]> so at least we have that way... java.lang.Long.parseUnsignedLong
<byteit101[m]> How did I miss that? Works for me
<headius[m]> $ jruby -e "p java.lang.Long.parseUnsignedLong('ffff0000aaaa5555', 16)"
<headius[m]> -281472113420971
<headius[m]> it doesn't like 0x or underscores
<byteit101[m]> pushed more changes to the concrete_java branch/pr, more to come for this
<headius[m]> cool
<byteit101[m]> aww... though I already strip out the latter for Integer() right now. Can change that
<byteit101[m]> Any way to avoid the extra to_java? (java.lang.Long.parseUnsignedLong("fe", 16).to_java :long).byteValue()
shellac has joined #jruby
<byteit101[m]> Yay, this now parses: @com.test.EverythingAnnotation(astr="foo", abyte=0xe9, ashort=65500, anint=-495000, along=0xdead_beef_00_ff00ff, afloat=12.633, adouble=-0.009995, abool=true, anbool=false, achar='q', Darray={@javax.annotation.Resource(description=":-("), @javax.annotation.Resource(description=":-)")}, anenum=java.lang.annotation.RetentionPolicy.RUNTIME, aClass=java.lang.String.java_class)
shellac has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
<headius[m]> is the extra to_java needed?
<headius[m]> the return type from parseUnsignedLong should be a Fixnum, which would naturally convert to long
<byteit101[m]> Yea, haven't figured out away to avoid all the back and forth. without to_java it's a fixnum, but byteValue is on Long: (undefined method `byteValue' for 254:Integer)
<headius[m]> oh
<headius[m]> to_java(:byte) should be the same, but that's going to truncate the long in either case
<headius[m]> ah but this isn't the long string
<headius[m]> yeah to_java(:byte)
<byteit101[m]> (too big for byte: 254)
<headius[m]> damn
<headius[m]> because it's signed
<byteit101[m]> yea :-)
<headius[m]> damn you java
<byteit101[m]> What I implemented there looks so gross, but I haven't figured out a better way
shellac has quit [Quit: Computer has gone to sleep.]
ur5us has joined #jruby
<headius[m]> well we can do it with bit math I guess
<byteit101[m]> I tried that, but was getting weird results
<byteit101[m]> (again due to signed vs unsigned)
<byteit101[m]> Going signed -> unsigned worked fine though, just not the reverse
ur5us has quit [Quit: Leaving]
<headius[m]> my brain isn't working today
<byteit101[m]> At least what I have works
<headius[m]> so we have unsigned bits and need to interpret them as signed without changing them
<headius[m]> yeah
<byteit101[m]> correct
<byteit101[m]> I was thinking pack/unpack and marshalling though a string might be another way, though it still seems icky
NightMonkey has quit [Ping timeout: 265 seconds]
<byteit101[m]> slightly different thing: is it possible to call a java method but to not wrap it into a ruby object?
<headius[m]> not really
<headius[m]> not for primitives anyway
<byteit101[m]> hmm... drat
NightMonkey has joined #jruby
<byteit101[m]> Workaround works! current PR now supports all annotation types on methods except char
lucasb has quit [Quit: Connection closed for inactivity]