#jruby on 2018-03-07 — irc logs at freenode.irclog.whitequark.org

2018-02-21 20:49 ChanServ changed the topic of #jruby to: Get 9.1.16.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:11 shellac_ has quit [Quit: Computer has gone to sleep.]

00:15 shellac_ has joined #jruby

00:16 shellac_ has quit [Client Quit]

01:00 bbrowning_away is now known as bbrowning

01:54 lroca has joined #jruby

02:08 Puffball has quit [Ping timeout: 260 seconds]

02:08 Puffball_ has joined #jruby

02:42 lroca has quit [Quit: lroca]

04:13 Puffball_ has quit [Remote host closed the connection]

04:24 enebo has quit [Ping timeout: 240 seconds]

04:24 yosafbridge` has quit [Ping timeout: 240 seconds]

04:25 enebo has joined #jruby

04:27 yosafbridge has joined #jruby

04:35 sidx64 has joined #jruby

04:38 sidx64 has quit [Client Quit]

05:59 sidx64 has joined #jruby

06:04 sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

06:25 sidx64 has joined #jruby

06:28 Guest68225 has quit [Ping timeout: 245 seconds]

06:29 me_ has joined #jruby

06:32 sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

06:40 sidx64 has joined #jruby

06:49 sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

07:32 sidx64 has joined #jruby

07:42 mkristian has joined #jruby

08:04 _whitelogger_ has joined #jruby

08:45 shellac_ has joined #jruby

08:53 claudiuinberlin has joined #jruby

09:00 shellac_ has quit [Quit: Computer has gone to sleep.]

09:46 shellac_ has joined #jruby

10:03 drbobbeaty has joined #jruby

10:04 rrutkowski has joined #jruby

10:09 rrutkowski has quit [Ping timeout: 255 seconds]

10:28 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

10:57 olle has joined #jruby

11:39 mkristian has quit [Quit: This computer has gone to sleep]

11:50 bzb has joined #jruby

12:02 fidothe has quit [Ping timeout: 240 seconds]

12:03 fidothe has joined #jruby

12:07 bzb has quit [Quit: Leaving]

12:10 drbobbeaty has joined #jruby

12:18 olle_ has joined #jruby

12:18 olle has quit [Ping timeout: 240 seconds]

12:18 olle_ is now known as olle

12:24 mkristian has joined #jruby

12:34 drbobbeaty has quit [Ping timeout: 240 seconds]

12:36 shellac_ has quit [Quit: Computer has gone to sleep.]

12:51 mkristian has quit [Quit: This computer has gone to sleep]

12:58 mkristian has joined #jruby

13:00 drbobbeaty has joined #jruby

13:32 bbrowning is now known as bbrowning_away

13:35 sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

13:56 drbobbeaty has quit [Ping timeout: 245 seconds]

14:34 mkristian has quit [Quit: This computer has gone to sleep]

14:41 mkristian has joined #jruby

15:56 bbrowning_away is now known as bbrowning

16:17 <enebo> lopex: I am going to put out 1.0.28 of jcodings. I want the EUC-JP fixes

16:18 GitHub120 has joined #jruby

16:18 GitHub120 has left #jruby [#jruby]

16:18 <GitHub120> jcodings/master 3b6f3a4 Thomas E. Enebo: [maven-release-plugin] prepare release jcodings-1.0.28

16:18 <GitHub120> [jcodings] enebo pushed 1 new commit to master: https://git.io/vAbuv

16:18 GitHub41 has joined #jruby

16:18 GitHub41 has left #jruby [#jruby]

16:18 <GitHub41> [jcodings] enebo tagged jcodings-1.0.28 at 0d15ce3: https://git.io/vAbuU

16:18 GitHub83 has joined #jruby

16:18 GitHub83 has left #jruby [#jruby]

16:18 <GitHub83> jcodings/master 32707e8 Thomas E. Enebo: [maven-release-plugin] prepare for next development iteration

16:18 <GitHub83> [jcodings] enebo pushed 1 new commit to master: https://git.io/vAbuT

16:19 <lopex> enebo: ok

16:19 <lopex> nirvdrum: ^^

16:20 <lopex> enebo: do they work btw ?

16:20 <enebo> hahaha

16:20 <enebo> lopex: I hope so

16:20 <enebo> lopex: I will give it a quick check. Silly I did not bother. I trust you so much

16:20 <nirvdrum> lopex: Awesome. Thanks.

16:20 <lopex> enebo: this is the most annoying thing to test in jcodings

16:21 <enebo> lopex: well I have a test case for sure

16:21 <enebo> const_set and const_defined? now rely on these for identification ofwhether it is valid constant name

16:21 <enebo> whereas in the past we used jlString

16:22 <enebo> lopex: speaking of fun!!!! if you wanted to be a rock star you could add flag/enum support for RubySymbols so we can mark what type of identifier they can represent

16:22 <enebo> lopex: MRI added this a while back whereas we O(n) check over and over

16:22 mkristian has quit [Quit: This computer has gone to sleep]

16:23 <enebo> sorry I meant rock star ninja

16:23 mkristian has joined #jruby

16:24 <lopex> enebo: what are those types ?

16:24 <enebo> oh let me get a link

16:26 <enebo> lopex: symbol.h things like is_const_id

16:27 <lopex> oh ruby_id_types

16:28 <enebo> doh...hmm maybe they don't cache that

16:28 <enebo> I thought they did

16:31 <enebo> lopex: oh hmm seems to still be a problem

16:32 <enebo> lopex: I will debug this to make sure

16:36 <lopex> nirvdrum: this shouldnt be needed in new joni btw https://github.com/oracle/truffleruby/blob/master/src/main/java/org/truffleruby/core/regexp/RegexpNodes.java#L91

16:37 <GitHub184> [jruby] greghuc opened issue #5082: Puma web server busted on Java 9.0.4 https://git.io/vAbgH

16:39 <enebo> lopex: isCodeCType(42699, 13) fails for 'λ' for EUC-JP

16:39 <lopex> enebo: might be char type offset issue, looking

16:39 <enebo> lopex: this code looks weirdf

16:40 <enebo> isWordGraphPrint does not contain ALNUM as valid

16:40 <enebo> but ALNUM(13) is less than MAX_STD_CTYPE(14)

16:41 <enebo> so either their is missing logic or isWordGraphPrint is not permissive enough

16:41 <lopex> well, this matches mri

16:42 <lopex> but yeah, those char types are broke on mri too

16:42 <enebo> oh

16:42 olle has quit [Quit: olle]

16:42 <lopex> on another sense :P

16:42 <enebo> you said they have inlined some of these checks outside of this

16:42 <enebo> somewhere not in this code right?

16:42 <enebo> in MRI

16:54 <lopex> enebo: ONIG_ENCODING_EUC_JP->is_code_ctype(42699, 13, ONIG_ENCODING_EUC_JP) is zero too

16:55 <enebo> so lambda is not an ALNUM

16:55 <enebo> from ONIG perspective

16:58 <enebo> lopex: it is frustrating because in unicode it goes down other path to isInCodeRange and returns true for ALNUM

16:59 <lopex> enebo: and those have different ranges too

17:00 <enebo> yeah

17:00 <enebo> http://www.fileformat.info/info/unicode/char/03bb/index.htm

17:00 <lopex> well, isWord also should go for ranges

17:00 <lopex> everything

17:00 <enebo> of course I go to EUC-JP page and it links to unicode entry but it is an ALNUM

17:00 <enebo> I think onigmo is just wrong here

17:00 <enebo> oh hmm

17:01 <enebo> should I use isWord and not isAlnum?

17:01 <enebo> lopex: actually what is the difference?

17:02 <enebo> isWord does fix it

17:04 <lopex> yeah, those both a are true for unicode

17:05 <enebo> lopex: so maybe EUC-JP specifically does not think they are ALNUM while for unicode they do? but MRI will basically still think it is a valid identifier character for a constant.

17:05 <enebo> lopex: isWord is basically all characters which do not separate words? Is '$' isWord?

17:09 <lopex> for unicode ?

17:11 <enebo> lopex: I don't know for anything

17:11 <enebo> lopex: what does isWord mean

17:11 <lopex> for unicode it's 0-9a-zA-Z_

17:12 <lopex> from ascii range

17:12 <lopex> god knows what's there

17:12 <lopex> but the problem is in char types and not ranges

17:12 <enebo> https://github.com/jruby/jruby/blob/bytelist_love/core/src/main/java/org/jruby/RubySymbol.java#L204

17:13 drbobbeaty has joined #jruby

17:13 rrutkowski has joined #jruby

17:13 <enebo> lopex: more or less I am depending on this method to look at a properly encoded string and ask if it represents a valid constant identifier

17:13 <lopex> what does mri have for that ?

17:13 <enebo> lopex: perhaps I should look at the lexer since it obviously is parsing

17:13 <enebo> lopex: I don't know...that was there those id types come into play

17:15 <enebo> return c != EOF && (Character.isLetterOrDigit(c) || c == '_' || !isASCII(c));

17:15 <enebo> this is our lexer

17:15 <enebo> which does not use jcodings at all

17:16 <enebo> which is fascinating since I did not read my original character through the lexer but transcoded it to EUC-JP

17:16 <enebo> likely this code should be the same as whatever ends up working in that method in RubySymbol

17:17 <enebo> #define is_identchar(p,e,enc) (rb_enc_isalnum((unsigned char)(*(p)),(enc)) || (*(p)) == '_' || !ISASCII(*(p)))

17:17 <enebo> #define parser_is_identchar() (!parser->eofp && is_identchar((lex_p-1),lex_pend,current_enc))

17:17 <enebo> That is MRI

17:18 <enebo> !ISASCII WOT!

17:18 <lopex> blech

17:18 <lopex> bleh even

17:18 <enebo> We do it as well but wtf

17:19 <enebo> So <256 gets isalnum and _ check but then anything else is fine?

17:20 <enebo> so at lexer level I could put some multibyte space char and it makes it past this point

17:20 <enebo> so MRI must validate this later somehow

17:20 <enebo> lopex: or am I mistaken?

17:21 <enebo> ./include/ruby/ruby.h:static inline int rb_isascii(int c){ return '\0' <= c && c <= '\x7f'; }

17:21 <lopex> there's #define is_identchar(p,e,enc) (ISALNUM((unsigned char)*(p)) || (*(p)) == '_' || !ISASCII(*(p))) in symbol.c too

17:22 <enebo> hehe so absolutely any character outside of that range will be valid for an identifier in the lexing portion of MRI (JRuby is a bit different since Character.isLetterOrDigit() I think will say yes/no for mbcs?

17:23 <enebo> lopex: ok so my problem right now...lambda is ok in constant name in MRI but I have no idea how they approve it. !ISASCII seems mad. That cannot possibly be valid can it?

17:23 <lopex> enebo: https://www.youtube.com/watch?v=BldD-VSNdNE

17:23 <enebo> lopex: I am just confused...perhaps MRI just doesn't care about what the characters are once they leave ASCII space?

17:24 <enebo> If it is that easy then no problem I guess but I thought we have a huge encodings database which tells us stuff

17:25 <lopex> it might be some remnants

17:25 <lopex> I dont expect consistency from mri

17:25 <enebo> lopex: you mean they started with this weird heuristic and never used the data once it was available

17:26 <enebo> lopex: a jcodings helper method which may be nice is isASCII(c)

17:27 <lopex> enebo: does the parser switch encodings at any time ?

17:27 <enebo> you mean within a single sourcefile?

17:27 <enebo> like having #coding: half way down?

17:28 <enebo> lopex: can our codepoints ever be negative?

17:28 <enebo> lopex: since it is in a signed value

17:29 <lopex> they shouldnt

17:29 <lopex> not sure about gb18030

17:29 <lopex> unicode is too small for that

17:29 <enebo> lopex: !ISASCII for us can just be >x7f

17:30 <enebo> not that it is a massive savings :P

17:31 <lopex> yeah Encoding.isAscii is exactly that

17:31 <enebo> oh there is an isAscii

17:31 <enebo> :)

17:32 <enebo> going to lunch

17:33 <lopex> enebo: I need to redigest what you said above about that parse thing

17:37 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

17:39 shellac_ has joined #jruby

17:43 mkristian has quit [Quit: This computer has gone to sleep]

18:15 shellac_ has quit [Ping timeout: 240 seconds]

18:21 shellac_ has joined #jruby

18:23 akp has joined #jruby

18:49 shellac__ has joined #jruby

18:50 shellac_ has quit [Ping timeout: 264 seconds]

18:52 claudiuinberlin has joined #jruby

19:13 <nirvdrum> lopex: Thanks for the notification. I have no idea what that code does or what it's for :-)

19:14 <lopex> nirvdrum: https://github.com/jruby/joni/issues/13

19:14 subbu is now known as subbu|lunch

19:14 <nirvdrum> lopex: Thanks.

19:14 <lopex> managed to add that in latest joni overhaul

19:15 <nirvdrum> Forking into our own project made a lot of sense, but it's certainly made other things a bit harder.

19:16 <nirvdrum> enebo: Still working on lexer improvements?

19:16 <enebo> nirvdrum: well I am still working on bytelist internally

19:16 <nirvdrum> Ahh.

19:17 <enebo> nirvdrum: but not specifically improving lexing

19:17 <nirvdrum> I'm looking at ways to eliminate some of the CoW faults for sharing at the moment.

19:17 <enebo> yeah CoW has weird properties

19:17 <nirvdrum> ByteList can be more efficient than Ropes here. At least the way I've implemented SubstringRope.

19:17 <enebo> but CoW from lex source should not really be one

19:18 <enebo> I keep thinking all identifiers should just CoW the same byte array of the source itself

19:18 <nirvdrum> There's a weird one where it goes ByteList -> String to get an identifier just to use it to look up in a Map keyed by String.

19:18 <nirvdrum> As far as I can tell, that String isn't used for anything else.

19:18 <nirvdrum> The keyword check, IIRC.

19:19 <enebo> heh I noticed we rebuild same regexp 4 times

19:19 <enebo> that should be refactored

19:19 <enebo> or one time is for looking for lvars in the regexp

19:19 <enebo> but we should just make one in lexer and pass it all the way through

19:22 <nirvdrum> lopex: Do you plan to have a new version of joni soon, or is 2.1.15 sticking around for a while?

19:23 <lopex> enebo: I think it can be released at any time

19:24 <enebo> LOL: mri23 -e 'Object.const_set("D\u202FD", 1); p Object.constants'

19:24 <enebo> lopex: so proof enough for me that the semantics of constants are not what is documented

19:25 <enebo> lopex: ANY non-ascii multibyte character it allowed after the first one

19:25 <lopex> hooray

19:25 <enebo> which is what the code says I guess

19:25 <enebo> no Ruby book ever written says this though

19:27 <enebo> lopex: nirvdrum: ok well I am ok releasing jcodings since my problem had nothing to do with it. Someone may as well get their goodies

19:27 <lopex> enebo: and joni

19:28 <enebo> lopex: do I have to? :)

19:28 <lopex> enebo: for nirvdrum

19:28 <nirvdrum> Are you guys still open to some invasive changes to make jcodings SVM-friendly?

19:28 <lopex> enebo: since that array reading

19:28 <lopex> sure why not

19:28 <nirvdrum> Because they'd be invasive :-)

19:28 <enebo> I guess it depends on "invasive changes" means

19:29 <nirvdrum> I think we discussed in New Orleans. But it might've been Hiroshima.

19:29 <nirvdrum> enebo: Basically, we can't do dynamic class loading.

19:29 <enebo> ah yeah

19:29 <lopex> ah I recall now

19:29 <enebo> I think the answer was just having a second class which could load all those eagerly

19:29 <enebo> or something like that?

19:30 <nirvdrum> I need to look again, but I think my idea was to load all the classes, but keep them shallow. The tables would be read lazily.

19:30 <enebo> oh

19:30 <lopex> nirvdrum: and newInstants ?

19:30 <lopex> *instance

19:30 <enebo> so data would still be lazy load but all types would be present

19:30 <nirvdrum> I'm doing something different in TruffleRuby at the moment. There, I just threw away a bunch of jcodings.

19:31 <nirvdrum> https://github.com/oracle/truffleruby/blob/master/src/main/java/org/jcodings/transcode/TranscodingManager.java

19:31 <nirvdrum> It's ugly, but it works.

19:31 <nirvdrum> Look for TruffleOptions.AOT.

19:32 <nirvdrum> SVM sees the static TruffleOptions.AOT value and discards the other branch which contains the code doing the dynamic lookup.

19:33 <nirvdrum> enebo: Yeah, that's the idea. I haven't looked at it in a while. I *think* the additional overhead would be minimal. But I'd have to work out the thread-safety of the tables.

19:33 <nirvdrum> Since those are read-only, two threads both loading the tables wouldn't be the end of the world.

19:33 <nirvdrum> lopex: I'd have to look at that again.

19:34 <enebo> so if I remember this is not just a load time issue but also a memory one

19:34 <nirvdrum> Basically I don't want to head down this path if it's apt to be rejected out of hand. But I'm happy to collaborate on it.

19:34 <enebo> nirvdrum: lopex: how many types are we talking about?

19:35 <enebo> telling me one per encoding is not what I am asking :)

19:35 <lopex> no

19:35 <lopex> dunno, like 30 impls max ?

19:35 <nirvdrum> Memory potentially. But the tables would end up compiled into the process and currently the whole process is loaded into memory anyway. So I'm not sure there's really any savings to be had there.

19:35 <lopex> er more like 50

19:35 <enebo> yeah no one cares about 50 classes

19:35 <enebo> not at this point :)

19:35 <nirvdrum> Ruby has 110 encodings, but a good number of those are aliases.

19:36 <enebo> I am just wondering how much of an issue the data is from memory perspective

19:36 <nirvdrum> Loading the maps lazily would be more of a memory savings for the JVM.

19:36 <enebo> I am guessing it is megs of data not like 1meg of data

19:36 <enebo> yeah

19:36 <nirvdrum> Some of the encoding tables are 1MB+

19:36 <enebo> I am just being devil's advocate about just making it all eager

19:36 <nirvdrum> Loading all of them would be noticeable.

19:37 <enebo> ok yeah that will stack up quick

19:37 <nirvdrum> Let me just go measure.

19:37 <enebo> nirvdrum: well I wondered about loading them as a single piece of data

19:37 <nirvdrum> Maybe not so bad. 3.2 MB of table data.

19:37 <enebo> but we would not want to increase heap by several megs

19:38 <nirvdrum> They're compact binary implementations though, so it'd be more in memory.

19:38 <enebo> so perhaps lazy data makes sense unless 2 of it is utf encodings we always load

19:38 <enebo> ah yeah

19:38 <enebo> ok yeah I doubt we want that hit

19:39 <enebo> so we have compact data and we expand it on loading?

19:40 <nirvdrum> I see 51 encoding files (no idea if multiple classes per file) and 29 transcoding files.

19:40 <enebo> ah fudge

19:40 <enebo> I did a mvn:prepare before updating jcodings

19:40 <nirvdrum> It's loaded into a byte[] and int[] depending on whether it's a byte-oriented or word-oriented file.

19:41 <nirvdrum> The additional overhead won't be massive.

19:41 <nirvdrum> 16 bytes for an array header?

19:42 <enebo> well that does not sound like it is uncompressed or anything

19:42 <nirvdrum> lopex would know better.

19:43 <nirvdrum> While I'm at it, I'd love nothing more than to address the static index value in Encoding.

19:43 <nirvdrum> enebo: https://github.com/jruby/jcodings/blob/master/src/org/jcodings/transcode/Transcoder.java#L30-L51

19:43 <nirvdrum> Basically I want to move all that readIntArray stuff out of the constructor.

19:47 <enebo> nirvdrum: It would be nice to hide it behind something simpler than making a zillion synch blocks

19:47 <nirvdrum> I can make eregon figure that part out :-)

19:48 <enebo> yeah

19:48 <enebo> I don't really know how this data is accessed either

19:48 <nirvdrum> From my naive standpoint, doing a simple null check should suffice. If two threads compete and both load the same table, whatever.

19:48 <enebo> yeah could be

19:49 <enebo> seems reasonable to me that you may have occasional race but result is same

19:49 xardion has joined #jruby

19:49 <enebo> if it is read-only it is really not that complicated to reason about

19:50 <nirvdrum> Alright. I'll take a crack at it then. You can poke holes in it when there's a PR.

19:52 sidx64 has joined #jruby

19:54 shellac__ has quit [Quit: Computer has gone to sleep.]

20:01 <enebo> nirvdrum: done joni+jcodings

20:01 rrutkowski has quit [Ping timeout: 260 seconds]

20:09 sidx64_ has joined #jruby

20:11 sidx64 has quit [Ping timeout: 256 seconds]

20:12 <lopex> nirvdrum, enebo: most compact are transcoder tables

20:13 drbobbeaty has quit [Ping timeout: 265 seconds]

20:13 <lopex> code ranges, fold tables and case mapping specials are intertwined with sub array lengths and metadata bits

20:14 <lopex> and they also pack metadata within code point values

20:15 <lopex> https://github.com/ruby/ruby/blob/trunk/enc/unicode/10.0.0/casefold.h#L6912

20:15 <lopex> hard to get any uglier

20:16 <lopex> https://github.com/ruby/ruby/blob/trunk/enc/unicode/10.0.0/casefold.h#L798

20:16 <lopex> these are also pretty

20:16 <lopex> so on java heap there will me lot of smaller sub arrays

20:17 <lopex> so making it mirror image of mri data could actually make the heap smaller

20:23 cshupp has joined #jruby

20:23 <cshupp> @headius

20:23 <cshupp> https://github.com/jruby/jruby/issues/5018

20:24 <cshupp> This bug ios still broken

20:24 <cshupp> This bug is still broken

20:24 <enebo> IOS

20:24 <cshupp> Hi Tom

20:25 <GitHub129> [jruby] enebo reopened issue #5018: open3.rb broken in JRuby https://git.io/vNSL4

20:25 <cshupp> 9.1.16 didn't fix it.

20:25 <cshupp> thanks

20:26 <enebo> cshupp: yeah np. I guess I don't know what happened there

20:26 <cshupp> Appreciat it.

20:26 <cshupp> @ me in git if you want me to try it in a custom built branch this time around

20:27 <cshupp> Bye

20:27 cshupp has quit [Client Quit]

20:31 subbu|lunch is now known as subbu

20:32 sidx64_ has quit [Ping timeout: 240 seconds]

20:35 drbobbeaty has joined #jruby

20:50 claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

20:51 claudiuinberlin has joined #jruby

21:02 akp has quit [Remote host closed the connection]

21:02 akp has joined #jruby

21:07 akp has quit [Ping timeout: 248 seconds]

21:59 akp has joined #jruby

22:13 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

22:23 bbrowning is now known as bbrowning_away

22:49 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

22:55 shellac_ has joined #jruby

23:09 <nirvdrum> Where do you guys handle string interpolation?

23:11 <nirvdrum> It looks like IRBuilder#buildDStr.

23:11 <nirvdrum> I was blanking on the DStr part.

23:18 shellac_ has quit [Quit: Computer has gone to sleep.]

23:45 shellac_ has joined #jruby

23:57 shellac_ has quit [Quit: Computer has gone to sleep.]