shellac has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
Joufflu has joined #jruby
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
rdubya has quit [Client Quit]
shellac has quit [Quit: Computer has gone to sleep.]
Joufflu has quit [Ping timeout: 255 seconds]
deobalds has joined #jruby
jhass has quit [Ping timeout: 246 seconds]
yopp has quit [Ping timeout: 258 seconds]
yopp has joined #jruby
jhass has joined #jruby
rdubya has joined #jruby
rdubya has quit [Client Quit]
Joufflu has joined #jruby
Joufflu has quit [Client Quit]
Puffball has quit [Quit: No Ping reply in 180 seconds.]
claudiuinberlin has joined #jruby
Puffball has joined #jruby
shellac has joined #jruby
temporal_ has quit [Ping timeout: 252 seconds]
shellac has quit [Quit: Computer has gone to sleep.]
deobalds has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
temporalfox has joined #jruby
drbobbeaty has joined #jruby
temporalfox has quit [Ping timeout: 248 seconds]
temporalfox has joined #jruby
shellac has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
temporalfox has quit [Ping timeout: 248 seconds]
deobalds has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
shellac has joined #jruby
drbobbeaty has joined #jruby
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
deobalds has quit [Quit: Computer has gone to sleep.]
lance|afk is now known as lanceball
Scorchin_ is now known as Scorchin
<GitHub176> [jruby] eregon pushed 4 new commits to master: https://git.io/vdOmI
<GitHub176> jruby/master e43c4cc Benoit Daloze: Squashed 'spec/mspec/' changes from 5bd9409..8aa352c...
<GitHub176> jruby/master a9fdbdb Benoit Daloze: Merge ruby/mspec commit 'e43c4ccadb90684226a1fc365680e630df71c0ab'
<GitHub176> jruby/master c3c0595 Benoit Daloze: Squashed 'spec/ruby/' changes from a4bc1d8..691755d...
shellac has quit [Ping timeout: 240 seconds]
shellac has joined #jruby
rdubya has quit [Quit: Leaving.]
rdubya has joined #jruby
drbobbeaty has quit [Ping timeout: 258 seconds]
drbobbeaty has joined #jruby
shellac has quit [Read error: Connection reset by peer]
drbobbeaty has quit [Read error: Connection reset by peer]
enebo has joined #jruby
olle has joined #jruby
olle has quit [Quit: olle]
<GitHub7> [jruby] ivoanjo closed issue #4651: Unexpected memory usage when running with native.enabled=true https://git.io/vH6Vn
temporalfox has joined #jruby
<nirvdrum> enebo: FYI, JRuby is affected by this: https://bugs.ruby-lang.org/issues/13950
<enebo> nirvdrum: thanks
temporalfox has quit [Read error: Connection reset by peer]
<nirvdrum> We're affected, too, since we copied from you.
<nirvdrum> But I think I'm going to make it "correct", otherwise strings go down the wrong path.
temporalfox has joined #jruby
<enebo> nirvdrum: because you are roped will your patch help us too?
<enebo> nirvdrum: I did not look for their fix to this
<nirvdrum> I just opened the issue. It's been a bug since 1.9.2.
<nirvdrum> But it's not a matter of ropes or not.
<nirvdrum> The only real difference is we eagerly work out the code range. I added assertions today verifying we're actually creating ropes with the correct code range. And that's when I started tripping over these old bugs.
<enebo> nirvdrum: well no doubt their patch will be very similar to ours
<nirvdrum> I added a follow-up comment. I think the logic here is more complex.
<enebo> nirvdrum: these are bugs I am ok with fixing as soon as discovered since nothing is gained from leaving behavior like this
claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]
<enebo> nirvdrum: oh I just read the actual bug
<enebo> nirvdrum: hilarious
<enebo> nirvdrum: basically TR with any non 7bit friendly encoding is just broken
<enebo> nirvdrum: so long as it can tr ascii-like chars
<nirvdrum> Yes.
<nirvdrum> Unless you end up only using it in contexts where code range doesn't matter.
<enebo> nirvdrum: which is not many encodings other than utf16*
temporalfox has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<nirvdrum> But JRuby isn't affected by that one.
<nirvdrum> ISO-2022-JP is another one I tried.
<nirvdrum> It's a problem for any non-ASCII compatible encoding.
<enebo> nirvdrum: but that is not 7 bit ascii when ascii?
<nirvdrum> Eh?
<enebo> nirvdrum: err how should I say that sentence :)
<enebo> nirvdrum: if [a-z] is not actually ascii a-z
rdubya has quit [Quit: Leaving.]
<nirvdrum> I could have this very wrong, but CR_7BIT only applies to codepoints < 128 in an ASCII-compatible encoding.
<nirvdrum> It's used all over the place for single-byte operations.
<enebo> nirvdrum: no you have it right
<enebo> nirvdrum: but UTF-16LE does not store those there
<enebo> nirvdrum: well I mean they are all 2 bytes
<nirvdrum> UTF-16LE uses 2 bytes for each codepoint.
<enebo> nirvdrum: I assume ISO-2022-JP must as well
<nirvdrum> And I guess surrogate pairs for grapheme clusters or whatever.
<nirvdrum> All this shit is confusing.
<enebo> nirvdrum: but 'a' in UTF-16 is a two bytes
<enebo> so it cannot be 7bit
<nirvdrum> Every CR_7BIT string is by definition CR_VALID, but if you incorrectly mark a string that *could be* CR_7BIT as CR_VALID, you might go down a bad path and get incorrect results.
<enebo> although it happens to have a 7bit 'a' in it :)
<nirvdrum> Because the helper functions for CR_VALID assume the source string is MBC.
<nirvdrum> enebo: Right.
<enebo> yeah I hate CR because it is less semantically driven and more specific to what they could opt at the time of 1.9 as an impl
<nirvdrum> I hate ASCII-8BIT with a passion.
<enebo> but at this point I doubt we could just remove it
<enebo> that internal state now dictates external encoding in cases so it has become semantic
<nirvdrum> singleByteOptimizable usually means it's CR_7BIT or it's ASCII-8BIT (or I suppose some other binary-based encoding).
<enebo> yeah
<nirvdrum> You just always have this corner case of dealing with arbitrary binary strings.
<enebo> nirvdrum: yeah
<nirvdrum> And it has a wildly different usage pattern.
<nirvdrum> I posit most strings are read-only, but most byte buffers have many writes.
<enebo> nirvdrum: yeah I guess in many languages bytes and chars are two things or chars come from byte storage
<nirvdrum> Ropes take a beating when reading from IO, for instance.
<enebo> nirvdrum: but notion that chars are made of bytes is more explicit at a language level
<enebo> nirvdrum: fact that in 1.8 all strings were bytes I guess drove this lack of distinction
<nirvdrum> I'm sure it was based on C strings.
<enebo> yeah char*
<enebo> C also is similarly weird
<nirvdrum> I'd just like to see a byte[] or byte buffer type introduced and let that get adopted.
<nirvdrum> ASCII-8BIT is here forever for compatibility reasons. But we could encourage people off it.
<enebo> nirvdrum: I think Matz dislikes lots of types but it is a duck-typed language so Binary/String would have been reasonable to me
<enebo> bin[1] would be second offset of bytes and string[1] would be second offset of a character
<nirvdrum> That's where I was advocating for it.
<nirvdrum> Since you can actually screw up your bytes arrays pretty easily if you're not careful.
<nirvdrum> You basically really need to know how MRI implemented strings.
<nirvdrum> enebo: This is terribly inefficient, but I don't have the energy to work out this tr helper at the moment: https://github.com/graalvm/truffleruby/pull/574/commits/89385b36297a232510af74cb32915b12b248d798
<nirvdrum> That seems to fix the String#tr issue.
<enebo> nirvdrum: why do you pass in 'c'?
<enebo> oh nvm
<nirvdrum> But it's silly to check this for every byte after you've already determined it's CR_VALID.
<enebo> nirvdrum: yeah the asciicompat check is invariant
<enebo> nirvdrum: and once valid it won't go back right?
<nirvdrum> The resulting string couldn't, as far as I can tell.
<nirvdrum> This should be a function of the replacement string's encoding and whether the string to replace is ever matched.
<enebo> nirvdrum: this is all gross in any case (not specifcially your code but the notion of how side-effecty cr is)
<nirvdrum> I sorta took their macro that you guys seem to have inlined and blew it up to include the current encoding.
<enebo> you could just look for non-ascii char as a boolean and set cr once at end or something too?
<enebo> no doubt your ropes even know right
<enebo> like you know if you have mbc in a rope already just not if result of tr will end up with mbc
<nirvdrum> I think so, yeah. But this code has a half dozen different exit points.
<enebo> nirvdrum: yeah no doubt this is squirrelly method too
<nirvdrum> I have a boundary on it and just expect it to be slow. At some point I'm going to need to tackle it.
<enebo> heh ours is massive
bbrowning is now known as bbrowning_away
<enebo> nirvdrum: so you have all this c0, c, save state too right?
<nirvdrum> Sorry, not sure what you're asking.
<enebo> nirvdrum: massive trTansHelper
<enebo> nirvdrum: just looking at the loop where those checks happen
<nirvdrum> Oh. We literally copy & pasted from you.
<enebo> nirvdrum: I am guessing someone decided to overload c to contain whether translation worked or not and the new value
<nirvdrum> The only thing that's changed is ByteList was replaced with something that looks like ByteList called RopeBuilder.
<enebo> who would be nobu probably :)
<enebo> so to save a boolean he added a second int
<enebo> well I guess I should say original c, translated c, and whether translation worked
<enebo> since translated c might not work then it is set to -1
<enebo> this is probably fine but the names and -1 checks make this challenging to read
<enebo> nirvdrum: if (c < TRANS_SIZE) {
<enebo> nirvdrum: this confuses me does utf16-le translate 'a' to be a value < 256?
<enebo> nirvdrum: c is the codepoint of 'a'
<enebo> as an example
<enebo> nirvdrum: I naively assumed everything in utf-16le would be above 256
<enebo> nirvdrum: this is where I feel weak in my knowledge
<enebo> heh actually ignore that question it probably i > trans_size
<enebo> everything has more than one path here
<nirvdrum> I'm getting dinner ready. In and out at the moment.
<nirvdrum> I've spent approximately 0 amount of time trying to understand this code, though.
<enebo> nirvdrum: np. I will probably stop working fairly soon...fighting jet lag and feelign a bit weird
<nirvdrum> I'm not sure I even understand many of the Ruby usages I see of it.
<enebo> nirvdrum: my main issue with code like this is we kept parity with C version to make future fixes simpler but it is in this we must have all logic paths in one method C-like build around it
<nirvdrum> Yeah.
<nirvdrum> The approach is sound, if not ugly.
<enebo> nirvdrum: so grokking all paths sometimes means you only follow one branch everywhere through one call but maybe not :)
<enebo> nirvdrum: yeah I am sure it works it is just trying to handle all cases in one place
<enebo> nirvdrum: which I think speaks volumes to lack of polymorphism
enebo has quit [Quit: Leaving.]
enebo has joined #jruby
<nirvdrum> enebo: If you're not married to maintaining parity, I've started optimizing CR_7BIT cases a bit more: https://github.com/graalvm/truffleruby/pull/563/files
<enebo> nirvdrum: well actual reasonable variable names would be enough for me
<enebo> c0 is not great for readability
<enebo> nirvdrum: but possibly
<nirvdrum> On the case methods, using ^= 0x20 seemed clearer and faster to me than this CType table with a bitmask applied.
<nirvdrum> enebo: I don't suppose you've looked at the Unicode casing methods for 2.4 support yet, have you?
<enebo> nirvdrum: nope
<enebo> nirvdrum: I think it needs data import but Charlie may have did that
<enebo> done that
<enebo> whatever
<enebo> in joni
<enebo> err jcodings
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
rdubya has quit [Client Quit]
rdubya has joined #jruby
rdubya has quit [Client Quit]
<lopex> nirvdrum: for encodings casing is mostly this https://github.com/ruby/ruby/blob/trunk/include/ruby/onigmo.h#L177
<lopex> and then rb_str_casemap in string.c on ruby side
rdubya has joined #jruby
rdubya has quit [Client Quit]
<nirvdrum> lopex: Thanks. I have no idea what any of those data structures are though.
drbobbeaty has joined #jruby
<lopex> nirvdrum: OnigEncodingTypeST is our Encoding class
<lopex> nirvdrum: all those function pointers translate to our virtual methods one to one