<lopex>
but it had to do something with cookies and random
KarolBucekGitter has quit [*.net *.split]
JulesIvanicGitte has quit [*.net *.split]
MattPattersonGit has quit [*.net *.split]
MattPattersonGit has joined #jruby
KarolBucekGitter has joined #jruby
JulesIvanicGitte has joined #jruby
nirvdrum has joined #jruby
nirvdrum has quit [Ping timeout: 240 seconds]
_whitelogger has joined #jruby
den_d has quit [Excess Flood]
den_d has joined #jruby
_whitelogger has joined #jruby
drbobbeaty has quit [Ping timeout: 264 seconds]
drbobbeaty has joined #jruby
_whitelogger has joined #jruby
<fidothe>
Working on https://github.com/jruby/jruby/issues/5905. Figured out that RubyMatchData already copes with the put-a-string-pattern-in-it-and-lazily-create-a-regexp case. Hurrah. When I put a multibyte-char string in it, I get the bytes dumped out in the result from `#inspect`, instead of seeing the UTF-8 char I expect. This works fine with the regexp-version, using the same input string. Is there something about
<fidothe>
`RubyString.strDup(runtime)` that needs me to do something extra about encodings?
<fidothe>
If I call `#string` on the `MatchData` instance then I get the UTF-8 encoded string I expect
<fidothe>
so that seems like it's not the culprit
<fidothe>
Okay, so you need to pass the whole string in as the `str` param when creating a MatchData. Makes sense, should have twigged from the way the Regexp-using version invokes it...
rusk has quit [Ping timeout: 250 seconds]
rusk has joined #jruby
nirvdrum has joined #jruby
<headius[m]>
Yeah that sounds right
drbobbeaty has quit [Ping timeout: 245 seconds]
shellac has joined #jruby
drbobbeaty has joined #jruby
sagax has joined #jruby
<fidothe>
Now I'm down to failing tests
<fidothe>
What's the origin of the tests in `test/mri/ruby`? `test_string.rb` is failing with some stuff that looks (at first glance) unrelated, and some is definitely related.
<fidothe>
That looks like we need to use `RubyRegexp.regsub` or equivalent
<fidothe>
i.e. we should be expanding `\0`
<fidothe>
hrm
<fidothe>
if that's done there's basically no way out of creating the Regex
metafr[m] has joined #jruby
<fidothe>
Working on the assumption that the Rubydocs are comprehensive, looks like you could get away with special casing on the presence of \0 - \9
<fidothe>
and subbing in the match for `\0` and empty string for `\1 `- `\9`
<fidothe>
I guess I need to dive into the MRI source on this
<fidothe>
more coffee
<metafr[m]>
Hi Guys, sorry, newbie question : in the JRuby 9.1.17.0 release notes it is said that this JRuby version is compatible with ruby 2.x. But what version of ruby JRuby 9.1.17.0 is using by default ? Is there a way to configure it in a RoR application?
<fidothe>
@metafr[m] JRuby 9.1.17.0 is compatible with Ruby 2.3
<fidothe>
You can't pick versions like you could in JRuby 1.7 (where the choice was MRI 1.8 or 1.9)
<enebo[m]>
fidothe: those test are internal test suite of C Ruby itself
<fidothe>
@enebo[m] thanks. I can resolve the references to bugs now :-)
<enebo[m]>
looks like largely 2 issues to work through glancing at that output
<fidothe>
@enebo[m] Yeah. The encoding one looks nasty
<enebo[m]>
you mean the \0 issue?
<fidothe>
the `\0` expansion is straightforward but annoying
<enebo[m]>
yeah I can see this...so last byte is not found due to probably walking wrong encoding
<enebo[m]>
or something with how it is walked
<fidothe>
@metafr[m] hope it was useful. If you need features from Ruby 2.4 or later, JRuby 9.2.8.0 supports 2.5
<fidothe>
@enebo[m] i assumed it was something about multi-byte length calculations
<fidothe>
This has a been a fun introduction to Java programming :-)
<enebo[m]>
fidothe: oh yeah it likely is either wrong encoding assumed or perhaps wrong string helper method like preciseMBCLength vs something else
<enebo[m]>
fidothe: oh cool. Java itself I hope is not a big barrier. We are not really doing too many complicated things in Java
<enebo[m]>
fidothe: you using an IDE?
<enebo[m]>
fidothe: if you are working on this now I can pull your fork and see if I notice anything
<enebo[m]>
fidothe: assuming you want any help. Sometimes it is fun to work through it on your own :)
<enebo[m]>
fidothe: also anything involving encodings or joni/jcodings and lopex is around too
<fidothe>
@enebo[m] IntelliJ IDEA. I've been meaning to learn how to write Java, and not just read it, for ages, and this seemed like a good way in
<fidothe>
@enebo[m] I'll ping you or @lopex if I have questions. I'm still getting used to holding the different string models - RubyString, byte[], java string, and how they interact in my head
<enebo[m]>
fidothe: yeah I always say we are usually pretty easy since you get to isolate to making a few methods and you can usually look at similar ones in the same core type to figure which methods to use
<enebo[m]>
fidothe: and in fact your first stab is a little more complicated because m17n is its own domain
<enebo[m]>
yeah in this case you should be extracting bytelist to get a byte[] mostly
<fidothe>
Fortunately I understand that bit pretty well, at least as far as Unicode and how it works in encodings and how that stuff works in normal Ruby
<enebo[m]>
biggest problem tends to be forgetting begin index (which usually gets missed for about a month since most strings will have begin = 0)
<enebo[m]>
fidothe: cool
<metafr[m]>
> @metafr[m] JRuby 9.1.17.0 is compatible with Ruby 2.3
<enebo[m]>
metafr: It might or it might not. It is no longer supported so there will be no fixes but in this case we do not use openssl but our own implementation trying to emulate openssl so it probably does not apply to us (although you would need to try an exploit to see)
<enebo[m]>
metafr: in most cases we tend to not have as many CVEs because Java does not have the memory safety issues that C does. In cases where we do can be issues like that one where possibly our impl does the same incorrect logic
<enebo[m]>
(and I am not saying we are vulnerable to that ... I don't know if 9.1.x is)
<enebo[m]>
metafr: I am assuming there is a reason you cannot contemplate 9.2.9.0 but 2.3 -> 2.5 is not too massive for incompatibilities...Largely the integer unificiation (Bignum/Fixnum => Integer)
<metafr[m]>
@enebo thanks a lot for these explanations
<enebo[m]>
metafr: yeah sorry it was not clearer that 9.1.x is 2.3.x
nirvdrum_ has joined #jruby
rusk has quit [Remote host closed the connection]
rusk has joined #jruby
xardion has quit [Remote host closed the connection]
xardion has joined #jruby
sagax has quit [Quit: Konversation terminated!]
sagax has joined #jruby
shellac has quit [Ping timeout: 240 seconds]
<enebo[m]>
fidothe: I just did a quick run of that failing test. StringSupport.index offset parameter appears to be the character offset and not the byte offset.
<fidothe>
Aha
<enebo[m]>
As a secondary comment this method is somewhat innefficient in the sense it needs to re-walk from front of string every time up to the right place each call into index
<fidothe>
So previous Multibyte Tests I did worked by accident
<enebo[m]>
since in a mbc scenario it cannot just jump forward
<enebo[m]>
yeah they worked because all chars were byte size of 1
<enebo[m]>
anyways I am going to eat some lunch but I thought I would share that
<fidothe>
No, I did some multibyte tests but the last replacement was never at the end of the string
<fidothe>
I hype I’d use string support index because String#index uses it. Good starting point for further improvements
<fidothe>
I thought I’d use
<fidothe>
Blooming iOS
nirvdrum_ has quit [Ping timeout: 240 seconds]
nirvdrum has quit [Ping timeout: 268 seconds]
rusk has quit [Remote host closed the connection]
lucasb has joined #jruby
nirvdrum has joined #jruby
nirvdrum_ has joined #jruby
<enebo[m]>
fidothe: maybe we can make an index which passes in a byte[] with begin index
<fidothe>
First step, make it work. Benchmark suggests that even the crude version will be faster in a bunch of situations. I suspect it won’t be for gsub on a long string
<enebo[m]>
fidothe: yeah an improved version of index will be pretty easy to plug in
<enebo[m]>
fidothe: in fact at that point keep track of characters will be removed at that point too
<fidothe>
The hard work there will be encoding stuff I guess.
<enebo[m]>
as you have this written the new version of index may even fit better
<enebo[m]>
since you are just passing in appropriate offset into byte[] for where next char starts
<enebo[m]>
really the new version of index will just be like the old one but will remove the offset() call towards the front
<enebo[m]>
I looked briefly and saw no one else use index in a repeated fashion so no opportunity to fix this in other things
<lopex>
fidothe: do you also follow the usage of mri's need_backref ?
<lopex>
doh it's more invasive than you thought
<lopex>
enebo[m]: now it all revolves around rb_pat_search(VALUE pat, VALUE str, long pos, int set_backref_str)
<lopex>
and they set set_backref_str as they wish from the callers
<lopex>
and it's a boolean
<lopex>
er, I meant, more invasive than I thought
<lopex>
so scan is affected as well
<lopex>
headius[m]: updated jcodings to unicode 12.1.0 shall we change the deps now ?