#jruby on 2018-07-25 — irc logs at freenode.irclog.whitequark.org

2018-05-24 16:34 ChanServ changed the topic of #jruby to: Get 9.2.0.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:07 slyphon has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

00:13 slyphon has joined #jruby

00:14 ahorek has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

00:15 slyphon has quit [Client Quit]

00:51 hoi has joined #jruby

00:55 hoi has quit [Client Quit]

00:56 hoi has joined #jruby

01:20 hoi has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

01:22 hoi has joined #jruby

01:35 hoi has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

01:45 hoi has joined #jruby

01:48 jrafanie has joined #jruby

02:04 slyphon has joined #jruby

02:25 hoi has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

02:32 Antiarc has quit [Ping timeout: 256 seconds]

02:35 hoi has joined #jruby

02:37 Antiarc has joined #jruby

02:43 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

03:02 <hoi> anyone has connect redshift with JRuby?

03:25 Caerus has joined #jruby

04:01 projectodd-ci has joined #jruby

04:22 sgeorge has joined #jruby

04:23 sgeorge has quit [Remote host closed the connection]

04:26 hoi has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

04:27 sgeorge has joined #jruby

04:30 sgeorge has quit [Remote host closed the connection]

05:00 ahorek has joined #jruby

05:01 ahorek has quit [Client Quit]

05:08 hoi has joined #jruby

05:57 rdubya has quit [Ping timeout: 260 seconds]

06:09 Puffball has quit [Remote host closed the connection]

06:15 shellac has joined #jruby

06:32 shellac has quit [Read error: Connection reset by peer]

06:34 ahorek has joined #jruby

06:46 ahorek has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

06:48 slyphon has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

06:53 hoi has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

07:28 claudiuinberlin has joined #jruby

07:42 <GitHub148> [jruby] boris-petrov opened issue #5260: Cannot install RuboCop 0.58.{1,2} https://git.io/fNRGX

08:18 ahorek has joined #jruby

08:18 ahorek has quit [Client Quit]

08:57 drbobbeaty has joined #jruby

09:15 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

10:54 drbobbeaty has joined #jruby

11:00 rdubya has joined #jruby

11:11 ahorek has joined #jruby

11:11 ahorek has quit [Client Quit]

11:18 <kares> enebo: got much better _parse by going native and doing string checks MRI does

11:19 <kares> which means we can avoid going to parse_eu and parse_us methods for strings like '2018-01-01'

11:19 <kares> this is what you already noticed locally by commenting them out - gives us 2x improvement

11:20 <kares> only thing JRuby (non-graal) is worse is Date.parse('2018-07-17', false) 17.8 vs 12.2 on MRI

11:21 <kares> ... Date._parse('2018-07-17 21:20:55') is beaten, yay!

11:21 <kares> now to polish up the PR and hopefully good to go ...

11:30 jrafanie has joined #jruby

11:45 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

13:08 sgeorge has joined #jruby

13:50 sgeorge has quit [Remote host closed the connection]

13:55 hoi has joined #jruby

14:00 <kares> last piece - added some custom code to avoid time parsing for raw dates -> 3x faster, MRI tests seems to pass

14:01 sgeorge has joined #jruby

14:06 slyphon has joined #jruby

14:19 jrafanie has joined #jruby

14:56 slyphon has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

15:14 slyphon has joined #jruby

15:30 sgeorge has quit [Remote host closed the connection]

15:33 sgeorge has joined #jruby

15:39 sgeorge has quit [Remote host closed the connection]

15:50 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

16:02 xardion has quit [Remote host closed the connection]

16:07 xardion has joined #jruby

16:26 claudiuinberlin has joined #jruby

16:27 sgeorge has joined #jruby

16:37 claudiuinberlin has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

16:56 hoi has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

17:17 <enebo> kares: you can land what you have if it passes tests

17:18 <enebo> kares: I know you are done for now but I keep looking at us using a RubyHash and realize if it was all native it would be a simple value type and likely just dumb fields

17:18 <enebo> kares: so I guess we can get more later if we want to keep pushing that direction

17:20 hoi has joined #jruby

17:21 hoi has quit [Client Quit]

17:26 claudiuinberlin has joined #jruby

17:38 sgeorge has quit [Remote host closed the connection]

17:44 sgeorge has joined #jruby

17:48 sgeorge has quit [Ping timeout: 244 seconds]

17:49 sgeorge has joined #jruby

17:53 sgeorge has quit [Ping timeout: 244 seconds]

18:17 Puffball has joined #jruby

18:20 <havenwood> I'm updating ruby-versions metadata so ruby-install can use the new Maven location to fetch binaries. Most of the bins are the exact same checksums, but I noticed a few anomalies with the checksums compared to the same versions on AWS.

18:21 <havenwood> jruby-dist-1.7.19-bin is a different checksum for both zip and tar.gz, and jruby-dist-9.1.17.0-bin.zip is as well, just the zip.

18:21 <havenwood> The rest of the binaries are the same checksums compared to the old versions.

18:22 <havenwood> enebo: Should I just defer to the new checksums ^ for the few that changed?

18:23 <havenwood> I'm just updating ruby-versions for the dist-bin, version 1.7.5 and later.

18:23 <havenwood> If earlier .tar.gzs are added, or more src versions, I'd be happy to update ruby-versions with those as well.

18:24 <havenwood> Here's the md5 checksum comparison: https://gist.github.com/havenwood/efe34bfc058283324d1687f1dc8f0efe

18:25 sgeorge has joined #jruby

18:29 <kares> enebo: okay, CI looking good so going to land on master

18:29 sgeorge has quit [Ping timeout: 264 seconds]

18:29 <kares> right, that RubyHash could go away - but you might need to move _parse_xxx left-overs to native

18:30 <kares> which is straight forward but I got tired of it, for now :)

18:30 <kares> might take a look at whether DateTime.iso good get some boost tomorrow

18:34 <GitHub128> [jruby] kares closed pull request #5259: [refactor] improve Date parsing performance (master...date-speed) https://git.io/fN4l0

18:34 <GitHub39> [jruby] kares pushed 12 new commits to master: https://git.io/fN0Wv

18:34 <GitHub39> jruby/master 3ff8998 kares: [refactor] use match? instead of =~ where possible

18:34 <GitHub39> jruby/master ddaad41 kares: review regexp match?-ing - MRI doesn't do dynamic dispatch...

18:34 <GitHub39> jruby/master 0e00652 kares: use internal str.sub! for date parsing (without frame info)...

18:35 <GitHub36> [jruby] kares closed issue #5255: Date parsing (still) noticeably slower than MRI https://git.io/fNn7P

18:37 sgeorge has joined #jruby

18:38 sgeorge has quit [Remote host closed the connection]

18:39 sgeorge has joined #jruby

18:52 <GitHub178> [jruby] danielford opened issue #5261: File.utime failing with Errno::EINVAL on JRuby 9.1.17.0 https://git.io/fN08q

19:38 <lopex> enebo: they also use some other specialized code in "match?"

19:39 <lopex> like rb_str_subpos

19:39 <enebo> lopex: ok

19:39 <lopex> irrelevant for our benchmark now

19:41 <enebo> lopex: that method is just optimized way of walking into the string n chars?

19:43 <lopex> enebo: https://gist.github.com/lopex/ca85859f002ca8ffbd7cf0966a368d6b

19:43 <lopex> enebo: can you remind me what for is that useCnt there ?

19:44 <enebo> lopex: no I just deleted it in my version since it did not make sense to me

19:44 <lopex> enebo: and release joni

19:44 <lopex> enebo: and it's buggy for me

19:44 <enebo> lopex: oh and btw matcherSearch defaults to search and not match

19:44 <lopex> enebo: they use search in "match?"

19:44 <enebo> apparently

19:45 <enebo> I got a good jump out of switching that

19:45 <enebo> It is in my diff

19:45 <enebo> I did not change it cleanly though

19:45 <enebo> lopex: but you raise a good question perhaps capturing match should use match as well

19:46 <lopex> enebo: going to buy a beer be back in 10 mins

19:46 <lopex> apparently you cant buy beer in Poland after 10pm

19:46 <lopex> since yersterday

19:47 <enebo> LAME

19:47 <lopex> police country

20:04 <lopex> enebo: can you release joni ?

20:04 <enebo> lopex: sure is it ready then?

20:05 <lopex> yes

20:06 <enebo> ok will release then

20:06 <lopex> enebo: we can sacrifice that msaBegin/msaEnd for now

20:06 <enebo> did you remove it then?

20:06 <enebo> in else

20:06 <enebo> or you are saying we can remove it later

20:06 <lopex> now, I mean it's unnecessarily computed now

20:06 <lopex> but we need it still

20:07 <enebo> ok so you did not remove it but we don't need it in this case

20:07 <lopex> enebo: the bigger gain in where there's no captures at all and we dont allocate regions

20:07 <enebo> I don't know what you are telling me

20:07 <lopex> enebo: we have three cases for that branch

20:08 <lopex> enebo: captures > 0, captures == 0 -> region -> null, and forced null for region

20:08 <lopex> so we'd have to distinguish it somehow

20:09 <lopex> like null region object or something like that

20:09 <enebo> lopex: ah I see so you mean we need more state if we want to not have that else compute

20:09 <lopex> enebo: or pass null region object on matcher creation

20:09 <enebo> lopex: which is almost funny since I passed in a boolean for that and you decided we should pass null

20:09 <enebo> lopex: or is passing null the same thing possibly?

20:09 <lopex> enebo: since for "a" =~ /a/ we still need to create beg[0] / end[0]

20:10 <enebo> lopex: we set a boolean if region passed is null?

20:10 <lopex> enebo: no since we have to computer the boundaries

20:10 <lopex> er

20:10 <lopex> enebo: I put that decision a bit earlier

20:11 <enebo> lopex: I guess what I heard you say is that else cannot be removed since we want to support 3 cases but one case does not need it

20:11 <lopex> enebo: https://github.com/jruby/joni/commit/af9a0dd37e85ea75884a6f9aa5e48f12dc8d1e72#diff-d3b338612a4ef0c8a1dd5019b0c04b89R159

20:11 <enebo> yeah I just said you pass region in

20:11 <lopex> enebo: yeah, third state could be null region object

20:12 <lopex> enebo: but the computation is very cheap so lets leave it for now

20:12 <enebo> I guess my only confusion is msaRegion is sometimes null or eager would not work

20:12 <lopex> eager is not used now

20:12 <enebo> lopex: yeah but you said you wanted it

20:12 <enebo> lopex: I am just trying to understand how we can tell the difference between the 3 cases

20:12 <enebo> lopex: I am ok with not doing it now

20:12 <lopex> so if msaRegion is null then msaBegin / msaEnd play role of msaRegion[0].beg / msaRegion[0].end

20:12 <lopex> enebo: ^^

20:13 <enebo> meaning we always need them because we have no regions OR that we need them in case we want to eagerly make a region?

20:14 <lopex> lest forget eager thing completely

20:14 <lopex> not used

20:15 <enebo> ok so we need those fields regardless of whether we have regions or not

20:15 <lopex> enebo: except for "match?"

20:15 <lopex> that's the third case

20:16 <enebo> so if we search with null regions we need them

20:16 <enebo> but if we match? with them we don't?

20:16 <lopex> not for "match?"

20:16 <enebo> so only match? we don't need them

20:17 <enebo> At this point I only see two cases but I think that is because you told me to ignore the third case of eager

20:17 <lopex> enebo: we need them only for this case "a" =~ /a/

20:17 <lopex> no regions

20:17 <lopex> implicit zero group

20:18 <lopex> eager is not a third case

20:18 <enebo> ah so no capturing but we still need to know where match starts/stops

20:18 <lopex> yes!

20:18 <enebo> whew I knew we would get there!

20:18 <lopex> oh maybe I should have put it youre way

20:18 <enebo> I have a similar issue with my oj port

20:19 <enebo> the original C preallocs the stack and needs a final value

20:19 <lopex> enebo: we could number capturing groups from 1 though, but that would create more confusion

20:20 <enebo> but it does not use an extra stack element for final value

20:20 <lopex> like use always msaBegin / msaEnd

20:20 <lopex> but on jruby api side it would have more complex code

20:20 <enebo> however it still writes to stack element 0 even when stack is empty

20:20 <lopex> or hmm

20:20 <enebo> lopex: that would be confusing impl wise as well

20:20 <lopex> I recall that it had problems

20:21 <enebo> lopex: simplest impl would be to always have at least one region

20:21 <enebo> but then you are indirecting this simple case with a box

20:21 <lopex> but you would allocate

20:22 <enebo> yeah and allocate the box

20:22 <enebo> If Java had value objects we would not care

20:24 <lopex> enebo: https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/RubyMatchData.java#L318

20:24 <lopex> https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/RubyMatchData.java#L644

20:24 <lopex> I forgot why we store beg /end on rubyMatchData though

20:24 <enebo> heh

20:25 <lopex> it was 10 years ago though

20:25 <enebo> lopex: ok but I see we can save an alloc

20:25 <lopex> enebo: three alocs

20:25 <enebo> and if it is as simple as /a/ then every little opt probably helps

20:25 <lopex> region and two arrays

20:25 <enebo> oh true

20:26 <enebo> Almost makes me think we could add region to ThreadContext and pass it in

20:26 <lopex> lots of regexp dont have captures

20:26 <enebo> not that I am advocating that

20:26 <enebo> I have that thought a lot though :P

20:26 <lopex> we'd need to realloc when bumping up

20:26 <lopex> on new regexp which more captures

20:27 <lopex> like that ?

20:27 <enebo> lopex: release succeeded. not sure how long it will take to appear though

20:27 <enebo> lopex: yeah I guess we have to cope with growing region

20:28 <enebo> lopex: is region just two arrays of ints?

20:28 <lopex> MRI realllocs region

20:28 <lopex> yes

20:28 <enebo> we could model region as a single array

20:28 <lopex> it could be one

20:28 <enebo> ok same page

20:29 <enebo> anyways this is probably not reasonable

20:29 <enebo> pre-allocing a largish single primitive int array stored on context

20:29 <enebo> then all match/search would pass that in

20:29 <enebo> no alloc and no penalty for having it past assignment

20:32 <lopex> enebo: I think someone copied that usecnt from mri without a reason

20:33 <lopex> mri need it so that it can free it when preprocessed pattern is not original one

20:33 <lopex> needs

20:34 <lopex> enebo: I wonder how mri does on pattern that change on preprocessing

20:34 <enebo> lopex: ok well I could not understand what it was for and the behavior was all commented out (and MRI code)

20:34 <lopex> since it's constant malloc/free

20:35 <lopex> we just hit preprocessed cache

20:35 <enebo> yeah

20:40 <enebo> lopex: the pos code you originally linked from MRI we basically do already

20:40 <enebo> in rbStrOffset(post)

20:40 <lopex> enebo: it's different

20:40 <enebo> It is generic in calling nth but it still is sb optimizable

20:40 Puffball has quit [Ping timeout: 240 seconds]

20:41 <lopex> enebo: it's rb_str_offset

20:42 Puffball has joined #jruby

20:42 <lopex> I was talking about rb_str_subpos

20:43 <enebo> lopex: but is it really different in behavior?

20:43 <enebo> lopex: I guess I don't see how that would be faster than rbStrOffset

20:44 <lopex> enebo: well

20:44 <lopex> enebo: it's different :P

20:44 <enebo> lopex: yeah

20:45 <enebo> lopex: It must have improved something I guess

20:45 <enebo> lopex: for VALID utf-8 we can probably use nirvdrum logic and walk continuation bytes

20:45 <enebo> lopex: but I would do that generically and not just for this

20:46 <lopex> enebo: what walk ?

20:46 <enebo> lopex: walk to subpos

20:47 <enebo> lopex: if we want to start a 5 it is five codepoints right?

20:47 <enebo> for valid UTF8 we need not examine every byte

20:48 <lopex> and ?

20:48 <lopex> well not every byte

20:48 <lopex> only first one

20:49 <lopex> enebo: yeah, I promised a call graph for those length functions

20:49 <lopex> we need to add another length to encoding

20:50 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

20:51 <enebo> lopex: yeah

20:51 <enebo> lopex: I still have never followed through on my assertion caching length on string would pay for itself

20:51 <lopex> no profile data

20:52 <enebo> lopex: There would be a tiny amount of cost for sb case

20:52 <lopex> so just guessing

20:52 <enebo> lopex: but it is so expensive in mbc case

20:52 <enebo> lopex: but yeah no evidence and it would not be faster for sure in sb case

20:52 <lopex> enebo: c deals with it from very beginning :P

20:53 <lopex> and most code ranges are sb

20:53 <enebo> I was told by MRI dev(s) that the reason length was never considered is because of lack of space in their struct

20:53 <lopex> or are they ?

20:53 <lopex> heh

20:54 <lopex> so why jruby doesnt do that ?

20:54 <enebo> lopex: well likely they are but killing perf for mbc at the cost of sb being a tiny bit slower might not be a good tradeoff

20:54 <lopex> because mri didnt have space for that

20:54 <enebo> we just have never tried

20:54 <enebo> and we ported their logic to some degree

20:54 <enebo> remember how long m17n took

20:54 Caerus has quit [Ping timeout: 268 seconds]

20:55 <lopex> it took me a year for string alone

20:55 <enebo> I think in my view of this was we were not going to deviate from MRI until we were confident we were correct

20:55 <lopex> yes

20:55 <lopex> that was my attitude as well

20:55 <enebo> so adding length could be done as an appendage/add-on but I also thought about using length field for CR as well

20:55 <lopex> and yet you have to be bug to bug compatible

20:56 <lopex> length for cr ?

20:56 <enebo> so negative values could indicate unknown/valid

20:56 <lopex> ah, I recall now

20:56 <enebo> arr.length with 7bit env is just length

20:56 <lopex> but it would have to go through centralized api

20:56 <enebo> err arr.length and length is 7bit

20:56 <lopex> otherwise you'd be lost

20:56 <enebo> well is is7bit() sure

20:57 <enebo> the methods would be super small and inline

20:57 <enebo> well I would lay money on that anyways

21:08 <enebo> lopex: joni is present!

21:08 <enebo> on maven

21:10 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

21:21 <nirvdrum> Whoa. That's a lot of backreading. Is it worth me catching up?

21:22 <enebo> nirvdrum: not really. we are just removing regions so you can implement match_p without setting them up

21:22 <enebo> nirvdrum: most of that was not understanding why we had regions and two ints

21:23 <enebo> nirvdrum: one this of interest is rb_str_subpos exists for pos repositioning for match_p (and a couple of other things). Why they did not just use their normal char walking code is a mystery

21:23 <enebo> could just be some microopt we don't really understand

21:25 <nirvdrum> So how are you tracking match positions if you remove regions?

21:25 <lopex> enebo: ok

21:25 <enebo> nirvdrum: no but match? doesn't

21:25 <lopex> nirvdrum: via two int fields in matcher

21:26 <enebo> well we do but we don't actually need to for match?

21:26 <lopex> "match?"

21:26 <enebo> nirvdrum: actually that was what started part of that discussion was that when regions are disabled we still calc beg/end for the match even though match_p doesn't need that

21:27 <enebo> It is a tiny amount of logic though so not likely very important

21:27 <nirvdrum> Ahh.

21:27 <enebo> lopex: you saw that joni is on maven repos

21:27 <nirvdrum> I think the only thing we do differently for `match?` right now is avoid setting `$~`.

21:27 <lopex> enebo: I believe you

21:27 <lopex> enebo: I was testing with local copy

21:27 <enebo> lopex: just making sure you know :)

21:28 <lopex> nirvdrum: now you can force the region to be null

21:28 <lopex> using reg.matcherNoRegion

21:28 <nirvdrum> Nifty.

21:28 <enebo> nirvdrum: yesterday we talked about idea we can make a more specialized interp for match_p to shave some of this regionish logic out but it likely would not be a big gain

21:29 <enebo> nirvdrum: but since we have the method we can change that impl any time if we decide to play with it

21:29 <nirvdrum> Cool.

21:29 <lopex> nirvdrum: also there's a possibility to use separate interpreter that omits some group logic in joni

21:29 <nirvdrum> I suppose my next big mountain to climb is really figuring out what joni is doing.

21:30 <nirvdrum> We ported code from JRuby and slapped a boundary around the whole thing. It mostly works, but isn't ideal.

21:30 <lopex> like the group is not referred by \1 for example

21:30 <nirvdrum> But I never really know where to start.

21:30 <lopex> but groups largely group so it's invitable for the most part

21:31 <lopex> enebo: I wonder how much the semantics changes when you just (?..)

21:31 <lopex> er (?:..)

21:32 <nirvdrum> Not even remotely related to what you guys are talking about, but I'd really, really, really like to get basic regexp patterns without capture to be as fast as a substring search.

21:32 <lopex> enebo: we could have external array which says which groups are cpaturing

21:32 <enebo> lopex: oh so you mean we region (?:...) along with capturing regions as same data?

21:32 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:32 <enebo> lopex: if so then won't match? be broken

21:32 <lopex> nirvdrum: but even them there is a question what fast skip algo you use

21:33 <lopex> enebo: something like changing capturing to not capturing

21:33 <lopex> enebo: if not referred

21:34 <enebo> I am not quite following. how would we know if it was referred or not

21:34 <enebo> (?: ...) is never referred

21:34 <nirvdrum> We end up down paths with code like Regexp.new(Regexp.quote(pattern)) and pattern ends up being ',' or "\n".

21:34 <lopex> nirvdrum: like for example /foo/ you can already use build boyer moor map

21:34 <enebo> but () may be but unless you mean \1 then I don't get it

21:34 <lopex> nirvdrum: but regexp mostly are known at parse time

21:34 <lopex> so it's all tradeoffs

21:35 <nirvdrum> So the regexp is a single ASCII char, no modifiers, no bounds, no captures.

21:35 <nirvdrum> It ideally would work the same as indexOf.

21:35 <lopex> enebo: (?:...) is just not capturing

21:35 <enebo> AST generation does validate regexp so we could allow joni to know if it is smple string

21:35 <enebo> lopex: isn't it?

21:35 <enebo> I never remeber the syntax

21:35 <nirvdrum> enebo: In this case it'd be a runtime thing.

21:35 <enebo> I thought (?: was non-capturing group

21:35 <lopex> enebo: not capturing group

21:36 <lopex> enebo: I said otherwise ?

21:36 <enebo> lopex: I don't understand what you are asking or why now

21:36 <nirvdrum> Anyway. I didn't mean to derail your conversation.

21:37 <lopex> enebo: well we are in agreement, I'm confused

21:37 <enebo> nirvdrum: well AST can be marked as regexp but we could also mark it as no special chars

21:37 <enebo> lopex: you brought up non-capturing and then said something about tracking them separately from regions

21:37 <enebo> lopex: so I did not bring this up at all. I think I just did/do not understand what you meant before

21:38 <enebo> nirvdrum: at IR build time for us we could implement it as simple string search

21:38 <enebo> nirvdrum: or we could make joni have an optimized implementation for just that

21:39 <lopex> enebo: something like the capture can be disabled later on

21:39 <enebo> lopex: oh! like we can tell after we have run it that the code using it never requires backref so we remove the regions?

21:40 <lopex> enebo: yes, but just changing the interpreter loop

21:40 <enebo> It is unfortunate that $~ lives past current stack

21:40 <lopex> and some array of numbers

21:41 <nirvdrum> enebo: For match? you could just rewrite it to String#index(pattern) != 0. But I'd like to have it optimized for `match` as well. In that case you would need to know the match boundaries.

21:41 <lopex> nirvdrum: wrt tradeoffs, something like "looongstringbefore abcd" =~ /abcd/

21:41 <nirvdrum> I just think joni ends up going down a more complicated path.

21:41 <lopex> nirvdrum: joni will build a boyer moor map for abdc

21:42 <lopex> nirvdrum: and then fast skip to the interesting point before it even enters interpreter loop

21:42 <nirvdrum> lopex: This is where I shamefully admit I don't know what that is :-P

21:42 <lopex> nirvdrum: not all indexOfs have this

21:42 <lopex> nirvdrum: boyer moore ?

21:42 <nirvdrum> Yeah. I'll read up on it.

21:42 <lopex> nirvdrum: nowadays mri uses sunday search, it's a modification

21:43 <enebo> lopex: I think notion though at parse time knowing it can be something much simpler means not feeding it into the engine of joni

21:43 <lopex> nirvdrum: but in gist you just build a skip map given a string you search for

21:43 <nirvdrum> Really, my understanding of regexp engines is limited to foundational automata. The pumping lemma and such.

21:43 <lopex> it's just string searching algos

21:43 <enebo> even it joni is super fast all the code around getting to that fast execution is not free

21:44 <lopex> nirvdrum: and you advance faster given that map

21:44 <nirvdrum> Gotcha.

21:44 <lopex> but you have to build it first, so indexOF could do that on some length threshold

21:45 <enebo> I think I am on a different wavelength on optimizing that case now

21:45 <enebo> I don't think it should ahve anything to do with joni other than joni pointing out it is this simple case

21:45 <lopex> nirvdrum: https://github.com/k-takata/Onigmo/blob/master/regexec.c#L3410

21:45 <nirvdrum> lopex, enebo: Perhaps I'm advocating for making some of these operations encoding and code range aware.

21:46 <nirvdrum> But I say that naively not having looked at the internals. If it's just a byte machine that may not even matter.

21:46 <lopex> wtf is Horspool

21:46 <enebo> even if joni is faster at finding the match on a simple string all the shit we plow through before we hit that fast code is substantial

21:46 <nirvdrum> indeed.

21:47 <nirvdrum> And Graal isn't going to help us inline through it.

21:47 <enebo> for truffle no doubt would just make a very simple specialized path for it

21:47 <nirvdrum> TRegex may do that. I haven't played with it yet.

21:47 <enebo> for IR we could do it a couple of ways

21:48 <enebo> a ~= /a/ would still need to say it is executing match in stack trace so some sleight of hand

21:48 <nirvdrum> I could provide a specialization for these simple cases, but then I need to maintain my own equivalent to regions and such so the `MatchData` instances can be constructed properly. It's doable, but I certainly don't want my own ad hoc limited regexp engine.

21:48 <lopex> nirvdrum: https://www.youtube.com/watch?v=hj4VmvyqbKY benchmarking talk, truffleruby is there

21:49 <lopex> in case you havent seen that

21:49 <enebo> nirvdrum: but literally only for cases like /\n/ where match would be pretty simple

21:49 <enebo> start/end is trivial in simple substring match

21:49 <nirvdrum> lopex: I haven't. But I'll check that out. I believe Chris has worked with Edd in the past.

21:51 <lopex> though I always saw degradations before steady state was riched on hotspot

21:51 <lopex> reached even

21:51 <nirvdrum> enebo: I ended up down this chain of thought when encountering this snippet from csv.rb: parse.sub!(@parsers[:line_end], "")

21:52 <enebo> oh yeah and in fact this would be more complicated for us since sub! is a call

21:52 <nirvdrum> Which basically is the same thing as String#chomp, but looks up a regexp from a map and uses that as an argument to String#sub!

21:53 <lopex> beauty

21:53 <enebo> so we would need to pass in a type which had match/whatever but was not specially a RubyRegexp

21:53 <nirvdrum> I think it was written this way so you could use something other than "\n" to demarcate different rows. But, I doubt anyone ever really does that.

21:54 <enebo> yeah I doubt that as well but \r\n may be possible perhaps

21:56 <lopex> enebo, nirvdrum: this is what onigmo uses to match newlines in almost every opcode related https://github.com/k-takata/Onigmo/blob/master/regexec.c#L68

21:56 <lopex> can it get any worse ?

21:56 <lopex> it's very hot code

21:56 <lopex> we dont have this yet

21:59 <enebo> lopex: who knows

22:01 <nirvdrum> lopex: So if it's not CRLF it call ONIGENC_IS_MBC_NEWLINE? What's a MBC newline?

22:02 sgeorge has quit [Remote host closed the connection]

22:02 <lopex> enebo: our isNewLine on encoding

22:03 <lopex> nirvdrum: ^^

22:03 <nirvdrum> Okay.

22:03 <lopex> but you get that option check

22:03 <nirvdrum> I guess I'd really like to see something that checks the last byte for LF if the encoding is ASCII-compatible.

22:06 sgeorge has joined #jruby

22:11 sgeorge has quit [Ping timeout: 264 seconds]

22:22 ahorek has joined #jruby

22:28 ahorek has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

22:52 sgeorge has joined #jruby

23:19 sgeorge has quit [Remote host closed the connection]

23:20 sgeorge has joined #jruby

23:24 sgeorge has quit [Ping timeout: 244 seconds]