#jruby on 2018-07-24 — irc logs at freenode.irclog.whitequark.org

2018-05-24 16:34 ChanServ changed the topic of #jruby to: Get 9.2.0.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:12 jrafanie has joined #jruby

00:26 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

01:36 jrafanie has joined #jruby

02:27 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

04:16 sgeorge has joined #jruby

04:24 sgeorge has quit [Remote host closed the connection]

04:45 Puffball_ has joined #jruby

04:48 Puffball has quit [Ping timeout: 240 seconds]

05:10 damnski has quit [Ping timeout: 240 seconds]

05:15 damnski has joined #jruby

05:15 slyphon has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

05:28 <GitHub151> [jruby] kares pushed 2 new commits to master: https://git.io/fN4cS

05:28 <GitHub151> jruby/master 77a768c kares: [refactor] convert bytes to int without regexp

05:28 <GitHub151> jruby/master 0b7b181 kares: [refactor] minor warnings - replace char can be used

05:32 sgeorge has joined #jruby

05:36 sgeorge has quit [Ping timeout: 260 seconds]

06:30 <GitHub137> [jruby] kares opened pull request #5259: [refactor] date parse internals to avoid $frame vars (master...date-speed) https://git.io/fN4l0

06:41 Caerus has quit [Quit: Leaving]

07:15 rdubya has quit [Ping timeout: 276 seconds]

07:25 shellac has joined #jruby

07:40 shellac has quit [Quit: Computer has gone to sleep.]

08:04 shellac has joined #jruby

08:58 drbobbeaty has joined #jruby

09:15 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

10:08 <kares> enebo: oh right, seems it has issues - I assumed its a flaky build, since the CI isn't 100% stable

10:09 <travis-ci> jruby/jruby-openssl (master:3f25f72 by kares): The build has errored. (https://travis-ci.org/jruby/jruby-openssl/builds/395468396)

10:09 <kares> will get it green by reverting or trying newer jruby for Maven - its just that its really hard to get the maven jruby plugin right for the whole matrix ;(

10:16 <GitHub47> [jruby-openssl] kares created ci-green (+1 new commit): https://git.io/fN4r0

10:16 <GitHub47> jruby-openssl/ci-green 33475eb kares: [build] try using later 1.7 JRuby for maven plugin

10:29 jrafanie has joined #jruby

10:35 <travis-ci> jruby/jruby-openssl (ci-green:33475eb by kares): The build has errored. (https://travis-ci.org/jruby/jruby-openssl/builds/407541613)

10:54 drbobbeaty has joined #jruby

10:59 <GitHub171> [jruby-openssl] kares pushed 1 new commit to ci-green: https://git.io/fN46F

10:59 <GitHub171> jruby-openssl/ci-green 91c5ad1 kares: [ci] do not use failing to start 1.7 JRubies...

11:00 rdubya has joined #jruby

11:02 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

11:06 rdubya has quit [Ping timeout: 256 seconds]

11:17 <travis-ci> jruby/jruby-openssl (ci-green:91c5ad1 by kares): The build passed. (https://travis-ci.org/jruby/jruby-openssl/builds/407555800)

11:21 ahorek has joined #jruby

11:26 ahorek has quit [Client Quit]

11:29 rdubya has joined #jruby

11:34 Puffball_ has quit [Read error: Connection reset by peer]

11:41 Puffball has joined #jruby

11:57 <GitHub18> [jruby-openssl] kares merged ci-green into master: https://git.io/fN4ME

11:57 <GitHub127> [jruby-openssl] kares deleted ci-green at 91c5ad1: https://git.io/fN4Mg

11:57 <kares> jossl CI should be green now

12:07 <travis-ci> jruby/jruby-openssl (master:91c5ad1 by kares): The build passed. (https://travis-ci.org/jruby/jruby-openssl/builds/407574100)

12:19 sgeorge has joined #jruby

12:22 shellac has quit [Quit: Computer has gone to sleep.]

13:01 shellac has joined #jruby

13:19 sgeorge has quit [Remote host closed the connection]

13:23 <enebo> kares: coolio

13:26 <enebo> kares: I don't know if I commented earlier about another idea for date parsing or not

13:27 <enebo> In one of those parse benches I notices it does 7 sub! but the first 4 do not actually sub until it reaches iso date

13:27 <enebo> As an experiment I removed those first 4 and we are barely slower than MRI

13:27 <enebo> Obviously that is not a solution but it does definitely show sub! is expensive

13:28 <enebo> as you noticed based on your latest PR

13:28 <enebo> I half wonder how fast we would be once we make match? work quickly to if match? then sub!

13:29 <enebo> This may even be compatible with your latest PR too

13:31 <enebo> lopex: you look at removing joni machinery for faster match?

13:39 Caerus has joined #jruby

13:47 shellac has quit [Quit: Computer has gone to sleep.]

13:49 sgeorge has joined #jruby

13:52 slyphon has joined #jruby

13:52 sgeorge has quit [Read error: Connection reset by peer]

13:52 sgeorge_ has joined #jruby

13:57 sgeorge_ has quit [Ping timeout: 268 seconds]

14:05 sgeorge has joined #jruby

14:06 <ChrisBr> enebo: headius: tests look quite good, however, some three suites time out. Did you experience this already? https://travis-ci.org/jruby/jruby/builds/407613595

14:07 sgeorge has quit [Remote host closed the connection]

14:07 sgeorge has joined #jruby

14:09 shellac has joined #jruby

14:09 jrafanie has joined #jruby

14:11 <enebo> ChrisBr: yeah those test:mri ones look odd

14:12 <enebo> ChrisBr: we do sometimes have timeouts but all three of those seem to be in the same spot

14:12 <enebo> ChrisBr: can you run test:mri and see if you see that

14:12 <enebo> I will try and eliminate some of this red this morning...looks like we added a param to CompiledIRMethod and missed it failing our spec:compiler run

14:18 <ChrisBr> enebo: hm ok

14:19 <enebo> ChrisBr: it is weird for all three to hang in the same place

14:19 <ChrisBr> oh is it the same place?

14:19 <enebo> we have spurious tests we should remove but I have never seen 3 runs of test:mri hang in same place in a single run

14:20 <enebo> well I think so. The last text all seem to be around some cgi escaping tests

14:20 <ChrisBr> locally I can not even start the test :/

14:20 <enebo> oh yeah? what happens?

14:20 <ChrisBr> nothing

14:21 <ChrisBr> is stall after starting mri test suite

14:21 <ChrisBr> anyway, already have sth suspicious, maybe some infinite loop in the base iterator

14:21 <ChrisBr> need to leave now! Thanks! Cu tomorrow

14:25 <GitHub48> [jruby] enebo pushed 1 new commit to master: https://git.io/fNBf3

14:25 <GitHub48> jruby/master d7b657b Thomas E. Enebo: Fix spec for removed hasKwargs parameter

14:25 <enebo> ChrisBr: ok cya

14:36 slyphon has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

14:43 shellac has quit [Quit: Computer has gone to sleep.]

14:51 shellac has joined #jruby

14:56 shellac has quit [Quit: Computer has gone to sleep.]

14:58 slyphon has joined #jruby

15:21 shellac has joined #jruby

15:30 shellac has quit [Quit: Computer has gone to sleep.]

15:34 shellac has joined #jruby

16:02 xardion has quit [Remote host closed the connection]

16:07 xardion has joined #jruby

16:07 <kares> enebo: yeah - that was my line of thinking (already avoided proper sub! but the replacement isn't that much faster)

16:08 <kares> PR is an experiment but there's a few match?-es around now so if that improves parsing improves

16:08 <kares> > does 7 sub! but the first 4 do not actually sub until it reaches iso date

16:09 <kares> yeah but MRI does those too ... although slightly differently

16:09 ahorek has joined #jruby

16:10 <kares> we could re-arrange iso to go first :) ... most common those eu/us dates with month abbreviations shouldn't be first

16:10 <lopex> enebo: not removing, just guarding

16:11 <lopex> we could pass that via options I guess

16:15 <kares> oh I see they have some checks before going down the rabbit hole of sub! .. will try it out and report on the PR

16:23 <enebo> kares: I wondered about rearranging but I figured they are subsets of iso but not the same (e.g. month and day are swapped)

16:24 <enebo> kares: I think match? will be 2-3x faster than sub if joni is not building match data behind the scenes

16:25 <enebo> so match? of same regexp before subs would give something back

16:25 <enebo> although at this point I keep wondering if we should just go full native as the short term fix

16:25 <enebo> lopex: guarding means what? just not saving if something is set?

16:25 <lopex> enebo: not creating regions upfront

16:26 <enebo> lopex: so not making lots of data for matched regions?

16:26 <lopex> joni differs wrt that a bit

16:26 <lopex> enebo: but I guess not saving the backrefs will be biggest gain

16:27 <lopex> since it's thread local access right ?

16:27 <enebo> lopex: kares basically did this for a variant of sub! and it was not a lot faster

16:27 <lopex> enebo: regions themselves are tiny, and populating them is cheap too

16:27 <lopex> which of course we should get rid of too

16:27 <enebo> lopex: so it begs why oni is 3x faster than joni then

16:28 <enebo> for match?

16:28 <lopex> so it will be region population them

16:28 <enebo> yeah I am thinking it must since joni always has done well vs oni

16:28 <enebo> but what do I know :P

16:28 <lopex> enebo: as I mentioned in the commen they have different impl for calling oni

16:29 <enebo> lopex: ah yeah

16:29 <enebo> lopex: So we have no option for region so we just make them or something like that?

16:30 <lopex> enebo: https://github.com/ruby/ruby/blob/trunk/re.c#L3324

16:30 <lopex> they pass NULL to onig_search there

16:31 <lopex> yep

16:31 <enebo> lopex: yeah mechanism could just a boolean for us and then not make region

16:32 <lopex> yes

16:32 <enebo> lopex: unless reusing region per thread somehow made sense :)

16:32 <lopex> and for backrefs population

16:32 <enebo> lopex: that is a different topic though

16:32 <lopex> enebo: we can reuse whole matcher per thread though

16:33 <enebo> lopex: so much the better if it makes sense

16:33 <enebo> lopex: I realized strftime in applications almost always use the same date strings so I added a runtime cache

16:33 <lopex> enebo: https://www.youtube.com/watch?v=hj4VmvyqbKY

16:34 <enebo> lopex: it dramatically sped up strftime since it was already compiled

16:34 <enebo> lopex: yeah even without viewing this I believe I know what will be said

16:35 <enebo> lopex: in Rails we things like date/times end up using the same regexp/date formats per application and not many of them

16:35 <lopex> enebo: runtime cache for regexp ?

16:35 shellac has quit [Quit: Computer has gone to sleep.]

16:36 <enebo> lopex: pre-compile regexps we see more than n times?

16:36 <lopex> enebo: or did you reuse the existing cache ?

16:36 <enebo> err cache

16:36 <enebo> no

16:36 <lopex> ok

16:36 <enebo> lopex: we have regexp cache then?

16:36 <lopex> yes

16:36 <enebo> ok good

16:36 <lopex> weakref one

16:36 <kares> there's already a cache being reused

16:36 <lopex> soft rather

16:36 <lopex> getRgexpFromCache afaik

16:36 <enebo> lopex: If I knew this I forgot about it

16:36 <kares> except not for any regexps having modifiers

16:36 <enebo> oh

16:36 <lopex> enebo: there's three of them

16:36 <kares> thus a bunch of in date parsing doesn't get reused

16:37 <lopex> for quoted and processed regexps too

16:37 <kares> all having //i

16:37 <enebo> haha well that sucks for some of those date ones

16:37 <enebo> i seems like a reasonable thing to cache though

16:37 <kares> haven't checked joni if theres aditional checking - just to caching I have seen in RubyRegexp

16:38 <kares> but for this we could pbly just write them in native and re-use

16:38 <enebo> so iso8601 is not cached because /ix?

16:38 <enebo> That is a big regexp

16:39 <enebo> kares: perhaps caching that will get out perf better than MRI

16:39 <enebo> kares: I also thought about native benefit of value object in Java vs Ruby Hash

16:39 <enebo> for your recent change

16:39 <enebo> m[:hour]

16:39 <enebo> which made me wonder if we are destined short-term just to go full native on this

16:40 <lopex> kares: it's options aware

16:40 <lopex> and encoding

16:40 <enebo> lopex: so //ix regexp caches?

16:40 <lopex> I hope I dont lie to you

16:40 <enebo> lopex: hey every day is a new day to me...I won't blame you if you are wrong

16:40 <lopex> enebo: it should

16:40 <enebo> lopex: ok well that would have been a big find if it isn't

16:41 <lopex> enebo: getRegexpFromCache

16:41 <enebo> yeah this should be fine unless our impl is broken and joni.options is not returning same stuff

16:42 <enebo> pretty trivial for me to test I guess

16:42 <kares> so its being cached then? cool!

16:42 <kares> oh you're still not sure :)

16:42 <lopex> well, if it goes through cache of course

16:42 <enebo> well if all regexps go through this method this should be doing it

16:43 <enebo> unless it alternates the same regexp with difference encodings or options

16:43 <enebo> or we have a bug in the impl

16:44 <enebo> A better impl of this would be to encode the options in front or back of bytelist

16:44 <enebo> then you can have multiple options of same regexp pattern in the cache

16:44 <enebo> I guess you pay the price of adding those bytes to the bytelist during lookup

16:44 <enebo> or two param hash

16:45 <enebo> anyways also not relevant for this problem

16:45 Caerus has quit [Read error: Connection reset by peer]

16:46 <kares> which getRegexpFromCache path were you guys looking at?

16:46 <kares> I followed a factory from native and it did not have the cache there for anything having options

16:46 <kares> although I should revisit closely my assumptions ...

16:46 <enebo> kares: ah where does it start calling into regexp?

16:47 <enebo> kares: I am so hoping you are right :)

16:47 <kares> well I only looked at moving some small ones into native

16:47 shellac has joined #jruby

16:47 <enebo> I only see RubyRegexp constructor and regexpInitialize call through it

16:47 <kares> yep that one

16:48 <enebo> they both ask the cache

16:48 <enebo> It is interesting to see how much happens before the cache lookup in that method

16:48 <kares> ah right I see it now

16:48 <kares> missed it - sorry for the confusion

16:48 <enebo> kares: np

16:49 <kares> yeah - that is pnly why I missed it

16:49 <kares> since for the other case the cache lookup was in ctor

16:50 <enebo> we need regexpcallsite

16:50 <enebo> then it can bootstrap all this shit and just keep the pattern at the site

16:51 sgeorge_ has joined #jruby

16:51 sgeorge has quit [Read error: Connection reset by peer]

16:51 <enebo> lol RegexpObjectSite

16:51 <enebo> we seem to have it with indy on but I don't think it elides any of this logic

16:52 <enebo> but it saves the regexp so I guess it works

16:52 <kares> heh

16:52 <kares> RubyRegexp seems to be thread-safe so I might try to keep the same instance around in an internal var

16:53 <enebo> kares: you mean native or Ruby?

16:53 <enebo> kares: using Joni directly may shave some more overhead off too

16:54 <lopex> enebo: but you can ask the cache youreselfe

16:54 <kares> native - I do have some smaller ones rewriten

16:55 <kares> but for some reason it got way slower - I think there's smt wrong cause it should be around ~ same

16:56 <enebo> I see the sun/mon/tue section of your PR

16:57 <enebo> kares: HAHAHA I think I have a weird way of doing day_num faster

16:57 <enebo> ((sun)|(mon)|(tue)...)

16:57 <enebo> if $1 is set then if $4 is set then that is tue

16:58 <enebo> not sure if joni will be better than extracting the text then doing caseinsensitivecmp

17:00 <kares> enebo: already have that in native :) based on C

17:00 <enebo> yeah I mean faster than what you have in native

17:00 <kares> yeah MRI is doing exactly that - caseinsensitivecmp

17:01 <enebo> or maybe faster

17:01 <kares> ah okay - let's have it :)

17:01 <enebo> since you will be average 3.5 nil checks

17:01 <enebo> /((sun)|(mon)|(tue)....)/i will match $1 for day match then you look from $2-$8 for which day is set

17:02 <enebo> only unknown is region setup slower or faster

17:02 <enebo> It very well may be way slower

17:02 <enebo> I just wondered since it would be not searching again and doing case insensitive compare

17:03 Caerus has joined #jruby

17:03 <enebo> a hashmap may also work out better since bytelist caches its hash calc

17:05 sgeorge_ has quit [Remote host closed the connection]

17:05 <enebo> kares: with regards to being slower if you are using compile.invokedynamic at all the bytecode is pretty much storing the completely processed regexp into a field so some processing would go away

17:06 <enebo> kares: although if you do as you said above and stick it into a field you should see if that overhead matters

17:18 Caerus has quit [Read error: Connection reset by peer]

17:28 <kares> ok will do some testing on that

17:28 <kares> wonder what are those bytes.getClass()/str.getClass() lines around RubyRegexp for

17:30 <kares> e.g. https://github.com/jruby/jruby/blob/master/core/src/main/java/org/jruby/RubyRegexp.java#L1013

17:30 <kares> let's look at git history ...

17:31 sgeorge has joined #jruby

17:32 <kares> they seem like NPE guards? https://github.com/jruby/jruby/commit/2df9b8c03f613f5e6429849747c684ba098cf023#diff-d1c53ff7045763cde59295b8488c99d0

17:38 shellac has quit [Quit: Computer has gone to sleep.]

17:47 shellac has joined #jruby

18:01 shellac has quit [Quit: Computer has gone to sleep.]

18:23 <enebo> lopex: regexp.numMem seems to be main value which defines whether to set up Regions

18:24 <enebo> lopex: If somehow we could config it to 0 would all regexps continue to work for match? (e.g. is not doing any captures for $~)

18:24 <lopex> but we need a separate flag

18:24 <enebo> lopex: why is that?

18:25 <lopex> we cant change nummem

18:25 <lopex> it's a property of a parsed regexp

18:25 <enebo> so all places which makes regions from that value will need to also check a second property?

18:25 <lopex> and regexp still needs to use it's groups so that (.)\1 works

18:25 <enebo> then we won't make the regions but all will still work

18:25 <lopex> yes, that's how mri does it

18:25 <enebo> ok

18:26 <lopex> why I said we could use options for that

18:26 <lopex> *thats

18:26 <enebo> yeah I know you said that but I looked at the code now a little bit

18:27 <enebo> Option.DONT_CAPTURE_GROUP

18:27 <enebo> lopex: what is that?

18:28 <enebo> if ((option & (ONIG_OPTION_DONT_CAPTURE_GROUP|ONIG_OPTION_CAPTURE_GROUP))

18:28 <enebo> == (ONIG_OPTION_DONT_CAPTURE_GROUP|ONIG_OPTION_CAPTURE_GROUP)) {

18:28 <enebo> looks like oni uses slightly different names now

18:29 <enebo> oh actually it is same name

18:30 <lopex> it's only used during parse

18:33 <lopex> enebo: https://github.com/ruby/ruby/blob/trunk/regexec.c#L1745

18:33 <enebo> lopex: yeah so largely we need msaRegion in our code to not alloc a structure so it will always be null

18:34 <enebo> lopex: and an option to do that I guess

18:34 <enebo> if (region != null) {

18:34 <enebo> this.msaRegion = regex.numMem == 0 ? null : new Region(regex.numMem + 1);

18:35 <lopex> and this has to be done at matcher construction

18:35 <enebo> so I guess somehow something here which looks at option which we can get from regexp?

18:35 <enebo> hmm yeah I don't see Option in Regexp

18:35 <lopex> yeah, anyways that's my first idea, I tried to come up with comething better

18:36 <enebo> oh options is there

18:36 <lopex> yes

18:36 <lopex> and not in regexp

18:36 <lopex> in a matcher

18:36 <lopex> there's two kinds of options

18:36 <lopex> regexp and match

18:36 <enebo> so I think we just make a new Option (give a name) and we just ask regexp.options & whatever in Matcher constructor

18:37 <lopex> but I dont like it for some reason

18:37 <enebo> Are all Option.* from regexp?

18:37 <lopex> they're mixed

18:37 <enebo> or are some for matcher

18:37 <enebo> ok

18:37 <lopex> they overlap even somewhere afaik

18:37 <enebo> I think for sake of experimentation it would be pretty simple but I guess I need to see how Matcher is made

18:38 <lopex> vie matcher factory

18:38 <enebo> Regexp.matcher

18:38 <enebo> but it passed no options

18:38 <lopex> enebo: it's passed via search and match

18:38 <lopex> just like in oni

18:39 <lopex> just that the condition for creating region is earlier during match constructiojn

18:40 <enebo> Matcher matcher = reg.matcher(strBL.unsafeBytes(), beg, beg + strBL.realSize());

18:40 <enebo> so I see how we are calling some of these

18:40 <enebo> lopex: so you don't like passing in a Match option to the Regexp since you may or may not want the matcher to use regions

18:40 <lopex> I think we could just add a flag to that

18:41 <enebo> lopex: so we should create an int options to matcher method

18:41 <lopex> enebo: all this complication is so that we can reuse matches

18:41 <lopex> unlike mri does

18:41 <enebo> oh MRI always remakes them

18:41 <lopex> er, matcher

18:41 <enebo> do we reuse matcher?

18:42 <lopex> enebo: like in scan etc

18:42 <lopex> we use same instance over again

18:43 <enebo> ok well so adding an overload with options field which we right now only check for some new option for eliminate region would be good enough?

18:43 <enebo> or basically the design you think would work best

18:43 <enebo> matcher(byte[], int beg, int len, int options)

18:43 <enebo> and what Option should we make for that?

18:46 <lopex> I'd go for a boolean at most, or even directly allocate/no allocate region in overloaded versions

18:46 <lopex> like matcheNoRegion

18:46 <enebo> why do we even have a MatcherFactor?

18:47 <lopex> because there's alternative one for asmified regexps

18:47 <enebo> I see it

18:49 <enebo> the easiest way to impl it would be to add boolean to matcher in Regexp use factory and then set some boolean on base Matcher after it is made

18:49 <enebo> which is probably not the most elegant way to do it

18:49 <lopex> actially I'm for this matcherNoRegion thingy

18:50 <enebo> so ByteCodeMachine is version with that logic

18:51 <enebo> and we extend and make another opEnd method?

18:51 <lopex> no

18:51 <lopex> ah, I dont mean that at all

18:51 <enebo> hahaha ok

18:51 <lopex> I mean another method in matcher factory

18:52 <enebo> ah so two methods one for regions and one for no regions

18:52 <lopex> one step ahead and we're just at where mri is

18:52 <lopex> where region is created externally

18:52 <enebo> oh so we make region and pass into Matcher

18:52 <lopex> except that we still can reuse matchers

18:52 <enebo> or don't

18:53 <enebo> and factory can do that

18:53 <lopex> yes, plust that zero group version

18:53 <lopex> condition

18:53 <enebo> I am almost following

18:53 <lopex> I think that's better

18:54 <enebo> the constructor of ByteCodeMachine sets up based on regexp.numMem

18:54 <lopex> I'll do that

18:54 <enebo> lopex: ok. I am super super excited to get this in

18:54 <enebo> lopex: at least I am really hoping it speeds up match?

18:54 <lopex> enebo: the whole hierarchy is reversed so that asm version fits in

18:55 Caerus has joined #jruby

18:55 <lopex> it started the other way round

18:55 <lopex> stack machine, bytecode, matcher

18:56 <enebo> do we use AsmCompilerSupport?

18:56 <lopex> no

18:56 <enebo> I could see how we could emit it for literal regexp in JIT

18:56 <lopex> but it's still a good start

18:56 <enebo> Not sure how much it would help or not

18:56 <enebo> It would get rid of one level of indirection of having a callsite

18:57 <enebo> It could be really useful for named captures

18:57 <enebo> If we provide some asm hook for setting lvar values for named captures

18:57 <enebo> I guess we still need to make backref

18:58 <lopex> afair I was overwhelmed how to compile subexp calls in asm

18:59 <lopex> but most of bytecode compilation wouldnt be that hard

18:59 <enebo> JIT joni bytecode still seems a good idea

19:00 <lopex> more of a aot

19:00 <lopex> well, for throw away regexps the interpreter still would be better I guess

19:01 <enebo> yeah it would be for same purpose of a JIT though...call regexp enough to be hot you generate bytecode which does it

19:02 <enebo> I was thinking it has some nice properties too if you JIT an engine for a specific regexp you can probably delete all unused instrs from it

19:02 <enebo> and if it has no captures or something like that you can remove a lot of logic which is not used

19:02 <enebo> I guess JVM JIT is pretty good at eliminating unused stuff so maybe it would be a wash

19:02 <lopex> oh, I though about completely separate logic for that compiler

19:02 sgeorge has quit [Remote host closed the connection]

19:03 <lopex> thought

19:03 <enebo> well it seems there are multiple ways this can be done

19:03 <lopex> it would go from ast

19:03 <enebo> sure

19:03 <lopex> directly, since all of the optz are on ast

19:04 <lopex> enebo: there's a template AsmCompiler.java

19:04 <enebo> I was looking at instrs since it is closer to IR and how our JIT operates

19:05 <lopex> I begun to install all bitsets, and string templates as fields

19:05 sgeorge has joined #jruby

19:10 sgeorge has quit [Ping timeout: 268 seconds]

19:17 <enebo> lopex: did quick and dirty. I will see if I can see a benefit

19:17 <enebo> https://gist.github.com/enebo/9cf33e39a7d92cb81f2129d4985b70b0

19:21 <enebo> no = pattern.nameToBackrefNumber(sBytes, name, nameEnd, regs);

19:21 <enebo> lopex: did some methods go away in joni?

19:21 <lopex> enebo: it's in scan env now

19:22 <enebo> how do I get that?

19:22 sgeorge has joined #jruby

19:23 <lopex> whoops

19:24 <lopex> I thought I checked that

19:24 <enebo> 4 callers in JRuby for it

19:24 <lopex> yep

19:24 <lopex> https://github.com/jruby/joni/commit/558252b052c2513559db04167354317d0f9cc388

19:25 <enebo> lopex: this is fine so I guess Matcher could have this method on it

19:26 <enebo> lopex: or expose getScanEnvironment

19:26 <enebo> lopex: seems like not exposing that is more desirable

19:26 <lopex> scan env is on parser, so it's no longer there

19:26 <enebo> it is in lexer and that is what Matcher is right?

19:26 <lopex> no

19:27 sgeorge has quit [Ping timeout: 240 seconds]

19:27 <enebo> hah ok yeah

19:28 <enebo> oh yeah it goes away

19:38 Caerus has quit [Ping timeout: 240 seconds]

19:46 sgeorge has joined #jruby

19:47 claudiuinberlin has joined #jruby

19:59 <lopex> enebo: yeah, I was too fast with that nametable move

20:00 <lopex> enebo: when do you want to push this in ?

20:00 <enebo> lopex: IMMEDIATELY

20:01 <enebo> lopex: seriously though match? is being used for perf by MRI so we should be faster

20:01 <enebo> and I think we can use it to speed up date parsing

20:01 <lopex> internally ?

20:01 <lopex> or by the aps ?

20:01 <enebo> I think Ruby libraries are using match? now because it is several times faster

20:01 <enebo> but in JRuby it isn't

20:02 <enebo> I guess I mean to say it is becoming a popular method because it is fast

20:10 mistergibson has joined #jruby

20:29 <enebo> lopex: https://gist.github.com/enebo/7ce5c166fc89890b2e933954d1f5cb4e

20:29 <enebo> So we got a pretty good bump in performance not creating regions but it is not nearly what MRI got from it

20:40 <lopex> enebo: and the backref logic ?

20:40 <enebo> supposedly

20:40 <enebo> I am removing more code now though

20:40 <enebo> holder[] and some crap like that which is not needed

20:41 <enebo> I did get a little bit more

20:41 <enebo> 2-2.1Mop/s to about 3M

20:42 <enebo> it was like 2.7 and 2.9 before I removed holder

20:42 <enebo> so 3M i/s vs 5.2 i/s on MRI

20:43 <enebo> so we gained about 50% perf removing region and not messing with setting threadlocal backtrace

20:43 <lopex> enebo: it's rb_reg_search0 vs rb_reg_match_p

20:43 <enebo> some of that is just removing the need for stuff like holder

20:43 <enebo> so we need a different entry point then?

20:44 <enebo> I did just continue to use matcherSearch but with regionless Matcher

20:45 <enebo> although we also can interrupt our regexps

20:45 <enebo> oh I see we can do match in SearchMatch

20:47 <enebo> changing SearchMatch to use matchInterruptible improved it more

20:47 <lopex> though they still use onig_search in rb_reg_match_p

20:48 <enebo> up to 4.3M i/s and 4.8 i/s

20:48 <enebo> so getting much closer to MRI now

20:48 <lopex> but all tainting, freezing and backref is not used

20:48 <lopex> enebo: do you also return true/false ?

20:48 <enebo> yes?

20:49 <lopex> well, current internal wont

20:49 <enebo> with graal CE rc4 we are at 5.5M i/s

20:49 <lopex> jruby internalss

20:49 <enebo> return context.runtime.newBoolean(matchPos(context, str, 0) >= 0);

20:49 <enebo> do you mean that?

20:49 <lopex> I'm for separate mehtod like that rb_reg_match_p

20:49 <lopex> ah, yes

20:50 <enebo> so what is matchInterruptible in comparison?

20:50 <enebo> That nearly got us to MRI speeds

20:50 <enebo> and honestly the warmup is a bit too short we probably do beat MRI if we run a little longer

20:51 <lopex> enebo: and no setBackRefInternal in the path ?

20:52 <enebo> lopex: I should have removed them all

20:52 <enebo> lopex: I just duplicated all the methods down and removed stuff not needed

20:52 <enebo> I think anyways

20:52 <enebo> but no backref methods any more

20:52 <lopex> enebo: and thecode on opEnd ?

20:52 <enebo> it is there

20:52 <enebo> but region == null

20:53 <enebo> I gave you a diff earlier

20:53 <lopex> yes, but it's a bit off

20:53 <lopex> lookat the else

20:54 <enebo> lopex: so we should just remove that code then

20:54 <lopex> or, hmm

20:54 <lopex> enebo: well, that if (region != null) is older than that optimization in mri

20:55 <lopex> it was to not use region if there's no groups in the regexp

20:55 <enebo> lopex: does make me wonder if extending this with more pruned down code would make more sense

20:55 <lopex> extending ?

20:55 <enebo> ByteCodeMachine

20:56 <lopex> you want to morphise opEnd ?

20:56 <lopex> definitely not

20:56 <enebo> heh yeah I guess not

20:56 <enebo> bimorphic for this would be pain ful

20:56 <enebo> other option is to copy it

20:56 <enebo> but I can see why that is icky too

20:56 <lopex> too late for bytecode rewriting :P

20:57 <lopex> well, we could use somethng like that actually

20:57 <lopex> but I dont know if it's worth the hassle

20:57 <enebo> so much code

20:58 <enebo> In this case this is called when opEnd

20:59 <enebo> oh so normative exit instr?

20:59 <enebo> I guess it is variable

21:00 <enebo> your compile this could get rid of tons of simple invariants and maybe some of these methods would inline

21:00 <enebo> compile this == asm or compile from AST or bytecode either way

21:03 <enebo> lopex: that else is confusing to me too

21:03 <enebo> lopex: msaBegin/End seems like it is direct relation to msaRegion

21:03 <enebo> lopex: but this else only exists when there is no msaRegion

21:04 <enebo> lopex: so why are these values being calculated?

21:04 claudiuinberlin has quit [Quit: Textual IRC Client: www.textualapp.com]

21:06 shellac has joined #jruby

21:06 shellac has quit [Client Quit]

21:18 <enebo> lopex: if I stop using matchInteruptible for match then we go from 4.9M i/s to about 5.5M

21:18 <lopex> enebo: because there might be no region is there's no captures

21:18 <lopex> enebo: and you still want to be able to return match begin/end

21:19 <enebo> but how would that be returned?

21:19 <lopex> enebo: via on demand region instance

21:19 <lopex> enebo: getEagerRegion

21:20 <enebo> but getEagerRegion is 'return msaRegion != null ? msaRegion : new Region(msaBegin, msaEnd);'

21:20 <lopex> yes

21:20 <enebo> haha I reversed the logic

21:20 <enebo> ok so it is possible to make something on demand then I see

21:20 <enebo> do we do that?

21:21 <lopex> under jruby I guess not

21:21 <enebo> So I am realizing we have two engines for sb and nonsb

21:21 <lopex> yes

21:21 <enebo> we could probably have a third for fast matching

21:21 <enebo> then we may have a few more methods on the class but those could nuke all this setting if we don't actually need it

21:22 <lopex> well in that case we could do much more in separate opcode that relate to captures

21:22 <lopex> opcodes

21:22 <enebo> yeah I somewhat mean that

21:22 <enebo> although in some cases existing opcodes would be used and any region logic would just be removed from that version

21:22 <lopex> but you cant get rid of them

21:22 <lopex> oh, hmm

21:23 <lopex> referred

21:23 <lopex> enebo: !!

21:23 <enebo> lopex: Not sure if you are happy or sad

21:23 <lopex> we know upfront I guess if there something like (.)\1

21:23 <enebo> yeah

21:23 <lopex> then $1 is referred in terms of oni

21:23 <enebo> we know lots of stuff before it starts

21:24 <lopex> but what worried me is all that special casing just for match?

21:24 <lopex> er "match?"

21:24 <enebo> well we don't know how much benefit

21:24 <lopex> enebo: so interruptible slows so much ?

21:25 <enebo> 68 consumers in Rails 5.1 core

21:25 <enebo> seems to unless matchInterruptible does more than match beyond that

21:25 <enebo> This is highly synthetic bench too

21:25 drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]

21:26 <enebo> no doubt in a more multi core use-case ala Rails we would not see the same penalty

21:26 <enebo> but who knows then since we are executing tons of other code at same time

21:26 <enebo> lopex: but going back to Rails 68 calls to match? in core code means people are using it because it is fast

21:27 <enebo> lopex: so it likely is important for us to do it quickly too

21:27 <enebo> lopex: since people will leverage it for the SPEEDZ

21:27 <enebo> lopex: but I don't think we should make some huge unmaintainable mess either

21:27 <lopex> yeah, that's why I was postponing it

21:28 <lopex> but since we have much cleared picture now..

21:28 <lopex> I'll look again at mri too

21:28 <enebo> lopex: tbh the stuff I did today was not very invasive to joni

21:28 <enebo> probably more changes to JRuby itself to not call through some massive set of common methods

21:29 <enebo> but I also think we could start with the same general strategy I did today in making a matcherNoRegions to factory and then later consider more invasive changes to joni interp for specialized or simpler matcher

21:30 <lopex> enebo: though but that getEagerRegion isnt use in joni, we still benefit from it since lots of core uses just getBegin / getEnd directly

21:30 <lopex> and not begin[0] and end[0]

21:31 <enebo> https://gist.github.com/enebo/ef854d4f4929c5484660aa6453e99b2c

21:31 <lopex> enebo: er, getEager isnt used in jruby

21:31 <enebo> This is the patch I ended up with for joni but I did revert your other change of that method to scanEnv

21:31 <lopex> enebo: I'd even pass that region from factory methods

21:31 <lopex> but that's nitpicking

21:32 <enebo> lopex: yeah I went for minimal amount of work while trying to achieve your factory-based API

21:33 <enebo> lopex: getEager is fine to keep but if we decide to try and improve match perf I think we can make a match engine and probably reduce the complexity of what we call for each op code

21:33 <enebo> that will just be another code path from execute

21:33 <enebo> but I have no idea if that will help perf or not either

21:33 <enebo> that msaBegin+ logic is pretty simple math

21:34 <enebo> I would think best case for specialize execute for match would be stripping logic down to the point some more stuff inlines

21:35 <lopex> enebo: since asm stuff isnt loader the factory method sohuld be considered ply

21:35 <lopex> polu

21:35 <lopex> er, poly

21:35 <lopex> er

21:35 <lopex> mono

21:35 <lopex> doh

21:35 <enebo> haha

21:35 <enebo> yeah

21:35 <enebo> you reflective call should eliminate that being a problem

21:36 <lopex> reflective ?

21:36 <enebo> newInstance

21:36 <lopex> ah, right

21:36 <enebo> match? 7.255M (± 9.8%) i/s - 35.918M in 5.007063s

21:36 <enebo> graal ce rc4 without interruptible

21:36 <lopex> but hwre ?

21:36 <lopex> where newInstance ?

21:37 <enebo> regex.factory = (MatcherFactory)cls.newInstance();

21:37 <enebo> in AsmCompilerSuppoert

21:37 <enebo> I guess it hardly matters since it is not even used

21:38 <enebo> our usage will be mono since we only use one type

21:38 <lopex> enebo: but it's a dead code now

21:38 <enebo> yeah exactly

21:39 <lopex> enebo: you could try omitting the factory api

21:39 <lopex> jus to see

21:39 <enebo> yeah I guess so

21:40 <enebo> I would be super surprised if that all did not inline away

21:40 <enebo> single type at single point calling a method which is just invoking a constructor

21:40 <enebo> tiny method mono

21:41 <enebo> unless the budget ran out

21:41 <enebo> lopex: so you plan on implementing this soon?

21:43 <lopex> enebo: the thing you just dirty checked ?

21:44 <enebo> lopex: well something which allows us to match in joni without regions being created

21:44 <enebo> lopex: I don't care if you do it like I did or how you like it

21:44 <lopex> enebo: probably almost identical

21:45 <lopex> but yeah

21:45 <enebo> lopex: ok

21:45 <enebo> lopex: I am pumped!

21:45 <lopex> enebo: can you post jruby diff for reference too ?

21:45 <enebo> lopex: sure it is a bit more playful than serious

21:45 <lopex> yeah, I get it

21:46 <enebo> https://gist.github.com/enebo/ad593950c5b5788616afd6e1ee499502

21:47 <lopex> I need to move that by number back on regexp

21:47 <enebo> yeah I did it locally

21:47 <enebo> I just did a revert and then that had a tiny conflict

21:52 <lopex> but neither sequel/sinatra/rack dont use match?

21:53 <lopex> enebo: active support does quite a bit

21:53 <enebo> lopex: yeah although I expect all libraries will eventually in hot spots

21:54 <enebo> optimizing for rails is not the end of the world for us either :P

21:54 <lopex> enebo: and what about that cache and strftime ?

21:55 jrafanie has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]

21:55 <enebo> lopex: oh that was just me saying I got good perf out of making a cache

21:55 <enebo> lopex: remember I forgot we cached regexp

21:56 <lopex> enebo: so maybe it didnt go through that cache ?

21:56 <enebo> lopex: what didn't?

21:56 <enebo> strftime does not use joni

21:56 <lopex> the bytelists

21:56 <lopex> ah

21:56 <enebo> It has it's own parser

22:03 Caerus has joined #jruby

22:33 sgeorge has quit [Remote host closed the connection]

22:34 ahorek has quit [Quit: http://www.kiwiirc.com/ - A hand crafted IRC client]

22:41 sgeorge has joined #jruby

22:45 sgeorge has quit [Ping timeout: 240 seconds]

22:50 Caerus has quit [Ping timeout: 256 seconds]

23:58 ahorek has joined #jruby