_whitelogger has joined #jruby
_whitelogger has joined #jruby
xardion has quit [Ping timeout: 268 seconds]
xardion has joined #jruby
shellac has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
drbobbeaty has joined #jruby
nirvdrum has joined #jruby
nirvdrum has quit [Remote host closed the connection]
nirvdrum has joined #jruby
shellac has quit [Quit: Computer has gone to sleep.]
lucasb has joined #jruby
shellac has joined #jruby
<headius[m]> Hood morning!
<headius[m]> oops, good morning
xardion has quit [Remote host closed the connection]
xardion has joined #jruby
shellac has quit [Ping timeout: 245 seconds]
sagax has joined #jruby
<fidothe> hey all. Making what I think is good progress on https://github.com/jruby/jruby/issues/5095. Unsurprisingly, I am learning a lot about internals... Naive implementation of `String#sub` (I started there, it seemed simpler) shows something like a 2x speedup in the benchmark i cribbed from @headius[m]'s `String#gsub` one (only for the pattern-is-a-string case). I have a couple of assumption-questions.
<fidothe> I was assuming that in the pattern-is-string case for `#sub` and `#gsub` there'd be no need to create a Regexp object
<fidothe> However, checking on magic vars in MRI while puzzling over `#gsub` I realised that `$~` returns a Matchdata anyway, and it provides a Regexp version of the string pattern from `#regexp`. And so does `#sub`. In Jruby's implementation `return context.setBackRef(context.nil);` is called by both `subBangIter` and `subBangNoIter`, which I assumed meant it doesn't populate `$~`
<fidothe> But I can see that it does in IRB. So, given that I clearly don't understand how the `setBackRef`/`$~` stuff works, is there a good resource that gives an overview of the way the magic var stuff works in JRuby? (also, I assume that my initial naive implementation of `#sub`is too naive by far)
<headius[m]> hey there!
<headius[m]> So yeah I believe MRI has a way to create a MatchData that's only partially populated so it doesn't have to have a compiled regex
<headius[m]> lopex: maybe you have some thoughts here
<headius[m]> I think the way MRI does it is that it just sticks the source string into the MatchData and then lazily creates the regex if needed
quadz has quit [Ping timeout: 240 seconds]
<headius[m]> The setBackRef(nil) is likely to clear it before doing the match, and then there should be a set of the actual match data deep in the guts of RubyRegexp.search
<headius[m]> We definitely want to avoid creating the regexp if possible since that's part of the speedup here, along with not using aheavy-weight regexp match to do a simple substring seach
<headius[m]> search
<headius[m]> FWIW setBackRef and such are thread-local and the annotations in JRubyMethod indicate if they will be read or written by a given method. We gather a list of names of methods that do that and assume all such names are tainted that way and will need a place to store backref. This is slated to be improved, either by lazily allocating such space only when needed or by using some low-overhead mechanism like a simple threadlocal
quadz has joined #jruby
<lopex> headius[m]: there's this need_backref now https://github.com/ruby/ruby/blob/master/string.c#L5214
<lopex> lol it's from 2f14bde88fc (nobu 2014-03-28
<lopex> that's how far behind we are
<fidothe> Okay that’s useful info. Will do some more digging and keep at it. Will also dig more into MRI’s approach... Thanks!
<headius[m]> Cool, thanks! I know this stuff is involved so feel free to ping any time. I monitor Matrix and of course I'm on other services. Gitter notifications are busted so I only see those when I happen to check.
<lopex> fidothe: also, there's some inconsistency with regexp cache in our code
<lopex> last time I looked mri had one entry regexp cache for loops
<lopex> we cache all regexps that are being created from strings implicitly
<lopex> headius[m]: if only I hade a change to play https://deadlockempire.github.io/
quadz has quit [Ping timeout: 265 seconds]
quadz has joined #jruby
snickers has joined #jruby
drbobbeaty has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
<headius[m]> woohoo, zero remaining items targeted for 9.2.9
<headius[m]> enebo: SHIP IT
<headius[m]> I will spend some time before RubyConf trying to finally land load service redux and the direct-addressing hash, so we can say those are at least coming up next
<lopex> what's the issue with that hash ?
<headius[m]> in ioquatix PR?
<headius[m]> s/PR/issue
<lopex> mri has unicode 12.1.0
<lopex> we are at 12.0.0
<headius[m]> what hash are you speaking of
<lopex> you mentioned that addressing hash issue
<headius[m]> ohhh
<headius[m]> it works fine, but the way it's written makes it more susceptible to concurrency issues than the one we have now
<headius[m]> the current one is less efficient but accidentally correct under more thread-unsafe cases
<lopex> lolz, imagine specializing all that code on assumption you saw only one thread
<headius[m]> basically where the current Hash impl is a chained bucket that's mostly appending or making a new bucket array, the direct-addressing version has multiple shared state changes for any mutation
<headius[m]> my aborted experiment to try to fix it made more of those operations atomic
<lopex> and deopt on another thread spawning :P
<headius[m]> like packing all of the int state changes into a single packed long
<lopex> madmans dream
<headius[m]> yeah that's no good :-)
quadz has quit [Ping timeout: 245 seconds]
<headius[m]> at this point I'm pretty much committed to making String, Array, and Hash be lock-free threadsafe regardless of what you do
<headius[m]> I think we can keep overhead to a minimum with CAS and friends
<lopex> the cas impls, are they simple intrincics or do they do optimize things too ?
nirvdrum has quit [Ping timeout: 252 seconds]
<headius[m]> I guess I don't know the answer to that
<headius[m]> the intrinsics use whatever the CPU provides for CAS, and I doubt it can be optimized away because that would seem to defeat the purpose
<headius[m]> the difference from back in the day when we decided not to explicitly make them thread-safe is that we mostly were still using synchronization, which requires a hard state change and blocking rather than nearly-free CAS for uncontended updates plus spin-until-you-win
<headius[m]> the many state changes in the D-A hash impl will be a little trickier to do atomically, but I'm sure it's possible
<headius[m]> Array and String will be easier, especially since they already have COW semantics that could be repurposed to handle atomic updates across threads
quadz has joined #jruby
snickers has quit [Read error: Connection reset by peer]
quadz has quit [Quit: ZNC 1.6.5+deb1+deb9u1 - http://znc.in]
quadz has joined #jruby
<lopex> maybe it's time to remove cow for arrays