<nirvdrum>
We're on par with MRI now, so not a huge deal. But since we implement stuff in Ruby, we were using FFI::Pointer.read_string to read a pointer from a read(2) call. So we were doing all this byte copying, code range scanning, allocating a rope and a string object, and then throwing it all way to get at the raw bytes.
<nirvdrum>
Having to maintain $. is still unduly expensive.
<nirvdrum>
Since it needs to be volatile.
<nirvdrum>
Beyond that, I'm not sure how to close the gap more. We structurally seem to be doing much the same as you. There might be something in NFI not quite as efficient as JNR.
enebo has quit [Ping timeout: 268 seconds]
<headius>
well, the lowest levels of our IO logic are the same as in MRI, where non-decoded bytes only get copied once and there's a separate buffer for decoded characters
<headius>
I don't know how yours works
<nirvdrum>
I think very similarly, although more of it shifted to Ruby.
<nirvdrum>
But we have to make a foreign call for read(2).
<nirvdrum>
Another difference is we yield to populate the resulting array. But when I checked that had a neligible difference. When I started, we were something like 3x slower than MRI, so I was content to get as fast as them.
<headius>
we used to be slower...it's good to hear we're faster now
<headius>
if I had more than a day a week to spend on perf we might be able to solve other items
pilne has quit [Quit: Leaving]
<nirvdrum>
Yeah. This was less about being fast as it was not being so slow. I didn't care too much about the benchmark. It just happened to highlight a couple big issues we had with IO reading.
sidx64 has joined #jruby
mkristian has joined #jruby
sidx64 has quit [Client Quit]
sidx64 has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
sidx64 has joined #jruby
sidx64_ has joined #jruby
sidx64 has quit [Read error: Connection reset by peer]
sidx64_ has quit [Ping timeout: 268 seconds]
sidx64 has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
sidx64 has joined #jruby
shellac has joined #jruby
claudiuinberlin has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
shellac has quit [Ping timeout: 264 seconds]
shellac has joined #jruby
<GitHub150>
[jruby] jmiettinen opened pull request #5088: Use com.jcraft.jzlib.JZlib instead of hacking the internals of java.util.zip.CRC32 (master...jarkko-4834) https://git.io/vxq5b
yosafbridge has quit [Quit: Leaving]
yosafbridge has joined #jruby
sidx64 has joined #jruby
shellac has quit [Max SendQ exceeded]
mkristian has quit [Quit: This computer has gone to sleep]
shellac has joined #jruby
mkristian has joined #jruby
bbrowning_away is now known as bbrowning
shellac has quit [Quit: Computer has gone to sleep.]
santiago3048RJ has joined #jruby
santiago3048RJ has quit [Client Quit]
shellac has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
sidx64 has joined #jruby
mkristian has quit [Quit: This computer has gone to sleep]
mkristian has joined #jruby
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
drbobbeaty has quit [Ping timeout: 264 seconds]
mkristian has quit [Quit: This computer has gone to sleep]
<headius>
g'day
<nirvdrum>
Howdy.
drbobbeaty has joined #jruby
enebo has joined #jruby
<nirvdrum>
lopex: I'm trying to upgrade joni, but I'm seeing a lot of messages like "character class has 'y' without escape". Do you know off-hand what that's all about? If not, I'll just start in with a debugger.
<nirvdrum>
Okay. I saw that issue, but it was for a different error, as far as I could tell.
<nirvdrum>
Did you just disable warnings in JRuby for this character class problem?
<lopex>
no enebo disabled in joni
<lopex>
nirvdrum: apparantly onigmo has different warning for that "Unknown escape \y is ignored: /\y/"
<lopex>
so it's a bug actually since no char class is involved here
<lopex>
will look into that
<nirvdrum>
lopex: But JRuby doesn't trigger the warning on that expression.
<lopex>
nirvdrum: -w
<nirvdrum>
Okay. So you're just passing a flag to suppress the warning then?
<lopex>
no, verbose warnings are off by default
<nirvdrum>
I think it's this WarnCallback option I didn't know about.
<enebo>
nirvdrum: this is something which will change in joni
<enebo>
nirvdrum: we are going to have a setter so we can globally change the default warn
<enebo>
nirvdrum: several constructor paths pick a warn impl which just does system.err.println but we want to provide our own
<enebo>
nirvdrum: I assume you do as well
<lopex>
enebo: but those used in jruby shouldnt now
<lopex>
all are explicit
<enebo>
lopex: ok so we changed them all? we had some in 9.1.16
<lopex>
enebo: I think I did
<enebo>
lopex: it is the only reason I made this change in joni since I did not want to audit all callers
<enebo>
ok
sidx64 has quit [Quit: My MacBook has gone to sleep. ZZZzzz…]
<lopex>
there was a single dup warn right ?
<enebo>
lopex: well cool but I still think we should just be able to set a default
<lopex>
which we couldnt find since joni should print the regexp
<enebo>
yeah duplicated character class was the only case I am aware of
<enebo>
but yeah we want to be able to display the regexp
<lopex>
but all others were fixed
<enebo>
so that is another feature both TR and we are interested in
<enebo>
I think I have that in the issue
<enebo>
I hope I do
<lopex>
enebo: and there was this \X which called dup not matter what
* enebo
why I make issues
<enebo>
heh
<enebo>
yeah main problem in fact
<enebo>
explicit cases should warn but the expansions were giving us that warning
<enebo>
oh sorry different thing
<lopex>
enebo: there real issue was that [aa] does whereas \X is silent
<enebo>
ah yeah
<lopex>
and \X goes same dup path
<lopex>
ah
<lopex>
unicode :P
<enebo>
I just remembered something duplicated the same char internally and caused the warning
<lopex>
"/\X/u" still warns on mri
<enebo>
which is not the users fault
<enebo>
HAHAHA
<lopex>
enebo: jeeeze we'll have to recall that thing again
<lopex>
enebo: this is the source of all torubles right ?
<lopex>
non fixed regexp encoding
<enebo>
lopex: seriously the fact that they have spent so much time making that 7bit path and forcing as much as they can into US-ASCII means they never probably noticed unicode is weird
<enebo>
lopex: but does onigurumo have these same issues or is this changes via onigmo
<lopex>
enebo: and all other string paths go for that perpare encoding
<lopex>
"ą".start_with?(/(?<!css)/i)
<lopex>
enebo: blows
<lopex>
same as =~
<lopex>
whereas "".start_with?(/(?<!css)/i) is happy
<nirvdrum>
enebo: I don't think I've ever looked at it.
<enebo>
nirvdrum: some constructors default on warningcallback and the default is System.err
<lopex>
enebo: no, onigmo needs explicit encoding
<nirvdrum>
enebo: Got it. That looks to be the case here. Thanks.
<enebo>
nirvdrum: if you provide your own callback you will be ok except...if we warn we currently do not display the regexp in the warn message
<nirvdrum>
I'm not seeing it now anyway.
<enebo>
yeah
<nirvdrum>
I had to track down the warning by reducing the tests I run.
<nirvdrum>
And lopex's suggestion helped a lot.
<nirvdrum>
I should wire you guys some money so you can buy him beer on my behalf.
<lopex>
nirvdrum: but there's one dup warning in rails tests we still have no clue where it comes from
<nirvdrum>
Heh.
<lopex>
so it's the only reason all dup warns are disabled now