#jruby on 2020-12-10 — irc logs at freenode.irclog.whitequark.org

2020-08-03 20:53 ChanServ changed the topic of #jruby to: Get 9.2.13.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

00:17 ur5us has quit [Ping timeout: 258 seconds]

00:22 ur5us has joined #jruby

04:19 ur5us has quit [Ping timeout: 258 seconds]

06:16 Antiarc has quit [Ping timeout: 240 seconds]

06:16 Antiarc has joined #jruby

08:16 ur5us has joined #jruby

10:12 ur5us has quit [Ping timeout: 258 seconds]

14:05 mistergibson has joined #jruby

15:46 <headius[m]> good morning

17:03 <headius[m]> 9.2 branch merged to master

17:14 <boc_tothefuture[> Afternoon all

17:14 <headius[m]> hi there!

17:16 <boc_tothefuture[> I am trying to understand a bit how java.math.BigDecimal is supposed to work within the ruby ecosystem.

17:16 <boc_tothefuture[> for example, i see it implements coerce but if I do a math operations with it, I get an error.

17:17 <boc_tothefuture[> Like if I do "8 + BigDecimalVariable" I get an exception.

17:17 <boc_tothefuture[> Is there a best practice here, a way to convert BigDecimal to the Ruby version and then go from there? Or really to convert back and forth reliably?

17:18 <headius[m]> you need to use Java's BigDecimal instead of Ruby's?

17:18 <boc_tothefuture[> well.. I am given Java BigDecimal

17:18 <boc_tothefuture[> From the framework

17:18 <boc_tothefuture[> I could convert it to Ruby's if that is the way to go..

17:18 <headius[m]> ah ok... and it isn't converting to Ruby automatically during the call

17:19 <boc_tothefuture[> no, it throws an error essentially saying it can't be casted to that.

17:19 <boc_tothefuture[> Java::JavaLang::ClassCastException (org.jruby.ext.bigdecimal.RubyBigDecimal cannot be cast to java.math.BigDecimal)

17:20 <boc_tothefuture[> but I didn't see a "to_ruby_big_decimal" type method.

17:20 <headius[m]> you can call to_d

17:20 <headius[m]> that is the Ruby coercion method for bigdecimals

17:20 <headius[m]> not common but it is there

17:21 <headius[m]> also need to have done require 'bigdecimal

17:24 travis-ci has joined #jruby

17:24 <travis-ci> jruby/jruby (master:7abc7bb by Charles Oliver Nutter): The build was broken. https://travis-ci.com/jruby/jruby/builds/207813672 [201 min 36 sec]

17:24 travis-ci has left #jruby [#jruby]

17:26 <boc_tothefuture[> that works. thanks!

17:28 <headius[m]> excellent!

17:30 <headius[m]> hmm failures on sequel head... I wonder if those are our issues

17:35 <boc_tothefuture[> question I didn't ask... is there a way to convert it back?

17:35 <headius[m]> generically, all Ruby objects will have a to_java method that takes an optional Java type

17:36 <headius[m]> to_java should do the right thing for you

17:36 <headius[m]> the Ruby BigDecimal just wraps a Java one so the conversion should be fairly lightweight

17:36 <boc_tothefuture[> yep.. awesome, it does! :-)

17:55 travis-ci has joined #jruby

17:55 <travis-ci> jruby/jruby (master:f49e970 by Charles Oliver Nutter): The build is still failing. https://travis-ci.com/jruby/jruby/builds/207821582 [200 min 12 sec]

17:55 travis-ci has left #jruby [#jruby]

18:34 subbu is now known as subbu|lunch

18:43 travis-ci has joined #jruby

18:43 <travis-ci> jruby/jruby (load_service_redux:cfe71a2 by Charles Oliver Nutter): The build failed. https://travis-ci.com/jruby/jruby/builds/207829506 [206 min 13 sec]

18:43 travis-ci has left #jruby [#jruby]

18:52 <headius[m]> hmm no jeremyevans in here today

18:56 enebo has joined #jruby

18:57 ChanServ changed the topic of #jruby to: Get 9.2.14.0! http://jruby.org/ | http://wiki.jruby.org | http://logs.jruby.org/jruby/ | http://bugs.jruby.org | Paste at http://gist.github.com

19:02 <headius[m]> enebo: so the fzakaria eagain thing is triggered by these properties: -J-Djnr.ffi.asm.enabled=false -J-Djruby.compile.mode=OFF

19:02 <headius[m]> remove either one and it works

19:03 <headius[m]> compile.mode seems to be used by some Ruby FFI stuff but I do not see how it affects this waitpid call that happens via jnr-posix

19:03 <headius[m]> the ffi.asm thing obviously affects jnr-posix but that property alone is insufficient to trigger the bug... very strange

19:04 <headius[m]> perhaps something in jnr-ffi is also looking at the jruby.compile.mode flag? 🤔

19:05 <enebo[m]> hmm this is wsl and not windows proper?

19:06 subbu|lunch is now known as subbu

19:06 <enebo[m]> we don't detect that as windows do we?

19:07 <headius[m]> not windows, not wsl

19:07 <enebo[m]> on this is nix stuff

19:07 <headius[m]> Linux but with the Nix package system/userspace in place

19:07 <headius[m]> yeah

19:07 <enebo[m]> WOT

19:07 <headius[m]> but I realized we have seen intermittent EAGAIN on waitpid on travis

19:08 <enebo[m]> the ffi thing I can see an angle in my head but why compile=off?

19:08 <headius[m]> yeah weird isn't it

19:09 <headius[m]> Ruby FFI does use that property to decide if it should generate bytecode stubs for FFI functions

19:10 <headius[m]> but this is just doing system('date') which does a jnr-posix waitpid internally

19:10 <headius[m]> it should not touch Ruby FFI stuff at all

19:11 <headius[m]> (and Ruby FFI probably should use a different property anyway

19:11 <enebo[m]> could it possible use OFF in FFI?

19:11 <headius[m]> I think I need to see if jnr-ffi is looking at the jruby property or something

19:11 <headius[m]> how else could jnr-posix be affected

19:12 <enebo[m]> hmm

19:13 <enebo[m]> If something is examining off outside the interpreter or whether jit is enabled it feels wrong

19:13 <enebo[m]> but if that is not the case how could using an interp cause a behavioral difference

19:13 <headius[m]> yeah it is not meant for things outside the jit

19:13 <travis-ci> jruby/jruby (master:f52f741 by Charles Oliver Nutter): The build is still failing. https://travis-ci.com/jruby/jruby/builds/207829532 [200 min 48 sec]

19:13 travis-ci has joined #jruby

19:13 travis-ci has left #jruby [#jruby]

19:15 <enebo[m]> We have seen interp and JIT do things differently over the years but they tend to just be basic ruby semantics usually in weird corners and it is hard to see how that would happen to make something in ffi do something differently

19:15 <enebo[m]> barring "fell out of the interpreter" and not running half a source file

19:15 <headius[m]> yeah this is literally just -e "system('date')"

19:15 <headius[m]> and it gets to the internal waitpid call

19:16 <enebo[m]> but for that to be true it would be super weird since JIT tends to only kick in after n attempts

19:16 <headius[m]> also reproducible with Process.waitpid spawn 'date'

19:17 <enebo[m]> even the asm part feels weird to me

19:18 <enebo[m]> I guess at some level posix will call down to jnr-ffi which will make it to jffi

19:19 <headius[m]> I could see asm having an effect but why only with compile.mode=OFF? If it is broken it should remain broken

19:20 <headius[m]> bleh

19:20 * headius[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/cDJhsayNaoEgMqFtGqPHOEbj/message.txt >

19:20 <enebo[m]> what prints out with native.enabled.verbose?

19:21 <headius[m]> native.verbose causes it to print out successfully loaded native POSIX impl

19:21 <enebo[m]> so we are getting LinuxPOSIX

19:22 <headius[m]> hmmm

19:22 <headius[m]> yeah that seems ok

19:22 <headius[m]> I wonder... could it be getting interrupted

19:22 <headius[m]> the logic for Process.waitpid sets up an interrupter using pthread_kill

19:22 <enebo[m]> comment it out

19:22 <headius[m]> it should not be firing because we don't interrupt the thread but if it did, that could cause this

19:22 <headius[m]> I'll do one better...I added an option to disable that feature

19:22 <headius[m]> native.pthread_kill=false

19:22 <enebo[m]> also you could put a a print on wakeup

19:23 <enebo[m]> oh but you see it as InterrruptedException in pthreadKillable?

19:23 <headius[m]> blast it doesn't help

19:23 <headius[m]> that was a good theory

19:23 <enebo[m]> I am just curious if you see this thing looping or what happens

19:24 <enebo[m]> but what I find weird is this two problems or just one

19:24 <enebo[m]> the fact that two settings causes it may not mean it is the same issue

19:24 <headius[m]> hmm I just notice it may not be using the pthread_kill logic

19:25 * headius[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/wkEcCeFHpwknKklNdGaKnTsJ/message.txt >

19:26 <enebo[m]> so I guess errno is 0 without those two set

19:26 <enebo[m]> err without either set

19:27 <headius[m]> ah yeah nevermind it finishes the pthreadKillable call and then sees nonzero errno

19:27 <enebo[m]> yeah

19:27 <headius[m]> raiseErrnoIfSet should be inside the closure

19:27 <headius[m]> so it is immediately after the waitpid

19:28 <enebo[m]> pthreadKillable probably is not creating errno != 0 though

19:28 <enebo[m]> I guess unless as you say signalHandlewr interrupts

19:29 <headius[m]> date finishes executing too, it prints out in this output

19:29 <headius[m]> maybe errno is just not being cleared and due to the combination of flags it is seeing an EAGAIN from something else?

19:30 <headius[m]> oh no

19:30 <headius[m]> what if this is normal behavior because the subprocess runs so fast we don't have time to waitpid

19:31 <headius[m]> and we just slow down due to compile=off and asm=off

19:31 <enebo[m]> hmm

19:32 <enebo[m]> I assume compile=off will always be present at first

19:32 <headius[m]> well just sleeping between the spawn and waitpid does not fail

19:32 <headius[m]> so perhaps not

19:32 <enebo[m]> so the combo could maybe slow it down enough?

19:32 <headius[m]> does not fail without the properties set and explicit delay

19:33 <enebo[m]> but this is only on Nix too right?

19:33 <headius[m]> well this is the only place I have been able to repro, using fzakaria docker image

19:33 <enebo[m]> remove pthreadKillable from this and just call waitpid

19:34 <headius[m]> the option to turn off pthread_kill should be doing that

19:34 <headius[m]> does not appear to help

19:34 <headius[m]> I don't have this set up to rebuild in the container right now

19:35 <enebo[m]> ok. I guess if you are certain of that then that is not part of it

19:35 <enebo[m]> ah

19:35 sagax has joined #jruby

19:35 <enebo[m]> but I can confirm it should not happen from reading the code

19:35 <enebo[m]> unless applyAsInt is super weird :)

19:36 <headius[m]> right, should just go straightaway to the waitpid closure if that property is off

19:36 <headius[m]> heh yeah yay for erased generics

19:36 <enebo[m]> so both

19:37 <enebo[m]> I have to say I find the weirdest aspect of this is compile=off

19:38 <headius[m]> definitely

19:38 <headius[m]> https://gist.github.com/headius/cf1e1381978745bbc4dd16d71199086d

19:38 <enebo[m]> really only two theories have merit: 1) slower execution 2) a bug in interp

19:38 <headius[m]> there's the full output and command line

19:39 <enebo[m]> but what Ruby executes in that call?

19:39 <enebo[m]> something ruby starting up which toggles something

19:40 <headius[m]> oh no

19:40 <enebo[m]> ok so one thing to note here

19:40 <headius[m]> -Xdebug.parser fixes it

19:40 <enebo[m]> yeah I was going to suggest that :)

19:40 <headius[m]> I was getting there from your train of thought

19:41 <enebo[m]> there is no Ruby loaded in that script so for OFF to be effective it would have to do something to earlier Ruby code loaded

19:41 <enebo[m]> I will also say the other thing though...as a test case -e is normally force to compile as the main script

19:41 <enebo[m]> so if you -r something_with_that system it probably would break without compile=off

19:42 <enebo[m]> if it had something to do with those lines

19:42 <headius[m]> yeah this is an interesting wrinkle

19:42 <enebo[m]> but this makes much more sense to me

19:43 <enebo[m]> so something in Ruby loading is working differently with compile=off

19:43 <enebo[m]> That in itself is remarkable

19:43 <enebo[m]> since nothing will compile normally past the main script which has not executed multiple times already

19:44 <enebo[m]> hmm

19:45 <enebo[m]> with default settings do we compile more than the default file/-e?

19:45 <headius[m]> not unless jit fires

19:45 <enebo[m]> ok so let's think through this

19:46 <headius[m]> nothing in prelude should be jitting in this short example

19:46 <enebo[m]> could we errantly execute something 20 times and not notice it is not working during bootstrapping ruby but it always ends up ok in the end because it JITs?

19:46 <headius[m]> there's a little bit of FFI use here but only on Windows and Solaris

19:47 <enebo[m]> and that is the other part of this it takes both to fail

19:47 <headius[m]> I don't see how we wouldn't notice it failing

19:47 <enebo[m]> well we do not see it anywhere but Nix so far

19:47 <headius[m]> what else does debug.parser turn off?

19:47 <enebo[m]> literally everything

19:47 <enebo[m]> all we do is init in Ruby

19:48 <enebo[m]> err initCore I think but we do not load gems or anything in kernel

19:48 <enebo[m]> as you know it exists to debug the parser/lexer so we will never execute any ruby

19:48 <headius[m]> right

19:48 <enebo[m]> how that does it I don't recall

19:49 <enebo[m]> It is super useful as it turns out

19:49 <headius[m]> aha

19:49 <headius[m]> --disable-gems also works

19:49 <enebo[m]> hmm

19:49 <enebo[m]> ok well that removed a thousand lines :)

19:50 <headius[m]> yeah but doesn't help narrow down much 😀

19:50 <enebo[m]> so something in gems or dependency of gems is doing something with ffi and in interp mode it fails?

19:50 <enebo[m]> but that comes back to what would not interp in the first place

19:50 <enebo[m]> and if it failed it would need to keep getting called so a JIT could fix it

19:51 <enebo[m]> headius: can you nuke all the gems?

19:52 <headius[m]> I probably can

19:52 <enebo[m]> one thing gems does is load a lot of crap in a loop

19:52 <headius[m]> I can confirm just disabling did_you_mean does not fix it

19:52 <enebo[m]> and maybe there is something really strange in there that the interp does not do well

19:52 <enebo[m]> but in full it JITs and continues enough where we do not notice not everything is loaded

19:53 <headius[m]> ok weird

19:53 <headius[m]> --disable-gems -rrubygems

19:53 <headius[m]> also is ok

19:53 <enebo[m]> HAHA

19:53 <headius[m]> that should only prevent gem related stuff at boot from loading

19:53 <enebo[m]> so it is not loading any gems but loading rubygems

19:54 <headius[m]> yeah

19:54 <headius[m]> and that is workoi

19:54 <headius[m]> working ok

19:54 <enebo[m]> yeah I wonder if there is a problem loading some gems in that image and we normally JIT something and that "fixes" the bootstrap enough

19:54 <headius[m]> ack nevermind it is intermittent

19:54 <headius[m]> I may be wrong about all this now

19:55 <enebo[m]> with OFF?

19:55 <headius[m]> ok so requiring rubygems does fail

19:55 <headius[m]> just didn't at first

19:55 <enebo[m]> ok

19:55 <headius[m]> I have not gotten --disable-gems alone to fail

19:55 <headius[m]> so there does seem to be a race when it fails

19:56 <enebo[m]> my pet theory has an interesting problem with it. What gem not loaded properly with OFF would then cause something later to stop working because something else finally loaded

19:56 <headius[m]> yeah something that touches FFI and leaves a bad errno somewhere somehow?

19:57 <enebo[m]> hmm we do reset errno at times

19:57 <headius[m]> I bet errno is nonzero before this waitpid call and it isn't cleared

19:57 <headius[m]> I think I can check that

19:57 <enebo[m]> yeah that was what jumped out when you said that

19:57 <headius[m]> could be a libc behavior different on Nix?

19:57 <enebo[m]> but if that is true you should see this behavior potentially in calling many things?

19:58 <headius[m]> not clearing errno in the same places

19:58 <headius[m]> and some gem does an ffi call that leaves an errno set

19:58 <headius[m]> I don't know how to tie this together with compile=off

19:58 <enebo[m]> so it is too bad you cannot build easily since you could remove that errno check

19:58 <headius[m]> oh but compile=off does change how FFI works

19:58 <enebo[m]> I mean printing it before would also be a good check

19:59 <headius[m]> as mentioned earlier

19:59 <headius[m]> so if it started to cause some ffi call booted by rubygems to fail, leave an errno present, and then we don't clear it before this

19:59 <headius[m]> and weird libc just for extra spice

19:59 <enebo[m]> can you repeat how off with ffi is different?

20:01 <headius[m]> errno is 2 before the waitpid call

20:01 <headius[m]> enoent

20:01 <enebo[m]> how does FFI change with compile=off?

20:02 <headius[m]> it uses a generic invoker instead of a bytecode-generated invoker

20:02 <headius[m]> as with the asm property there may be bugs in the generic invoker code never seen because we typically don't run this way

20:03 <enebo[m]> ok that seems likely now

20:03 <headius[m]> I had to do some work on jffi/jnr-ffi to get the non-asm logic passing tests

20:03 <headius[m]> https://github.com/jruby/jruby/blob/390ca2c47ebec80b0b01221c19b0ee23f75253d8/core/src/main/java/org/jruby/ext/ffi/jffi/BufferNativeInvoker.java

20:03 <enebo[m]> So it seems very likely you can finish this up by resetting errno before the call but it begs a couple of questions

20:04 <headius[m]> I believe it falls through that code if compile=off

20:04 <headius[m]> yeah this could be a glitch in how jnr-ffi or jnr-posix or ffi handles errno

20:04 <headius[m]> like the generic invoker is supposed to clear errno but does not

20:05 <enebo[m]> 1. is there a bug here in jffi/jnr-ffi that is not working and the errno is just a sad side-effect?

20:05 <enebo[m]> 2. Should we be more defensive before calls to posix and reset errno?

20:05 <headius[m]> I had thought that in normal C we can rely on errno to be reset to zero on a successful call

20:05 <headius[m]> not having to clear it before that call

20:06 <enebo[m]> heh for all we know waitpid is not being invoked at all here

20:06 <headius[m]> but jnr-ffi or jnr-posix includes logic to cache errno so it doesn't get corrupted by an intervening call

20:06 <enebo[m]> we still don't know if resetting it would even make this work

20:06 <headius[m]> if that broke or was not being done properly in the generic invoker we could end up with errno remaining set across a successful call

20:06 <headius[m]> yeah well I can force it before waitpid

20:06 <headius[m]> trying now

20:07 <enebo[m]> yeah if it failed to invoke before it might just always be broken

20:08 <headius[m]> clearing b

20:08 <headius[m]> before the call seems to work

20:08 <headius[m]> so it seems like this is a rogue errno value leftover in jnr-posix or something

20:09 <headius[m]> ugh sorry intermittent again

20:09 <headius[m]> and --disable-gems may be another red herring

20:10 <headius[m]> it is a timing issue of some kind

20:12 <headius[m]> ok disable-gems does still seem to be green, whew

20:12 <headius[m]> and clearing does not help

20:12 * headius[m] sent a long message: < https://matrix.org/_matrix/media/r0/download/matrix.org/owmYDXRyJNWpfHbXTglCFSdw/message.txt >

20:13 <enebo[m]> so perhaps it is just the non-generated invoker is broken

20:14 <enebo[m]> headius: so compile=OFF but asm=true will go back to generated invokers?

20:16 <headius[m]> well there are two levels of generation

20:16 <headius[m]> asm=true allows jnr-ffi to generate ASM stubs for the native side of a call

20:16 <headius[m]> compile=off is being used by our Ruby FFI impl to decide whether to generate a Java stub for each FFI function

20:17 <headius[m]> so asm=true turns on the native stub again but the java stub would still be off

20:17 <headius[m]> I will try to really confirm that asm property actually is affecting this

20:18 <headius[m]> half dozen runs with asm=true all ok

20:18 <headius[m]> back to false, fails immediately

20:20 <headius[m]> seems the same with compile.mode... works ok six out of six runs with compile=default and fails again immediately when compile=off

20:20 ur5us has joined #jruby

20:21 <headius[m]> I can confirm the errno does clear to 0 before waitpid

20:22 ur5us has quit [Remote host closed the connection]

20:24 <headius[m]> hmm

20:24 <headius[m]> bypassing Process.waitpid and going straight to jnr-posix seems to pass ok

20:24 <enebo[m]> ship it

20:25 <enebo[m]> funny though it seems like lots of stuff should be broken in this env

20:25 <headius[m]> oh well of course, there's no raise

20:26 <headius[m]> but errno does appear to be zero after the waitpid call in this configuration

20:32 ur5us has joined #jruby

20:36 <headius[m]> this nix setup appears to be using glibc btw

21:26 mistergibson has quit [Quit: Leaving]

23:31 ur5us has quit [Ping timeout: 260 seconds]