xardion has quit [Remote host closed the connection]
xardion has joined #jruby
<headius[m]>
enebo: hey
<headius[m]>
I'm going to look into that master failure but I pushed this last night and it shockingly passes all tests: https://github.com/jruby/jruby/pull/6249
<headius[m]>
wanted to get your opinion... look at both commits separately
<headius[m]>
it makes multibyte expand_path work properly and miraculously does not break any tests
<headius[m]>
it's part of fixing a bug in webrick that uses temp files and multibyte paths forced to ASCII-8BIT
<headius[m]>
our expand_path mangles the MBC along the way because it tries to treat the individual bytes as characters in Java and then encodes them separately on the way out again
<headius[m]>
with this it leaves the bytes as is and they go back out without mangling
<headius[m]>
the other half I'm attempting is to do this in fnmatch as well
<headius[m]>
hmm though fnmatch mostly works with bytes already
<enebo[m]>
Does ISOCoder mean iso8859_1?.
<headius[m]>
yeah
<headius[m]>
ISOCoder? you mean decodeISO?
<enebo[m]>
decodeISO calls ISOCoder in thread local
<headius[m]>
oh, yeah ok
<enebo[m]>
ISO was just a guess it was for 8859_1
<headius[m]>
yeah to avoid constructing a new charset decoder every time, there's something similar for UTF-8
<enebo[m]>
yeah I was just hung up on the name...I guess Irealized it had to be 8859_1 based on the byte + sign chopper
<enebo[m]>
we have no convention on short hand but I see a second obvious reference to this being that as well
<enebo[m]>
So first commit will just always make any path built internally to be 8859_1 since Java String will just barf those out as-is as char * to native calls
<headius[m]>
yeah exactly
<headius[m]>
I expected it would break something in URLs or properly encoded strings or whatever but it just works
<headius[m]>
of course if those encodings weren't ASCII-compat they'd break, it's just not tested and super uncommon anyway
<headius[m]>
so the second commit limits it only to cases where the incoming paths are already "binary" and we have no encoding to use
<headius[m]>
both pass the case in question
<enebo[m]>
For urls must be transcodable to UTF-something so ASCII-only is fine but I think any random garbage will work for files and dirs
<enebo[m]>
ok so first commit is totally golden and second one is safe I think
<enebo[m]>
second one could probably be better for cases where it is an encoding which is valid for the filesystem BUT java has no Charset for it. but that is probably very rare
<headius[m]>
I had considered a couple other triggers for this binary logic
<headius[m]>
yeah I don't know how often we run into that now
<headius[m]>
I did an audit years ago to try to make sure all encodings would at least use the right name for the charsets they go with
<enebo[m]>
if Encoding has a check in jcodings then it could just be a second check before the toString else
<headius[m]>
and I think there's a hard error in getCharset now that we have not seen happen
<enebo[m]>
but I really think this case seems very unusual so it may not exist in reality :)
<enebo[m]>
or it exists in reality but that single user has not made a bug for it :)
<headius[m]>
yeah I guess I'm coming down on the side of making this narrow as posible
<headius[m]>
so if it's ASCII-8BIT clearly do this
<enebo[m]>
This came about from invalid encoded strings translating to a file path right?
<headius[m]>
but maybe also CR_UNKNOWN or CR_BROKEN when ascii-compat = true?
<headius[m]>
yeah internally webrick forces many path strings to binary
<enebo[m]>
yeah I definitely have no problems with how this was constrained. I can just see one more unlikely corner
<headius[m]>
I assume to allow %AB like characters to go through without getting mangled
<enebo[m]>
and fwiw that corner was already broken
<headius[m]>
the case in question encodes those weird question marks using URL escaping... we end up trying to decode it as valid UTF-8 bytes and it breaks
<enebo[m]>
heh
<enebo[m]>
yeah so long as it ends up as binary your fix definitely fixes it somewhat cleverly
<enebo[m]>
8859_1 definitely is a way of cheating the Java charset
<enebo[m]>
but honestly if it is 8bit it has to be treated no differently than a char * (e.g. garbage)
<headius[m]>
yeah
<headius[m]>
that's the general theme I'm seeing in these webrick failures.. including the ENV thing where we need to be able to just use raw char*
<enebo[m]>
Having been studying how Rust handles Strings and Paths it is amusing how much code just uses char * and hopes for the best
<headius[m]>
that one I can't fix without an overhaul of env vars that doesn't use System.getenv anymore
<headius[m]>
yeah it is like that in CRuby and I didn't really get why until now
<enebo[m]>
It definitely also explains why they are making tests of inconsistent bytes
<headius[m]>
you can put pretty much any char* into an env var and it needs to be treated as bytes and not characters
<enebo[m]>
this is not all on MRI either...they do use char * but so does everything on the filesystem too
<headius[m]>
or put another way it's up to the consumers of the env var to know/decide what encoding the bytes are in
<headius[m]>
but JDK getenv uses String so they have to decode
<enebo[m]>
yeah to the env stmt
<headius[m]>
so it's basically the same problem as paths
<enebo[m]>
it is same basic garbage data problem
<headius[m]>
yeah
<headius[m]>
MRI handles it better because they don't handle it at all
<enebo[m]>
I suppose anything that passes char * or gets char * will do this
<enebo[m]>
We just have not fixed all cases because MRI has not written tests for it yet
<headius[m]>
this may fix a few things, I haven't checked yet
<headius[m]>
I know we've had some multibyte expand_path issues in the past but this is a pretty specific weird case
<enebo[m]>
yeah it will be interesting to see if you only experience positive behavior after this change or not :)
<headius[m]>
99% of the time we're dealing with properly encoded strings there
<enebo[m]>
It seems reasonable to me
<headius[m]>
ok
<headius[m]>
I will sort out if there's any additional narrowing needed and squash these for merge
<enebo[m]>
well in most modern languages you do work with properly encoded strings
<enebo[m]>
Ruby decided to combine binary data with their string class
<enebo[m]>
but it has the benefit of working seemlessly with any C apis
<enebo[m]>
in that C apis have no constraints on string contents other than termination
<headius[m]>
this may advise fixes in other places where we have to use java.lang.String too
<enebo[m]>
My proof that this was a design mistake is the number of bug fixes/CVEs around finding APIs which accept strings with a \0 only part way through it
<headius[m]>
could be helpful in any cases that are ascii-compat and just need to juggle around substrings on known boundaries, like / characters
<headius[m]>
yeah that is a big problem for them in general too because they have to zero terminate any string content they pass to system functions
<headius[m]>
so they've got \0 leaking into everything all the time
<enebo[m]>
Maybe we should rename things referring to ISO as Binary
<headius[m]>
hmm
<headius[m]>
I'm not thrilled with that naming but it's better than ISO
<enebo[m]>
ISO describes how we make a binary string
<enebo[m]>
but I was just thinking about intent of making a pointer of bytes that we have no idea what they might be
<headius[m]>
decodeASCII would be ok but MRI muddied the waters with US-ASCII and ASCII-8BIT
<enebo[m]>
In fact (and I am not suggesting this) it has always been a problem I have had with ASCII vs USASCII as ASCII is really just BINARY
<headius[m]>
though I suppose technically ASCII is only 0..127
<enebo[m]>
decode8Bit
<headius[m]>
yeah
<headius[m]>
decodeOctets
<enebo[m]>
yeah I am ok with 8bit or binary
<headius[m]>
I think I prefer binary to 8bit
<enebo[m]>
yeah my top preference so far is binary
<enebo[m]>
it is more or less not known and ascii-8bit is commonly referred to as binary
<headius[m]>
at least in Ruby world
<enebo[m]>
well yeah but this is in our ruby impl so I think that fits
<enebo[m]>
I am mostly thinking about how I will read the code in 6 months or longer
<headius[m]>
it's fairly clear
<headius[m]>
blob of bytes
<enebo[m]>
decodeAsBinary or decodeAsRaw perhaps?
<enebo[m]>
Raw is not a bad name either
<headius[m]>
decodeRawBytes
<enebo[m]>
yeah seems nice
<enebo[m]>
I do not think I could misinterpret what that means in the future
<headius[m]>
me neither
<headius[m]>
enebo: I went with decodeRaw because the methods in question already take byte[] as a parameter.. decodeRawBytes seem seemed redundant
<enebo[m]>
ah ok
<i8her8oat[m]>
What would you do to clone a property? just <property.clone()> ?
<headius[m]>
enebo: do you know? this is in the context of jrubyfx
<enebo[m]>
a property in fxml?
<i8her8oat[m]>
hm I think it has to be homemade, since JavaFX doesnt provide any thing related to cloning SimpleDoubleProperties
<i8her8oat[m]>
prob something like <clone = double_property(self, "clone", old.value)>
<enebo[m]>
that will just make a property for clone with a value right?
<i8her8oat[m]>
yes
<enebo[m]>
I have not used properties in years
<i8her8oat[m]>
: /
<enebo[m]>
That should just make an instance of that and be a short-hand for writing it iout long hand
<enebo[m]>
If for some reason the DSL syntax lets you down you can always write it out long hand: javafx.beans.property.SimpleDoubleProperty.new(self, "clone", old.value)
<enebo[m]>
err probably Java:: at the front of that depending on whether we load javafx as a method name to Kernel/BasicObject
<i8her8oat[m]>
DSL is consistent when loaded from a Java program. It doesn't work on block building : timeline do ... animate ... end doesnt work. I suppose vbox() do ... label() ... end neither, so I have to go with the <getChildren().add()> way
<enebo[m]>
i8her8oat: ah yeah that something is an issue where our auto adding is not correct
<enebo[m]>
in some cases it is just missing but it is a massive API
<i8her8oat[m]>
yeah you can work around pretty easily
<enebo[m]>
if you decipher our code there is a place where for each type we specify what is auto added
<enebo[m]>
I have not looked in quite a while but it is a big list of classes and it will specify how to add child elements for each time
<enebo[m]>
err s/time/type/
<enebo[m]>
I'm sorry it has just been a long time since I looked and byteit101 spent a lot of time after me expanding things out
<i8her8oat[m]>
maybe the precompiled.rb file?
<i8her8oat[m]>
for <Java::JavafxSceneControl::TableView>, there is
<headius[m]>
enebo: the next case I ran into is when a binary string gets into File.directory?, where we ultimately use another java String and java file APIs
<enebo[m]>
precompiled is generated if I remember right
<headius[m]>
in this case it seems like assuming the chars are at least encoded like system paths is the best we can do
<headius[m]>
so if it's binary I'm using default system charset, rather than decoding it as ISO-8859-1 characters
<headius[m]>
it seems to work ok for this case
<i8her8oat[m]>
enebo: yes, it is
<enebo[m]>
i8her8oat: perhaps I am thinking of updating exts.yml and then regenerating
<enebo[m]>
originally I just hard-coded this so this has changed since I worked on it
<i8her8oat[m]>
if you ever feel like regeneratingn the file, could you link it? I dont generate it everytime I run my app
<enebo[m]>
This has to be documented somewhere...hmm
<enebo[m]>
It does explain each file and it might explain how regeneration works
<i8her8oat[m]>
ok great
<enebo[m]>
i8her8oat: since we have not released and fx changes like every 18 months in some weird way this may be useful for you to 'rake reflect' anyways