ur5us has quit [Ping timeout: 260 seconds]
ur5us has joined #jruby
michael_mbp has joined #jruby
nirvdrum has quit [Ping timeout: 256 seconds]
ur5us has quit [Ping timeout: 260 seconds]
ur5us has joined #jruby
_whitelogger has joined #jruby
ur5us has quit [Ping timeout: 260 seconds]
nirvdrum has joined #jruby
knu has quit [Ping timeout: 260 seconds]
knu has joined #jruby
_whitelogger has joined #jruby
haze has quit [Quit: Lost terminal]
subbu is now known as subbu|lunch
subbu|lunch is now known as subbu
<headius[m]> enebo: hey about https://github.com/jruby/jruby/pull/6228
<headius[m]> I was wondering if it would be more appropriate in the parser
<headius[m]> the parser produces this chain of tokens for the formatting etc but leaves the padding character blank
<headius[m]> so then the interpreter has to guess at it which requires me to save that extra state
<headius[m]> well interpreter/parser whatever you'd call it
<headius[m]> but I did not write that parser so I am not sure if this is too contextual
<enebo[m]> so are you saying it is something the parser is not sending on but it is there
<headius[m]> well I'm saying the FORMAT_OUTPUT token should be set to the text padding character (a space) because the subsequent token it applies to will want text padding
<headius[m]> I implemented that manually by deferring the padding character selection until I see the next token
<headius[m]> but that seems like maybe parser could determine it better
<headius[m]> my patch is essentially rewriting the token stream
<enebo[m]> headius: I am going to lok at the code again...I just looked at your changes themselves vs how it is implemented before that point
<headius[m]> what I have works fine but it feels wrong
<enebo[m]> if this was printf and a parser for it then it would just probably construct formats with their specifiers but not really know anything about the fields
<headius[m]> the other thing that seems odd to me is the fact that padding is a token on its own rather than being a modifier to the following token
<enebo[m]> for this perhaps we have more necesary knowledge and it would know
<headius[m]> like you could have a flag during parse that says "next token should pad by X" and then when you get the token you'd know how to pad
<headius[m]> right that printf comment is just what I mean
<enebo[m]> I need to look at code
<headius[m]> like if you wrote a parser for printf, you wouldn't have 20 and "f" be different tokens for %20f
<headius[m]> the term token is maybe confusing here but to me it's clear the whole item is "f" with padding "20"
<headius[m]> or precision or whatever
<enebo[m]> well I would have format('f', [specifiers]) where those specifiers would be '20' and other things but I would not do much pass that
<headius[m]> right
<enebo[m]> but yeah I think f would have a set of modifiers
<headius[m]> that would be more than enough here and no hacks needed to lazily decide the padding char
<enebo[m]> it would just be 20 but not say ' ' or 0 or how it is padded
<enebo[m]> but you should be able to just know it is 20 of something and then go but for Z it is spaces
<headius[m]> I just felt icky about fixing this broken lazy logic by making it more lazy
<headius[m]> started to feel like a parrot implementer
<enebo[m]> haha
<enebo[m]> ok give me a few to remember what this code is
<enebo[m]> did I write it?
<headius[m]> ok
<headius[m]> I thought so
<headius[m]> you're my patron saint of parsers
<enebo[m]> this particular file has you all over it but it was a move commit
<enebo[m]> I am sure I wrote the lexer
<enebo[m]> I do not think I wrote this...I dislike the notion of a List<Token> but I do remember optimizing that with a cache I think
<enebo[m]> I half think someone wrote this for us and it was too slow then I added caching and it was quick cached
<headius[m]> I don't believe I ever implemented the parser side of this that's consuming the tokens
<headius[m]> but I have been known to forget
<enebo[m]> Although the List<Token> doesn ot need to be reparsed each time so that was a benefit
<enebo[m]> yeah I think someone contrib'd this
<enebo[m]> All my parser that are not the main ones are simple static recursive descent parsers
<headius[m]> ok
<headius[m]> so maybe my fix is fine then
<enebo[m]> So each option will have a token for FORMAT_SPECIAL before whatever is after it
<enebo[m]> ok so compilePattern is not entirely a parser...it does parse the flex tokens but then just makes another list of tokens
<enebo[m]> I think whatever processes this list should just notice their is a FORMAT SPECIAL and then when it processes the next token which is what to print (e.g. year) it then knows special said 20
<enebo[m]> but I am looking at how we process these tokens now
<headius[m]> that's what I did
<headius[m]> oh more context here
<headius[m]> the rewrite turns that special format into series of extra tokens
<headius[m]> so it see s %F and replaces it with the equivalent of Y-m-d
<headius[m]> the problem is that the padding selection is deferred until after that so it sees Y and thinks it wants a numeric padding
<headius[m]> but it should have a text padding because it's an aggregate format
<headius[m]> sorry I should have described the problem better at first
<headius[m]> so yeah the problem is that the lazy pad logic comes after the rewriting
<headius[m]> by that time it doesn't know it was supposed to be text padded
<enebo[m]> All aggregates are more or less text?
<headius[m]> yeah
<enebo[m]> but then all of those aggregates combined nee to be 20 for %20F
<headius[m]> see my patch, there were two special formats that appear to use numeric padding because they're just weird formats
<headius[m]> Q I think was one
<headius[m]> I checked all of these against MRI
subbu is now known as subbu|away
<enebo[m]> ok I see why you are saying what you are saying
<enebo[m]> if it is special no formatter is provided BY THE LEXER otherwise it makes one
<headius[m]> right
<headius[m]> and in my head it seemed like the lexer would be better set up to provide that
<headius[m]> my code seems like me working around lack of information out of the lexer
<enebo[m]> so all directives call this method directive which will possibly output special
<enebo[m]> I would not have had this lex like this
<headius[m]> since it's a contrib I release you from your obligation to assist, but maybe you will have an idea how to do it better
<enebo[m]> I would have passed all the flags as tokens and you would read each flag until you came to the actual directive
<headius[m]> strftime is a pretty simple format so it seems like this could all be parsed in one go
<headius[m]> yeah that would be better too
<headius[m]> I think the main ick here is that I have to rewrite the token stream
<enebo[m]> well I wouldn't have but nonetheless
<headius[m]> so that the subsequent interpreter will actually have the right information
<enebo[m]> wow this is surprisingly tough to follow
<enebo[m]> special is what?
<headius[m]> it took me a while to figure out where to fix this
<headius[m]> and I am not entirely happy with how I did it
<enebo[m]> I can read the code but I don't get why it is not within the default map of format tokens
<headius[m]> special are these multiformat things I guess
<headius[m]> like %F gets translated into %Y-%m-%d
<headius[m]> so it's "special"
<enebo[m]> I have not run or print anything out so I am not really grokking as well as I could
<headius[m]> so this is a pre-parse that converts "special" aggregate formats into their components
<headius[m]> it's like a de-parser
<enebo[m]> ok so the F gets sent on
<enebo[m]> but with no formatter
<headius[m]> well I'd say the F gets converted into Y-m-d and that is sent on, but then the later determination of padding has the wrong specified
<headius[m]> specifier
<enebo[m]> I feel like 'F' could have just been added to this list and had the token returned be the expanded list
<headius[m]> so it rewrites F but does not rewrite the FORMAT_OPTION that came before it
<headius[m]> until my patch
<headius[m]> my patch basically says "ok I saw a FORMAT_OPTION... let's see what it applies to"
<headius[m]> and then produces a new FORMAT_OPTION token based on special format rather than rewritten format
<headius[m]> it's a weird impl
<headius[m]> seems overcomplicated now that we are discussing it more
<enebo[m]> I can sort of understand why they cleaved it into two passes
<headius[m]> well I assume it's because of these special formats
<enebo[m]> they decided to have the compound stuff expand in second phase
<headius[m]> it parses out the special token and then splats it into its elements
<headius[m]> yeah
<enebo[m]> but the formatting applies to the compond value right?
<headius[m]> right
<headius[m]> and that's the bug
<headius[m]> so now it has formatting determination in two places
<enebo[m]> %20F is all three things and not just that it was numeric
<enebo[m]> so it is really two problems
<headius[m]> right
<enebo[m]> how do you figure out that there are 5 more characters in that expansion?
<headius[m]> eh?
<headius[m]> oh for the padding
<enebo[m]> maybe I misread how add to pattern works
<headius[m]> 🤔
<enebo[m]> It sort of looked like it just put them all on as their own thing
<headius[m]> I guess the padding count is determined after formatting the resulting text
<headius[m]> because e.g. month long names would be different
<enebo[m]> so you have the same length string as MRI with your fix?
<headius[m]> I don't think I checked
<enebo[m]> I would naively assume this fix only makes the year get spacing out to 20
<headius[m]> the original report just said the character was wrong so I probably didn't think to check the length
<headius[m]> well it's supposed to pad F
<headius[m]> so it pads Y-m-d out to N chars
<enebo[m]> yeah and with this fix I think it just pads Z
<enebo[m]> or whatever the 4 digit year
<enebo[m]> or at least that is what I think
<headius[m]> it's a good theory
<enebo[m]> I should probably apply your patch and try
<headius[m]> if so then this is more broken than I realized and it needs to also determine the pad difference during special processing
<enebo[m]> this formatting needs to almost be parens around a group of text
<headius[m]> hmm
<headius[m]> but it can't
<headius[m]> because it doesn't know that F will be Y-m-d or how long those are
<enebo[m]> The problem with this parser is it just translates and add formatter at front but it just broke it into n sub formats
<headius[m]> well maybe it can hardcode this because that is a fixed-length format
<headius[m]> 4-2-2
<enebo[m]> yeah I was wondering how many are known
<enebo[m]> %c will not work
<enebo[m]> unless it does some special padding
<enebo[m]> 'a b e' is variable
<headius[m]> right I figured there'd be some variable length special formats
<headius[m]> maybe you should rewrite it this afternoon
<enebo[m]> I think this can be fixed generically
<enebo[m]> instead of List<Token> it could be List<Token | List<Token>>
<enebo[m]> if list is encountered it recursively calls format call on that sublist
<headius[m]> hmm
<enebo[m]> but the problem with that solution is formatting has to happen after
<enebo[m]> I think whoever wrote this did not consider this at all and as such this is pretty busted
<enebo[m]> wow this is really weird...didn't this use to cache the compiled form
<enebo[m]> I must be thinking of another date/time parser in our codebase
<headius[m]> hey unrelated, how is this bug still open and marked for 9.1.18?
<headius[m]> it turns out it's working in 9.2.11.1 but I went to change the milestone and was shocked to see it was set to a released version
<enebo[m]> haha
<headius[m]> yeah 9.1.18 has three open issues even though it's a closed milestone
<headius[m]> 1.7.15 has two issues open
<enebo[m]> Can you mark against a closed milestone?
<headius[m]> that's it
<headius[m]> yeah you can
<headius[m]> this was clearly not fixed in 9.2.11.1 but it works there
<enebo[m]> yeah that is fine
<headius[m]> I'm not about to bisect releases back to 9.1.18
<headius[m]> ok
<headius[m]> looks like it's only these 5 bugs that are unfixed and marked for releases
<headius[m]> I will review and clean up
<enebo[m]> so strftime up until your "fix" did not look ahead (or at least not outside jflex) so I am confused why this is made into a List then walked
<enebo[m]> I see no cache here at all
<headius[m]> enebo: so the week is winding down here and this is in my queue... how do you want to proceed?
subbu|away is now known as subbu
nirvdrum has quit [Ping timeout: 260 seconds]
<enebo[m]> landing it as-is is better than nothing
<enebo[m]> I think ultimately we should rewrite this parser and add a cache
<headius[m]> ok
<headius[m]> I'll land it then for 9.3
nirvdrum has joined #jruby
ur5us has joined #jruby
ur5us has quit [Quit: Leaving]