dkubb: That wierd %r(/) vs /\/ behavior is reflected in Regexp#source its RUBY.
dkubb: So I'll have to workaround it.
dkubb: Gonna try to quote unquoted / in regexps. I think all regexp engines in ruby will contain workarounds for this stuff.
dkubb: Actually I'm not correct about Regexp#source
mbj_ has joined #rom-rb
mbj has quit [Read error: Connection reset by peer]
vovchanskiy has joined #rom-rb
vovchanskiy has quit [Remote host closed the connection]
pussen has joined #rom-rb
pussen has quit [Remote host closed the connection]
mbj_: good morning
mbj_: so Regexp#source works properly by normalizing things?
mbj_: I wonder, does rbx have two different nodes to represent %r(/) vs /\// ?
dbussink: ^^^
dkubb: they end up with different sources yes
dbussink: we were discussing if ruby impls represent %r(/) and /\// differently
like maybe they have different ast nodes
dkubb: if you have rbx installed, you can do rbx compile -A -e '/
dkubb: if you have rbx installed, you can do rbx compile -A -e '/\//'
and compare that to the other
-A prints an ast
-B bytecode
dbussink: so when the regexp is parsed, does there need to be conditional logic or normalization somewhere so that they are treated the same.. since I assume they compile down to the same thing under the hood?
well, looks like they are actually different under the hood
but i don't really know the exact details
the original assumption mbj and I had was that when they are parsed by whitequark/parser, they would be represented the same in the ast .. we thought looking at how ruby impls do it that we might understand why/if they are different
mbj_: the ruby docs for Regexp#to_s say "This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original."
mbj_: so maybe you can just normalize it yourself if you wanted
mbj_: i was thinking about doing a bit of the parsing logic for sql.rb. I was wondering if you had any projects to point me towards or if you're interested in helping me get started? I think for me the biggest problem is that there's a lack of docs and example projects using ragel (aside from whitequark/parser) and I'm not yet sure if there's a better approach to start with or if I have to find it via trial and error
mbj_ is now known as mbj
postmodern has joined #rom-rb
dkubb: From what I now, ruby is a language you have to lex and parse at the same time.
dkubb: Because of the lvar / method call ambiguity at lexer level.
dkubb: And ragel is not able to generate LR(0) grammars. For that reason wquitequark used racc
dkubb: I think you should find a .y file from a well known implementation (postgres) and implement the C parts in ruby.
dkubb: I'd pick parts of the files.
are you building your own Ruby implementation now?
postmodern: I'd not implement my own ruby.
postmodern: I'd implement a subset ;)
postmodern: Reason behind this discussion, unparsing %r(/) vs /\//
postmodern: for unparser (mutant)
ah ha
supposedly you can get access to the internal regexp tree
i was looking into it to write a regexp fuzzer
postmodern: I'll do one for mutant
postmodern: I'm on phone bbl
of course things like /a*/ are hard to fuzz :)
postmodern: The thing is, unparser tries to archieve the following invariant:
So it is totally okay to emit a regexp literal in original source like %r() as //
BUT the regexp contents should be the same.
ah ha
%r(/) gets parsed as (regexp, (str "/))
yeah by regexp fuzzer, i was referring to taking a regexp and generating all possible inputs
/\// gets parsed as (regexp, (str "\\/"))
where as mutant wants to mutant the regexp itself
postmodern: Ahh I thought the other way round.
postmodern: got it.
postmodern: Yeah I think mutant has the easier problem ;)
postmodern: So a generic unparser that does NOT know the original delimiter has a problem.
postmodern: Because literals like /\// already contain the quoted delimiter in str body, and literals like %r(/) do not.
postmodern: I think I need to make unparser source map aware for this node, wich I dislike.
lgierth has joined #rom-rb
CraigBuchek has quit [Quit: Leaving.]
lfox has joined #rom-rb
CraigBuchek has joined #rom-rb
breakingthings has quit []
postmodern: I thought it might be possible to use https://github.com/ammar/regexp_parser for parsing regexps, and from there the same kind of structure as mutant could be used to mutate each kind of node
postmodern: I'd guess when mutant begins to mutate regexps most code will have dozens or uncovered mutations for each regexp.. people are pretty bad at testing regexps against possible inputs
*dozens of
^ $ vs. \A \z
oh yeah
what I would typically do is'
but doesn't that mean developers will have to write tests for every malformed input?
mutate from the weaker to strong nodes
just to ensure the regexp rejects it?
postmodern: You need to have a counter example for each mutation.
I dunno, I think they might have to write a test for each class of valid input
postmodern: I think the set of counter examples is finite and probably 2 times the amount of nodes in the regexp AST.
ah like [a-z] -> [^a-z]
Or a|b => b
mbj, hmm what about really complex regexps, like email validation?
mbj, that might generate a ton of counter examples
postmodern: tbh dunno.
postmodern: Next problem mutant currently only "sees" nodes inside def and defs nodes.
my regexps these days are within class / module bodies.
postmodern: I expect you'll not try to mutation cover email regexps via testing against a corpus of valid email addresses.
And also I think the libraries shipping this regexps would be mutation covered and a typicall user would not reinvent such "complex" regexps.
And if he does, he might thing: Uneasy to mutation cover, maybe I should refer to a lib before failing myself.
Wich is a commond side effect of mutation covering your code :D
mbj, good points
postmodern: With the next version of mutant I expect to have a very fine grained configuration, I dont expect most users will go for "full rom style coverage".
postmodern: But its very helpfull to explore the quality of your software via just using the mutations as a metric.
postmodern: I'll need to add tons of documentation and invent some wording, for example we have a problem with explicit and implicit mutation coverage, wich is IMHO not in the current mutation coverage literature.
postmodern: The configuration for not removing explicit returns for implicit ones is already in a branch. You'll like it. And I hope ronin could be mutation covered :D
dkubb: In unparser I have the concept of "terminated" nodes.
dkubb: terminated nodes are guaranteed to get emitted as composable expressions you could use "everywhere".
dkubb: For example a fixnum literal is "terminated".
mbj: what's a counter-example?
dkubb: range
dkubb: 1..2
dkubb: if you have a range as receiver you need parenthesis
dkubb: an ast like (send (irange (int 1, int 2)), :foo) must be emitted as (1..2).foo
mbj: in sql I have it so the emitter can parenthesize based on what the parent node is
dkubb: The emitters all support #terminated?
mbj: so effectively all my sql statements are terminated now
dkubb: Yeah, I used the same strategy for unparser a while
dkubb: But it could manifest in "douple parenthesis"
how do you think?
dkubb: Because sometimes termination does not depend only on node type
I have it so the node is responsible for parenthesizing itself
or rather the node's emitter
I think this will work for SQL
yeah, it's much simpler than ruby
For ruby it lead to unneded terminals in the output