<mbj>
dkubb: That wierd %r(/) vs /\/ behavior is reflected in Regexp#source its RUBY.
<mbj>
dkubb: So I'll have to workaround it.
<mbj>
dkubb: Gonna try to quote unquoted / in regexps. I think all regexp engines in ruby will contain workarounds for this stuff.
<mbj>
dkubb: Actually I'm not correct about Regexp#source
mbj_ has joined #rom-rb
mbj has quit [Read error: Connection reset by peer]
vovchanskiy has joined #rom-rb
vovchanskiy has quit [Remote host closed the connection]
pussen has joined #rom-rb
pussen has quit [Remote host closed the connection]
<dkubb>
mbj_: good morning
<dkubb>
mbj_: so Regexp#source works properly by normalizing things?
<dkubb>
mbj_: I wonder, does rbx have two different nodes to represent %r(/) vs /\// ?
<dkubb>
dbussink: ^^^
<dbussink>
dkubb: they end up with different sources yes
<dkubb>
dbussink: we were discussing if ruby impls represent %r(/) and /\// differently
<dkubb>
like maybe they have different ast nodes
<dbussink>
dkubb: if you have rbx installed, you can do rbx compile -A -e '/
<dbussink>
dkubb: if you have rbx installed, you can do rbx compile -A -e '/\//'
<dbussink>
and compare that to the other
<dbussink>
-A prints an ast
<dbussink>
-B bytecode
<dkubb>
interesting
<dkubb>
dbussink: so when the regexp is parsed, does there need to be conditional logic or normalization somewhere so that they are treated the same.. since I assume they compile down to the same thing under the hood?
<dbussink>
well, looks like they are actually different under the hood
<dbussink>
but i don't really know the exact details
<dkubb>
the original assumption mbj and I had was that when they are parsed by whitequark/parser, they would be represented the same in the ast .. we thought looking at how ruby impls do it that we might understand why/if they are different
<dkubb>
mbj_: the ruby docs for Regexp#to_s say "This string can be fed back in to Regexp::new to a regular expression with the same semantics as the original."
<dkubb>
mbj_: so maybe you can just normalize it yourself if you wanted
<dkubb>
mbj_: i was thinking about doing a bit of the parsing logic for sql.rb. I was wondering if you had any projects to point me towards or if you're interested in helping me get started? I think for me the biggest problem is that there's a lack of docs and example projects using ragel (aside from whitequark/parser) and I'm not yet sure if there's a better approach to start with or if I have to find it via trial and error
mbj_ is now known as mbj
postmodern has joined #rom-rb
<mbj>
dkubb: From what I now, ruby is a language you have to lex and parse at the same time.
<mbj>
dkubb: Because of the lvar / method call ambiguity at lexer level.
<mbj>
dkubb: And ragel is not able to generate LR(0) grammars. For that reason wquitequark used racc
<mbj>
dkubb: I think you should find a .y file from a well known implementation (postgres) and implement the C parts in ruby.
<mbj>
dkubb: I'd pick parts of the files.
<postmodern>
are you building your own Ruby implementation now?
<mbj>
postmodern: I'd not implement my own ruby.
<mbj>
postmodern: I'd implement a subset ;)
<mbj>
postmodern: Reason behind this discussion, unparsing %r(/) vs /\//
<mbj>
postmodern: for unparser (mutant)
<postmodern>
ah ha
<postmodern>
supposedly you can get access to the internal regexp tree
<postmodern>
i was looking into it to write a regexp fuzzer
<mbj>
postmodern: I'll do one for mutant
<mbj>
postmodern: I'm on phone bbl
<postmodern>
of course things like /a*/ are hard to fuzz :)
<mbj>
postmodern: The thing is, unparser tries to archieve the following invariant:
<mbj>
So it is totally okay to emit a regexp literal in original source like %r() as //
<mbj>
BUT the regexp contents should be the same.
<postmodern>
ah ha
<mbj>
%r(/) gets parsed as (regexp, (str "/))
<postmodern>
yeah by regexp fuzzer, i was referring to taking a regexp and generating all possible inputs
<mbj>
/\// gets parsed as (regexp, (str "\\/"))
<postmodern>
where as mutant wants to mutant the regexp itself
<mbj>
postmodern: Ahh I thought the other way round.
<mbj>
postmodern: got it.
<mbj>
postmodern: Yeah I think mutant has the easier problem ;)
<mbj>
postmodern: So a generic unparser that does NOT know the original delimiter has a problem.
<mbj>
postmodern: Because literals like /\// already contain the quoted delimiter in str body, and literals like %r(/) do not.
<mbj>
postmodern: I think I need to make unparser source map aware for this node, wich I dislike.
lgierth has joined #rom-rb
CraigBuchek has quit [Quit: Leaving.]
lfox has joined #rom-rb
CraigBuchek has joined #rom-rb
breakingthings has quit []
<dkubb>
postmodern: I thought it might be possible to use https://github.com/ammar/regexp_parser for parsing regexps, and from there the same kind of structure as mutant could be used to mutate each kind of node
<dkubb>
postmodern: I'd guess when mutant begins to mutate regexps most code will have dozens or uncovered mutations for each regexp.. people are pretty bad at testing regexps against possible inputs
<dkubb>
*dozens of
<postmodern>
^ $ vs. \A \z
<dkubb>
oh yeah
<dkubb>
what I would typically do is'
<postmodern>
but doesn't that mean developers will have to write tests for every malformed input?
<dkubb>
mutate from the weaker to strong nodes
<postmodern>
just to ensure the regexp rejects it?
<mbj>
postmodern: You need to have a counter example for each mutation.
<dkubb>
I dunno, I think they might have to write a test for each class of valid input
<mbj>
postmodern: I think the set of counter examples is finite and probably 2 times the amount of nodes in the regexp AST.
<postmodern>
ah like [a-z] -> [^a-z]
<mbj>
exactly
<mbj>
Or a|b => b
<postmodern>
mbj, hmm what about really complex regexps, like email validation?
<postmodern>
mbj, that might generate a ton of counter examples
<mbj>
postmodern: tbh dunno.
<mbj>
postmodern: Next problem mutant currently only "sees" nodes inside def and defs nodes.
<mbj>
my regexps these days are within class / module bodies.
<mbj>
postmodern: I expect you'll not try to mutation cover email regexps via testing against a corpus of valid email addresses.
<mbj>
And also I think the libraries shipping this regexps would be mutation covered and a typicall user would not reinvent such "complex" regexps.
<mbj>
And if he does, he might thing: Uneasy to mutation cover, maybe I should refer to a lib before failing myself.
<mbj>
Wich is a commond side effect of mutation covering your code :D
<postmodern>
mbj, good points
<mbj>
postmodern: With the next version of mutant I expect to have a very fine grained configuration, I dont expect most users will go for "full rom style coverage".
<mbj>
postmodern: But its very helpfull to explore the quality of your software via just using the mutations as a metric.
<mbj>
postmodern: I'll need to add tons of documentation and invent some wording, for example we have a problem with explicit and implicit mutation coverage, wich is IMHO not in the current mutation coverage literature.
<mbj>
postmodern: The configuration for not removing explicit returns for implicit ones is already in a branch. You'll like it. And I hope ronin could be mutation covered :D
<mbj>
dkubb: In unparser I have the concept of "terminated" nodes.
<mbj>
dkubb: terminated nodes are guaranteed to get emitted as composable expressions you could use "everywhere".
<mbj>
dkubb: For example a fixnum literal is "terminated".
<dkubb>
mbj: what's a counter-example?
<mbj>
dkubb: range
<mbj>
dkubb: 1..2
<mbj>
dkubb: if you have a range as receiver you need parenthesis
<mbj>
dkubb: an ast like (send (irange (int 1, int 2)), :foo) must be emitted as (1..2).foo
<dkubb>
mbj: in sql I have it so the emitter can parenthesize based on what the parent node is
<mbj>
dkubb: The emitters all support #terminated?
<dkubb>
mbj: so effectively all my sql statements are terminated now
<mbj>
dkubb: Yeah, I used the same strategy for unparser a while
<mbj>
dkubb: But it could manifest in "douple parenthesis"
<dkubb>
how do you think?
<mbj>
dkubb: Because sometimes termination does not depend only on node type
<dkubb>
I have it so the node is responsible for parenthesizing itself
<dkubb>
or rather the node's emitter
<mbj>
I think this will work for SQL
<dkubb>
yeah, it's much simpler than ruby
<mbj>
For ruby it lead to unneded terminals in the output