carlosgaldino has quit [Quit: My MacBook Pro has gone to sleep. ZZZzzz…]
dimday has joined #rubinius
diegovio1 has joined #rubinius
diegoviola has quit [Ping timeout: 250 seconds]
<|jemc|>
hm, looks like my best shot for more performance out of this parser generator I need to implement the Test* set of instructions in the LPEG machine paper
diegovio1 is now known as diegoviola
houhoulis has joined #rubinius
|jemc| has quit [Ping timeout: 260 seconds]
diegovio1 has joined #rubinius
diegovio1 is now known as diegovio1a
johnmuhl has quit [Quit: Connection closed for inactivity]
amclain has joined #rubinius
arrubin has joined #rubinius
kagaro has quit [Ping timeout: 252 seconds]
josh-k_ has quit [Remote host closed the connection]
|jemc| has joined #rubinius
tenderlove has quit [Quit: Leaving...]
diegoviola has quit [Ping timeout: 255 seconds]
zacts has joined #rubinius
diegovio1a is now known as diegoviola
diegovio1 has joined #rubinius
arrubin has quit []
Bwild has joined #rubinius
diegoviola has quit [Quit: WeeChat 1.0.1]
havenwood has joined #rubinius
carlosgaldino has joined #rubinius
diegovio1 is now known as diegoviola
amclain_ has joined #rubinius
mrb_bk has quit [Ping timeout: 258 seconds]
mrb_bk has joined #rubinius
amclain has quit [Ping timeout: 258 seconds]
amclain_ is now known as amclain
meh` has quit [Ping timeout: 264 seconds]
diegoviola has quit [Quit: WeeChat 1.0.1]
meh` has joined #rubinius
noop has joined #rubinius
dzhulk has joined #rubinius
noop has quit [Ping timeout: 256 seconds]
diegoviola has joined #rubinius
havenwood has quit [Remote host closed the connection]
<yxhuvud>
oh well, I guess it is my own fault for choosing to implement something that has its parsing power screwed up to 11
amsi has quit [Ping timeout: 258 seconds]
amsi has joined #rubinius
<yorickpeterse>
darn parsing theory using epsilon for nil/null values
<yorickpeterse>
that character is impossible to type
<yxhuvud>
indeed.
diegovio1 has joined #rubinius
diegoviola is now known as Guest95927
diegovio1 is now known as diegoviola
Guest95927 has quit [Ping timeout: 252 seconds]
<|jemc|>
wow, this 20s parse run is taking for_ever_ with -Xint
Locke23rus has joined #rubinius
* |jemc|
goes to get some lunch
Locke23rus has left #rubinius [#rubinius]
<|jemc|>
oh it finished just in time (12 minutes, 45 seconds)
<|jemc|>
speaking of just in time
<|jemc|>
brixen: looks like it may be a JIT issue
<|jemc|>
wasn't able to show the problem in a minimal dynamic method, but if I do -Xint my parser "works"
<|jemc|>
I'll try a minimal dynamic_method in an infinite loop
<|jemc|>
with and without jit
<brixen>
|jemc|: sounds good
<brixen>
I'm guessing it's a jit issue unrelated to goto_if_nil
goyox86 has quit [Read error: Connection reset by peer]
<brixen>
|jemc|: also would be curious to see a slow interpreter case that is massively sped up by the JIT
<yorickpeterse>
I have one app where I can see the JIT doing its thing in my graphs (at least I suspect that is the case)
<|jemc|>
could be, although I injected a p(the_nil_value) bytecode right before the goto_if_nil check and it shows nil
<yorickpeterse>
although I'd have to overlay JIT metrics on top of the timings graph to be sure
goyox86 has joined #rubinius
<|jemc|>
re the app: pegleromyces is public code, you just need to install myco (from source) first
<|jemc|>
when I'm done investigating this, I'll push up the latest few changes
<brixen>
|jemc|: ok, a simple script to repro would o/~ awesome o/~
<brixen>
I'm good at cloning repo's, etc, but that little bit of help goes a long way :)
<yorickpeterse>
|jemc|: that name is impossible to pronounce
<|jemc|>
to repro the massive jit speedup, you mean?
<brixen>
|jemc|: you're welcome to take over pegarus if you want, and it has a twitter account, too
<brixen>
|jemc|: yeah, to repro whatever
<|jemc|>
brixen: I may take you up on that after pegleromyces is fast :)
<|jemc|>
for those who want to reap the benefits o this work wihout having to install my toy language
josh-k_ has joined #rubinius
josh-k has quit [Ping timeout: 250 seconds]
<|jemc|>
yorickpeterse: regarding the name - it's just a working title for the project right now - it'll probably end up getting integrated into myco at some point anyway
<|jemc|>
(the BytecodeHelpers module at the top is just for inspecting values and debug printing)
<|jemc|>
I'm running the 2.4.1 release
<|jemc|>
I can file as an issue later tonight if you like
<brixen>
this looks like nil every time I run it
<brixen>
hm
<|jemc|>
so your loop runs infinitely without raising?
<brixen>
no, I mean your repro looks good
<|jemc|>
ah
<yorickpeterse>
yxhuvud: ok one more, how do parsers best handle epsilons? That is, say you have the rule "values -> pair more_values; more_values -> T_COMMA values | _;" with "_" being epsilon
<yorickpeterse>
yxhuvud: right now I basically push epsilon on to the stack and for all non defined tokens my goto table jumps to an epsilon production
<yorickpeterse>
which is basically a noop
<yorickpeterse>
(if that makes any sense)
<brixen>
|jemc|: I had a very simple test for this, rechecking that now
<yorickpeterse>
yxhuvud: I think this is a hack, but I can't really think of another way
<|jemc|>
yorickpeterse: pushing an epsilon token onto the end of the stack to mark EOF makes sense to me
<yorickpeterse>
although the parser _seems_ to work correctly
<yorickpeterse>
|jemc|: it's not EOF though
<yorickpeterse>
more like end-of-current-production
<|jemc|>
misunderstood then, nvm
<yorickpeterse>
getting pretty darn close to figuring this out :D
<brixen>
|jemc|: my test works and your repro shows an issue, fun :)
<yorickpeterse>
then it's off to write a parser toolkit, a grammar, and a parser for that (and then bootstrap the whole pig)
<yorickpeterse>
then it's writing a bunch of C/Java to make this webscale
<yorickpeterse>
:<
<|jemc|>
brixen: heh, fun indeed
<|jemc|>
brixen: wish I knew enough about the JIT to be able to help out - it's on my list of things to eventually understand
<|jemc|>
brixen: should we add some of these JIT regression tests to the rubinius repo somewhere?
<brixen>
|jemc|: it's on my list to make it easier for you to understand :)
<|jemc|>
great
<brixen>
I'm working on a test framework for the JIT
<|jemc|>
cool
<|jemc|>
I imagine those two might come hand-in-hand
<yorickpeterse>
By this point you'd be worried if brixen _wasn't_ working on something
<brixen>
right, cus I'd be dead :)
<brixen>
|jemc|: hm, your repro is interesting
<brixen>
my test is testing a specific value
<brixen>
either a straight up g.push :nil, or the argument passed in
<brixen>
|jemc|: I'll gist you my test
<|jemc|>
I was originally trying to isolate it like that, but ended up just going with a near-cut-and-paste of one of my parsing instructions
<|jemc|>
it may be related to the primitive dispatches?
<|jemc|>
(the primitives behind find_string and chr_at)
<|jemc|>
I chose those methods because they were as close to the Rubinius.primitives as possible
<bennyklotz>
yorickpeterse: gratz to your hand written LL parser, found any resources (other than wikipedia cryptic formulas) which helped?
<yorickpeterse>
bennyklotz: about 2 dozen papers, Wikipedia, random websites, IRC and lots of note taking :P
<yorickpeterse>
But I can say this: all academic resources I found fucking suck _unless_ you already are familiar with the algorithms (funny how that goes)
<|jemc|>
brixen: in case you hadn't tried it yet: if you s/goto_if_nil/goto_if_false, the problem does not reproduce
<yorickpeterse>
I still have to figure out how to run actions whenever a production is complete
<bennyklotz>
yorickpeterse: okay thx, I think I'll just google for some papers and read them :)
<yorickpeterse>
Once I've confirmed I didn't implement some weird thing I'll see if I can blag about this
<bennyklotz>
okay cool :)
<yorickpeterse>
bennyklotz: the Wikipedia page is probably easiest to read, the papers are a pain
<bennyklotz>
kk thx
<brixen>
|jemc|: yeah, I already tested that :)
<yxhuvud>
yorickpeterse: I cheat by transforming the grammar to avoid epsilons. note that I havn't gotten the actual parse tree building to work yet, so actual handling in the generating stage is not known to me. You better ask someone that has finished a parser for that :)
<yorickpeterse>
yxhuvud: hm, thanks
<brixen>
|jemc|: heh, gets even more strange
<yxhuvud>
or well, I *have* to rewrite productions that can lead to epsilons, but I have some epsilons in the state diagram. Those are handled by adding the state the transition lead to.
<yorickpeterse>
yxhuvud: in this case I'm using epsilon as sort of a catch-all to break out of a productio n
<yorickpeterse>
* production
<yxhuvud>
I see.
<yorickpeterse>
which I think can be done in a better way
<brixen>
|jemc|: fixed
<brixen>
|jemc|: silly me, I had planned to redo this with the fix and got distracted
GitHub129 has joined #rubinius
<GitHub129>
[rubinius] brixen pushed 1 new commit to master: http://git.io/3RmT5Q
GitHub129 has left #rubinius [#rubinius]
<GitHub129>
rubinius/master 033ec2d Brian Shirai: Fixed JIT for goto_if_nil, goto_if_not_nil.
<|jemc|>
brixen: thanks
* |jemc|
looks at the diff
<brixen>
it's pretty simple
<brixen>
the FALSE_MASK thing was the issue
<brixen>
well, basically, masking to false and then comparing to nil
<brixen>
that was silly
<brixen>
if we want to use our millions of nils, we'd mask to primordial nil and compare
<brixen>
but no reason to do that at the moment
<brixen>
merging master to 1.8.7 branch and working through the vm/llvm files means I'm typing git and jit a bunch
<brixen>
and since whatever I type my brain pronounces, even when I tell it to stfu, it gets pretty funny
<brixen>
is git pronounced git or jit
<brixen>
and then typing jit
<brixen>
could be a college humor short about how to pronounce gif
<|jemc|>
heh
goyox86 has quit [Max SendQ exceeded]
<|jemc|>
that's why I use the long name for those instructions
<|jemc|>
so that no one things I'm using a version control instruction
<|jemc|>
or a peanut butter instruction that choosy moms would choose
goyox86 has joined #rubinius
<yorickpeterse>
hmpf, I'd love for Ruby to basically have define_method but without the performance overhead
<|jemc|>
yorickpeterse: where does the overhead come from (in rbx)? just the reseting of the method cache?
<yorickpeterse>
actually perf is roughly the same in rbx when comparing def/define_method/eval
<yorickpeterse>
hm, I think my benchmark is fucked
<yorickpeterse>
MRI reports eval being faster than a regular def
<yorickpeterse>
lol
<yorickpeterse>
Ah ok better, define_method is ~1,25x slower compared to a def
<yorickpeterse>
but in JRuby define_method is 2,45x slower compared to a def
<yorickpeterse>
^ why I stopped using define_method in Oga and switched to eval() in a few places
<|jemc|>
yorickpeterse: 1,25x slower in rbx?
<yorickpeterse>
No MRI
<|jemc|>
ah
<|jemc|>
yeah, if it was slower in rbx (after having some time to "settle in") I would be curious to know why
<yorickpeterse>
probably not very efficient but it's a nice start
<yorickpeterse>
that would return `{"name"=>"Yorick", "age"=>22, "location"=>"Netherlands", "anger_level"=>9000}`
<|jemc|>
well, I'm glad it's not *over 9000*
<yorickpeterse>
semantics
<yorickpeterse>
Right, so now that my parser actually does something it's 1,6 times slower than Racc
<chrisseaton>
yorickpeterse: I like that you've labelled the rows and columns of your table - the one thing that most textbooks miss out and that makes it hard for students to understand in my experience
<yorickpeterse>
chrisseaton: Yeah, it's quite confusing to remember what is what otherwise
<|jemc|>
yorickpeterse: would it help your performance to turn your _rule_X methods into a big case statement?
<yorickpeterse>
|jemc|: could be, although on at least rbx/jruby I'd expect them to be inlined (at least the short methods)
<yorickpeterse>
lemme see what happens if I use a case
<yorickpeterse>
wait first lemme see what rbx does
<yorickpeterse>
Hm interesting
<yorickpeterse>
so on MRI my LL parser is ~1,5x slower
<yorickpeterse>
on Rbx it performs more or less the same as the Racc code, not sure if that's good or bad
<yorickpeterse>
(slower than MRI though)
<yorickpeterse>
and on JRuby my LL parser is 2,5x slower compared to Racc, but JRuby in general is faster than MRI
<chrisseaton>
|jemc|: big switch statements are the enemies of many JITs - V8 and HotSpot both don't like them - not sure about LLVM
<yorickpeterse>
either way, tomorrow I'll be looking into how painful it is to do part of this in Java/C++ (so I can bypass Ruby data structures) and then hook that up to Ruby
<|jemc|>
chrisseaton: can you elaborate a bit for my edification? is it gotos in general or something special about switch/case?
goyox86 has quit [Read error: Connection reset by peer]
<chrisseaton>
|jemc|: it's really the 'big' part - JITs often don't like big methods - and I think in particular some JITs actively stop compiling big switch statements as they consider them to be likely to not be well optimised (not saying it's a good heuristic - just that's the way it often is)
<chrisseaton>
|jemc|: for example, if your JS function is too many characters (literally the number of caracters) V8 will not inline it - just based on that alone - regardless of how small it may compile to
<chrisseaton>
|jemc|: actually I can give you a very concrete example - switch statements desugar into much more than a series of 'if's - often involving lookup arrays and things. In Truffle, we were not able to determine that one of these lookup arrays was constant, and so a switch was not compiling away. So there was a real bug from a switch causing a JIT to not work as
<chrisseaton>
well as normal dispatch would.
<yorickpeterse>
|jemc|: no noticable perf difference for a case vs method calls