nirvdrum has quit [Ping timeout: 248 seconds]
ur5us has quit [Ping timeout: 248 seconds]
ur5us has joined #jruby
nirvdrum has joined #jruby
lucasb has quit [Quit: Connection closed for inactivity]
nirvdrum has quit [Ping timeout: 268 seconds]
ur5us has quit [Ping timeout: 265 seconds]
nirvdrum has joined #jruby
ur5us has joined #jruby
nirvdrum has quit [Ping timeout: 248 seconds]
rusk has joined #jruby
ur5us has quit [Ping timeout: 248 seconds]
nirvdrum has joined #jruby
dopplerg- has joined #jruby
havenwood_ has joined #jruby
adam12_ has joined #jruby
_Caerus has joined #jruby
Caerus has quit [*.net *.split]
dopplergange has quit [*.net *.split]
havenwood has quit [*.net *.split]
adam12 has quit [*.net *.split]
havenwood_ is now known as havenwood
havenwood has joined #jruby
havenwood has quit [Changing host]
KeyJoo has joined #jruby
ur5us has joined #jruby
ur5us has quit [Ping timeout: 248 seconds]
shellac has joined #jruby
shellac has quit [Client Quit]
lucasb has joined #jruby
adam12_ is now known as adam12
adam12 is now known as Guest95748
Guest95748 is now known as adam12
xardion has quit [Remote host closed the connection]
xardion has joined #jruby
rusk has quit [Remote host closed the connection]
KeyJoo has quit [Quit: KeyJoo]
<headius[m]> subbu: hey I am working on some optimizations for blocks and have a question about flags and compiuler passes
<headius[m]> there's an instruction for blocks to load the frame's closure, so it can be yielded to... LoadFrameClosureInstr
<headius[m]> this is added conservatively to all closures by the builder, but if we never yield it becomes dead code and goes away
<headius[m]> however its flag changes stick around... it forces a frame when no frame load is actually needed
<headius[m]> so all blocks are pushing and popping a frame and twiddling its visibility even though most don't need it
<headius[m]> the lifecycle of flags is confusing...once we set them they stay set, but various paths cause them to be calculated very early
<headius[m]> seems there needs to be a reset at some point, or perhaps every point, during the optimization passes
<headius[m]> naively resetting them in after DCE or before ACP seems to damage non-local break though
travis-ci has joined #jruby
<travis-ci> jruby/jruby (reduce_block_overhead:2b31906 by Charles Oliver Nutter): The build passed. https://travis-ci.org/jruby/jruby/builds/626326525 [177 min 23 sec]
travis-ci has left #jruby [#jruby]
<subbu> oh man ... that seems like a lifetime ago ... i am in india right now .. let me try to page in some of it tomorrow and see what i can find unless enebo[m] knows the answer readily.
<enebo[m]> The naive reset not working answer is that the instrs determine flags but also the passes themselves will set flags. So it is a combination of things. I think perhaps ACP itself should determine if we can eliminate it at that point.
<enebo[m]> I did no work on ACP itself but seemingly we just need to know which properties satisfy when a frame is not needed and then not add it during ACP OR a pass later which examines the state of things and removes conservative assumptions.
<headius[m]> So you are saying that perhaps make ACP walk instructions again?
<enebo[m]> headius: possibly...if ACP removes something which will change it so we know we don't need to push a frame
<enebo[m]> otherwise we can not add it in the first place if we know all we need to
<headius[m]> Currently ACP just asks for flags but those flags reflect instructions that have been removed
<headius[m]> The load of the frame closure is removed properly, but it has already tainted the flags
<headius[m]> Clearly there is a disconnect between the life cycle of the flags and the life cycle of the instructions
<headius[m]> There may be many other things not optimizing properly because these early flags leak into the optimized IR later on
<enebo[m]> perhaps whatever has removed whatever was holding it back should be removing the flag then...but flags were added more conservatively initially
<headius[m]> I thought about trying that, but if two instructions trigger a flag, removing one of those instructions should not remove the flag
<headius[m]> So it needs to be a full recalculate after instructions get modified, at least for flags that are driven by the instructions present
<enebo[m]> yeah so the property would be not having both and would require a full scan
<enebo[m]> but with that said if you remove one of them you can then fire off a full scan afterwards or none if nothing was removed
<headius[m]> The good news is that if we fix this, that unnecessary frame push and pop will just go away
<enebo[m]> the utility of what flags represent is valuable but maintaining them is tough. Being able to calculate them on demand would be a tremendous amount of time
<headius[m]> With that gone, only a few tweaks are required to make leaf closures have almost zero dispatch cost
<enebo[m]> So let me ask a question...is it ACP which potentially removes some instrs or is it conceivably others as well?
<headius[m]> ACP only adds instructions
<enebo[m]> ok but isn't ACP the pass that adds the frame push?
<headius[m]> It examines flags to see if a frame or scope are needed, adds those instructions, and then marks the scope as having call protocol
<enebo[m]> so possibly a good "today" solution is a prescan in ACP to see if the instrs which would force a frame or not and update flags. Later determine whether that can be pushed further back into other pass of building itself
<headius[m]> In this case, the load of the frame closure has set that the frame is required, so ACP ads it
<headius[m]> Even though that load was removed
<enebo[m]> I believe this is mostly fallout from just being conservative with the intention to shave this away but I guess we need to know what allows this
<enebo[m]> what removes that load?
<headius[m]> I could make ACP rescan instructions, but if we have logic that is attempting to force a frame or scope based on information other than the instructions it won't see that
<headius[m]> I believe dce
<enebo[m]> ah DCE should probably have a post scan
<headius[m]> that's what I tried...reset flags and rescan, but it loses something needed for break
<enebo[m]> it would be DCE responsibility to know it removed something which would change the scopes flags
<enebo[m]> oh don't rescan
<enebo[m]> just scan and remove flag(s) which were results of removed instrs
<headius[m]> it works otherwise and this unnecessary frame goes away
<headius[m]> yeah how?
<enebo[m]> which instrs exist which require this flag?
<headius[m]> this case is LoadFrameClosure
<enebo[m]> If that is only a single instr which reflects this then that would be enough to remove the flag
<enebo[m]> and it can even be done at the time the instr is marked dead
<enebo[m]> if it is a set of instrs then I guess you need to run over all passes if you see any in the set
<headius[m]> it's NEEDS_BLOCK
<headius[m]> also set by CallBase for calls with literal block
<headius[m]> seems only those two set it
<headius[m]> hmm
<headius[m]> actually I don't think CallBase should set it
<headius[m]> ahh I see
<enebo[m]> in LiveVariableNode.markDeadInstrs something in the while can remove it but now I am not sure if I am totally following along
<headius[m]> it sets it for Proc.new and for calls to method names that might access frame
<headius[m]> not for calls that take a block
<enebo[m]> so the flag specifically is NEEDS_BLOCK?
<headius[m]> yeah
<headius[m]> my issue with removing flags is that I don't know if there are other instructions remaining that still need it
<headius[m]> like if the LoadFrameClosure gets removed, but later on there's a Proc.new, the body still needs frame closure
<enebo[m]> yeah that is where I started this conversation...we need to know what needs the flag. For this case instr wise Proc.new call and whatever does load_frame_closure
<headius[m]> ok let me turn this around a bit
<enebo[m]> we also have REQUIER_ALL_FRAME_{ two} subsets
<headius[m]> why doesn't a full reset and rescan work?
ur5us has joined #jruby
<headius[m]> I feel like if there are flags that can't be restored by a full rescan then there's a flaw in flags
<enebo[m]> I believe it is because the passes can observe something not inherent in a full scan BUT that maybe is not true
<enebo[m]> yeah I cannot fully answer that...it could just be a bug
<headius[m]> something clearly removes a flag needed for break in a block
<enebo[m]> we would need to audit all flag sets and make sure it is all just sourced from current set of instrs
<headius[m]> or rather, a rescan does not properly restore that flag
<headius[m]> yeah that's where I'm leaning now
<enebo[m]> I mean by default calculateflags just uses the method on the instr
<headius[m]> if there are flags that are not based on instrs, they should be a separate set
<enebo[m]> and some of the logic for what flags are needed in understanding more globally whether a set of instrs means something
<headius[m]> like for forcing a frame or scope regardless of what instrs need
<headius[m]> but there shouldn't be many of those cases...ideally flags should always be rescannable
<headius[m]> the combination of scope state + instrs should always be enough to restore flags from empty
<enebo[m]> and passes
<headius[m]> I say not passes
<headius[m]> at least for flags that are based on what instructions are present
<headius[m]> passes are a transformation of scope + instrs
<enebo[m]> so we have 2 sets of data and one cannot be recalcd?
<enebo[m]> IRFlags specifically what is true about the scope
<enebo[m]> here is a question so I better get your head space on this...
<enebo[m]> where is REQUIRES_BLOCK?
<headius[m]> it only comes from instructions, so it should accurately be restored in a rescan
<headius[m]> something needed for break does not get restored from a rescan
<headius[m]> I'm saying that is a bug
<enebo[m]> well in the case of REQUIRES_BLOCK it sounds like caclulateFlags on instr can just set REQUIRES_BLOCK if there is the presence of Proc.new OR load_frame_closure so I agree here
<enebo[m]> I would like to know when this does not work
<headius[m]> so you can try this and see the breakage I see
<headius[m]> in DCE after it runs, remove FLAGS_CALCULATED from the scope's flags and rescan
<enebo[m]> I have reset on occasion and seen breakage but I think some of this may be stuff like reuse_parent_dynscope being removed
<headius[m]> then try to run rake tests
<enebo[m]> some of the flags are calculated in a more complicated way than just is there an instr present
<headius[m]> yeah so that's a good example of a flag that should not be resettable this way
<headius[m]> I wouldn't say more complicated, I'd say it's based on things other than instrs
<headius[m]> so I think the issue is that we're overloading flags to be both scope state and information about the instructions
<enebo[m]> ok so I think we both agree there is some set of completely rederivable flags and in fact they are a virtue in that we can simply rescan once
<headius[m]> yes
<enebo[m]> instr.computeScopeFlags() on each one
<enebo[m]> There is a secondary set of state which depends on whole scope analysis which is an outcome of passes
<headius[m]> yeah this makes more sense to me now
<enebo[m]> whole scope analysis which may even include some child scopes
<enebo[m]> we really have two solutions to consider here
<enebo[m]> we could separate these or we could delineate the type and make a rescan subset
<headius[m]> I feel like separating would be better
<headius[m]> the entire purpose of caching the instr flags is so we don't have to rescan them, but we should be able to rebuild that subset entirely from instrs without wiping out flags that are derived elsewhere
<headius[m]> so computeInstrFlags should be a separate thing that can always flush and rescan
<enebo[m]> I do really feel something to make this technical decision more orthogonal would be to make all accesses via isBlockRequired()
<enebo[m]> from a usage standpoint and probably a set otherwise
<headius[m]> I don't follow
<enebo[m]> this would isolate the implementation enough where the only consideration would be memory consumption and pain of keeping the actual enumset as a more messy impl
<enebo[m]> we should not be doing scope.getIRFlags() from passes
<enebo[m]> we are leaking impl
<headius[m]> yeah
<headius[m]> well it's not really doing that but in this case it's needsFrame()
<enebo[m]> we should properly scope.isBlockRequired { get it from whereever }
<headius[m]> needsFrame checks for a number of flags representing frame fields, and REQUIRES_BLOCK is still there even though the instr that added it is gone
<headius[m]> yeah
<enebo[m]> if we properly use predicates and setters then I don't care if we split them or not
<enebo[m]> my only followup on splitting is adding a whole field for the separation
<enebo[m]> leaving combined would make the impl more complicated but we would more or less restrict any uses of how we store it to IRScope and IRFlags itself so it would stay an impl detail
<enebo[m]> headius: so I think we did unwrap some mystery of flags today
<headius[m]> I will look into splitting for now, and see if I can separate flags derived from instructions from flags derived through other state
<enebo[m]> My other ponder is whether this should be happening more than a few places. rederiving every pass is possible but I think it is only important on flags determining whether we add something or remove something
<headius[m]> my thinking is that calculating if a frame is needed would look at both, so we can force frame and scope behavior in one set, and the deer I have it in another
<headius[m]> Derive it in another
<enebo[m]> in passes which are about live variable state for example we probably don't look at any flags so resetting for them won't help
<enebo[m]> ACP is sounds will definitely help
<headius[m]> For the no argument times block, once this is fixed there are two other items
<headius[m]> One is that eval type field, which I think needs to be made final and eval with a block should always make a new one
<headius[m]> So blocks that are not used in an eval don't constantly have to check and set this field
<headius[m]> The other one is argument processing which we talked about in Richmond
<headius[m]> In this case, as a normal no arg block, we should just remove the instruction altogether. It's only being inserted to check arity if the block is a lambda
<headius[m]> I pushed a branch that makes block type final, and if proc needs to change it, it clones the block
<headius[m]> Since the vast majority of blocks will always be normal, very few will ever be cloned. That commit appears to be green
<headius[m]> This gets back to old discussions about splitting up the blocks of different types, so rather than having a field indicating what block type or eval type it is, that is all handled in instructions
<headius[m]> Possibly by lazily recompiling as a lambda or eval block
<enebo[m]> yeah
<enebo[m]> That has been something I have spiked once or twice and ran into issues
<enebo[m]> but hey I got inlining seemingly green perhaps it is not as bad as I think
<enebo[m]> I do think one issue is the idea that one parent scope has to share two child blocks which represent one piece of code
<enebo[m]> although I don't think that was the problem I had
<enebo[m]> Personally I really feel making a new IRClosure for a second form should just work but the parent has to be conservative so both work
<enebo[m]> this is where I don't remember how many times we getChildClosures and set flags/state on the parent
<enebo[m]> IRScope.calculateClosureScopeFlags() has one issue with nonlocal returns since if it is a lambda it doesn't but if it is proc it does
<enebo[m]> so parent sort of has to set it to expecting them and I believe that largely just adds a BB which recv_exception and then does the nonlocalreturn helper
<enebo[m]> anyways that is just a wrinkle and should not accept the lion share of code which is ordinary procs and not lambdas
<enebo[m]> also that BB will never get reached in actual code so it may just end up as code bloat
ur5us_ has joined #jruby
ur5us has quit [Read error: Connection reset by peer]
nirvdrum has quit [Ping timeout: 248 seconds]