Calls within a block, whatever a block is, can use a local calling convention in which the compiler knows where all of the values are to be stored, and thus can elide the check for number of return values, stack-pointer restoration, etc. Alternately, they can use the full unknown-values return convention while trying to short-circuit the call convention. There is probably some low-hanging fruit here in terms of CPU branch-prediction.
The local (known-values) calling convention is implemented by
Local unknown-values calls are handled at the call site by the
mutiple-call-local VOPs. The main difference
between the full call and local call protocols here is that
local calls use a different frame setup protocol, and will tend
to not use the normal frame layout for the old frame-pointer and