Optimize for the common case where the input write size is less than the
buffer size. This slightly increases the cost for pathological write
patterns that commonly fill the buffer exactly, but if a client is doing
that frequently, they're already paying the cost of frequent flushing,
etc., so the cost is of this optimization to them is relatively small.