Calculate sequential cummulative sum of values in numeric vectors.
Arguments
- x
The input vector.
Must be
atomicnumbers.NULLvalues are returnedNULL.- lag
Which lag to use.
Defaults to
1.Must be a natural number. (See Greater lags section, below.)
- skip
A function to indicate which values to skip.
Defaults to
is.na.This must be a
functionwhich can be applied toxto return alogicalvector of the same length.TRUEvalues are skipped over in the calculations. By default, theskipfunction isis.na, soNAvalues in the input (xargument) are skipped. The skipped values are returned as is in the output vector.- init
Initial value to fill the beginning for calculation.
Defaults to
0.Should be the same class as
x; length must be not longer thanlag.NAvalues at the beginning (or end ofright == TRUE) are filled with these values before summing.- groupby
How to group the data.
Defaults to
list().Should be
vectororlistofvectors; must be lengthlength(x).Differences are not calculated across groups indicated by the
groupbyvector(s).- orderby
The order for calculating the difference.
Defaults to
list().Should be
vectororlistofvectors; must be lengthlength(x).Differences in
xare calculated based on the order oforderbyvector(s), as determined bybase::order().- right
Should the
initpadding be at the "right" (end of the vector)?Defaults to
FALSE.Must be a singleton
logicalvalue: an on/off switch.By default,
right == FALSEso theinitpadding is at the beginning of the output.
Details
sigma is very similar base-R cumsum().
However, sigma should be favored in humdrumR use because:
It has a
groupbyargument, which is automatically used byhumdrumRwith(in) commands to constrain the differences within pieces/spines/paths ofhumdrumdata. Using thegroupbyargument to a function (details below) is generally faster than using agroupbyargument towithinHumdrum().They (can) automatically skip
NA(or other) values.sigmaalso has ainitargument which can be used to ensure full invertability withdelta(). See the "Invertability" section below.
If applied to a matrix, sigma is applied separately to each column, unless margin is set to 1 (rows)
or, if you have a higher-dimensional array, a higher value.
Invertability
The sigma and delta functions are inverses of each other, meaning that with the right arguments set,
sigma(delta(x)) == x and delta(sigma(x)) == x.
In other words, the two functions "reverse" each other.
The key is that the init argument needs to be set to 0, and all other
arguments (lag, skip, groupby, etc.) need to match.
So actually, sigma(delta(x, init = 0, ...)) == x and delta(sigma(x), init = 0)) == x.
When we take the differences between values (delta(x)), the resulting differences can't tell us
fully how to reconstruct the original unless we know where to "start" (a constant offset).
For example,
delta(c(5, 7, 5, 6)) == c(NA, 2, -2, 1)
We know our input goes up 2, back down 2, then up 1, but the starting value (the first 5)
is lost.
If we call sigma on this, we'll get:
sigma(c(NA, 2, -2, 1)) == c(0, 2,0, 1)
We get the right contour, but we're offset by that constant 5.
If we call delta(x, init = 0) the necessary constant (the first value) is kept at the beginning of the vector
delta(c(5, 7, 5, 6), init = 0) == c(5, 2, -2, 1)
so sigma gets what we want, full invertability:
sigma(delta(c(5, 7, 5, 6), init = 0)) == c(5, 7, 5, 6)
Alternatively, we could specify the necessary constant as the init argument of sigma:
sigma(delta(c(5, 7, 5, 6)), init = 5) == c(5, 7, 5, 6)
so the init arguments of the two functions are complementary.
Currently, the right argument of delta has no complement in sigma, so invertability
only holds true if right = FALSE (the default).
Greater lags
The behavior of sigma when abs(lag) > 1 is easiest to understand as the inverse of the
behavior of delta(abs(lag) > 1), which is more intuitive. (sigma is the inverse of delta(), see the
Invertability section above).
Generally, if abs(lag) > 1, x is grouped by its indices modulo lag, and the cumulative sum is calculated separately
for each set of modulo indices.
For example, consider lag == 2 for the following input:
x | index | index modulo 2 |
| 1 | 1 | 1 |
| 3 | 2 | 0 |
| 2 | 3 | 1 |
| 2 | 4 | 0 |
| 5 | 2 | 1 |
The cumulative sum of the 1 and 0 modulo-index groups are:
Index
1:cumsum(c(1,2,5)) == c(1, 3, 8).Index
0:cumsum(c(3,2)) == c(3, 5)
Interleaved back into order, the result is c(1,3,3,5,8).
This may not be very clear, but sure enough delta(c(1, 3, 3, 5, 8), lag = 2, init = 0) returns the original
c(1,3,2,2,5) vector!
Again, understanding delta(..., lag = n) is easier than sigma(..., lag = n) (see the Invtertability section
below.)
Negative lag
If lag is negative, the output is the same as the equivalent positive lag, except
the sign is reversed (output * -1).
This behavior is easiest to understand as the inverse of the
behavior of delta(lag < 0), which is more intuitive. (sigma is the inverse of delta(), see the
Invertability section above).
Grouping
In many cases we want to perform lagged calculations in a vector, but not across certain boundaries.
For example, if your vector includes data from multiple pieces, we wouldn't want to calculate melodic intervals
between pieces, only within pieces.
The groupby argument indicates one, or more, grouping vectors, which break the x (input) argument
into groups.
If more than groupby vectors are given, a change in any vector indicates a boundary.
Value pairs which cross between groups are treated as if they were at the beginning.
Basically, using the groupby argument to a function should be
similar or identical to using tapply(x, groupby, laggedFunction, ...) or using a groupby
expession in a call to with(in).humdrumR.
However, using a groupby argument directly is usually much faster, as they have been
specially optimized for this functions.
The most common use case in humdrum data, is looking at "melodies" within spines.
For this, we want groupby = list(Piece, Spine, Path).
In fact, humdrumR with(in) calls will automatically feed these
three fields as groupby arguments to certain functions: mint, delta, sigma, lag, ditto, ioi, sumTies, hop, wort, or wort.character.
So any use of delta in a call to with(in), will automatically calculate the delta
in a "melodic" way, within each spine path of each piece.
However, if you wanted, for instance, to calculate differences across spines (like harmonic intervals)
you could manually set groupby = list(Piece, Record).
Order
When performing lagged calculations, we typically assume that the order of the values in the input vector
(x) is the order we want to "lag" across.
E.g., the first element is "before" the second element, which is "before" the third element, etc.
[Humdrum tables][humTable] are always ordered Piece > Piece > Spine > Path > Record > Stop.
Thus, any lagged calculations across fields of the humtable will be, by default, "melodic":
the next element is the next element in the spine path.
For example, consider this data:
The default order of these tokens (in the Token field) would be a b c d e f.
If we wanted to instead lag across our tokens harmonically (across records) we'd need to specifiy a different order
For example, we could say orderby = list(Pice, Record, Spine)---the lagged function
would interpret the Token field above as a d b e c f.
For another example, note Stop comes last in the order.
Let's consider what happens then if here are stops in our data:
**kern **kern
a d
b D e g
c A f a
*- *-
```
The default ordering here (`Piece > Spine > Record > Stop`) "sees" this in the order `a b D c A d e g f a`.
That may or may not be what you want!
If we wanted, we could reorder such that `Stop` takes precedence over `Record`: `orderby = list(Piece, Spine, Stop, Record)`.
The resulting order would be `a b c d e f D G g a`.
[humTable]: R:humTable