Many users may wish to work with humdrum data,
without having to rely on humdrumR
's with(in).humdrumR functionality.
Rather, you'd like to just get "normal" R
objects out of your humdrum data.
humdrumR
defines a number of functions/methods for "coercing" humdrum data into
basic R
data types.
Usage
# S4 method for humdrumR
as.vector(x, mode = "any")
as.lines(
humdrumR,
dataTypes = "GLIMDd",
padPaths = "dont",
padder = "",
sep = "\t"
)
# S3 method for humdrumR
as.matrix(x, dataTypes = "GLIMDd", padPaths = "corpus", padder = NA)
# S4 method for humdrumR
as.data.frame(x, dataTypes = "Dd", padPaths = "corpus", padder = NA)
# S4 method for humdrumR
as.data.frame(x, dataTypes = "Dd", padPaths = "corpus", padder = NA)
as.matrices(humdrumR, dataTypes = "LIMDd", padPaths = "piece", padder = NA)
as.data.frames(humdrumR, dataTypes = "LIMDd", padPaths = "piece", padder = NA)
Arguments
- mode
The desired output class.
Defaults to
"any"
.Must be a single
character
string naming an atomic vector type to coerce the output to (i.e.,logical
ornumeric
).If set to
"any"
, the output type is simply whatever the type of the selected field is.- humdrumR
HumdrumR data.
Must be a humdrumR data object.
- dataTypes
Which types of humdrum record(s) to include.
Defaults to
"GLIMDd"
foras.lines()
andas.matrix()
;"Dd"
foras.data.frame()
;"LIMDd"
foras.matrices()
andas.data.frames()
.Must be a single
character
string. Legal values are'G', 'L', 'I', 'M', 'D', 'd'
or any combination of these (e.g.,"LIM"
). (See the humdrum table documentation for explanation.)- padPaths
Determines how spine-paths are aligned in the output.
Defaults to
"dont"
foras.lines()
;"corpus"
foras.matrix()
andas.data.frame()
;"piece"
foras.matrices()
andas.data.frames()
Must be a single
character
string,"corpus"
,"piece"
, or"dont"
.See the details for an explanation.
- padder
Used to fill in differences in the number of columns between files and/or spine paths.
Defaults to
NA
.Must be a single
atomic
value.- sep
Separator to place between columns in collapsed lines.
Defaults to
"\t"
(tab).Must be a single
character
string.
Details
Generally, coercion works by evaluating a humdrumR object's the
selected fields and forcing the result to be an atomic vector.
When multiple field are selected, they are pasted together, separated by ", "
.
If a field is not atomic (like a list
, or lm
object), a concise representation of the
list or object class is printed.
The as.vector(humdrumR) has the additional
option of coercing the resulting vector to a particular type using the mode
argument.
The as.matrix(humdrumR) method take things a step further by putting the evaluated
fields into a two-dimensional matrix, with rows representing records and columns indicating
spine paths (see Padding section below).
as.data.frame(humdrumR) first calls as.matrix
then converts the matrix to a
data.frame
.
Note that as.matrix(humdrumR)
places the entire corpus object into one matrix, even if there are multiple pieces.
In contrast, the plural as.matrices
and as.data.frames
call their respective singular versions
separately on each individual file in a humdrumR corpus and return them all in a list.
The row names of the matrix
/data.frame
(s) consist of two integer values,
separated by a .
, representing: Piece.Record
.
The as.lines
function converts a humdrumR object into a character
vector of text lines,
with columns separated by the sep
argument (defaults to "\t"
), just as you'd see in a humdrum-syntax file.
Each line is a single row from a as.matrix.humdrumR
, with padded values at the right side removed.
The matrix's Piece.Record
row names are preserved as the lines' names.
Note that multiple-stop token (where Stop > 1L
) cannot by incorporated into the two
dimensional matrix
/data.frame
. Thus, as.matrix(humdrumR)
calls
collapseStops(collapseAtomic = TRUE, sep = " ")
on the humdrumR object before creating a matrix.
Padding
Different pieces in a single humdrumR object
often differ in the number of spines and/or spine paths they contain.
To squish them into a two dimensional object (matrix
or data.frame
) they must necessarily be padded to the same number of columns.
(Global comments---which actually have NA
spines---are also padded, placing the record in column 1.)
The pad
argument is a single atomic value which is used to pad the matrix.
Another consideration is the behavior of spine paths. In the humdrum syntax, a spine path in a leftward spine "bumps" data in higher spines into new columns, as in this example:
At the beginning and end of the file, the second column holds data for the second spine.
However, in the middle of the file, the second column holds data from the second spine path of the first spine.
To make the spine structure clearer, as.matrix(humdrumR)
has the option to pad spine paths.
For example, using "_"
as our pad
argument:
This aspect of the matrix padding behavior can be controlled with the padPaths
argument, with three possible values/behaviors:
"corpus"
: Paths are padded such that spine-paths across all pieces in the corpus all align in the same columns. If even one file has a spine path, all the other files are padded so their spines stay aligned. This is the default behavior foras.matrix(humdrumR)
."piece"
: Paths are padded, but only within each piece. The spines/paths between different pieces may not align."dont"
: Paths are not padded at all.