In the humdrumR package, the fundamental data structure is called a humdrum table. A humdrum table encodes all the information in a collection of one or more humdrum-syntax files as a single data.table (A data.table is an "enhanced" version of R's standard data.frame). Humdrum tables are stored "inside" every humdrumRclass object that you will work with, and various humdrumR functions allow you to study or manipulate the them. If you want to directly access the humdrum table within a humdrumRclass object, use the getHumtab() function.

The getHumtab() function extracts the humdrum table from a humdrumR object.

Use the fields() function to list the current fields in a humdrumRclass object.


getHumtab(humdrumR, dataTypes = "GLIMDd")

  fieldTypes = c("Data", "Structure", "Interpretation", "Formal", "Reference",
    "Grouping", "selected")

# S3 method for humdrumR



HumdrumR data.

Must be a humdrumR data object.


Which types of humdrum record(s) to include in the output.

Defaults to "GLIMDd".

Must be a character string, which specifies which types of data tokens/records to extract. Legal values are: "G" (global comments), "L" (local comments), "I" (interpretations), "M" (barlines), "D" (non-null data), or "d" (null data). Multiple types can be specified in a single string: e.g., "GLIMD". Note that "I" also grabs "E" (exclusive) and "S" (spine-control) tokens.


Which types of fields to list.

Shows all fields by default.

Must be a character vector. Legal options are "Data", "Structure", "Interpretation", "Formal", "Reference", and "Grouping". You can also pass "selected" to extract only the selected fields. Types can be partially matched---for example, "S" for "Structure".


In a humdrum table, by default, humdrum data is organized in a maximally "long" (or "tall") format, with each and every single "token" in the original data represented by a single row in the table. Even multiple-stops---tokens separated by spaces---are broken onto their own rows. Meanwhile, each column in the humdrum table represents a single piece of information associated with each token, which we call a field. Throughout this documentation, you should keep in mind that a "token" refers to a row in the humdrum table while a "field" refers to a column:

  • Token = row

  • Field = column


There are six types of fields in a humdrum table:

  1. Data fields

  2. Structure fields

  3. Interpretation fields

  4. Formal fields

  5. Reference fields

  6. Grouping fields

When first created by a call to readHumdrum(), every humdrum table has at least nineteen fields: one data field (Token), two interpretation fields (Tandem and Exclusive), three formal fields, and thirteen structure fields. Additional formal, interpretation, or reference fields may be present depending on the content of the humdrum file(s), and you can create additional data fields by using within.humdrumR(), mutate.humdrumR(), or other functions.

Data fields:

Data fields are used to describe individual data points in humdrum data (as opposed to groups of points). Every humdrum table starts with a data field called Token, which contains character strings representing the original strings read from the humdrum files. Users can create as many additional data fields as they like. Every call to withinHumdrum() generates new data fields.

Structure fields:

Every humdrum table has thirteen Structure fields, which describe where each data token was "located" in the original humdrum data: which file, which spine, which record, etc. See the vignette on humdrum syntax to fully understand the terms here.

  • File info:

    • Filename :: character

      • The unique name of the humdrum file. This may include an appended path if more than one file with the same name were read from different directories (see the readHumdrum() docs).

    • Filepath :: character

      • The full file name (always includes its full path).

    • Label :: character

      • A label specified during the call to readHumdrum(), associated with a particular readHumdrum "REpath-pattern." If no label was specified, patterns are just labeled "_n", where "n" is the number of the pattern.

    • File :: integer

      • A unique number associated with each file (ordered alphabetically, starting from 1).

    • Piece :: integer

      • A number specifying the number of the piece in the corpus. This is identical to the File field except when more than one piece were read from the same file.

  • Location info:

    • Spine :: integer

      • The spine, numbered (from left-to-right) starting from 1.

      • This field is NA wherever Global == TRUE.

    • Path :: integer

      • The "spine path." Any time a *^ spine path split occurs in the humdrum data, the right side of the split becomes a new "path." The original path is numbered 0 with additional paths numbered with integers to the right. (If there are no spine path splits, the Path field is all 0s.)

      • This field is always NA when Global == TRUE.

    • ParentPath :: integer

      • For spine paths (i.e., where Path > 0), which path was the parent from which this path split? Where Path == 0, parent path is also 0.

    • Record :: integer

      • The record (i.e., line) number in the original file.

    • DataRecord :: integer

      • The data record enumeration in the file, starting from 1.

    • Stop :: integer

      • Which token in a multi-stop token, numbered starting from 1.

      • In files with no multi-stops, the Stop field is all 1s.

      • This field is always NA when Global == TRUE.

    • Global :: logical

      • Did the token come from a global record (as opposed to a local record)?

      • When Global == TRUE, the Spine, Path, and Stop fields are always NA.

  • Token info:

    • Type :: character

      • What type of record is it?

        • "G" = global comment.

        • "L" = local comment

        • "I" = interpretation

        • "M" = measure/barline

        • "D" = non-null data

        • "d" = null data

        • "E" = exclusive interpretation

        • "S" = spine-control tokens (*^, *v, *-)

Interpretation fields:

Interpretation fields describe interpretation metadata in the humdrum file(s). Humdrum interpretations are tokens that "carry forward" to data points after them, unless cancelled out by a subsequent interpretation. (See the humdrum syntax vignette for a detailed explanation.) All humdrum data must have an exclusive interpretation so humdrum tables always have an Exclusive (:: character) field indicating the exclusive interpretation associated with each token/row of the Token field.

Humdrum data may, or may not, include additional tandem interpretations. A universal rule for parsing tandem interpretations is impossible, because A) tandem interpretations can "overwrite" each other and B) users can create their own tandem interpretations. The best we can do in all cases is identify all tandem interpretations that have appeared previously in the spine (counting most recent first). All these previous interpretations are encoded in a single character string in the Tandem field (see the tandem() docs for details). If working with non-standard interpretations, users can parse the Tandem field using the tandem() function. If no tandem interpretations occur in a file, the Tandem field is full of empty strings ("").

Fortunately, many tandem interpretations are widely used and standardized, and these interpretations are known by humdrumR. Recognized interpretations (such as *clefG4 and *k[b-]) are automatically parsed into their own fields by a call to readHumdrum(). See the readHumdrum() documentation for more details.

Formal fields:

Formal fields indicate musical sections, or time windows within a piece, including formal designations ("verse", "chorus", etc.) and measures/bars. Humdrum data may or may not include formal metadata fields, indicated by the token "*>". Classified formal marks are put into fields matching their name. Unclassified formal marks are placed in a field called Formal as a default. Nested formal categories are appended with an underscore and a number for each level of descent: Formal_1, Formal_2, ..., Formal_N. If part of a section is not given a name in a lower hierarchical level, the field is simply empty ("") at that point.

Humdrum data may, or may not, also include barlines (tokens beginning "="). However, humdrum tables always include three formal fields related to barlines:

  • Bar :: integer

    • How many barline records (single or double) have passed before this token?

    • If no "=" tokens occur in a file, Bar is all zeros.

    • Note that this field is independent of whether the barlines are labeled with numbers in the humdrum file!

  • DoubleBar :: integer

    • How many double-barline records have passed before this token?

    • If no "==" tokens occur in a file, DoubleBar is all zeros.

  • BarLabel :: character

    • Any characters that occur in a barline-token after an initial "=" or "==". These include the "-" in the common "implied barline" token "=-", repeat tokens (like "=:||"), and also any explicit bar numbers.

    • Note that the Bar field always enumerate every bar record, while measure-number labels in humdrum data (which appear in the BarLabel field) may do weird things like skipping numbers, repeating numbers, or having suffixes (e.g., "19a"). If no barline tokens appear in the file, BarLabel is all empty strings ("").

If no barline tokens are present in a file, Bar and DoubleBar will be nothing but 0s, and BarLabel will be all NA.

Reference fields:

Reference fields describe any Reference Records in the humdrum data. Every reference record (records beginning "!!!") in any humdrum file in a corpus read by readHumdrum is parsed into a field named by the reference code: "XXX" in "!!!XXX". Reference tokens are all identical throughout any humdrum piece. If a reference code appears in one file but not another, the field is NA in the file which does not have the code. If no reference records appear in any files read by readHumdrum(), no reference fields are created.

Examples of common reference records are "!!!COM:" (composer) and "!!!OTL:" (original title). Any humdrum data with these records will end up having COM and OTL fields in its humdrum table.

Grouping fields:

Grouping fields are special fields which may be created by calls to group_by(). These fields are deleted by calls to ungroup(). These fields are generally hidden/inaccessible to users.

Null data

In humdrum syntax, there is no requirement that every spine-path contains data in every record. Rather, spines are often padded with null tokens. In some cases, entire records may be padded with null tokens. Each type of humdrum record uses a different null token:

  • Intepretation: *

  • Comment: !

  • Barline: =

  • Data: .

Many humdrumR functions automatically ignore null data, unless you specifically tell them not to (usually, using dataTypes argument). Whenever different fields() are created or selected, humdrumR reevaluates what data locations it considers null. Note that humdrumR considers data locations to be "null" when

  • the selected fields are all character data and the token is a one of c(".", "!", "!!", "=", "*", "**"); or

  • the selected fields are all NA (including NA_character_).

When humdrumR reevaluates null data, the Type field is updated, setting data records to Type == "d" for null data and Type == "D" for non-null data. This is the main mechanism humdrumR functions use to ignore null data: most functions only look at data where Type == "D".

Whenever you print or export a [humdrumR objecthumdrumRclass, null data in the selected fields prints as "."---thus NA values print as .. Thus, if you are working with numeric data with NA values, these NA values will print as ".".


Breaking the complex syntax of humdrum data into the "flat" structure of a humdrum table, with every single token on one line of a data.table, makes humdrum data easier to analyze. Of course, thanks to the structure fields, we can easily regroup and reform the original humdrum data or use the structure of the data (like spines) in our analyses. However, in some cases, you might want to work with humdrum data in a different structure or "shape." humdrumR has several options for "collapsing" tokens within humdrum tables, "cleaving" different parts of the data into new fields, or otherwise reshaping humdrum data into basic R data structures you might prefer.

Querying Fields

The fields() function takes a humdrumR object and returns a data.table(), with each row describing an available field in the humdrum table. The output table has five columns:

  • Name

    • The field name.

  • Class

    • The class() of the data in the field.

  • Type

    • The type of field (described above). Can be "Data", "Structure", "Interpretation", "Formal", "Reference", or "Grouping".

  • Selected,

    • A logical indicating which fields are selected.

  • GroupedBy

    • A logical indicating which, if any, fields are currently grouping the data.

Using the names() function on a humdrumR object will get just the field names, the same as fields(humData)$Name.

See also

To actually extract fields from humdrumR data, see the pull() family of functions.


humData <- readHumdrum(humdrumRroot, "HumdrumData/BachChorales/chor00[1-4].krn")
#> Finding and reading files...
#> 	REpath-pattern '/home/nat/.tmp/Rtmpz26RCR/temp_libpatha780af15d1/humdrumR/HumdrumData/BachChorales/chor00[1-4].krn' matches 4 text files in 1 directory.
#> Four files read from disk.
#> Validating four files...
#> all valid.
#> Parsing four files...
#> Assembling corpus...
#> Done!

#>                Name     Class           Type Selected GroupedBy Complement
#>  1:           Token character           Data        1     FALSE      FALSE
#>  2:             Bar   integer         Formal        0     FALSE      FALSE
#>  3:        BarLabel character         Formal        0     FALSE      FALSE
#>  4:       DoubleBar   integer         Formal        0     FALSE      FALSE
#>  5:          Formal character         Formal        0     FALSE      FALSE
#>  6:             BPM character Interpretation        0     FALSE      FALSE
#>  7:            Clef character Interpretation        0     FALSE      FALSE
#>  8:       Exclusive character Interpretation        0     FALSE      FALSE
#>  9:      Instrument character Interpretation        0     FALSE      FALSE
#> 10: InstrumentClass character Interpretation        0     FALSE      FALSE
#> 11:             Key character Interpretation        0     FALSE      FALSE
#> 12:    KeySignature character Interpretation        0     FALSE      FALSE
#> 13:     Mensuration character Interpretation        0     FALSE      FALSE
#> 14:          Tandem character Interpretation        0     FALSE      FALSE
#> 15:   TimeSignature character Interpretation        0     FALSE      FALSE
#> 16:             AGN character      Reference        0     FALSE      FALSE
#> 17:             CDT character      Reference        0     FALSE      FALSE
#> 18:             COM character      Reference        0     FALSE      FALSE
#> 19:             EED character      Reference        0     FALSE      FALSE
#> 20:             EEV character      Reference        0     FALSE      FALSE
#> 21:             OPR character      Reference        0     FALSE      FALSE
#> 22:         OTL@@DE character      Reference        0     FALSE      FALSE
#> 23:          OTL@EN character      Reference        0     FALSE      FALSE
#> 24:             PC# character      Reference        0     FALSE      FALSE
#> 25:             SCT character      Reference        0     FALSE      FALSE
#> 26:             SMS character      Reference        0     FALSE      FALSE
#> 27:             YOR character      Reference        0     FALSE      FALSE
#> 28:         hum2abc character      Reference        0     FALSE      FALSE
#> 29:           title character      Reference        0     FALSE      FALSE
#> 30:      DataRecord   integer      Structure        0     FALSE      FALSE
#> 31:            File   integer      Structure        0     FALSE      FALSE
#> 32:        Filename character      Structure        0     FALSE      FALSE
#> 33:        Filepath character      Structure        0     FALSE      FALSE
#> 34:          Global   logical      Structure        0     FALSE      FALSE
#> 35:           Label character      Structure        0     FALSE      FALSE
#> 36:      ParentPath   integer      Structure        0     FALSE      FALSE
#> 37:            Path   integer      Structure        0     FALSE      FALSE
#> 38:           Piece   integer      Structure        0     FALSE      FALSE
#> 39:          Record   integer      Structure        0     FALSE      FALSE
#> 40:           Spine   integer      Structure        0     FALSE      FALSE
#> 41:            Stop   integer      Structure        0     FALSE      FALSE
#> 42:            Type character      Structure        0     FALSE      FALSE
#>                Name     Class           Type Selected GroupedBy Complement

