GCIMSDataset.RdGCIMSDataset is an R6 class to store a dataset.
When the dataset is created, the on_ram option controls whether the actual
data is stored not in memory or it is read/saved from/to files as
needed, so the dataset object scales with large number of samples.
GCIMSDataset$new_from_list(): Create a new GCIMSDataset from a list of samples
GCIMSDataset$new_from_saved_dir(): Create a new on disk GCIMSDataset from a directory
new_from_list()Create a new GCIMSDataset object from a list of samples. Note that with this
constructor on_ram is TRUE by default
pDataA data frame with at least the SampleID and filename columns.
alignTo store alignment results
peaksTo store the peak list
TISA matrix of n_samples vs drift time, with the Total Ion Spectrum of each sample
RICA matrix of n_samples vs retention time, with the Reverse Ion Chromatogram of each sample
dt_refA numeric drift time of reference
rt_refA numeric retention time of reference
userDataA list to store arbitrary data in the dataset
sampleNamesThe sample names of the GCIMSDataset samples
new()Create a new GCIMSDataset object
GCIMSDataset$new(
pData = NULL,
base_dir = NULL,
...,
samples = NULL,
parser = "default",
scratch_dir = NULL,
keep_intermediate = FALSE,
on_ram = FALSE
)pDataA data frame holding phenotype data for the samples (or NULL). The data frame
should at least have a SampleID column, and a filename column if samples are stored in files.
base_dirThe base directory. Sample i is found on file.path(base_dir, pData$filename[i]).
...Unused
samplesA named list of GCIMSSample objects to be included in the dataset (or NULL). Names
should correspond to the SampleID column in the pData data frame.
parserFunction that takes a file path and returns a GCIMSSample object. Use "default" to use the
default parser in the GCIMS package, that supports .mea files (from GAS). Check
out vignette("importing-custom-data-formats", package = "GCIMS") for more information
scratch_dirA directory where intermediate and processed samples will be stored
keep_intermediateIf TRUE, intermediate results will not be deleted (ignored if on_ram is TRUE).
on_ramIf TRUE, samples are not stored on disk, but rather kept on RAM. Set it to TRUE only with
small datasets.
dummy_dataset <- GCIMSDataset$new(
pData = data.frame(SampleID = character(), filename = character(0)),
base_dir = tempdir()
)subset()Create a new dataset containing a subset of the samples
samplesA numeric vector (sample indices), a character vector (sample names)
or a logical vector of the length equal to the number of samples in the
dataset (TRUE elements will be subset)
inplaceif TRUE subset happens in-place, otherwise subset will return a copy.
new_scratch_dirA new scratch directory, only used if inplace=FALSE and the dataset is on-disk.
.impl__subset__()Do not call this method. It does an inplace subset. Use
obj$subset(samples, inplace = TRUE) instead
appendDelayedOp()Appends a delayed operation to the dataset so it will run afterwards
operationA DelayedOperation object
realize()Execute all pending operations on the dataset
getSample()Get a sample from a GCIMSDataset
getRIC()Get the Reverse Ion Chromatogram
copy()Creates a copy of the dataset. If the dataset is stored on disk, then
a new scratch_dir must be used.
updateScratchDir()For on-disk datasets, copy all samples to a new scratch dir.
This is useful when creating copies of the dataset, using the dataset$copy() method.
getCurrentDir()Get the directory where processed samples are being saved, on on-disk datasets.
## ------------------------------------------------
## Method `GCIMSDataset$new`
## ------------------------------------------------
dummy_dataset <- GCIMSDataset$new(
pData = data.frame(SampleID = character(), filename = character(0)),
base_dir = tempdir()
)