GCIMSDataset.Rd
GCIMSDataset is an R6 class to store a dataset.
When the dataset is created, the on_ram
option controls whether the actual
data is stored not in memory or it is read/saved from/to files as
needed, so the dataset object scales with large number of samples.
GCIMSDataset$new_from_list()
: Create a new GCIMSDataset from a list of samples
GCIMSDataset$new_from_saved_dir()
: Create a new on disk GCIMSDataset from a directory
new_from_list()
Create a new GCIMSDataset object from a list of samples. Note that with this
constructor on_ram
is TRUE
by default
pData
A data frame with at least the SampleID and filename columns.
align
To store alignment results
peaks
To store the peak list
TIS
A matrix of n_samples vs drift time, with the Total Ion Spectrum of each sample
RIC
A matrix of n_samples vs retention time, with the Reverse Ion Chromatogram of each sample
dt_ref
A numeric drift time of reference
rt_ref
A numeric retention time of reference
userData
A list to store arbitrary data in the dataset
sampleNames
The sample names of the GCIMSDataset samples
new()
Create a new GCIMSDataset object
GCIMSDataset$new(
pData = NULL,
base_dir = NULL,
...,
samples = NULL,
parser = "default",
scratch_dir = NULL,
keep_intermediate = FALSE,
on_ram = FALSE
)
pData
A data frame holding phenotype data for the samples (or NULL
). The data frame
should at least have a SampleID
column, and a filename
column if samples are stored in files.
base_dir
The base directory. Sample i
is found on file.path(base_dir, pData$filename[i])
.
...
Unused
samples
A named list of GCIMSSample
objects to be included in the dataset (or NULL
). Names
should correspond to the SampleID
column in the pData
data frame.
parser
Function that takes a file path and returns a GCIMSSample object. Use "default"
to use the
default parser in the GCIMS package, that supports .mea
files (from GAS). Check
out vignette("importing-custom-data-formats", package = "GCIMS")
for more information
scratch_dir
A directory where intermediate and processed samples will be stored
keep_intermediate
If TRUE
, intermediate results will not be deleted (ignored if on_ram
is TRUE
).
on_ram
If TRUE
, samples are not stored on disk, but rather kept on RAM. Set it to TRUE
only with
small datasets.
dummy_dataset <- GCIMSDataset$new(
pData = data.frame(SampleID = character(), filename = character(0)),
base_dir = tempdir()
)
subset()
Create a new dataset containing a subset of the samples
samples
A numeric vector (sample indices), a character vector (sample names)
or a logical vector of the length equal to the number of samples in the
dataset (TRUE
elements will be subset)
inplace
if TRUE
subset happens in-place, otherwise subset will return a copy.
new_scratch_dir
A new scratch directory, only used if inplace=FALSE
and the dataset is on-disk.
.impl__subset__()
Do not call this method. It does an inplace subset. Use
obj$subset(samples, inplace = TRUE)
instead
appendDelayedOp()
Appends a delayed operation to the dataset so it will run afterwards
operation
A DelayedOperation object
realize()
Execute all pending operations on the dataset
getSample()
Get a sample from a GCIMSDataset
getRIC()
Get the Reverse Ion Chromatogram
copy()
Creates a copy of the dataset. If the dataset is stored on disk, then
a new scratch_dir
must be used.
updateScratchDir()
For on-disk datasets, copy all samples to a new scratch dir.
This is useful when creating copies of the dataset, using the dataset$copy()
method.
getCurrentDir()
Get the directory where processed samples are being saved, on on-disk datasets.
## ------------------------------------------------
## Method `GCIMSDataset$new`
## ------------------------------------------------
dummy_dataset <- GCIMSDataset$new(
pData = data.frame(SampleID = character(), filename = character(0)),
base_dir = tempdir()
)