GCIMSDataset

GCIMSDataset is an R6 class to store a dataset.

When the dataset is created, the on_ram option controls whether the actual data is stored not in memory or it is read/saved from/to files as needed, so the dataset object scales with large number of samples.

Constructors:

GCIMSDataset$new()
GCIMSDataset$new_from_list(): Create a new GCIMSDataset from a list of samples
GCIMSDataset$new_from_saved_dir(): Create a new on disk GCIMSDataset from a directory

Constructor `new_from_list()`

Create a new GCIMSDataset object from a list of samples. Note that with this constructor on_ram is TRUE by default

Usage

GCIMSDataset$new_from_list(
  samples,
  pData=NULL,
  scratch_dir = NULL,
  keep_intermediate = FALSE,
  on_ram = TRUE
)

Arguments

See GCIMSDataset$new()

Constructor `new_from_saved_dir()`

Creates a new GCIMSDataset object from a directory where a GCIMSDataset with on_ram=FALSE was saved.

Usage

GCIMSDataset$new_from_saved_dir(
  input_dir,
  scratch_dir = dirname(input_dir)
)

Arguments

input_dir: The path to the directory where the dataset.rds is saved and all the corresponding sample_*.rds files are. Typically a subdirectory of scratch_dir.
scratch_dir: The new scratch directory where further processing samples will be saved. By default it is the parent of input_dir.

Public fields

pData: A data frame with at least the SampleID and filename columns.
align: To store alignment results
peaks: To store the peak list
TIS: A matrix of n_samples vs drift time, with the Total Ion Spectrum of each sample
RIC: A matrix of n_samples vs retention time, with the Reverse Ion Chromatogram of each sample
dt_ref: A numeric drift time of reference
rt_ref: A numeric retention time of reference
userData: A list to store arbitrary data in the dataset

Active bindings

sampleNames: The sample names of the GCIMSDataset samples

Methods

Public methods

GCIMSDataset$new()
GCIMSDataset$print()
GCIMSDataset$subset()
GCIMSDataset$.impl__subset__()
GCIMSDataset$appendDelayedOp()
GCIMSDataset$hasDelayedOps()
GCIMSDataset$realize()
GCIMSDataset$getSample()
GCIMSDataset$extract_dtime_rtime()
GCIMSDataset$getRIC()
GCIMSDataset$extract_RIC_and_TIS()
GCIMSDataset$is_on_disk()
GCIMSDataset$copy()
GCIMSDataset$updateScratchDir()
GCIMSDataset$getCurrentDir()
GCIMSDataset$clone()

Method `new()`

Create a new GCIMSDataset object

Usage

GCIMSDataset$new(
  pData = NULL,
  base_dir = NULL,
  ...,
  samples = NULL,
  parser = "default",
  scratch_dir = NULL,
  keep_intermediate = FALSE,
  on_ram = FALSE
)

Arguments

pData: A data frame holding phenotype data for the samples (or NULL). The data frame should at least have a SampleID column, and a filename column if samples are stored in files.
base_dir: The base directory. Sample i is found on file.path(base_dir, pData$filename[i]).
...: Unused
samples: A named list of GCIMSSample objects to be included in the dataset (or NULL). Names should correspond to the SampleID column in the pData data frame.
parser: Function that takes a file path and returns a GCIMSSample object. Use "default" to use the default parser in the GCIMS package, that supports .mea files (from GAS). Check out vignette("importing-custom-data-formats", package = "GCIMS") for more information
scratch_dir: A directory where intermediate and processed samples will be stored
keep_intermediate: If TRUE, intermediate results will not be deleted (ignored if on_ram is TRUE).
on_ram: If TRUE, samples are not stored on disk, but rather kept on RAM. Set it to TRUE only with small datasets.

Examples

dummy_dataset <- GCIMSDataset$new(
  pData = data.frame(SampleID = character(), filename = character(0)),
  base_dir = tempdir()
)

Method `print()`

prints the dataset to the screen

Usage

GCIMSDataset$print()

Method `subset()`

Create a new dataset containing a subset of the samples

Usage

GCIMSDataset$subset(samples, inplace = FALSE, new_scratch_dir = NA)

Arguments

samples: A numeric vector (sample indices), a character vector (sample names) or a logical vector of the length equal to the number of samples in the dataset (TRUE elements will be subset)
inplace: if TRUE subset happens in-place, otherwise subset will return a copy.
new_scratch_dir: A new scratch directory, only used if inplace=FALSE and the dataset is on-disk.

Returns

A GCIMSDataset (new or the current one depending on inplace), with the requested sample subset

Method `.implsubset()`

Do not call this method. It does an inplace subset. Use obj$subset(samples, inplace = TRUE) instead

Usage

GCIMSDataset$.impl__subset__(samples)

Arguments

samples: A numeric vector (sample indices), a character vector (sample names) or a logical vector of the length equal to the number of samples in the dataset (TRUE elements will be subset)

Returns

The given GCIMSDataset object, with a subset of the samples

Method `appendDelayedOp()`

Appends a delayed operation to the dataset so it will run afterwards

Usage

GCIMSDataset$appendDelayedOp(operation)

Arguments

operation: A DelayedOperation object

Returns

The modified GCIMSDataset object

Method `hasDelayedOps()`

Find out if the dataset has pending operations

Usage

GCIMSDataset$hasDelayedOps()

Returns

Returns TRUE if the dataset has pending operations, FALSE otherwise

Method `realize()`

Execute all pending operations on the dataset

Usage

GCIMSDataset$realize(keep_intermediate = NA)

Arguments

keep_intermediate: logical or NA. Only when the analysis is on disk, keep intermediate result files. If NA, the keep_intermediate option given at the dataset initialization takes precedence.

Returns

The dataset object, invisibly

Method `getSample()`

Get a sample from a GCIMSDataset

Usage

GCIMSDataset$getSample(sample)

Arguments

sample: Either an integer (sample index) or a string (sample name)

Returns

The GCIMSSample object

Method `extract_dtime_rtime()`

Sets an action to extract the reference retention and drift times

Usage

GCIMSDataset$extract_dtime_rtime()

Method `getRIC()`

Get the Reverse Ion Chromatogram

Usage

GCIMSDataset$getRIC()

Returns

A matrix with the reverse ion chromatograms for all samples

Method `extract_RIC_and_TIS()`

Extracts the RIC and the TIS

Usage

GCIMSDataset$extract_RIC_and_TIS()

Returns

The GCIMSDataset

Method `is_on_disk()`

Whether the dataset is saved on disk or stored in RAM

Usage

GCIMSDataset$is_on_disk()

Returns

TRUE if on disk, FALSE otherwise

Method `copy()`

Creates a copy of the dataset. If the dataset is stored on disk, then a new scratch_dir must be used.

Usage

GCIMSDataset$copy(scratch_dir = NA)

Arguments

scratch_dir: The scratch directory where samples being processed will be stored, if the copy is on disk.

Returns

A new GCIMSDataset object

Method `updateScratchDir()`

For on-disk datasets, copy all samples to a new scratch dir. This is useful when creating copies of the dataset, using the dataset$copy() method.

Usage

GCIMSDataset$updateScratchDir(scratch_dir, override_current_dir = NULL)

Arguments

scratch_dir: The new scratch_dir, must be different from the current one
override_current_dir: Typically used only internally, overrides the location of the samples. Useful when we are loading a dataset from a directory and the directory was moved since it was saved.

Method `getCurrentDir()`

Get the directory where processed samples are being saved, on on-disk datasets.

Usage

GCIMSDataset$getCurrentDir()

Returns

Either a path or NULL. NULL is returned if samples have not been saved (either because have not been loaded or because the dataset is stored on RAM)

Method `clone()`

The objects of this class are cloneable with this method.

Usage

GCIMSDataset$clone(deep = FALSE)

Arguments

deep: Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `GCIMSDataset$new`
## ------------------------------------------------

dummy_dataset <- GCIMSDataset$new(
  pData = data.frame(SampleID = character(), filename = character(0)),
  base_dir = tempdir()
)

Constructors:

Constructor new_from_list()

Usage

Arguments

Constructor new_from_saved_dir()

Usage

Arguments

Public fields

Active bindings

Methods

Public methods

Method new()

Usage

Arguments

Examples

Method print()

Usage

Method subset()

Usage

Arguments

Returns

Method .impl__subset__()

Usage

Arguments

Returns

Method appendDelayedOp()

Usage

Arguments

Returns

Method hasDelayedOps()

Usage

Returns

Method realize()

Usage

Arguments

Returns

Method getSample()

Usage

Arguments

Returns

Method extract_dtime_rtime()

Usage

Method getRIC()

Usage

Returns

Method extract_RIC_and_TIS()

Usage

Returns

Method is_on_disk()

Usage

Returns

Method copy()

Usage

Arguments

Returns

Method updateScratchDir()

Usage

Arguments

Method getCurrentDir()

Usage

Returns

Method clone()

Usage

Arguments

Examples

Constructor `new_from_list()`

Constructor `new_from_saved_dir()`

Method `new()`

Method `print()`

Method `subset()`

Method `.implsubset()`

Method `appendDelayedOp()`

Method `hasDelayedOps()`

Method `realize()`

Method `getSample()`

Method `extract_dtime_rtime()`

Method `getRIC()`

Method `extract_RIC_and_TIS()`

Method `is_on_disk()`

Method `copy()`

Method `updateScratchDir()`

Method `getCurrentDir()`

Method `clone()`