GCIMSDataset is an R6 class to store a dataset.

When the dataset is created, the on_ram option controls whether the actual data is stored not in memory or it is read/saved from/to files as needed, so the dataset object scales with large number of samples.

Constructors:


Constructor new_from_list()

Create a new GCIMSDataset object from a list of samples. Note that with this constructor on_ram is TRUE by default

Usage

GCIMSDataset$new_from_list(
  samples,
  pData=NULL,
  scratch_dir = NULL,
  keep_intermediate = FALSE,
  on_ram = TRUE
)

Arguments

See GCIMSDataset$new()

Constructor new_from_saved_dir()

Creates a new GCIMSDataset object from a directory where a GCIMSDataset with on_ram=FALSE was saved.

Usage

GCIMSDataset$new_from_saved_dir(
  input_dir,
  scratch_dir = dirname(input_dir)
)

Arguments

  • input_dir: The path to the directory where the dataset.rds is saved and all the corresponding sample_*.rds files are. Typically a subdirectory of scratch_dir.

  • scratch_dir: The new scratch directory where further processing samples will be saved. By default it is the parent of input_dir.

Public fields

pData

A data frame with at least the SampleID and filename columns.

align

To store alignment results

peaks

To store the peak list

TIS

A matrix of n_samples vs drift time, with the Total Ion Spectrum of each sample

RIC

A matrix of n_samples vs retention time, with the Reverse Ion Chromatogram of each sample

dt_ref

A numeric drift time of reference

rt_ref

A numeric retention time of reference

userData

A list to store arbitrary data in the dataset

Active bindings

sampleNames

The sample names of the GCIMSDataset samples

Methods


Method new()

Create a new GCIMSDataset object

Usage

GCIMSDataset$new(
  pData = NULL,
  base_dir = NULL,
  ...,
  samples = NULL,
  parser = "default",
  scratch_dir = NULL,
  keep_intermediate = FALSE,
  on_ram = FALSE
)

Arguments

pData

A data frame holding phenotype data for the samples (or NULL). The data frame should at least have a SampleID column, and a filename column if samples are stored in files.

base_dir

The base directory. Sample i is found on file.path(base_dir, pData$filename[i]).

...

Unused

samples

A named list of GCIMSSample objects to be included in the dataset (or NULL). Names should correspond to the SampleID column in the pData data frame.

parser

Function that takes a file path and returns a GCIMSSample object. Use "default" to use the default parser in the GCIMS package, that supports .mea files (from GAS). Check out vignette("importing-custom-data-formats", package = "GCIMS") for more information

scratch_dir

A directory where intermediate and processed samples will be stored

keep_intermediate

If TRUE, intermediate results will not be deleted (ignored if on_ram is TRUE).

on_ram

If TRUE, samples are not stored on disk, but rather kept on RAM. Set it to TRUE only with small datasets.

Examples

dummy_dataset <- GCIMSDataset$new(
  pData = data.frame(SampleID = character(), filename = character(0)),
  base_dir = tempdir()
)


Method print()

prints the dataset to the screen

Usage

GCIMSDataset$print()


Method subset()

Create a new dataset containing a subset of the samples

Usage

GCIMSDataset$subset(samples, inplace = FALSE, new_scratch_dir = NA)

Arguments

samples

A numeric vector (sample indices), a character vector (sample names) or a logical vector of the length equal to the number of samples in the dataset (TRUE elements will be subset)

inplace

if TRUE subset happens in-place, otherwise subset will return a copy.

new_scratch_dir

A new scratch directory, only used if inplace=FALSE and the dataset is on-disk.

Returns

A GCIMSDataset (new or the current one depending on inplace), with the requested sample subset


Method .impl__subset__()

Do not call this method. It does an inplace subset. Use obj$subset(samples, inplace = TRUE) instead

Usage

GCIMSDataset$.impl__subset__(samples)

Arguments

samples

A numeric vector (sample indices), a character vector (sample names) or a logical vector of the length equal to the number of samples in the dataset (TRUE elements will be subset)

Returns

The given GCIMSDataset object, with a subset of the samples


Method appendDelayedOp()

Appends a delayed operation to the dataset so it will run afterwards

Usage

GCIMSDataset$appendDelayedOp(operation)

Arguments

operation

A DelayedOperation object

Returns

The modified GCIMSDataset object


Method hasDelayedOps()

Find out if the dataset has pending operations

Usage

GCIMSDataset$hasDelayedOps()

Returns

Returns TRUE if the dataset has pending operations, FALSE otherwise


Method realize()

Execute all pending operations on the dataset

Usage

GCIMSDataset$realize(keep_intermediate = NA)

Arguments

keep_intermediate

logical or NA. Only when the analysis is on disk, keep intermediate result files. If NA, the keep_intermediate option given at the dataset initialization takes precedence.

Returns

The dataset object, invisibly


Method getSample()

Get a sample from a GCIMSDataset

Usage

GCIMSDataset$getSample(sample)

Arguments

sample

Either an integer (sample index) or a string (sample name)

Returns

The GCIMSSample object


Method extract_dtime_rtime()

Sets an action to extract the reference retention and drift times

Usage

GCIMSDataset$extract_dtime_rtime()


Method getRIC()

Get the Reverse Ion Chromatogram

Usage

GCIMSDataset$getRIC()

Returns

A matrix with the reverse ion chromatograms for all samples


Method extract_RIC_and_TIS()

Extracts the RIC and the TIS

Usage

GCIMSDataset$extract_RIC_and_TIS()

Returns

The GCIMSDataset


Method is_on_disk()

Whether the dataset is saved on disk or stored in RAM

Usage

GCIMSDataset$is_on_disk()

Returns

TRUE if on disk, FALSE otherwise


Method copy()

Creates a copy of the dataset. If the dataset is stored on disk, then a new scratch_dir must be used.

Usage

GCIMSDataset$copy(scratch_dir = NA)

Arguments

scratch_dir

The scratch directory where samples being processed will be stored, if the copy is on disk.

Returns

A new GCIMSDataset object


Method updateScratchDir()

For on-disk datasets, copy all samples to a new scratch dir. This is useful when creating copies of the dataset, using the dataset$copy() method.

Usage

GCIMSDataset$updateScratchDir(scratch_dir, override_current_dir = NULL)

Arguments

scratch_dir

The new scratch_dir, must be different from the current one

override_current_dir

Typically used only internally, overrides the location of the samples. Useful when we are loading a dataset from a directory and the directory was moved since it was saved.


Method getCurrentDir()

Get the directory where processed samples are being saved, on on-disk datasets.

Usage

GCIMSDataset$getCurrentDir()

Returns

Either a path or NULL. NULL is returned if samples have not been saved (either because have not been loaded or because the dataset is stored on RAM)


Method clone()

The objects of this class are cloneable with this method.

Usage

GCIMSDataset$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.

Examples


## ------------------------------------------------
## Method `GCIMSDataset$new`
## ------------------------------------------------

dummy_dataset <- GCIMSDataset$new(
  pData = data.frame(SampleID = character(), filename = character(0)),
  base_dir = tempdir()
)