DelayedOperation.Rd
Delayed operations enables us to process our samples faster on big datasets. See the details section for details on how they work.
A named for de delayed operation, only used for printing.
A function that takes a sample and returns a modified sample
A named list with additional arguments to be passed to function
A named list with additional arguments to be passed to
function. Compared to params
, each argument must be a named list of length the number of samples, so
each sample will receive its corresponding parameter according to its name
A function that takes a modified sample and returns an extracted object.
A function that takes a dataset and a list of extracted objects and returns a modified dataset.
A DelayedOperation object
Let's say we have a pipeline with two actions (e.g. smooth() and detectPeaks()). and we want to apply it to a dataset with two samples (e.g s1, s2).
This is a simple pseudocode to execute all actions in all samples. The code is written so you can get an idea of how :
dataset = list(s1, s2)
actions = list(smooth, detectPeaks)
for (action in actions) {
for (i in seq_along(dataset)) {
dataset[[i]] <- action(dataset[[i]])
}
}
When the dataset is big, samples are stored in disk, and loaded/saved when used:
dataset = list(s1, s2)
actions = list(smooth, detectPeaks)
for (action in actions) {
for (i in seq_along(dataset)) {
sample <- read_from_disk(i)
sample <- action(sample)
save_to_disk(sample)
}
}
So actually, we can avoid "saving and loading" by changing the loop order:
dataset = list(s1, s2)
actions = list(smooth, detectPeaks)
for (i in seq_along(dataset)) {
sample <- read_from_disk(i)
for (action in actions) {
sample <- action(sample)
}
save_to_disk(sample)
}
This requires that when we apply an operation to the dataset, the operation is delayed, so we can stack many delayed operations and run them all at once.
The DelayedOperation class allows us to store all pending actions and run them afterwards when the data is needed.
Besides, samples can be processed in parallel if enough cores and RAM are available.
The DelayedOperation class also considers that sometimes we want to extract
some information from each sample (e.g. the Reverse Ion Chromatogram)
and build some matrix with the Reverse Ion Chromatograms of all samples. It changes
the loops above, so after each action modifies each sample, we can extract something
out of the sample and save it. After all actions have executed, we can aggregate
the results we have extracted and save them into the dataset. This is used for instance
in the getRIC()
implementation, to extract the RIC from each sample and afterwards
aggregate it into a matrix. This is implemented here with the fun_extract
and
fun_aggregate
functions.