Pipelines

Uses nmr_pca_outliers_robust to perform the detection of outliers

Normalize the full spectra to the internal calibrant region, then exclude that region and finally perform PQN normalization.

Usage

pipe_load_samples(samples_dir, glob = "*0", output_dir = NULL)

pipe_add_metadata(nmr_dataset_rds, excel_file, output_dir)

pipe_interpolate_1D(nmr_dataset_rds, axis, output_dir)

pipe_exclude_regions(nmr_dataset_rds, exclude, output_dir)

pipe_outlier_detection(nmr_dataset_rds, output_dir)

pipe_filter_samples(nmr_dataset_rds, conditions, output_dir)

pipe_peakdet_align(
  nmr_dataset_rds,
  nDivRange_ppm = 0.1,
  scales = seq(1, 16, 2),
  baselineThresh = 0.01,
  SNR.Th = -1,
  maxShift_ppm = 0.0015,
  acceptLostPeak = FALSE,
  output_dir = NULL
)

pipe_peak_integration(
  nmr_dataset_rds,
  peak_det_align_dir,
  peak_width_ppm,
  output_dir
)

pipe_normalization(
  nmr_dataset_rds,
  internal_calibrant = NULL,
  output_dir = NULL
)

Arguments

samples_dir

The directory where the samples are

glob

A wildcard aka globbing pattern (e.g. *.csv) passed on to grep() to filter paths.

output_dir

The output directory for this pipe element

nmr_dataset_rds

The nmr_dataset.rds file name coming from previous nodes

excel_file

An excel file name. See details for the requirements

The excel file can have one or more sheets. The excel sheets need to be as simple as possible: One header column on the first row and values below.

Each of the sheets contain metadata that has to be integrated. The merge (technically a left join) is done using the first column of each sheet as key.

In practical terms this means that the first sheet of the excel file MUST start with an "NMRExperiment" column, and as many additional columns to add (e.g. FluidXBarcode, SampleCollectionDate, TimePoint and SubjectID).

The second sheet can have as the first column any of the already added columns, for instance the "SubjectID", and any additional columns (e.g. Gender, Age).

The first column on each sheet, named the key column, MUST have unique values. For instance, a sheet starting with "SubjectID" MUST specify each subject ID only once (without repetitions).

axis

The ppm axis range and optionally the ppm step. Set it to NULL for autodetection

exclude

A list with regions to be removed Typically: exclude = list(water = c(4.7, 5.0))

conditions

A character vector with conditions to filter metadata. The conditions parameter should be a character vector of valid R logical conditions. Some examples:

conditions <- 'Gender == "Female"'
conditions <- 'Cohort == "Chuv"'
conditions <- 'TimePoint %in% c("T0", "T31")'
conditions <- c(Cohort == "Chuv", 'TimePoint %in% c("T0", "T31")')

Only samples fullfilling all the given conditions are kept in further analysis.

nDivRange_ppm

Segment size, in ppms, to divide the spectra and search for peaks.

scales

The parameter of peakDetectionCWT function of MassSpecWavelet package, look it up in the original function.

baselineThresh

All peaks with intensities below the thresholds are excluded. Either:

A numeric vector of length the number of samples. Each number is a threshold for that sample
A single number. All samples use this number as baseline threshold.
NULL. If that's the case, a default function is used (nmr_baseline_threshold()), which assumes that there is no signal in the region 9.5-10 ppm.

SNR.Th

The parameter of peakDetectionCWT function of MassSpecWavelet package, look it up in the original function. If you set -1, the function will itself re-compute this value.

maxShift_ppm

The maximum shift allowed, in ppm

acceptLostPeak

This is an option for users, TRUE is the default value. If the users believe that all the peaks in the peak list are true positive, change it to FALSE.

peak_det_align_dir

Output directory from pipe_peakdet_align

peak_width_ppm

A peak width in ppm

internal_calibrant

A ppm range where the internal calibrant is, or NULL.

Value

This function saves the result to the output directory

Pipeline: Filter samples according to metadata conditions

Pipeline: Peak detection and Alignment

Pipeline: Peak integration

Pipe: Full spectra normalization

Details

If there is no internal calibrant, only the PQN normalization is done.

Examples

## Example of pipeline usage
## There are differet ways of load the dataset
dir_to_demo_dataset <- system.file("dataset-demo", package = "AlpsNMR")
# excel_file <- system.file("dataset-demo",
#                          "dummy_metadata.xlsx",
#                          package = "AlpsNMR")
# output_dir <- tempdir()

## Load samples with pipes
# pipe_load_samples(dir_to_demo_dataset,
#                  glob = "*.zip",
#                  output_dir = "../pipe_output")

## Another way to load it
# nmr_dataset <- nmr_read_samples_dir(dir_to_demo_dataset)

## Saving the dataset in a .rds file
# nmr_dataset_rds <- tempfile(fileext = ".rds")
# nmr_dataset_save(nmr_dataset, nmr_dataset_rds)

## Interpolation
# pipe_interpolate_1D(nmr_dataset_rds,
#                    axis = c(min = -0.5, max = 10, by = 2.3E-4),
#                    output_dir)

## Get the new path, based in output_dir
# nmr_dataset_rds <- paste(output_dir, "\", "nmr_dataset.rds", sep = "", collapse = NULL)

## Adding metadata to samples
# pipe_add_metadata(nmr_dataset_rds = nmr_dataset_rds, output_dir = output_dir,
#                  excel_file = excel_file)

## Filtering samples
# conditions <- 'SubjectID == "Ana"'
# pipe_filter_samples(nmr_dataset_rds, conditions, output_dir)

## Outlier detection
# pipe_outlier_detection(nmr_dataset_rds, output_dir)

## Exclude regions
# exclude_regions <- list(water = c(5.1, 4.5))
# pipe_exclude_regions(nmr_dataset_rds, exclude_regions, output_dir)

## peak aling
# pipe_peakdet_align(nmr_dataset_rds, output_dir = output_dir)

## peak integration
# pipe_peak_integration(nmr_dataset_rds,
#                      peak_det_align_dir = output_dir,
#                      peak_width_ppm = 0.006, output_dir)

## Normalization
# pipe_normalization(nmr_dataset_rds, output_dir = output_dir)

Usage

Arguments

Value

Details

See also

Examples