Peak grouping function, exposing several options useful for benchmarking.

clusterPeaks(
  peaks,
  ...,
  distance_method = "euclidean",
  dt_cluster_spread_ms = 0.1,
  rt_cluster_spread_s = 20,
  distance_between_peaks_from_same_sample = 100,
  clustering = list(method = "hclust"),
  verbose = FALSE
)

Arguments

peaks

A data frame with at least the following columns:

  • "UniqueID" A unique ID for each peak

  • "SampleID" The sample ID the peak belongs to

  • "dt_apex_ms", "rt_apex_s" The peak positions

  • "dt_max_ms", "dt_min_ms", "rt_max_s", "rt_min_s" (for filtering outlier peaks based on their size)

...

Ignored. All other parameters beyond peaks should be named

distance_method

A string. One of the distance methods from stats::dist, "sd_scaled_euclidean" or "mahalanobis"

dt_cluster_spread_ms, rt_cluster_spread_s

The typical spread of the clusters. Used for scaling. dimensions when computing distances. When clustering$method is "hclust", these spreads are used to cut cluster sizes.

distance_between_peaks_from_same_sample

The distance between two peaks from the same sample will be set to distance_between_peaks_from_same_sample*max(distance_matrix)

clustering

A named list with "method" and the supported method, as well as further options. For method = "kmedoids", you must provide Nclusters, with either the number of clusters to use in the kmedoids algorithm (cluster::pam) or the string "max_peaks_sample" to use the maximum number of detected peaks per sample.

For method = "hclust", you can provide hclust_method, with the method passed to mdendro::linkage().

verbose

logical, to control printing in the function

Value

A list with :

  • peak_list_clustered: The peak list with a "cluster" column

  • cluster_stats: Cluster statistics (cluster size...)

  • dist: peak to peak distance object

  • extra_clustering_info: Arbitrary clustering extra information, that depends on the clustering method

Examples

peak_list_fn <- system.file("extdata", "peak_list.rds", package = "GCIMS")
peak_list <- readRDS(peak_list_fn)

peak_clustering  <- clusterPeaks(peak_list)