clusterPeaks.Rd
Peak grouping function, exposing several options useful for benchmarking.
clusterPeaks(
peaks,
...,
distance_method = "euclidean",
dt_cluster_spread_ms = 0.1,
rt_cluster_spread_s = 20,
distance_between_peaks_from_same_sample = 100,
clustering = list(method = "hclust"),
verbose = FALSE
)
A data frame with at least the following columns:
"UniqueID" A unique ID for each peak
"SampleID" The sample ID the peak belongs to
"dt_apex_ms", "rt_apex_s" The peak positions
"dt_max_ms", "dt_min_ms", "rt_max_s", "rt_min_s" (for filtering outlier peaks based on their size)
Ignored. All other parameters beyond peaks
should be named
A string. One of the distance methods from stats::dist, "sd_scaled_euclidean" or "mahalanobis"
The typical spread of the clusters. Used for scaling.
dimensions when computing distances. When clustering$method
is "hclust"
, these spreads are used to cut cluster sizes.
The distance between two peaks from the same sample will be set to distance_between_peaks_from_same_sample*max(distance_matrix)
A named list with "method" and the supported method, as well as further options.
For method = "kmedoids"
, you must provide Nclusters
, with either the number of clusters
to use in the kmedoids algorithm (cluster::pam) or the string "max_peaks_sample"
to use the maximum number of
detected peaks per sample.
For method = "hclust"
, you can provide hclust_method
, with the method
passed to mdendro::linkage()
.
logical, to control printing in the function
A list with :
peak_list_clustered
: The peak list with a "cluster" column
cluster_stats
: Cluster statistics (cluster size...)
dist
: peak to peak distance object
extra_clustering_info
: Arbitrary clustering extra information, that depends on the clustering method
peak_list_fn <- system.file("extdata", "peak_list.rds", package = "GCIMS")
peak_list <- readRDS(peak_list_fn)
peak_clustering <- clusterPeaks(peak_list)