Chinese Journal of Chromatography ›› 2025, Vol. 43 ›› Issue (6): 585-593.DOI: 10.3724/SP.J.1123.2025.01019

• Reviews • Previous Articles     Next Articles

A review and research prospects on the application of the XCMS mass-spectrometry data-processing software in the environmental science field

YANG Cheng1, ZHANG Ao1, GAO Zhanqi2, SU Guanyong1,*()   

  1. 1. Jiangsu Province Key Laboratory of Chemical Pollution Control and Resources Reuse,School of Environmental and Biological Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
    2. Key Laboratory of Environment Monitoring and Analysis for Organic Pollutants in Surface Water,Ministry of Ecology and Environment,Jiangsu Province Environmental Monitoring Center,Nanjing 210019,China
  • Received:2025-01-14 Online:2025-06-08 Published:2025-05-21
  • Supported by:
    Natural Science Foundation General Project of Jiangsu Province(BK20242011);National Natural Science Foundation of China (General Project)(42477387)

Abstract:

Biological and environmental samples are complex and contain a highly diverse range of compounds. Analyzing these samples by chromatography-high-resolution mass spectrometry generates a substantial volume of mass-spectrometry data that are composed of mass-to-charge-ratio (m/z), retention-time (RT), and peak-intensity information that require considerable time and energy to process. Consequently, employing software to process mass-spectrometry data for identification and analysis purposes is imperative. Among the many mass-spectrometry data-processing options, XCMS (various forms (X) of chromatography mass spectrometry), which is highly efficient, precise, and freely accessible software for processing mass-spectrometry data, is broadly used in the environmental science field. This study aimed to explore the use of XCMS in environmental science applications by comprehensively reviewing the workflow, underlying principles, and parameter-optimization measures of XCMS. The workflow mainly includes importing, processing, and exporting data. Importing data requires the use of format conversion tools, such as MSConvert, which converts data generated by various instruments into a format acceptable by XCMS, while data processing includes peak detection, alignment, and filling. The various XCMS functions are mainly realized via its built-in algorithms, with the Matched Filter, CentWave, Obiwarp, and Peak Density algorithms most commonly used. The first two algorithms implement the peak-detection function, while the latter two implement the peak-alignment function. XCMS identifies compound peaks from mass-spectrometry data during peak-detection; it first filters for noise and corrects the baseline. An algorithm then detects peaks based on their shapes and intensities. XCMS can also de-emphasize and de-distort to filter out interfering information in each peak signal. The CentWave algorithm is particularly effective for processing high-resolution mass-spectrometry data by improving detection accuracy and recall. Peak-detection is followed by alignment. Here, XCMS uses kernel density estimations to match peaks between samples by estimating the retention-time distribution of matched peaks, which corrects for any nonlinear deviations in retention-times. This step is critical for accurately comparing samples. The peak-filling step resolves missing peaks in the data, and XCMS uses information from other samples to fill these gaps. This process enhances the integrity of the dataset and improves analysis accuracy. In terms of applications, XCMS has demonstrated significant progress for the non-targeted screening of environmental pollutants, identifying exogenous metabolic pollutant transformations, and exploring the endogenous metabolisms of biomolecules. For example, XCMS efficiently extracts the mass spectrometry of complex samples during the non-targeted screening of environmental pollutants, thereby providing a reliable database for subsequent identification. Although the use of XCMS in the environmental science field has delivered particular results, some limitations still exist, including the use of large amounts of memory, problems associated with the software crashing when dealing with large-scale data, and the misclassification of noise as valid signals during feature detection, which results in a large number of false positives, errors, and missed detections when processing data for compounds with complex chemical compositions and structural types. In addition, the degree of user interaction and automation requires further improvement. XCMS offers significant developmental potential in the environmental science field. Continuing algorithmic optimization and database expansion through improvements in algorithmic robustness, data compatibility, and user experience, are expected to see XCMS develop broadly and provide more powerful support for the environmental science field in the future.

Key words: XCMS, environmental science, non-targeted screening, unknown contaminants

CLC Number: