色谱 ›› 2025, Vol. 43 ›› Issue (6): 585-593.DOI: 10.3724/SP.J.1123.2025.01019

• 专论与综述 • 上一篇    下一篇

质谱数据处理软件XCMS在环境科学领域的应用综述与研究展望

杨丞1, 张奥1, 高占啟2, 苏冠勇1,*()   

  1. 1.南京理工大学环境与生物工程学院,江苏省化工污染控制与资源化高校重点实验室,江苏 南京 210094
    2.江苏省环境监测中心,生态环境部地表水环境有机污染物监测分析重点实验室,江苏 南京 210019
  • 收稿日期:2025-01-14 出版日期:2025-06-08 发布日期:2025-05-21
  • 通讯作者: * E-mail:sugy@njust.edu.cn.
  • 基金资助:
    江苏省自然科学基金面上项目(BK20242011);国家自然科学基金面上项目(42477387)

A review and research prospects on the application of the XCMS mass-spectrometry data-processing software in the environmental science field

YANG Cheng1, ZHANG Ao1, GAO Zhanqi2, SU Guanyong1,*()   

  1. 1. Jiangsu Province Key Laboratory of Chemical Pollution Control and Resources Reuse,School of Environmental and Biological Engineering,Nanjing University of Science and Technology,Nanjing 210094,China
    2. Key Laboratory of Environment Monitoring and Analysis for Organic Pollutants in Surface Water,Ministry of Ecology and Environment,Jiangsu Province Environmental Monitoring Center,Nanjing 210019,China
  • Received:2025-01-14 Online:2025-06-08 Published:2025-05-21
  • Supported by:
    Natural Science Foundation General Project of Jiangsu Province(BK20242011);National Natural Science Foundation of China (General Project)(42477387)

摘要:

生物样品和环境样品中化合物种类繁多、成分复杂,使用色谱-高分辨质谱对样品进行分析后会产生大量由质荷比(mass-to-charge ratios,m/z)、保留时间(retention-time,RT)、峰强度等组成的色谱-质谱数据,处理这些数据需要耗费大量的时间和精力,需要借助质谱数据处理软件对其进行识别分析。在众多的质谱数据处理软件中,各种形式的色谱质谱(various forms (X) of chromatography mass spectrometry, XCMS)作为一款高效、准确且可免费获取的质谱数据处理软件,在环境科学领域得到广泛应用。本论文聚焦XCMS在环境科学领域中的应用,综述了XCMS的工作流程、工作原理和参数优化措施。XCMS的工作流程主要包括数据导入、数据处理和数据导出等步骤,数据导入需要借助MSConvert等格式转换工具将不同仪器生成的数据转换为XCMS可接受的格式,数据处理大致包括峰检测、峰对齐和峰填充等步骤。在应用方面,XCMS在环境污染物非靶向筛查、污染物外源性代谢转化鉴定以及生物分子内源性代谢研究中取得了显著进展。例如,在环境污染物非靶向筛查中,XCMS能够高效提取复杂样品中的质谱特征,为后续的鉴别提供可靠的数据基础。尽管XCMS在环境科学领域的应用取得了一定成效,但仍存在一些局限性,如用户交互和自动化程度仍有待提高。XCMS在环境科学领域的发展潜力巨大,未来随着算法的不断优化和数据库的扩展,通过不断改进算法鲁棒性、数据兼容性和用户体验,XCMS有望为环境科学研究提供更强大的支持。

关键词: XCMS, 环境科学, 非靶向筛查, 未知污染物

Abstract:

Biological and environmental samples are complex and contain a highly diverse range of compounds. Analyzing these samples by chromatography-high-resolution mass spectrometry generates a substantial volume of mass-spectrometry data that are composed of mass-to-charge-ratio (m/z), retention-time (RT), and peak-intensity information that require considerable time and energy to process. Consequently, employing software to process mass-spectrometry data for identification and analysis purposes is imperative. Among the many mass-spectrometry data-processing options, XCMS (various forms (X) of chromatography mass spectrometry), which is highly efficient, precise, and freely accessible software for processing mass-spectrometry data, is broadly used in the environmental science field. This study aimed to explore the use of XCMS in environmental science applications by comprehensively reviewing the workflow, underlying principles, and parameter-optimization measures of XCMS. The workflow mainly includes importing, processing, and exporting data. Importing data requires the use of format conversion tools, such as MSConvert, which converts data generated by various instruments into a format acceptable by XCMS, while data processing includes peak detection, alignment, and filling. The various XCMS functions are mainly realized via its built-in algorithms, with the Matched Filter, CentWave, Obiwarp, and Peak Density algorithms most commonly used. The first two algorithms implement the peak-detection function, while the latter two implement the peak-alignment function. XCMS identifies compound peaks from mass-spectrometry data during peak-detection; it first filters for noise and corrects the baseline. An algorithm then detects peaks based on their shapes and intensities. XCMS can also de-emphasize and de-distort to filter out interfering information in each peak signal. The CentWave algorithm is particularly effective for processing high-resolution mass-spectrometry data by improving detection accuracy and recall. Peak-detection is followed by alignment. Here, XCMS uses kernel density estimations to match peaks between samples by estimating the retention-time distribution of matched peaks, which corrects for any nonlinear deviations in retention-times. This step is critical for accurately comparing samples. The peak-filling step resolves missing peaks in the data, and XCMS uses information from other samples to fill these gaps. This process enhances the integrity of the dataset and improves analysis accuracy. In terms of applications, XCMS has demonstrated significant progress for the non-targeted screening of environmental pollutants, identifying exogenous metabolic pollutant transformations, and exploring the endogenous metabolisms of biomolecules. For example, XCMS efficiently extracts the mass spectrometry of complex samples during the non-targeted screening of environmental pollutants, thereby providing a reliable database for subsequent identification. Although the use of XCMS in the environmental science field has delivered particular results, some limitations still exist, including the use of large amounts of memory, problems associated with the software crashing when dealing with large-scale data, and the misclassification of noise as valid signals during feature detection, which results in a large number of false positives, errors, and missed detections when processing data for compounds with complex chemical compositions and structural types. In addition, the degree of user interaction and automation requires further improvement. XCMS offers significant developmental potential in the environmental science field. Continuing algorithmic optimization and database expansion through improvements in algorithmic robustness, data compatibility, and user experience, are expected to see XCMS develop broadly and provide more powerful support for the environmental science field in the future.

Key words: XCMS, environmental science, non-targeted screening, unknown contaminants

中图分类号: