Quantitative Analysis of Polyphenols in Lonicera caerulea Based on Mid-Infrared Spectroscopy and Hybrid Variable Selection

An illustration showing a basket of blueberries, laboratory equipment with flasks and samples, scientific instruments including a spectrometer, data visualization charts with a bar graph and network diagram, a colorful 3D scatter plot, sound wave patterns, and a molecular structure diagram, all arranged in a natural landscape setting.

AI Summary of Peer-Reviewed Research

This page presents an AI-generated summary of a published research paper. The original authors did not write or review this article. See full disclosure ↓

Molecules·2026-02-23·Peer-reviewed·View original paper ↗·Follow this topic (RSS)
Publication Signals show what we were able to verify about where this research was published.MODERATECore publication signals for this source were verified. Publication Signals reflect the source’s verifiable credentials, not the quality of the research.
  • ✔ Peer-reviewed source
  • ✔ Published in indexed journal
  • ✔ No retraction or integrity flags

Overview

This study developed a quantitative prediction model for polyphenol content in Lonicera caerulea using mid-infrared spectroscopy coupled with a hybrid variable selection strategy optimized for high-dimensional, small-sample datasets. The research addresses the analytical need for rapid, non-destructive quality control methods in functional food assessment.

Methods and approach

One hundred ninety-one blue honeysuckle samples from Northeast China were analyzed. Spectral data (7468 dimensions) were acquired via Fourier transform infrared spectrometry, with polyphenol reference values determined using the Folin-Ciocalteu method. Preprocessing evaluation across 10 methods identified multiplicative scatter correction combined with Savitzky-Golay first derivative as optimal. The hybrid variable selection approach (VIP1.0 intersected with top 30% random forest regression variables) reduced dimensionality to 984 wavelengths. Four machine learning models (partial least squares, random forest regression, support vector regression, and XGBoost) underwent three-stage hyperparameter tuning on calibration (n=152) and prediction (n=39) sets stratified using the SPXY algorithm.

Key Findings

The optimized XGBoost model demonstrated superior performance on the independent test set with R-squared of 0.92, root mean square error of 0.098, and residual prediction deviation of 3.47. The hybrid variable selection method achieved 86.8% dimensionality reduction while improving predictive accuracy relative to the classical competitive adaptive reweighted sampling approach, which yielded R-squared of 0.78 and residual prediction deviation of 2.14, representing 16.3% and 55.2% improvements respectively.

Implications

The hybrid variable selection strategy effectively mitigates analytical challenges inherent to high-dimensional spectral datasets with limited sample sizes, addressing a methodological constraint common in spectroscopy-based quality control applications. The framework demonstrates transferable utility for rapid, non-destructive quantification of bioactive compounds in plant materials, with potential extension to other functional food matrices requiring polyphenol characterization.

Disclosure

  • Research title: Quantitative Analysis of Polyphenols in Lonicera caerulea Based on Mid-Infrared Spectroscopy and Hybrid Variable Selection
  • Authors: Haiwei Wu, Xuexin Li, Jianwei Liu, Zhihao Wang, Yuchun Liu
  • Publication date: 2026-02-23
  • DOI: https://doi.org/10.3390/molecules31040750
  • OpenAlex record: View
  • PDF: Download
  • Disclosure: This post was generated by Claude (Anthropic). The original authors did not write or review this post.

Get the weekly research newsletter

Stay current with peer-reviewed research without reading academic papers — one filtered digest, every Friday.

More posts