Wine quality assessment through lightweight deep learning: integrating 1D-CNN and LSTM for analyzing electronic nose VOCs signals

Purpose – In the wine industry, maintaining superior quality standards is crucial to meet the expectations of both producers and consumers. Traditional approaches to assessing wine quality involve labor-intensive processes and rely on the expertise of connoisseurs proficient in identifying taste profiles and key quality factors. In this research, we introduce an innovative and efficient approach centered on the analysis of volatile organic compounds (VOCs) signals using an electronic nose, thereby empowering nonexperts to accurately assess wine quality. Design/methodology/approach – To devise an optimal algorithm for this purpose, we conducted four computational experiments, culminating in the development of a specialized deep learning network. This network seamlessly integrates 1D-convolutional and long-short-term memory layers, tailor-made for the intricate task at hand. Rigorous validation ensued, employing a leave-one-out cross-validation methodology to scrutinize the efficacy of our design. Findings – The outcomes of these e-demonstrates were subjected to meticulous evaluation and analysis, which unequivocally demonstrate that our proposed architecture consistently attains promising recognition accuracies, ranging impressively from 87.8% to an astonishing 99.41%. All this is achieved within a remarkably brief timeframe of a mere 4 seconds. These compelling findings have far-reaching implications, promising to revolutionize the assessment and tracking of wine quality, ultimately affording substantial benefits to the wine industry and all its stakeholders, with a particular focus on the critical aspect of VOCs signal analysis. Originality/value – This research has not been published anywhere else.


Introduction
Wine, typically comprising 10-15% ethanol by volume, enjoys global popularity and has long been a central feature of social events.Its composition predominantly includes water, ethanol and various acids.However, the distinctive characteristics of different wines, such as quality, geographic origin and vintage are attributed to a myriad of minor constituents [1,2].Among these characteristics, aroma stands out as a crucial determinant of wine quality [3].
The compounds responsible for the aroma are volatile in nature [4], playing a significant role in the wine's sensory quality, a factor of paramount importance to consumers [5].Consequently, the evaluation of wine quality continues to depend substantially on organoleptic assessments conducted by trained experts, emphasizing the subjective nature of this process [6][7][8].
The assessment of wine quality remains a dynamic field due to the intricate composition of wine, which primarily consists of water, ethanol and acids.However, the subtle distinctions in wine, such as quality, origin and vintage, are influenced by numerous trace components [1].Conventional methods for quality assessment involve chemical analysis and expert sensory panels.Chemical analysis focuses on parameters like alcohol content, sugar, dry extract and volatile acidity, using established techniques such as distillation, titration or highperformance liquid chromatography [9].Although these methods offer precise insights into specific wine constituents, they have limitations, including a lack of holistic assessment due to the interplay of various components and practical challenges [10].Sensory evaluation, while valuable, is subject to subjectivity and associated factors like individual differences and sensory fatigue, making it costly and resource-intensive to maintain a sensory evaluation laboratory with trained experts [11].Consequently, the adoption of electronic noses (E-noses) has emerged as a cost-effective and objective alternative for evaluating the overall quality of wine through odor and taste analysis.
The E-nose, modeled after the human olfactory system, offers a noninvasive method to assess wine aromas and detect volatile components [12,13].It primarily consists of a sensor array and a pattern prediction unit, capable of quantifying and classifying foods containing complex volatile organic compounds (VOCs).The sensors in the E-nose, akin to mammalian olfactory receptors, are designed to respond to odors with characteristics such as weak selectivity and cross-sensitivity [14][15][16].The pattern prediction unit undertakes the task of identification and discrimination, similar to the mammalian brain's processing of signals, ultimately providing a comprehensive odor-based quality fingerprint of the food [9].Recent studies have demonstrated the efficacy of E-nose systems in various applications.For instance, Hazarika et al. successfully employed E-nose technology for detecting biotic stress in Khasi Mandarin orange trees [17].Furthermore, Ozmen and Dogan introduced a portable E-nose with quartz crystal microbalance sensors and online analytical capabilities, suitable for both rapid assessments and long-term monitoring [18].In the context of wine, E-nose technology has been used for diverse purposes, such as distinguishing between different types of wines from the same grape variety and viticultural zone [19], quantitative analysis of wine aroma volatiles [20], monitoring wine aging processes [21] and differentiating among wines of the same grape variety from the same cellar [22,23].
In recent years, the demand for the Internet of things (IoT) and advancements in deep learning methodologies have paved the way for efficient and automated analysis and feature extraction in various domains, wine discovery being no exception.Numerous studies have employed deep learning/machine learning approaches to enhance the accuracy of characteristic extraction in wines.For instance, the integration of artificial neural network (ANN) techniques with partial least squares regression (PLSR) by Gomes et al. [24] and linear discriminant analysis (LDA), ANN, canonical variate analysis (CVA) and Kohonen's selforganizing NN (KhNN) by Dixit et al. and Guo et al. [25,26].Furthermore, Lu et al. has shown promising results in determining the sugar content in port wine grape berries.The multifeature fusion convolutional NN (MCNN), as implemented by Lu et al., achieved a remarkable 93.2% classification accuracy and an area under the curve (AUC) of 0.987 [27].The research by Kuntsche et al. developed a network called the alcoholic beverage identification deep learning algorithm (ABIDLA), targeting its applications in image-based bottle classification, detecting glycerol adulteration in red wines and tracing the geographical origins of wines [28].

ACI
Besides the advantages, it is necessary to consider the limitations of deep learning approaches in previous research.First, the large number of trainable parameters inherent in deep learning networks such as ResNet and DenseNet, especially when compared to a relatively small dataset, is a significant limitation.This disparity can lead to overfitting, which hinders the trained model to make inferences in the real world.Second, the original paper [29] of the database used in this study reveals a high accuracy rate of 97.68%, but it is conspicuously devoid of documentation regarding the number of training epochs.Notable was the training time of models, which lasted only 99 seconds.
Considering the limitations highlighted, this study outlines a strategic objective comprised of key focal points: (1) Design lightweight deep learning architectures, incorporating 1D-convolutional layers (1D-CNN), long short-term memory (LSTM) and NN, for the nuanced classification of wine labels into high, average and low-quality (LQ) categories.
(2) Strict validation strategy based leave-one-out cross-validation to ensure the inference of trained models.

Material and methodology 2.1 Data description
The database used in this study was compiled by Rodriguez Gamboa et al. using an electronic olfactory sensing device commonly known as O-NOSE [29].In this experimental setup, six channels of metal-oxide gas sensors were employed.To procure the requisite samples, a total of 22 commercially available wine bottles were selected for the study.Before commencing the experiment, a subset of 13 bottles was chosen randomly and stored in an unregulated environment for roughly six months.These bottles were designated as LQ for this study.
Additionally, four out of the 22 bottles were uncorked two weeks prior to the experiment, earmarked as average-quality (AQ).The last five were classified as high-quality (HQ).The dataset consists of 235 samples with 51 HQ, 43 AQ and 141 LQ measurements.

Sampling procedure
Figure 1a provides a schematic representation of the apparatus and the methodology employed for capturing signals of VOCs in the context of wine analysis.The assembly comprises 22 wine bottles; 13 classified as LQ, 4 as AQ and 5 as HQ.A 1 ml aliquot of each

Wine quality assessment
wine sample is exposed to air to expedite evaporation before saturation.The VOCs are then sequestered in a gas chamber for half a minute to accumulate, and thereafter transferred for signal acquisition.In the first phase of acquisition lasting 90 seconds, VOCs are propelled into the sensor chamber, triggering changes in sensor resistance.The subsequent phase involves sensor desorption.The sensor's sampling frequency during this desorption phase is set at 18.5Hz.Finally, the purification phase commences, where residual VOCs are purged from the chamber for over 600 seconds.

Computation algorithm
As illustrated in Figure 1b, the devised algorithm for VOCs signals processing encompasses critical stages such as signal preprocessing, model training, testing, classification and evaluation of results.2.3.1 Data preprocessing.Data preprocessing is essential to prepare the initial VOC signals for analysis.The aim is to minimize the inadvertent introduction of noise during the sampling process.Specifically, the first and last 10 seconds of each signal are excluded.Subsequently, a down-sampling step is executed to reduce the noise influence.This rigorous approach is fundamental to ensure the reliability and accuracy of the subsequent data processing stages.Given that the dataset is relatively small (235 samples divided into 3 categories: HQ, AQ, LQ), an alternative strategy known as the time-slicing window method is employed to mitigate the limitations of dataset size (Figure 2).This technique involves subdividing the original signals into 4-s-long segments, with a 50% overlap between consecutive segments.This approach ensures data consistency and enables us to have a dataset of substantial size, enhancing the stability and reliability of the deep learning model [30].
2.3.2Network architecture.The deep learning model proposed herein is constructed using three primary components: the channel extraction (CE), the LSTM block and the NN block.The base block comprises a 1D-convolutional layer, a batch normalization layer and a ReLU activation layer.Its principal role is extracting distinct features from the signal channels of gas sensors.The NN block is comprised of three NN layers, a ReLU activation layer, and two sigmoid activation layers, aimed at producing complex feature representations.Additionally, the LSTM block, consisting of two LSTM layers and one dropout layer, captures long-term dependencies within the data.
As part of the experimental procedure, a CE block is integrated into the architecture to facilitate comprehensive analysis of the gas sensor data (Figure 3).The gas sensor data consists of six channels, each processed separately and fed into the base block.After feature extraction, the features are aggregated and passed through an average pooling layer.

ACI
Consequently, this ensures that the model considers the varying compositions and response characteristics of the gas being analyzed.

Implementation 3.1 Computational experiments
Throughout the experiment phase, four diverse network architectures, as shown in Figure 3, were carefully crafted and deployed for the training, validation and testing of the wine database.The fundamental aim was to identify the network structure that would optimize the performance of the model.The first experiment evaluated the efficacy of the CE for each channel when only connected average layer to a dropout layer and softmax layer.The second and third experiments explored the implications of linking the dropout layer's output to the NN and LSTM blocks, respectively.The final experiment connected the dropout layer's output to both the NN and LSTM blocks simultaneously.These trials were designed to assess whether augmenting the first experiment with the NN block, the LSTM block or both would enhance performance.The comprehensive analysis underscored the potential of integrating the first experiment with other architectural elements, yielding insights into the optimal structure for maximizing performance.Throughout the 600 epochs of the training and validation process, the learning rate was manipulated strategically, being reduced by factors of 5 and 10 after the 200th and 450th epochs respectively and decreasing exponentially by a factor of 0.1 from the 550th epoch.This bespoke learning rate schedule facilitated the optimal convergence of the deep learning network structure.

Validation approach
Prior to the data preprocessing stage, the wine database, which comprised of 235 samples (22 bottles), was divided into three distinct collections: a training set, a validation set and a test set.The training/validation set, and test set underwent a leave-one-out cross-validation

Wine quality assessment
(LOOCV) procedure, where one bottle was used for testing and the remaining 21 bottles were used for the training/validation set, which was iteratively repeated until each bottle had been singularly used as the test set.The training/validation set, and test sets then underwent preprocessing following the procedure described in Figure 2. The training set and validation set were created by randomly assigning 80% of training/validation set to the training set, while the remaining 20% was allocated to the validation set.This allocation is maintained unchanged throughout the training and validation stage, as shown in Figure 4a.

Results and discussion
This section offers an exhaustive evaluation of the performance metrics gleaned from the training, validation and testing stages of the experiment.Metrics including the training, validation and testing accuracy scores were considered, alongside the training and validation times, and the total count of network parameters.These performance indicators furnish key insights into the effectiveness and efficiency of the proposed model, highlighting its ability to accurately classify and predict the wine dataset.

Results
Figure 5 represents the confusion matrices derived from the testing phase of both the 3-class and 2-class classification tasks conducted in experiment 4. In general, the findings indicate a notable level of proficiency in both tasks, as evidenced by accuracy rates surpassing 82% for the 2-class task and 92% for the 3-class task.This finding provides additional support for the claim that the research was conducted in a suitable and appropriate manner.Nevertheless, it is important to highlight that the AQ class demonstrates the highest rates of misclassification in comparison to the other classes.Specifically, the (AQ-HQ) task exhibits an approximate misclassification rate of 23.1%, the (AQ-LQ) task has a misclassification rate of 12.6%, and the (HQ-AQ-LQ) task has a misclassification rate of 9.9%.The discussion section will provide a more in-depth analysis and examination of these findings.Furthermore, Table 1 presents a comprehensive summary of the final testing results, affirming the successful trajectory of the research with significantly high levels of accuracy.The average accuracy metric for the (HQ-AQ) task is greater than 82%, while for the remaining tasks it is above 92%.Notably, experiments 2 (84.90%), 3 (85.67%),and 4 (82.18%)exhibited underperformance in the (HQ-AQ) task (87.8%).However, when this outlier was reclassified as an (AQ-LQ) and (HQ-LQ) task, their performance slightly surpassed that of experiment 1.Out of the four experiments that were conducted, experiment 2 exhibited the highest average performance, followed by experiment 4, experiment 3 and experiment 1, in descending order.
The research demonstrated satisfactory performance; nevertheless, there remain challenges that need to be addressed in the subsequent phase.Although the accuracy is promising, there are concerns regarding misclassifications, particularly within the AQ class.The misclassification rates for AQ-HQ, AQ-LQ and HQ-AQ-LQ indicate potential difficulties in accurately distinguishing AQ samples using the classification model.In order to ascertain the underlying causes and potential solutions, it is imperative to conduct additional research on these matters.Therefore, the subsequent section will analyze these challenges and propose solutions to enhance the performance of classification.

Discussion
The confusion matrices presented in Figure 5 illustrate a clear distinction between the HQ and LQ classes.Additionally, the matrices indicate that the AQ class exhibits a higher rate of misclassification, particularly in tasks involving the distinction between HQ and AQ (23.1%)ACI

ACI
and AQ and LQ (12.6%).Moreover, the t-SNE visualization in Figure 4b facilitates clear visual differentiation among the HQ, AQ and LQ categories.There is a wide range of AQ samples that exhibit mixing and diffusion characteristics between the HQ and LQ classes.It should be noted that the AQ class was subjected to a two-week period of exposure in an uncontrolled environment before the experiment, whereas the LQ class had been stored in such conditions for a duration of six months.The disparity in misclassification rates between the AQ class and the other classes could potentially be attributed to variations in storage conditions.It is crucial to consider, nonetheless, that the primary focus of this study revolved around assessing the effectiveness of algorithms in detecting the quality of wine.Hence, the absence of empirical analysis or supplementary investigation to substantiate this hypothesis represents a limitation of this study.
Despite the impressive test performance results, there is a minor concern that requires consideration.Table 1 provides information regarding the number of network parameters and the training durations for each experiment.Even though the experiment with the highest number of parameters and longest training time reaches 82,724 parameters and 3,630.51seconds, such values can be regarded as relatively modest in the context of deep learning networks.In comparison to the other experiments, Experiment 1 has superior network parameters and training performance.However, the performance of all experiments is uniformly substantial, with only 1-2% variance between them.Intriguingly, experiment 1 and experiment 2, which lack the LSTM block, demonstrate greater accuracy in certain circumstances than experiment 3 and experiment 4.This observation suggests that the contribution of LSTM to wine quality detection may not be as pronounced, and that the inference speed of networks containing an LSTM block may be slower in real-world scenarios.
The robustness of the models proposed in this study is illustrated in Table 2 through a comparative analysis.Through 600 epochs of extensive training, our models are protected against underfitting, ensuring their efficacy (Figure 6).In addition, the Time-slicing window method utilized in our method enables rapid recognition with a duration of only 4 seconds.Even though our study's 92.75% accuracy falls slightly short of the 97.68% accuracy reported in the original wine data paper using the same validation method (LOOCV), it is important to note that the original paper utilized a deep Multilayer Perceptron (MLP) with a significantly shorter training time of 99 seconds [29].In addition, the original paper did not reveal the number of epochs or other relevant training parameters, which raises concerns about the possibility of overfitting in their results.In contrast, our approach to validation is rigorous and includes the division of the validation set, which facilitates the weight update procedure and reduces the risk of overfitting during the training phase.Consequently, our findings offer compelling evidence for the dependability and effectiveness of the proposed models.A final limitation of this study is its inability to account for the class imbalance present in the wine database.As mentioned in the section describing the data, the distribution of samples across the classes is unbalanced, with 13 bottles from the LQ class, 4 bottles from the AQ class, and the remaining bottles from the HQ class.Using the time-slicing window method during the preprocessing stage accentuates the unequal representation of classes in the resulting data, thereby intensifying this class imbalance.This issue was acknowledged, but it was not investigated in this study.As a future consideration, measures will be investigated to resolve this class imbalance and its potential impact on model performance, with the goal of improving the overall efficacy and dependability of the proposed approach.

Conclusions
In this study, we proposed deep learning networks, integrating 1D-CNN and LSTM architectures, for the assessment of wine quality.The network achieved a commendable accuracy of 92.75% and a rapid estimation time of four seconds, suggesting its suitability for small to medium-sized software and hardware platforms.However, we must acknowledge the limitations of our study, notably the imbalance in data distribution and the limited sample size, which could affect the model's reliability.To address these challenges, future work will focus on expanding dataset to ensure a more representative sample size.Additionally, we plan to explore two strategies to mitigate the issue of data imbalance: firstly, the collection of more samples to achieve a balanced distribution across different labels, and secondly, the potential application of over-sampling techniques like the synthetic minority over-sampling technique (SMOTE) to augment underrepresented data.This dual approach aims to enhance the robustness and accuracy of our model in future applications.

Figure 1 .
Figure 1.(a) Experimental setup illustrating sample collection from wine dataset, (b) Flowchart showcasing the algorithm for VOCs signals processing

Figure 2 .
Figure 2. Data preprocessing stage including downsampling and timeslicing window methods

Figure 3 .
Figure 3. Illustration of the architectural configurations employed in the four experimental setups conducted for the study

Figure 4 .
Figure 4. (a) Diagram illustrating the validation strategy in the study, (b) The scatter plot illustrating the separation of classes based on the t-distributed stochastic neighbor embedding (t-SNE) method

Figure 5 .
Figure 5.The confusion matrices corresponding to each classification task

Figure 6 .
Figure 6.The trajectory of the training and validation procedure (normal and log scale)