Qualitative classification of sugar processing stream products by near infrared spectroscopy
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The Sugar Milling Research Institute NPC (SMRI) is an integral and essential part of the sugar
industry as it provides a quality control service among other consultation services to sugar mills
in South Africa and other parts of Africa. SMRI uses various prediction equations with near
infrared spectroscopy (NIRS), in transmission mode, to predict analyte concentrations present
in the various sugar stream products. In this study, chemometrics was used to develop a
classification model using discriminant analysis, which could be applied to the process analysis
to choose the correct prediction equation for a specific sugar stream product. Samples were
selected based on various geographical and environmental factors to ensure variability between
the samples. Two different types of data sets were explored to determine the best classification
model. The first method used the spectral data of absorbance and wavelength of each sample:
Pre-processing was carried out to eliminate any scattering effects. Principal component analysis
(PCA) was then applied to reduce the data so that only the necessary information remained.
Various classification models, namely, K-nearest neighbour (KNN), Classification tree,
Support vector machine (SVM), and Logistic regression, were tested and validated by
comparing the predicted sample types against actual sample types. Results showed that the
KNN (3) model with the Savitzky Golay filter and three principal components (PCs) provided
the best separation between the various sugar stream products. The second method used the
analyte concentrations for pol (apparent sucrose content), Brix (total dissolved solids), sucrose,
fructose, glucose, and ash for the various sugar stream products. These results were standardised
before PCA was applied. The same classification models were applied, tested, and validated
using actual samples. These results showed that the Logistic regression model with two PCs
performed best. The optimum model from each investigation was compared against each other
by evaluating the performance measures of the two models. Based on the analyte concentration
data, the Logistic regression (lasso) model with two PCs provided the best separation between
sugar stream products. The F1 scores and classification accuracies determined this for the
calibration and independent validation sample data set, which were 99.4 and 100 %,
respectively.
Description
Submitted in fulfilment of the requirements for the Degree of Master of Applied Science in Chemistry, Durban University of Technology, 2022.
Citation
DOI
https://doi.org/10.51415/10321/4120