A data science analysis of the South African COVID-19 infodemic on Twitter
Loading...
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The rapid dissemination of information on Twitter (X), particularly during COVID-19, has
exacerbated infodemics, marked by the proliferation of both accurate and false
information. Users were inundated with Fake News, encompassing misinformation,
disinformation and malinformation. Misinformation entails the unintentional dissemination
of false information, whereas disinformation involves deliberate deception.
Malinformation, although grounded in truth, is manipulated to inflict harm. Traditional
human-led verification systems have proven inadequate, as seen in prior infodemics such
as the 2016 U.S. elections and South Africa’s #FeesMustFall. Legislative measures in
South Africa aimed at curbing Fake News during the pandemic were insufficient because
of the vast volume of tweets, necessitating computational approaches.
Despite global focus on COVID-19 Fake News, South African research on Twitter
infodemic analysis remains limited, particularly in the areas of Big Data, longitudinal
analysis, fake news detection, sentiment analysis and social bot detection. This study
addresses these gaps through advanced data science methods, including Natural
Language Processing (NLP), Machine Learning (ML) and Change Point Analysis (CPA),
to analyse the South African COVID-19 Twitter infodemic.
A longitudinal South African COVID-19 dataset (SAcovid19dataset), comprising 976 086
tweets from 8 November 2019 to 19 July 2021, was curated. Additionally, a labelled
dataset (C19MLdataset) of 30 193 tweets was created for Fake News detection,
containing 17 069 ‘Fake News’ and 13 124 ‘Not Fake News’ tweets. The study focused
on textual analysis, as audio, video and image analytics were beyond its scope.
This research uniquely employs an exhaustive approach to compare a wide range of
models for COVID-19 Fake News detection prioritising balanced accuracy and execution
time performance metrics. Using the C19MLdataset, twenty-seven (27) shallow, five (5)
deep learning (DL) and seven (7) transformer models were systematically evaluated.
ExtraTreesClassifier, RandomForestClassifier and LightGBM emerged as the topperforming shallow models. RoBERTa was the top performing transformer model and BiLSTM outperformed other DL models. LightGBM was identified as the most efficient model because of its speed and low
computational demands. After optimization, it achieved a balanced accuracy of 88.76%
and detected 262 508 (26.89%) Fake News tweets from the full SAcovid19dataset.
Sentiment analysis, performed using VADER and CPA, revealed 16 significant shifts in
sentiment due to real-world events. Approximately 56% were related to lockdown
announcements and restrictions. For instance, the South African national state of
emergency on 15 March 2020 led to a shift from neutral to positive sentiment.
Contrastingly, the 26 February 2021 South African state of the nation address saw
sentiment shift from positive to negative.
Social bot activity was examined using three novel algorithms that analysed tweet
timestamps, content duplication and sources. Results showed that three of the Top 10
Fake News accounts exhibited bot-like behaviour, confirming the presence of automated
accounts in the spread of false information.
This study significantly contributes to the understanding of South Africa’s COVID-19
infodemic by providing a robust Fake News detection model and linking real-world events
to shifts in public sentiment. The development of social bot detection algorithms further
illuminates the role of automated accounts in the dissemination of Fake News. These
findings have practical implications for policymakers and researchers aiming to combat
infodemics using computational tools.
Description
Submitted in fulfilment of the requirements for the Degree of Doctor of Philosophy in Information Technology, Durban University of Technology, Durban, South Africa, 2025..
Citation
DOI
https://doi.org/10.51415/10321/6053
