Repository logo

A data science analysis of the South African COVID-19 infodemic on Twitter

dc.contributor.advisorThakur, Surendra C.
dc.contributor.authorKhan, Yaseen
dc.date.accessioned2025-06-26T06:26:38Z
dc.date.available2025-06-26T06:26:38Z
dc.date.issued2024
dc.descriptionSubmitted in fulfilment of the requirements for the Degree of Doctor of Philosophy in Information Technology, Durban University of Technology, Durban, South Africa, 2025..
dc.description.abstractThe rapid dissemination of information on Twitter (X), particularly during COVID-19, has exacerbated infodemics, marked by the proliferation of both accurate and false information. Users were inundated with Fake News, encompassing misinformation, disinformation and malinformation. Misinformation entails the unintentional dissemination of false information, whereas disinformation involves deliberate deception. Malinformation, although grounded in truth, is manipulated to inflict harm. Traditional human-led verification systems have proven inadequate, as seen in prior infodemics such as the 2016 U.S. elections and South Africa’s #FeesMustFall. Legislative measures in South Africa aimed at curbing Fake News during the pandemic were insufficient because of the vast volume of tweets, necessitating computational approaches. Despite global focus on COVID-19 Fake News, South African research on Twitter infodemic analysis remains limited, particularly in the areas of Big Data, longitudinal analysis, fake news detection, sentiment analysis and social bot detection. This study addresses these gaps through advanced data science methods, including Natural Language Processing (NLP), Machine Learning (ML) and Change Point Analysis (CPA), to analyse the South African COVID-19 Twitter infodemic. A longitudinal South African COVID-19 dataset (SAcovid19dataset), comprising 976 086 tweets from 8 November 2019 to 19 July 2021, was curated. Additionally, a labelled dataset (C19MLdataset) of 30 193 tweets was created for Fake News detection, containing 17 069 ‘Fake News’ and 13 124 ‘Not Fake News’ tweets. The study focused on textual analysis, as audio, video and image analytics were beyond its scope. This research uniquely employs an exhaustive approach to compare a wide range of models for COVID-19 Fake News detection prioritising balanced accuracy and execution time performance metrics. Using the C19MLdataset, twenty-seven (27) shallow, five (5) deep learning (DL) and seven (7) transformer models were systematically evaluated. ExtraTreesClassifier, RandomForestClassifier and LightGBM emerged as the topperforming shallow models. RoBERTa was the top performing transformer model and BiLSTM outperformed other DL models. LightGBM was identified as the most efficient model because of its speed and low computational demands. After optimization, it achieved a balanced accuracy of 88.76% and detected 262 508 (26.89%) Fake News tweets from the full SAcovid19dataset. Sentiment analysis, performed using VADER and CPA, revealed 16 significant shifts in sentiment due to real-world events. Approximately 56% were related to lockdown announcements and restrictions. For instance, the South African national state of emergency on 15 March 2020 led to a shift from neutral to positive sentiment. Contrastingly, the 26 February 2021 South African state of the nation address saw sentiment shift from positive to negative. Social bot activity was examined using three novel algorithms that analysed tweet timestamps, content duplication and sources. Results showed that three of the Top 10 Fake News accounts exhibited bot-like behaviour, confirming the presence of automated accounts in the spread of false information. This study significantly contributes to the understanding of South Africa’s COVID-19 infodemic by providing a robust Fake News detection model and linking real-world events to shifts in public sentiment. The development of social bot detection algorithms further illuminates the role of automated accounts in the dissemination of Fake News. These findings have practical implications for policymakers and researchers aiming to combat infodemics using computational tools.
dc.description.levelD
dc.format.extent229 p
dc.identifier.doihttps://doi.org/10.51415/10321/6053
dc.identifier.urihttps://hdl.handle.net/10321/6053
dc.language.isoen
dc.subjectCOVID-19
dc.subjectOpinion Mining
dc.subjectFake News
dc.subjectNatural Language Processing
dc.subjectInfodemic
dc.subjectData Science
dc.subjectMachine Learning
dc.subjectSentiment Analysis
dc.subjectSocial Bot Detection
dc.subject.lcshInformation overload
dc.subject.lcshSocial media--Influence
dc.subject.lcshCOVID-19 Pandemic, 2020-2023
dc.subject.lcshFake news
dc.titleA data science analysis of the South African COVID-19 infodemic on Twitter
dc.typeThesis
local.sdgSDG03
local.sdgSDG04

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Khan_Y_2025.pdf
Size:
6.97 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
2.22 KB
Format:
Item-specific license agreed upon to submission
Description: