Please use this identifier to cite or link to this item:
https://hdl.handle.net/10321/5830
Title: | Developing a data lakehouse for a South African government-sector training authority : implementing quality control for incremental extract-load-transform pipelines in the ingestion layer | Authors: | Govender, Priyanka Naicker, Nalindren Patel, Sulaiman Saleem Joseph, Seena Moonsamy, Devraj Akinola, Ayotuyi Tosin Madamshetty, Lavanya Govender, Thamotharan Prinavin |
Editors: | Ogunleye, Olalekan Samuel | Issue Date: | Dec-2024 | Publisher: | IGI Global | Source: | Govender, P. et al. 2024. Developing a data lakehouse for a South African government-sector training authority: implementing quality control for incremental extract-load-transform pipelines in the ingestion layer. In: Ogunleye, Olalekan Samuel. Machine learning and data science techniques for effective government service delivery. Hershey, Pa.: IGI Global, 157-184. doi:10.4018/978-1-6684-9716-6.ch006 | Abstract: | The Durban University of Technology is undertaking a project to develop a data lakehouse system for a South African government-sector training authority. This system is considered critical to enhance the monitoring and evaluation capabilities of the training authority and ensure service delivery. Ensuring the quality of data ingested into the lakehouse is critical, as poor data quality deteriorates the efficiency of the lakehouse solution. This chapter studies quality control for ingestion-layer pipelines to propose a data quality framework. Metrics considered for data quality were completeness, accuracy, integrity, correctness, and timeliness. The framework was evaluated by practically applying it to a sample semi-structured dataset to gauge its effectiveness. Recommendations for future work include expanded integration, such as incorporating data from more varied sources and implementing incremental data ingestion triggers. |
URI: | https://hdl.handle.net/10321/5830 | ISBN: | 9781668497166 9781668497180 |
DOI: | 10.4018/978-1-6684-9716-6.ch006 |
Appears in Collections: | Research Publications (Accounting and Informatics) |
Show full item record
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.