DATA SCIENCE FOR SMALL AND MEDIUM-SIZED ENTERPRISES: A STRUCTURED LITERATURE REVIEW

Small and medium-sized enterprises (SMEs) are faced with the challenge of implementing Industry 4.0 (I4.0) technologies to keep up with larger industry players, and to pave the way for sustainable business development and digital transformation. Data science (DS) specifically has the potential to provide SMEs with the ability to elevate their decision-making processes by making informed data-driven decisions. A vast amount of literature relates to the field of DS and its implementation; however, the literature related to the implementation of DS in the context of SMEs is limited. Consequently, this article presents a structured literature review with regard to the implementation of DS to support sustainable business operations in SMEs, and particularly focusing on SMEs in developing countries


INTRODUCTION
As small and medium-sized enterprises (SMEs) embark on the journey of digital transformation and the implementation of Industry 4.0 (I4.0) technologies, data science (DS) may be seen as a key driver of such advancements [1]. However, DS often falls victim to the 'buzzword' phenomenon; thus there is much confusion about the extent of its scope, which may be explained by its intricate relationship with other important, similarly misunderstood concepts, such as big data (BD), data analytics (DA), big data analytics (BDA), data mining (DM), data-driven decision-making (DDDM), and the Internet of Things (IoT) [2]. For SMEs to implement DS effectively, an understanding of its relationship with such concepts and underlying principles is necessary [2] -and that is addressed in this paper.
DS may be briefly explained as the supporting set of fundamental principles that aid in the process of extracting information or knowledge from data [2]. Data mining is perhaps the most closely related concept, which comprises the process of knowledge generation from data, thus producing new trends and patterns [2], [3]. BD has become somewhat ubiquitous in various areas of literature, which often proves to be a hindrance to the structural development of a term [4]. Consequently, the term is frequently used with reference to a variety of concepts -including the collection, storing, aggregating, and processing of data [4], all of which fall under different categories of the DS umbrella. However, BD may be described by five distinct characteristics: volume, velocity, variety, veracity, and value [5].
For the purpose of this paper, BD simply refers to data sets that have these characteristics and that require new technologies to be processed, owing to the overwhelming nature of their size [2]. DS may be seen as the core of these related concepts, with all working together in unison to provide insights that lead to better, more informed organisational decision-making [2]. Consequently, given the overlapping nature and interchangeable use of the above-mentioned terms in the literature, it is appropriate to consider them as analogous to DS for the purpose of this paper.
DS has the potential to offer insightful trends and information to support sustainable business development, but its implementation comes with a unique set of challenges -especially for SMEs. Based on the findings of the Organisation for Economic Cooperation and Development (OECD), formal SMEs contribute roughly 70% of total employment and between 50% and 60% of the gross domestic product (GDP) of developed countries, and 45% of employment and 33% of the GDP of developing countries [6]. These statistics provide clear evidence that SMEs are vital to economies around the globe, and that there is high potential for growth in SMEs in developing countries. They may also be an indication of the contrasting level of obstacles that SMEs face in pursuit of growth in developed vs developing countries.
The aim of this structured literature review is systematically to consider the current literature on the implementation of DS in SMEs from developing and developed countries respectively. Emphasis is then placed on the challenges, readiness, and opportunities associated with implementing DS in SMEs. The objectives of this paper are: (i) to identify the literature on the implementation of DS in SMEs; (ii) to synthesise the literature and identify common factors that are barriers to implementation, organisational readiness, and opportunities; (iii) to analyse and compare these factors in the context of developing and developed countries; and (iv) to identify gaps in the literature that require further research.

RESEARCH METHODOLOGY
This paper presents a structured literature review, which may be seen as an appropriate method of considering the literature that is critical and central [7]. It is carried out in accordance with the 'preferred reporting items for systematic reviews and meta-analyses' (PRISMA) methodology [8]. This was deemed to be an appropriate option owing to its transparency and completeness for the reporting of systematic reviews [8]. Figure 1 presents the four-phase flow diagram associated with the PRISMA methodology, which outlines the steps taken in this systematic literature review. For the purpose of this paper, the literature on the implementation of DS, BD, IoT, DDDM, and DA in the context of SMEs was obtained through the use of SCOPUS -the largest database of peer-reviewed literature, consisting of publications from scientific journals, books, and conference proceedings. Google Scholar was used to identify specific literature related to key concepts that were not found in the SCOPUS database.

RESULTS AND DISCUSSIONS
Following the approach described in Section 2, detailed discussions of the respective phases of the PRISMA methodology are provided in this section, along with the results corresponding to each of those phases.

Phase 1: Identification
Phase 1 represents the identification stage of the PRISMA methodology, where the first step is to select appropriate keywords for the SCOPUS database search function. The selected search terms and their variations are summarised in Table 1. The SCOPUS results were not subject to any limitations in order to accumulate the greatest number of articles for the screening phase. The results were limited, however, by the various publishers' restrictions on access to the literature. Based on the respective search queries entered in the SCOPUS database, 525 results were displayed prior to screening. A further five documents were identified through the use of Google Scholar, which increased the total number of publications to 530. Next, duplicate documents were removed, which led to a total of 512 publications.

Phase 2: Screening
Phase 2 comprises the screening of the literature identified in phase 1, which starts by screening the titles and abstracts with the aim of excluding blatantly irrelevant material, and by looking at the occurrence of the search terms shown in Table 1 and the main categories of the study -namely, DS, SMEs, and implementation. The initial 512 documents were narrowed down to 137 by the first screening. Owing to a lack of access to publications, a further 48 documents were disregarded, which left a total of 89 documents. Next, the documents were screened based on document type: all unpublished documents were disregarded, along with web pages, presentations, and posters. A further six documents were thus excluded, leading to a total of 83 documents for the Eligibility phase.

Phase 3: Eligibility
In phase 3, the eligibility of the screened literature was determined through the use of the eligibility criteria, as shown in Figure 2. The main determinants of the selection process were the context of SMEs and whether the literature covered DS in this context. A total of 60 documents did not meet the eligibility criteria, leaving 23 documents for the qualitative review and synthesis stage.

Phase 4: Quantitative synthesis -bibliometric analysis
Expanding on the content of Section 3.1, the different search terms considered for the SCOPUS database searches and the extent of their variations are illustrated in Table 1. As discussed in Section 1, DS is regarded as an umbrella term for the purpose of this article; it consists of many related concepts that were included in the DS search term. The different combinations of the search terms illustrated in Table 1 are shown in Table 2, along with their corresponding search labels for ease of reference. The bibliometric analysis corresponds to the first part of the fourth and final phase of the PRISMA methodology, shown in Figure 1.

Timeline of publications
A timeline of the previously selected publications is illustrated in Figure 3, starting from the year 2013; all publications published prior to that year are included under 2013. The year 2022 is included for the sake of completeness; however, because that year had not concluded at the time of writing this article, it was not considered for the analysis that follows.
Referring to Figure 3, the relative lack of publications before 2017 might be indicative of the novelty of the DS-related literature. However, a sharp increase in total publications is observed from 2017 onward, especially related to DS challenges (represented by the search label 'DSC'). This trend could be explained by considering the occurrence of publications related to the implementation of DS ('DSI'); there is a slight but steady increase in publications from 2016 to 2020, after which a decline is observed. This could indicate that DS has reached its 'trough of disillusionment', as predicted by the Gartner hype cycle for emerging technologies in 2015 [9].This may be explained as a general sense of disappointment in a technology, and a subsequent drop in its application, after a cycle of overuse and inflated expectations. This would explain the constant increase in publications related to DS challenges as more enterprises discover the difficulties associated with its implementation. DS in developing countries ('DSDG') shows an upward trend from 2017 to 2019, after which a period of fluctuations is observed until the year 2021. This presents a unique opportunity for further research to be done, as SMEs in developing countries have more room for general growth than SMEs in developed countries (as discussed in Section 1) [6].  Figure 4 shows the number of publications per continent. Based on the lack of publications corresponding to DS in developed countries ('DSDP'), the European dominance observed in Figure 4 raises a new factor to consider. A possible reason for the polarity between the two sides is that research conducted in developed countries is not advertised as such; it might be regarded simply as the international standard by the respective authors. This reasoning could also be used to explain the more specific use of the 'developing' search term and its variations in publications: authors might accept the notion that developed countries are considered the standard, but might find it necessary to specify explicitly when they refer to developing countries. Given the potential of DS for SMEs mentioned in Section 1, and the lack of a literature from developing countries, there is a unique opportunity for research to be conducted in this context. Furthermore, Africa has the worst ratio of publications per population share, which also emphasises the opportunity for research in this area.

Subject areas of publications
The number of publications for the most common subject areas and their corresponding search areas are shown in Figure 5. Based on the data presented in Figure 5, it could be deduced that computer science and engineering contribute roughly half of the total publications. This suggests that the majority of the research is technical in nature, which might point to an opportunity for research with a larger emphasis on economic and social perspectives.  Figure 6 provides a visual representation of the co-occurrence of the keywords that are present in the identified literature, with the aim of displaying the importance of these keywords and identifying potential links. Author and index keywords were considered, with a minimum occurrence of five for each keyword and a link strength of one. 'Minimum occurrence' refers to the minimum number of times a keyword should appear before being considered for the network diagram, and 'Link strength' refers to the minimum number of times keywords need to be linked to each other in the literature before they are connected by a line in the network diagram. A total of 36 keywords met the criteria, and are shown in Figure 6. VOSviewer was used to construct the diagram, which automatically identified four clusters, indicated by the respective colours. Based on the sizes of the most prominent keywords of each cluster, cluster categories were identified as follows:

Keyword analysis and emerging links
1. Big data: At the centre of it all, BD is linked to each individual keyword, with the strongest links to data analytics, advanced analytics, and small and medium-sized enterprise. This provides further evidence of BD being used as an umbrella term in the literature, not exclusively for data consisting of the 5Vs (discussed in Section 1). 2. Data analytics: Linked to most of the same keywords as BD, and with a similar link strength, it could be deduced that data analytics and BD go hand-in-hand. Although these concepts have many overlapping qualities, it is important to note that they should not be used interchangeably, as is often the case in the literature. 3. Decision-making: Once again, this was linked to virtually all keywords, but with weaker link strengths than those of BD and data analytics. These links provide evidence of the end result of using data, as improved decision-making could be regarded as one of the main goals of its implementation [2]. The strongest links are formed with data analytics and with small and medium-sized enterprise.

Industry 4.0:
This was linked to most of the technology-and management-related keywords, with the strongest links to competitive advantage, big data, Internet of Things, and small and medium-sized enterprise.

Phase 4: Qualitative synthesis -content analysis
Phase 4 of the PRISMA methodology ended with a content analysis that covered the 23 publications identified in Phase 3. The first step was to identify the most notable themes present in the final set of publications; the four respective SCOPUS search categories did not suffice for this step, as the search terms were broad and the literature provided various areas of interest.

Barriers to DS implementation in SMEs from developing countries
The literature identified for Phase 4 of the PRISMA methodology, as mentioned in Section 3.5, consisted of eight publications directly related to the implementation of DS in developing countries. These publications all mention barriers to implementation, which could be further broken down into seven categories, as illustrated in Table 3.
Data quality concerns is the least-mentioned barrier to implementation in the literature, which might point to the relative unimportance of the issue, as data quality only becomes a factor once the necessary infrastructure is already in place, which is often not the case in SMEs from developing countries. The main reason for poor data quality could be attributed to a lack of employee skills [10].
Insufficient infrastructure is mentioned in three of the eight publications, with a clear emphasis on the difficulty posed by BD specifically, owing to the sheer size of the data and the subsequent requirement for sufficient server equipment [1], [11]. The aim is to handle and interpret data swiftly and efficiently; however, many SMEs are challenged in this regard, as they do not have adequate transmission and storage infrastructure or computing power because of financial constraints [12].
Financial constraints is viewed as one of the main barriers to DS implementation, owing to the high cost of adequate infrastructure and skilled workers [11], [13]. The latest technological advancements in DS could help to reduce costs by leading to better organisational decision-making; but the cost of implementation is still seen as a major barrier to SMEs in developing countries [14].
Data privacy and security concerns: Data is often stored in the cloud, which poses a unique threat: the data might be sensitive in nature, and could be exploited if it is not securely stored [10]. This adds another dimension to the problem, which might require legal expertise -a rare commodity in SMEs in developing countries [11], [12].
Social challenges were mentioned in four of the eight publications, emphasising the importance of organisational culture for the successful adoption of DS. Resistance to change and to adapting to I4.0 in general is proving to be a large stumbling block for many SMEs, thus highlighting the importance of having the full support of management [11], [12], [15], [16]. Moreover, there is a shortage of data experts at executive level in many developing countries; and this could lead to a lack of strategic leadership, which is seen as one of the main reasons why SMEs in developing countries are not implementing DS [16].
Access to software is a major stumbling block for SMEs, given the high cost and level of expertise required to implement these DS software solutions [11], [12], [14], [16]. There are less complicated solutions on the market, but their usefulness is questioned [12].
Lack of skills were mentioned in six of the publications, making it the most common barrier to implementation mentioned in the literature. Based on the number of DS publications published per year, as displayed in Figure 3 of Section 3.4.1, the field of DS is still growing rapidly, which explains why developing countries are struggling to find individuals with the necessary skill sets in the job market [11], [16]. Following the common principle of supply and demand, those individuals are highly coveted, and can ask large salaries because of the high demand and low supply of their skills [17]; so employing them is often not feasible for SMEs, given the previously mentioned financial constraints they face.

Barriers to DS implementation in SMEs from developed countries
The barriers to implementation are mentioned in six studies that place a specific emphasis on SMEs from developed countries. These barriers can be split into five categories, as illustrated in Table 4.
Regulatory challenges are discussed in two sources, which mention the lack of knowledge in SMEs about legal regulations relating to data and about service level agreements [18], [19]. There are also concerns about the privacy and security of data when using cloud computing -a solution that is growing in popularity because of its flexibility and limited financial demands [19].
Economic challenges are seen as the first obstacle to overcome when implementing DS in SMEs, as they have a direct impact on the quality of the resources that companies are able to obtain [20], [21]. Furthermore, there is an economic aspect to every barrier mentioned in this paper, which also points to its importance. It could be argued that the small number of publications covering economic challenges are a result of the obvious and indirect nature of their role.
Technical challenges are mentioned in four publications, making it one of the three most often mentioned barriers to implementation. The sheer size of BD poses many technical challenges, and directly impacts the data acquisition, storage, and analysis phases [20]. The accuracy and the lack of traceability of AI decisionmaking processes are a cause of concern for some SMEs, along with the need for supervision; at the very least, human intervention is still required for the development of models [18]. In addition, SMEs tend to have limited cybersecurity knowledge and practices, which makes them vulnerable to cyber-attacks [19], [21].
The current literature places great emphasis on the organisational challenges experienced in SMEs, pointing to its being one of the barriers with the greatest impact. It is believed that management often pursues DS as a time-based project rather than as a permanent transformation towards data-driven decision-making [20], [22]. Consequently, this approach typically leads to unmet expectations and ultimately to discontinuation. Furthermore, additional barriers to implementation include resistance from owners and relevant workers' lack of trust in AI solutions [18], [19].
Lack of skills is mentioned in five publications, making it the most common barrier to implementation in the current literature. It is seen as a major barrier in developing and developed countries alike, pointing to a global shortage of data literacy. However, in developed countries, the skills mentioned in the literature as lacking are more specific: there is a need for more AI experts [18], along with skills related to BD and cybersecurity and DS experts with adequate business understanding [17], [19], [21], [22]. It could be argued that the higher degree of specificity used to describe DS challenges in developed countries, as opposed to developing countries, provides further evidence that DS implementation is more advanced in developed countries. Organisational challenges [18, 19, 20, 22] 4 Lack of skills [17,18,19,21,22] 5

3.5.3 SMEs' readiness for DS implementation
As with any new technology being implemented, it is important to assess whether an organisation is ready for its implementation from a maturity perspective [23]. It is also necessary to ensure that a proper alignment between strategies and business specific goals is achieved [11], [24]. It is argued that organisational technological readiness is a multi-level construct that requires collective readiness at both individual and organisational levels [11], [25]. This notion is supported by the social challenges and organisational challenges barriers mentioned in Section 3.5.1 and Section 3.5.2 respectively, as one of the major challenges to DS implementation is deemed to be the lack of an organisation-wide trust in it and an inclination to implement it. Many factors influence the readiness of SMEs; however, it could be argued that these factors can easily be derived from the challenges mentioned in Section 3.5.1 and Section 3.5.2, and so they will not be elaborated upon further in this paper.
The assessment of an organisation's readiness for DS implementation could be achieved through the use of a maturity model, which would clear the path for a development and implementation plan to be designed [23]. Based on a large variety of maturity models reviewed by Coleman et al. [23], they could be summarised by the following dimensions: 1. Business strategy 2. Data governance 3. Level of skills within an organization 4. Level of enterprise adoption 5. Management and organizational culture Coleman et al. [23] argue that these models were developed for large enterprises, which brings their use by SMEs into question because of the vastly different contexts these respective organisations operate in. Consequently, it is necessary for a maturity model to be developed specifically for the context of SMEs [23]. However, the literature on maturity models for DS implementation in SMEs is currently in short supply.

Opportunities for DS implementation in SMEs
The implementation of DS brings many benefits and opportunities to SMEs. However, it is argued that the outcomes are not typically measured in organisations, but are simply assumed to be beneficial based on DS's innovativeness and digital nature [26]. Nonetheless, there is sufficient literature related to the benefits and opportunities for SMEs to conclude that there would be great potential in implementing it. The references cited in this section are shown in Table 5.
DS technologies and innovations are believed to provide SMEs with a competitive advantage, given its knowledge generation capabilities [1], [27]. Furthermore, the proper analysis of data can lead to greater innovation, which is seen as a strong selling point for SMEs: it serves as a great motivator for talented individuals and potential investors [18], [19]. Ultimately, it is believed that DS has the potential to produce a reduced workload as a result of its digital and autonomous nature, which could also lead to a reduction in human error [18].
Greater client and market understanding could be achieved through the use of BD, which would pave the way for SMEs to analyse and predict customer and market behaviour [1], [14], [20], [28] and lead to more effective decision-making.
Increased performance and productivity are mentioned in four studies, with an increase in efficiency being attributed to AI [18]. The combination of cloud computing and BD is believed to lead to higher productivity, reduced hardware costs, and increased profits [1], [19], [29].
Improved business processes and decision-making is the most often mentioned benefit in the current literature, which is perfectly aligned with the views of Provost et al. [2] that improved organisational decision-making is the central aim of DS implementation. Asa result, BD and BDA are considered critical technologies for the purpose of optimising business operations and developing effective strategies, thus leading to improved and well-informed organisational decision-making [1], [14], [27], [28], [30].

CONCLUSION
First, the aim in this literature review was to examine the literature on the implementation of DS in SMEs in general. Second, frequently discussed topics and categories were identified, with the aim of gaining insight into the state of the literature. In this structured literature review, four categories of interest were identified, covering the implementation of DS in SMEs from both developing and developed countries, the DS implementation readiness of SMEs, and finally, the opportunities for SMEs that come with DS implementation. This review identified the common trends in the literature and promising areas of future research. The key findings were the lack of research into DS in SMEs in the context of developing countries, and a lack of maturity models for the implementation of DS in SMEs in general. Another important finding was the lack of available DS skills in both developing and developed countries. This could prove to be an important consideration for the development of future DS maturity assessment models and implementation frameworks. The insights of this review lay the groundwork for future in-depth research into the implementation of DS in SMEs in various economic contexts. The goal of future research would be to aid in the process of sustainable development by making a significant contribution to the vastly expanding body of literature on implementing I4.0 technologies.