Online harvesting of municipality websites into trusted digital repository

Purpose – Municipalities, as the front lines of service delivery, use websites as one of the tools to communicate information to the public. While it is considered a record, many organisations, including municipalities, do not manage websites as such. This study aims to explore the archiving of websites as records in the municipalities of KwaZulu-Natal (KZN) Province in South Africa by using the web archiving life cycle model. Design/methodology/approach – This study used a mixed-methods research with an explanatory design, with quantitative data collected ﬁ rst through content analysis of websites and qualitative data collected through interviews. Researchers used multilevel sampling, ﬁ rst quantitatively analysing all available websites of the municipalities (52) in KZN, and then qualitatively selecting only records managers, information managers, web administrators, communication managers and website managers or designers from municipalities because of their understanding and involvement with websites in some way. Findings – This study established that some records on municipal websites are often in paper format in record-keeping systems, whereas others are born digital and are not captured in the systems. Municipalities lack a dedicated web online harvesting tool as well as an archiving policy or strategy to guide website archiving. Furthermore, municipalities placed a high reliance on service providers to keep their websites operational. Research limitations/implications – It became clear during the interviews that most of the participants were unfamiliar with web archiving. As a result, only 12 of the 56 selected participants from the municipalities provided the required information in relation to the current study as others could not provide answers. Data for other participants were not analysed. Originality/value – Due to a lack of infrastructure for ingesting digital records into archival custody, a framework for harvesting web content of value is proposed both internally in municipalities and externally to an archive repository.


Introduction
Many organisations, including municipalities, use websites to communicate with stakeholders because the web is fast, current and effective.During the process of communicating through websites, content that is regarded as a record is created (Bragg and Kristine, 2013).This content on websites is important and should be managed as records for future reference.While some documents published on websites are frequently available in paper format in record-keeping systems, others are created in a digital environment and are not preserved in such systems.Many national archival agencies around the world, such as the National Archives and Records Service of South Africa (NARSSA) (2007), regard websites as records other than correspondence systems.For example, the National Archives and Records Service Act (Act No. 43 of 1996) defines a record as "recorded information, regardless of form or medium, created or received by a governmental body in pursuance of its activities".In this context, a governmental body is defined as any "legislative, judicial or administrative organ of the state", which includes municipalities.The inclusion of a website as a record implies that it must be managed and treated as a record as per the NARSSA.
All governmental organisations, including municipalities, are required by the NARSSA to have a strategy in place for effective digital records management.Municipalities in South Africa use websites to interact with their constituents.However, many organisations, including municipalities, are experiencing problems managing websites as records over the course of their entire life cycle.Tsabedze (2018), for example, noted that in the government of Eswatini (previously The current issue and full text archive of this journal is available on Emerald Insight at: https://www.emerald.com/insight/2514-9326.htm Swaziland), there was a lack of proper records management, rendering the e-government strategy ineffective because the government websites were down.Furthermore, the Eswatini Government notes that there is no government-wide coordinated e-records strategy that cuts across and integrates records across all ministries (Tsabedze, 2018).The situation is similar in Botswana, as Mosweu (2018) found out that the government of Botswana lacks policies and strategies to ensure the long-term preservation of liquid communication generated by social media.Web content is also regarded as a form of liquid communication.
According to Mosweu and Ngoepe (2019), liquid communication is content that flows and is not rigid.This liquid communication is ongoing, and the content moves from one circle of people to another, crossing public-private lines (Mosweu, 2019).This definition clearly includes websites, emails and social media content.As Mosweu and Ngoepe (2020) would attest, such content requires a faucet to be controlled.
Websites come in a variety of shapes and sizes, each of which presents a unique set of challenges when it comes to managing them as records (Brügger, 2005).A website, for example, can be static, with a collection of hyperlinked documents stored in folders on a database.The only activity on the site in this regard is the movement between the hyperlinked documents.These sites are relatively simple to preserve; for example, snapshots can be taken, or the entire site can be written to a compact disc, and version control can be used when the site changes (Brügger, 2005).Documents published on such websites can also be managed through the databases in which they are stored.Websites, according to the NARSSA guidelines, facilitate business transitions and should be treated as records that can be accessed and used as needed.Governmental bodies would not have to worry about harvesting websites if they were properly managed, as records could be retrieved from anywhere.
Secondly, a website can be dynamic, which presents preservation and management challenges.On a dynamic website, users can make requests for data contained in a database on the server that was assembled on the fly based on what is requested, using an e-form (Niu, 2012).Users can search for all available resources to answer a specific query on dynamic websites that are linked to the internet (Niu, 2012).These types of websites are difficult to manage as records, and they necessitate a proper strategy for managing and archiving them.Finally, a website can be interactive, allowing organisations to communicate with their stakeholders.For the record, this type of website falls under the umbrella of "liquid communication", as defined by Mosweu (2019) and Mosweu and Ngoepe (2019).This type of communication can easily be passed back and forth between the participants (Mosweu and Ngoepe, 2020).The content emitted by such interactions is difficult to manage and archive in the context of this study.However, if such records are of long-term value, they should be archived for posterity.This archiving process is referred to as "web archiving" in this study.This study used web archiving life cycle theory to explore the archiving of websites as records in the municipalities of KwaZulu-Natal (KZN) Province in South Africa.
The web archive life cycle model represents the establishment of a measurable model to which organisations can turn while building or enhancing their online archiving programmes.The Archive-information technology team developed this model based on their practical experience in web archiving.According to Bragg and Kristine (2013), this is made up of various cycles, which include the following: Policy: the blue circle just inside the policy band represents the high-level decisions an institution faces as it sets up and manages its web archiving programme.
Vision and objectives: institutions clarify the goals of their web archiving programme.
Resources and workflow: institutions review their available resources, including finances, expertise, staff, potential collaborators and others, to determine how to proceed with developing or changing their web archiving programme.
Access/use/reuse: institutions make decisions about whether and how to provide access to their collections and monitor how patrons use the content.
Preservation: institutions make decisions about how they want to preserve the data they collect through their web archiving activities.This includes both data files and metadata.
Risk management: institutions consider their approach to risk when creating a web archiving programme, they look at copyright and permissions as well as access.
The day-to-day tasks captured in the web archive life cycle include: Appraisal and selection: institutions decide specifically which websites they want to collect.
Scoping: institutions may opt to archive portions of a website, whole sites or even the entire web domain.
Data capture: institutions fine-tune how they want to capture their data through decisions about crawl (capture) frequency and types of files to archive or not archive.
The scoping and data capture phases of the life cycle often overlap as they involve similar activities and decisions.
Storage and organisation: this step includes a temporary or long-term storage plan for the archived data.For some institutions, the storage and organisation phase of the life cycle might also constitute their preservation activities.
The constructs chosen for this study are website appraisal and selection, website harvesting, storage and access.

Problem statement
Even though the NARSSA Act considers websites to be records, many South African public organisations, including municipalities, do not manage their websites as records (NARSSA, 2006).R as a result, these records simply disappear, resulting in institutional amnesia or gaps in information posted online, such as the liquid communication identified by Mosweu and Ngoepe (2019).All governmental bodies, including municipalities, are required by NARSSA to have a strategy for the effective management of digital records, which includes web content.This means that accounting officers in government bodies must make sure that all electronic records systems, such as email and websites, are managed in accordance with NARSSA record-keeping guidelines and standards.A framework for long-term digital preservation elements is required to assist municipalities in web archiving.Such a framework might enable municipalities to implement practical plans for long-term digital preservation to improve accessibility and information flow.The purpose of this study was to explore the archiving of websites as a record in the municipalities of KZN in South Africa.The specific objectives were to: Examine the nature of websites in KZN municipalities.
Determine the selection process for municipalities in KZN when archiving websites.
Determine the storage of published content when archiving websites in KZN municipalities.
Assess the process of harvesting municipal websites into the trusted e-records digital repositories in KZN.
Suggest a framework for web archiving in KZN municipalities.

Methodology
This study used a mixed-methods research with an explanatory design, with quantitative data collected first through website content analysis and qualitative data collected through interviews.Multilevel sampling was used, with researchers first quantitatively analysing all available websites of the 52 municipalities in KZN, as reflected in Table 1.For the qualitative data, only records managers, information managers, web administrators, communication managers and website managers or designers from municipalities were chosen because of their understanding of and involvement with websites in some way.Although 55 participants were identified for interviews, saturation was reached with only 12 participants.Although other participants beyond the 12 were interviewed, they had no idea how their municipalities' websites functioned.Even when they referred the researchers to other colleagues, we were unable to obtain relevant information.As a result, only the information provided by the 12 participants was found to be useful during the data analysis.The data is presented thematically in accordance with the study's objectives, with verbatim quotes from the 12 participants (Participant 1 to 12), as well as figures and tables.

Findings and discussions
The findings are organised into themes that stem from the objectives of the study.

Nature of websites in municipalities
The researchers wanted to know if municipalities had websites and what kind of websites they had.Figure 1 shows that 21 municipalities had static websites, 31 had dynamic websites and three did not.Municipalities without websites indicated that they are located in rural areas and thus experience network challenges.Furthermore, their constituents did not interact with them through their websites.
Municipal interactive websites included a chat function, also known as the message box in some municipalities.This function enables communities to interact directly with the municipality, or to leave contact information so that they can contact them.Municipalities face preservation challenges as a result of such websites.It was also discovered that some municipal websites are linked to various social media platforms (see example in Figure 2).Some municipalities, including Umzumbe Local Municipality, Umshwathi Local Municipality, Umuziwabantu Local Municipality and Richmond Local Municipality, have not updated their websites in a long time, some for three to four years (see Figures 3 and 4 for example of outdated websites).When asked about the outdated municipality websites, Participant 11 stated, "The outdated information on websites is caused by the fact that we only wait on the service provider to update information for us".Furthermore, Participant 11 identified a lack of information and communication technology (ICT) skills as a barrier to using the municipalities' websites.This is due to the fact that in the majority of these municipalities, content is updated by service providers who require payment each time the website is updated.As a result, because there is no skill transfer, the maintenance is not sustainable.The researchers were shown a signed Memorandum of Agreement between the service provider and the municipality in this case, but it was not pursued.It was revealed that most municipalities (48) outsourced their websites to service providers.In this regard, service providers are responsible for developing websites from scratch, publishing information (via email) and archiving data (web archiving).These municipalities relied entirely on service providers to post information on their behalf via email.
Indeed, some of the websites' content is not up to date (e.g.2023 content for Umuziwabantu Local Municipality, Ubuhlebezwe Local Municipality, Maphumulo Local Municipality and Richmond Local Municipality is the same as in 2019).The researchers evaluated the websites in 2019, 2020, 2021, 2022 and 2023.In 2019, it was revealed that information on most municipalities' websites was posted when the websites were created.In 2020, the change became apparent due to the same situation as during the pandemic, where most of the municipalities were forced to adapt to the change and the digital world.It was observed that after six months, there was no new update relating to awareness surrounding the COVID-19 pandemic or any other recent news.Participant 10 indicated that "most service providers do not hand over source codes to the municipalities after creating websites so that they can continue to have business and hold the municipality to ransom through maintaining the website".The participant even cited an incident where the governing party in South Africa, the African National Congress (ANC), lost information on the website because the service provider refused to hand over the information until the ANC paid a certain amount owed.This was almost certainly happening with municipalities and incidents that were not reported to the public.Such scenarios raise issues of ownership for the website; hence, one participant identified the risk of not having control over the municipality's website, which could jeopardise longterm digital preservation.From all the records management policies scrutinised by the researchers, none made reference to a website as a record.The only reference made in one of the policies is with regard to requests for access to the information manual, which is available on the website.When asked about this, one participant indicated that this is because websites in It should be noted that there were municipalities that were responsible for publishing and updating their own content such as KwaDukuza Local Municipality, Umshwathi Local Municipality and eThekwini Metropolitan Municipality.In this regard, the responsibilities lied with the web content manager.With these municipalities, their websites were up to date.A participant from one of these municipalities stated that "We do not have a policy that is specific to archiving of websites but only ICT and email policy".Other municipality websites have links that lead to nowhere (see Figure 5).

Selection process of web content for archiving
In the public sector, appraisal leads to the final phase of records management, disposal, which includes either indefinite retention in the office of origin, record destruction or transfer to an archive repository.The selection phase of web archiving entails selecting specific websites to archive.This stage of the life cycle involves more granular, specific decision points than the broader "vision and objectives" policy phase (Bragg and Kristine, 2013, p. 22).Participants were asked how they handle the decision-making process for which materials are published on the municipal website.
Participants (KwaDukuza Local Municipality, uMlalazi Local Municipality, Ubuhlebezwe Local Municipality, Richmond Local Municipality, uMhlathuze Local Municipality and eThekwini Municipality) indicated that they were not familiar with records appraisal process.Municipalities and other governmental bodies, as Ngoepe and Nkwe (2018) pointed out, must apply to the NARSSA or respective provincial archives for disposal authority.Once the municipality has been granted disposal authority, it will use it to develop a retention schedule that will guide either destruction or transfer to an archive repository.The general consensus was that no disposal authority was requested because municipalities did not consider websites to be records.Participant 3 stated as follows: We use the website as a communication tool.Therefore, we are not managing it as a record.This is so because the content that is published is managed elsewhere.Here on the website, we are just communicating with the external stakeholder.
The municipalities mainly relied on the NARSSA's general disposal authority, which does not cover websites.

Storage of published web content
For decades, storage has been a problem in records management, particularly for paper-based records.Digital records have proven to be a lifeline for many organisations in terms of storage.The use of cloud computing services by businesses has skyrocketed in recent years (Katuu, 2016).However, data shows that cloud storage adoption in South Africa is very low (Shibambu and Ngoepe, 2020).Perhaps this is due to the fact that the cloud storage policy was only developed in 2021 although not yet approved.
When asked how the website's and files' contents are created, stored and managed, Participant 1 stated: "Information is described as per the various departments of the municipality" (technical services, community services, etc.).Participant 9 also highlighted: We always create hardcopy files for each of the pieces of information or content being emailed to the service provider, but we do not determine how long we have to keep the files that are being emailed to the service provider.
The following storage mediums were also identified by the participants: cloud, internal server, external server, universal serial bus, hard copy files and automated systems.It was also discovered that email is the most commonly used system for sharing information to be posted on websites (by service providers or colleagues) in municipalities.Participant 10, for example, stated: As the municipality, we email all the content to the service provider, and it never crosses our mind to think about the criteria to be used in the management of the files that we remain with after emailing everything to the service provider.
Participants (Kokstad Local Municipality, eThekwini Metropolitan Municipality and KwaDukuza Local Municipality) mentioned SharePoint and WordPress as content management software used by the municipalities.Furthermore, because some websites were archived using the internal server, only municipal employees had access to them.In such cases, attempting to access old links leads to a "404" error message (see Figure 2).

Harvesting of websites into a digital repository
According to Ngoepe and Keakopa (2011), South Africa lacks the infrastructure to ingest digital records into archival custody.Furthermore, Ngoepe (2017) indicated that the only digitised audio-visual records in South Africa are the court proceedings from the famous Rivonia trial, which were originally in Dictabelt format.As a result, research into how municipal websites are harvested was required.It was discovered that public archives repositories in South Africa, including the KZN Provincial Archives, unconsciously use a post-custodial approach to digital record preservation.Participants indicated that they had no idea how the municipality harvest websites for archiving.For example, Participant 11 stated, "We request the service provider to deal with all issues relating to websites".Another issue highlighted by this participant was a lack of technical resources and capacity for designing and maintaining websites.
From the KZN Provincial Archives, it was confirmed that there is no infrastructure for web ingestion into archival custody.Instead, archivists there indicated that they are waiting for the national archives to fully implement access to memory (AToM) and Archivematica, which will later be extended to the provincial archives services.However, the archivists indicated that they were not sure if these systems would cater to web archiving.Participant 3 remarked: We use the website as a communication tool.Therefore, we are not managing it as a record.This is so because the content that is published is managed elsewhere.Here on the website, we are just communicating with the external stakeholder.
It was, however, revealed that only eThekwini Metropolitan Municipality and KwaDukuza Local Municipality indicated that they do harvest their content into an internal web archiving system.For example, Participant 5 stated that the municipality relies on the software to control the content on websites through WordPress.Software such as WordPress, Drupal, SharePoint and TTech can store data for up to five years, depending on how it is programmed.

A framework for web archiving
The ultimate objective of this study was to propose a framework for archiving websites.Figure 6 depicts a proposed framework to help municipalities ensure web archiving.This framework can also be applied by other governmental bodies, as the national and provincial archives lack the necessary infrastructure for web archiving.
The proposed framework describes the overall structure of archiving websites, which includes legislation and regulatory requirements, digital content harvesting, e-government initiatives and social media.These components are linked to one another using numbering to illustrate the relationship between them and create a coherent framework.In the study, it was established that while most municipalities have a records office, records officials are not involved in documenting, digitising, harvesting or posting information on websites.The suggested framework entails capturing individual municipality websites and storing them in the same repository as digital records, through the preservation system such as Archivematica, where they can be accessible to the public via AToM, as these two are currently piloted by the NARSSA.The process involves capturing individual municipality websites and putting them in the same trusted digital repository as electronic records.Trusted digital repositories is one whose mission is to provide reliable, long-term access to managed digital resources (Pearce-Moses, 2019).

Legislation and regulatory requirements
To support the long-term digital process, appropriate legal and regulatory requirements that are available had to be adhered to.The availability of a legislative and policy framework may go a long way in helping to regulate web archiving.The pieces of legislation that can be followed in South Africa include, but are not limited to, the National Archives Act as well as the Provincial Archives Act.The primary standard for creating and managing electronic records in office environments, which is endorsed by the NARSSA, is SANS (ISO) 16175-2:2014 Ed. 1.00: Information and documentation -Principles and functional requirements for records in electronic office environments -Part 2: Guidelines and functional requirements for digital records management systems.Therefore, in the absence of directives from the provincial archives, the web content has to be managed and archived as per the prescripts of the NARSSA.In this regard, municipalities should apply for disposal authority for websites.Once issued, the municipalities need to determine retention schedules.

Retention and disposition
Municipalities are required to have a records retention schedule in place to describe the procedures for transferring or disposing of records, including websites.If the information on websites, especially the dynamic ones, has long-term value, it should be preserved.Preservation can either be internal within municipality servers, as the provincial archive does not have infrastructure, or external, as the national archive is busy building infrastructure.In this regard, as Ngoepe (2017) would attest, a policy of distributed custody would be applied.Once the infrastructure is built, such websites can be harvested on a daily basis, especially those with archival value, and stored on the servers of the provincial or national archives.

Harvesting digital content
The proposed model includes two levels of web archiving in this regard: harvesting digital content, which links information to the internal web archive, and harvesting to provincial archives.The NARSSA requires governmental bodies to implement and maintain integrated document and records management systems that provide, at a minimum, the following functionality: managing a functional subject file plan; and managing emails and websites as records.It should be noted that only a few of the municipalities (three) had the capacity to implement fully automated integrated document and records management systems.

Conclusion and recommendations
This study found that, while municipalities have websites, they use them as a communication tool rather than a record.Some records on municipal websites are often in paper format in record-keeping systems, whereas others are born digital and are not captured in the systems.Municipalities lack a dedicated web online harvesting tool, as well as an archiving policy to guide website archiving.Furthermore, municipalities relied heavily on service providers to keep their websites operational, with little municipal involvement.This is a high risk because service providers charge municipalities exorbitant fees, and there have even been reports in the media about organisations losing their domains and website information due to contractual obligations.Municipalities need to address the issue of ownership of websites and their associated content in contracts with service providers.In this regard, the records management policy should state that "all records, regardless of format, created by any person or entity in the service of the municipality are owned by the municipality and subject to its overall control".Such provisions should be applicable to both municipal employees and independent contractors.When custody of municipal records is transferred to a contracted service provider via outsourcing or any other arrangement, the records should remain the municipality's property and subject to approved disposal instructions and storage requirements.In addition, rather than relying on an implied directive, the definition of a record must be revisited to include websites.Finally, guidelines are required to guide municipalities on how to manage and archive websites.
used only as a communication tool with members of the public and not as a record.

Figure 2
Figure 2 Example of a dynamic websites (Newcastle Municipality, 2021)