Asset information requirements for blockchain-based digital twins: a data-driven predictive analytics perspective

Purpose – Thepurposeofthisstudyistoidentifythekeydatacategoriesandcharacteristicsdefinedbyasset informationrequirements(AIR)andhowthisaffectsthedevelopmentandmaintenanceofanassetinformation model(AIM)forablockchain-baseddigitaltwin(DT). Design/methodology/approach – A mixed-method approach involving qualitative and quantitative analysis was used to gather empirical data through semistructured interviews and a digital questionnaire surveywithanemphasisonAIRforblockchain-basedDTsfromadata-drivenpredictiveanalyticsperspective. Findings – Based on the analysis of results three key data categories were identified, core data, static operation and maintenance (OM) data, and dynamic OM data, along with the data characteristics required to performdata-drivenpredictiveanalyticsthroughartificialintelligence(AI)inablockchain-basedDTplatform.ThefindingsalsoincludehowthecreationandmaintenanceofanAIMisaffectedinthiscontext. Practical implications – The key data categories and characteristics specified through AIR to support predictive data-driven analytics through AI in a blockchain-based DT will contribute to the development and maintenance of an AIM. Originality/value – The research explores the process of defining, delivering and maintaining the AIM and the potential use of blockchain technology (BCT) as a facilitator for data trust, integrity and security.


Introduction
In asset-centric organizations, such as real estate or asset management (AM) companies, understanding how to manage and operate their assets is vital for organizations to fully harness their potential value Muller et al., 2019;Munir et al., 2020;Tchana et al., 2019). Identifying the information required to manage assets throughout their entire life cycle is one of the fundamental challenges for AM organizations (Heaton, 2020). In terms of information management during the building life cycle, International Organization for Standardization (ISO) 19650-1 (ISO, 2018) defines the concept of asset information models (AIM) which compiles the data and information necessary to support the AM process.
The contents of an AIM are specified by asset information requirements (AIR) which are derived from organizational information requirements (OIR). While contemporary AM tools enable asset information collection, their integration and data analytics capabilities are poor, and they are unable to manage dynamic asset data throughout an asset's lifecycle (Heaton, 2020;Lu et al., 2020b).
In terms of AM, digital twins (DTs) have the potential to provide asset managers with trustworthy, real-time records of real estate data (Dietz and Pernul, 2020;G€ otz et al., 2022;Jiang et al., 2021;Lee et al., 2021;Lu et al., 2020a;Macchi et al., 2018;Opoku et al., 2021;Shahzad et al., 2022;Yitmen et al., 2021;Zhao et al., 2022a). Subsequently, the concept of DTs within the architecture, engineering, construction, operation and facility management (AECO-FM) industry has received intense coverage in research (Davila Delgado and Oyedele, 2021;Ozturk, 2021). By utilizing artificial intelligence (AI) applications like machine learning (ML), DTs are able to analyze and predict the state of assets (Macchi et al., 2018;Zhao et al., 2022b). This is vital to ensure the healthy operation of a building and can reduce costs and save time during operation and maintenance (OM) (Ozturk, 2021)).
Despite the fact that DTs allows for the transparent and efficient implementation of industrial services and applications, these benefits are predicated on the assumption of data trust, integrity and security. In this aspect, utilizing blockchain technology (BCT) would be a possible solution to help AM organizations handle data on a distributed ledger while assuring data coordination amongst trustworthy DTs (Lee et al., 2021;Rasheed et al., 2020;Suhail et al., 2022;Teisserenc and Sepasgozar, 2021a).
Hence, the objective of this study is to identify the key data categories and characteristics defined by AIR and investigate how this affects the development and maintenance of an AIM in the context of a blockchain-based DT to support predictive data-driven analytics through AI. To achieve the objective of the study three research questions are posed.

RQ1.
What are the key data categories and characteristics specified through AIR to support predictive data-driven analytics through AI in a blockchain-based DT?
RQ2. How do the key data categories and their characteristics affect the development and maintenance of an AIM?
RQ3. Which of these data and information could benefit from being processed on a blockchain?
In the next section, the theoretical background is outlined, followed by the methodology involving study design, materials and procedures. The fourth section displays an analysis of the collected data, followed by a discussion in the fifth section. Finally, the conclusion is presented together with suggestions for future research.

Theoretical background 2.1 Information management in the asset management process
The concept of AM has become increasingly more prevalent within research and industrial practice to increase the produced value of assets throughout their life cycle (Macchi et al., 2018). AM is defined as: ". . . the coordinated activities that an organization performs in order to realize value from their physical assets" (ISO, 2014) processes or methods applied by AM organizations to supervise their asset portfolio on a day-to-day basis are termed AM systems defined by the ISO 55000-series (ISO, 2014) as: ". . . a set of interrelated or interacting elements to establish asset management policy, asset management objectives and processes to achieve those objectives" (ISO, 2014). In terms of information management during the building life cycle ISO 19650-1 (ISO, 2018), defines a set of information requirements and models. Any information or data that supports the management of an asset is stored in an AIM establishing a set of information to assist in the decision-making process throughout the life cycle of an asset. The AIM is comprised of both structured (e.g. 3D models, schedules and databases) and unstructured data (e.g. documentation, video and sound recordings). The contents of an AIM are specified by AIR which in turn are based on the OIR. OIR specifies the information required by an organization to achieve its objectives for AM and organizational functions. As opposed to asset or project-level requirements the OIR are organizational-level requirements (ISO, 2018). While there are a handful of standards aimed at utilizing building information modeling (BIM) data within OM and AM stating that organizations shall develop an AIM (Heaton, 2020), there are no comprehensive overarching frameworks supporting the alignment of strategic, process and technical standards (Alnaggar and Pitt, 2019;Heaton, 2020;Lu et al., 2019). The AECO-FM industry is plagued by complexity, fragmentation (Camposano et al., 2021) and interoperability issues caused by the lack of common data standards, formats, protocols and the general heterogeneity of data and differences in semantics and syntax of data (Shahzad et al., 2022). Identifying the information required to manage assets throughout their entire life cycle is one of the fundamental challenges for AM organizations. Inadequate knowledge of what information should be collected often results in misalignment with organizational goals. As a result, the AIM derived from these OIR and AIR is not suitable from an AM perspective (Heaton, 2020;Heaton et al., 2019;Munir et al., 2020).

Digital asset management tools
In the AM process, decision-making is key and it requires a steady flow of real-time data regarding asset performance and condition, reliable communication channels and immutable records of previous real estate data (Lu et al., 2019(Lu et al., , 2020bMacchi et al., 2018). Subsequently, the interoperability, validity and integrity of data and information is vital (Lu et al., 2020b). As AM organizations are challenged to increase asset performance with less financial resources, increasing social responsibilities and data management regulations while minimizing the environmental impact, they are looking at how these challenges can be addressed by utilizing digital technologies (Heaton, 2020;Heaton and Parlikad, 2020).
The adoption of BIM throughout the AECO-FM industry has entailed an increase of BIMbased AM both in practice and research. Although BIM has been successfully implemented in the design and construction phase its contribution during OM has been limited (Alnaggar and Pitt, 2019;Heaton, 2020;Heaton and Parlikad, 2020). While BIM is a capable AM tool during OM it lacks the analytical capabilities and level of information richness for the complex situations and comprehensive data management required, especially during the OM phase (Boje et al., 2020;Lu et al., 2019Lu et al., , 2021Shahzad et al., 2022). Thus, there is a need for an integrated platform capable of managing information in dispersed databases supporting the various activities during OM (Lu et al., 2021).

Digital twins and artificial intelligence
DTs are capable of providing functionality in line with the decision support needs of AM throughout an asset's lifecycle by providing asset managers with trustworthy, real-time records of real estate data facilitating the decision-making process (Dietz and Pernul, 2020;Jiang et al., 2021;Lee et al., 2021;Lu et al., 2020a;Macchi et al., 2018;Opoku et al., 2021;Shahzad et al., 2022;Yitmen et al., 2021;Zhao et al., 2022a). Fundamentally, a DT consists of a highfidelity virtual replica of its physical counterpart with a two-way connection enabling a bidirectional flow of data and information. However, there are several definitions of DTs in the literature (Opoku et al., 2021;Ozturk, 2021;Shahzad et al., 2022). Unlike BIM, DTs consider both the replication of the physical asset and a two-way connection that allows for updates and control of the asset (Davila Delgado and Oyedele, 2021;Shahzad et al., 2022). DTs integrate various categories of data regarding built assets, e.g. dynamic data such as the as-is condition of an asset, or static data like building information models, into a platform capable of generating insights and decision-support (Heaton and Parlikad, 2020). It is important to note that static data also includes nongeometric data like asset ID or name, location, type and relational data (Becerik-Gerber et al., 2012). In the case of DTs in the built environment, the spatial data provides the core framework, the engineering and equipment data portray the systems, and Internet of things (IoT) sensors facilitate real-time data collection that feed into AI and ML models (Lukesh et al., 2021).
In terms of the progressive development of DTs Boje et al. (2020) present an evolutionary model. The first generation incorporates sensing, analyzing and monitoring capabilities, the second adds AI, and the third and final generation is constituted by a fully semantic DT capable of utilizing the available knowledge through AI applications, such as ML, deep learning and data mining, similar to the cognitive DT described by Yitmen et al. (2021). Subsequently, recent research has also been directed at IoT sensor data combined with ML and AI to facilitate advanced analytics and data-driven decision-making through DTs (Heaton, 2020;Heaton and Parlikad, 2020).
Predicting the state of assets is key to ensuring the healthy operation of a building (Ozturk, 2021). Fundamentally, predictive analytics aims to identify relationships and expose patterns in data to enable the prediction of future outcomes based on current and historical data (Gandomi and Haider, 2015). By utilizing ML algorithms and real-time data the state of assets can be analyzed and predicted to support the decision-making process (Macchi et al., 2018;Zhao et al., 2022b). Subsequently, time losses can be avoided, and costs can be reduced during the OM-phase (Ozturk, 2021).
One of the most common technical challenges when it comes to implementing ML is the lack of extensive and structured data to train and validate the ML models. Also, the potential benefits of implementing ML struggles to justify the cost of implementation in the AECO-FM industry (Bouabdallaoui et al., 2021;Hong et al., 2020). Even though construction projects generate increasingly larger volumes of heterogenous data (mainly as a result of BIM implementation and wireless sensor networks) the adoption of AI is not on par with other industries (Pan and Zhang, 2021). Access to large data volumes on its own is of little value, and its potential benefits can only be accessed by utilizing it to facilitate evidence-based decision-making (Gandomi and Haider, 2015). An advanced DT enables data aggregation facilitating advanced AI implementations (Yitmen et al., 2021).
While extensive guidelines for applied ML in the AECO-FM industry exist (Bilal and Oyedele, 2020), extracting insight from data can be broken down into two fundamental processes. Data management (referring to processes and techniques for collecting, storing and preparing data for analysis) and data analytics (referring to the techniques applied in analyzing and extracting intelligence from the data) (Gandomi and Haider, 2015).

Blockchain-based digital twins
While a few studies have implemented predictive maintenance and anomaly detection through various methods, the concept of BCT is not included (Bouabdallaoui et al., 2021;Lu et al., 2020b;Zhao et al., 2022b). Although DTs allows for the transparent and efficient implementation of industrial services and applications, these benefits are predicated on the assumption of data trust, integrity and security. In real-life circumstances, data breaches may occur for a variety of reasons, both malevolent and nonmalicious. As a result, in order to effectively utilize the capabilities of DTs, the information and data must be reliable and secure (Shahzad et al., 2022). In this aspect, utilizing BCT would be a possible solution to help organizations handle data on a distributed ledger while assuring data coordination amongst trustworthy DTs (Lee et al., 2021;Rasheed et al., 2020;Suhail et al., 2022;Teisserenc and Sepasgozar, 2021a), providing stakeholders with raw data of when, where and how processes were executed (G€ otz et al., 2022).

SASBE
BCT is based on a decentralized immutable ledger which consists of chained blocks of information where transactions are verified by a peer-to-peer network of nodes (Shojaei et al., 2020). The decentralized technology differs from how traditional databases are built and brings several advantages compared to traditional technologies (Turk and Klinc, 2017). The distinct advantages are decentralization, immutability, reliability, authenticity, transparency and automation through smart contracts (Li et al., 2019;Suhail et al., 2022;Teisserenc and Sepasgozar, 2021a), providing an opportunity to endow trust into a network of generally segregated AM actors (G€ otz et al., 2022).
If decisions made by AI applications cannot be understood or trusted by the end-users, the relationship becomes dysfunctional. By leveraging BCT for AI, data security can be enhanced entailing higher trust and credibility of decisions. Storing tamperproof information in a transparent manner also facilitates trust and understanding of the decisions made by AI applications (Salah et al., 2019;Suhail et al., 2022).
One commonly discussed challenge of BCT is the energy consumption of certain BCT protocols. By switching to more energy-efficient consensus mechanisms the overall energy consumption of BCT can be reduced (Salah et al., 2019;Suhail et al., 2022;Teisserenc and Sepasgozar, 2021b;Vranken, 2017). However, due to the transaction speed limitations of BCT, large data volumes are challenging to store or process on the blockchain (Nawari and Ravindran, 2019;Pedersen et al., 2019;Suhail et al., 2022). Typically, transactions on a public blockchain are approved in 10 min or more while it usually takes less than one second on private blockchains (Salah et al., 2019).
Off-chain storage has the potential to alleviate the scalability difficulties while also lowering on-chain storage expenses in a standard blockchain (Hasan et al., 2020;Nawari and Ravindran, 2019;Pedersen et al., 2019;Putz et al., 2021;Suhail et al., 2022). However, there are several challenges with integrating off-chain storage solutions with blockchain-based DTs. For instance, off-chain storage induces centralization the validity of off-chain transactions cannot be guaranteed, state consistency of on and off-chain storage in real-time and security issues (Suhail et al., 2022). Hence it is recommended that future studies identify and define what data to be stored on or off the blockchain (Suhail et al., 2022;Teisserenc and Sepasgozar, 2021b).
The literature utilized in the theoretical background is summarized in chronological order in Table 1.

Materials and methods
A literature review was conducted in order to establish the theoretical background of the study, followed by a mixed-method approach to gather in-depth insights (Tang, 2020) from industry professionals specialized in AM, DTs and BCT through semistructured interviews and a digital questionnaire. The mixed methods approach was adopted to overcome the limitations of a single design by facilitating elaboration, clarification and development of the findings from one method with the results of the other. Mixed methods are also beneficial when seeking to widen the scope of the research by utilizing different approaches for different inquiry components (Molina-Azorin, 2016).
Due to the novelty of technologies like DT and BCT, a purposive sampling method was applied for selecting interviewees and respondents to the questionnaire as it is most effective when a certain area of expertise is required (Tongco, 2007). Unlike randomized studies, which purposely try to obtain a sample representative of the population the central premise of purposive sampling is to focus on individuals equipped with certain qualities that make them more suitable to contribute to the study (Etikan, 2016). Both quantitative and qualitative studies may employ purposive sampling, and despite the method being inherently biased, it remains reliable even in comparison with random probability sampling (Tongco, 2007 (2020) AIM to support the adoption of a DT 11 Boje et al. (2020) Construction DT: Directions for future research 12 Lu et al. (2021) Moving from BIM to DTs for OM 2.3 Digital twins and artificial intelligence 13 Dietz and Pernul (2020) A system of system approach to DTs 14 Jiang et al. (2021) DT implementations in the civil engineering sector 15 Lee et al. (2021) Integrated DT and blockchain framework 16 Lu et al. (2020a) Development of a DT at building and city level 17 Opoku et al. (2021) DT application in the construction industry 18 Yitmen et al. (2021) DTs for building lifecycle management 19 Zhao et al., 2022a Application of DT technologies to revamp building OM processes 20 Ozturk (2021) DT research in the AECO-FM industry 21 Davila Delgado and Oyedele (2021) DT for the built environment DTs in construction and real estate 24 Gandomi and Haider (2015) Big data concepts, methods and analytics 25 Zhao et al. (2022b) OM system based on DTs and ML 26 Bouabdallaoui et al. (2021) Predictive maintenance in building facilities using ML 27 Hong et al. (2020) State-of-the-art research and applications of ML in the building life cycle 28 Pan and Zhang (2021) The role of AI in construction engineering and management 2.4 Blockchain-based digital twins 29 Bilal and Oyedele (2020) Applied ML in construction industry 30 Rasheed et al. (2020) DT Values, challenges and enablers from a modeling perspective 31 Suhail et al. (2022) Blockchain-based DTs: research trends, issues and future challenges 32 Teisserenc and Sepasgozar When to use BCT 41 Hasan et al. (2020) Blockchain-based approach for creating a DT 42 Putz et al. (2021) Blockchain-based secure DT information management (1) ("Construction Industry" OR "Digital Twin" OR DT OR "Building Information Modelling" OR "Building Information Management" OR BIM) AND ("Asset Management" OR "Asset Information Model" OR "Asset Information Requirements" OR Blockchain*) (2) ("Construction Industry" OR BIM OR "Building Information Modelling" OR "Architecture Engineering and Construction" OR AEC OR AECO OR "Real Estate") AND ("Digital Twin" OR DT) Since DT and BCT are novel technologies in their infancy evolving at rapid pace, papers covering DT and BCT published prior to 2017 were excluded from the literature review through filtering. The following parameters were included in the filter: Year of publication: 2017-2022, Language: English, Document type: Journal articles and Conference Papers. The information flow of the literature search process is presented in Figure 1 based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) flow diagram (Moher et al., 2009). In addition, complementary searches mainly regarding AI, standards and research methods added another 15 references.

Interviews
The semistructured interviews were conducted through video calls with industry professionals, and all interviews were recorded and transcribed with permission from the interviewees. The interviews were based on an interview guide including predesigned questions covering a list of themes allowing for a more adaptive discussion. Data collected in the interviews was analyzed thematically using the qualitative data analysis software NVivo. Respondent demographics are displayed in Table 2.

Questionnaire survey
As a complement to the qualitative data collected from the interviews, a digital questionnaire based on the literature review was created. To construct the survey three overarching data categories were conceptualized based on the categories mentioned by Heaton and Parlikad (2020), Becerik-Gerber et al. (2012) and Lukesh et al. (2021), see section 2.3 for more details. The categories are core data, static OM data and dynamic OM data. The data categories will be explained further in the discussion. The questionnaire was based on a five-point Likert scale (Likert, 1932) covering general perceptions and the data categories derived from the literature. This was followed by multiple-choice questions where beneficial attributes of BCT were matched with the key data categories. The respondents were also given the opportunity to provide comments on each question, contributing to the qualitative data. To furtherly develop the questionnaire the first six interviews were used to validate and improve its content. The questionnaire was distributed through email and LinkedIn to 425 individuals from which 59 responses were received, resulting in a response rate of 14%. Respondent demographics are displayed in Table 3.
The unambiguous nature of numerical representation cannot always be exhibited in words. Numbers provide a definitive and standardized way to describe and interpret data. Statistical analyses can be viewed as a tool to bridge the gap between questions and answers. In the field of statistics, there are three main categories of analysis: descriptive, inferential and associational. Since purposive sampling was applied in selecting the population for this study inferential statistics are not applicable. Instead, descriptive statistics like the mean, median and standard deviation are calculated to provide a better understanding of the data. Associational statistics refers to relationships between data, such as correlation (DePoy and Gitlin, 2016).
In this study, the results of the questionnaire were analyzed using the statistical analysis software Statistical Package for the Social Sciences (SPSS) which was used to calculate the mean, median, standard deviation and correlations. Since the Likert-type data collected from the questionnaire is ordinal and normality cannot be assumed Spearman's rank correlation coefficient was used to calculate the correlation (Hauke and Kossowski, 2011).

Thematic analysis
A thorough thematic analysis of the captured ideas of the interviewees resulted in the emergence of seven major themes, addressing issues regarding prerequisites and challenges for predictive analytics, creation of AIM, ensuring the as-is/as-built status of AIM, AECO-FM industry characteristics, technical aspects of BCT, use cases of BCT, and BCT and AI synergies. Table 4 summarizes these themes along with a brief description of the core ideas, challenges and opportunities highlighted during the interviews.

Statistical analysis
As a result of the statistical analysis of the results of the digital questionnaire, displayed in Table 5, it was found that the level of expertise regarding the concept of DT and AM is considerably higher than that of BCT. Moreover, the calculated standard deviation indicates that there is homogeneity among the respondents regarding their level of expertise on the concept of BCT, AM and DT respectively. The analysis results listed in this table also reveal that most of the experts believe that data collected in the construction industry usually is unstructured and that structuring data is challenging. Furthermore, they are convinced that as-built BIM models usually only contain as-built geometry but lack asset-specific engineering data (e.g. detailed performance specifications and manufacturer data).
In a similar manner, Table 6 shows the results of analyses of how challenging the respondents find it to define (the process of developing and formulating information requirements for each data category), deliver (delivering the required information and data to the operational phase) and maintain (maintaining the required information and data throughout the asset life cycle) each of the key data categories (core data, static OM data and dynamic OM data).
From the calculated average values, it is evident that maintaining the data is the most challenging issue while delivering and defining the data are ranked next, respectively. Moreover, correlations between how challenging the respondents find it to define, deliver and maintain the three key data categories and their rated level of expertise is presented in Table 7.
In the questionnaire, the respondents were asked to select from five beneficial attributes of BCT for each key data category. The percentage of respondents who matched a key data category with an attribute is presented in Table 8. The five BCT attributes were: immutability (data/information not being capable of or susceptible to change), security (data/information being trustworthy, dependable and free from risk of loss), traceability (the ability to access and view the data/information history), accountability (the ability to always know where the Blockchainbased digital twins data/information originates from) and accessibility (being able to control who has access to the data/information). As can be seen in Table 7, traceability, specifically when it comes to dynamic OM data, is recognized as the most beneficial attribute of BCT. It is also evident that dynamic OM data benefits the most from all five attributes on average. Theme Description 1. Prerequisites and challenges for predictive analytics -First, ML requires a meta data model, i.e.i.e. a data structure for core asset data that is suitable for AI, e.g. ontologies like bricks, haystack or real estate core. Second, detailed data/information about the assets facilitating the O&M process. Third, data measuring the performance of the asset in question, real-time and historical -Formulating the problem mathematically for advanced multivariable predictive analytics is challenging -ML requires a lot of structured data to train and validate the models 2. Creation of an AIM -BIM models are generally not delivered in as-built condition. Cost focus leads to suboptimizations such as modeling assets in one space and referring to them in others. Assets in the BIM model are suggestions, the actual asset that is installed is not specified. Sometimes this information is delivered separately without any connection to the model or in formats that are challenging to integrate into the AIM. -Creating a structured AIM is resource intensive and the benefits are not instantly accessible. Due to the general lack of structured data the AECO-FM industry has an especially poor starting point -Ideally the 3D drawing space needs to be auto translated into building knowledge graphs as they are easily become too large to be effectively maintained by humans 3. Ensuring as-is/asbuilt status of AIM -Interoperability is vital to enable editing, ensuring the as-is/as-built state of the AIM. -The process of updating the AIM needs to be integrated into the daily work processes 4. AECO-FM Industry Characteristics -Divergence in digital maturity between actors -Generally low digital maturity in the industry -Data and information segregation 5. Technical aspects of BCT -The transaction speed of blockchain makes it inappropriate for high data rates, off-chain storage with references on chain could circumvent this issue -Some BCT protocols are energy intensive -It should not cost too much in terms of computing power and performance to decrypt anything that is not of particularly high dignity -Blockchains are difficult to program, cost of implementation could be higher than the potential value 6. Use cases of BCT -Access control credentials, which credentials has access to what spaces during what time -Who has access to what information in the DT. -Keeping track of service intervals, protocols, inspection logs, pictures, state of assets, active/retired assets -Transactions in general requires an immutable state, e.g. data/information handovers could utilize BTC to establish consensus regarding what was delivered by whom and when -If there is a need for security of a certain type of information blockchain could be useful -The level of encryption should stand in proportion to the level of sensitivity/ importance of the information/data. Sensitivity/importance of data can vary between different types of facilities 7. BCT and AI synergies -Recording data and decisions made by AI provides transparency and traceability promoting higher trust and ultimately better results

Discussion
DTs within the AECO-FM industry are a promising emerging concept for AM organizations. As a platform DT facilitates the integration of IoT devices and AI modules like ML and predictive analytics enabling AM organizations to make better decisions through data-driven decision-making (Macchi et al., 2018;Zhao et al., 2022b). As the benefits of DTs are predicated on data trust, integrity and security, BCT is introduced as a means to overcome this issue (Lee et al., 2021;Rasheed et al., 2020;Suhail et al., 2022;Teisserenc and Sepasgozar, 2021a). This study outlines three overarching key data categories and characteristics that support datadriven predictive analytics through AI in a blockchain-based DT, how this affects the development and maintenance of the DT, and lastly the role of BCT in this context.

Key data categories and characteristics
In the case of DTs in the built environment, Lukesh et al. (2021) state that spatial data forms the underlying data structure, engineering and equipment data depict the systems, and IoT sensors collect real-time data that feed into ML modules. Building further on this idea by incorporating the nongeometric data categories presented by Becerik-Gerber et al. (2012), and dynamic and static data from Heaton and Parlikad (2020), together with empirical data (see Table 4), three key data categories were defined by the authors. The first one is core data, which constitutes the underlying framework for all other data. Secondly, static OM data, Level of expertise regarding the concept of digital twin (Novice 1-10 Expert) 8.17 8 1.544 Level of expertise regarding the concept of blockchain (Novice 1-10 Expert) 5.14 5 2.161 Level of expertise regarding the concept of asset management (Novice 1-10 Expert) 7.97 8 1.956 5-point Likert scale (1 5 Strongly Disagree, 5 5 Strongly Agree) When working with asset information requirements what the client wants is normally clear and specified 2.39 2 1.051 When I am working with asset information requirements the goals they are supposed to fulfill are clear 3.07 3 1.096 The more granular/detail/specific the information requirement is the more problematic it becomes to define, deliver and maintain throughout the life cycle of an asset 2.88 3 1.366 The necessary information and data for operation and maintenance/facility management is usually transferred from the construction and design phase to the operational phase 2.46 2 1.222 As-built BIM models usually only contains as-built geometry but lacks asset specific engineering data (e.g. detailed performance specifications and manufacturer data) 3.76 4 1.088 .ifc (Industry Foundation Classes) is an appropriate file format to use as the virtual representation of a physical asset in the context of a digital twin 3.12 3 1.068 Data collected in the construction industry is usually unstructured 3.86 4 0.918 It is challenging to structure the data collected in the construction industry 3. Note(s): **. Correlation is significant at the 0.01 level (2-tailed) Table 7. Spearman's rank correlation (Spearman, 1904) between graded level of expertise and defining, delivering and maintaining the three key data categories SASBE which provides basic information about the asset that forms the foundation for the dynamic data necessary for AM activities during the OM phase. It acts as a benchmark for the dynamic data that is routinely monitored and updated during an asset's operational phase. This is generally static record data that normally does not require frequent changes or modification. Thirdly, dynamic OM data, which provides contextual real-time information about the performance and status of individual assets. Development and implementation of a predictive model require structured data regarding an asset's historical and as-is condition (Gandomi and Haider, 2015) and one of the most common technical challenges in applied ML is the lack of extensive structured data to train and validate the models (Bouabdallaoui et al., 2021;Hong et al., 2020). While large amounts of data are available in the AECO-FM industry (Pan and Zhang, 2021), the collected empirical data indicates that the industry lacks a common standardized structure and is characterized by segregation. During the interviews, data schemas or ontologies like real estate core, bricks or haystack were mentioned as a possible solution as they are well suited for AI applications and could assist in solving data structure-related issues. By encapsulating the data categories with an ontology-based data structure the core data can provide a structured foundation, static OM data can provide detailed asset information and the dynamic OM data can provide contextual real-time and historical data regarding asset performance and status. Figure 2 provides an illustration of the key data categories with examples and characteristics.

Effect on developing and maintaining an AIM
The process of defining AIR from OIR requires clear and well-formulated OIR in order to avoid misalignment between the organizational goals and the AIM (Heaton, 2020;Heaton et al., 2019;Munir et al., 2020). As indicated by the empirical data (see Table 4 and Table 5), formulating clear goals derived from organizational objectives to support the development of AIR is a problem in the AECO-FM industry. The results of this can be seen in as-built BIM models lacking asset-specific engineering data and the inadequate data structure, which ultimately results in information not being transferred from the design and construction phases to the OM phase. Although the results of the survey are quite neutral, they indicate that conditions are not optimal.
While BIM models are one of the primary vessels for transferring data and information from the planning and design phase to the AIM and OM phase they are generally adapted for construction. Prioritizing cost induces suboptimal practices like modeling assets in one location and referring to them in others. Also, modeled assets are generally suggestions, not the specific assets installed during construction. This information is commonly delivered separately without any connection to the model or in formats that are challenging to integrate into the AIM (see Table 4).
It was pointed out during the interviews that structuring and standardizing the data and information that constitutes the AIM is vital to its creation and maintenance throughout the building lifecycle. In this manner, an ontology-based data structure could be used to not only  Table 8. Key data categoriesblockchain attributes Figure 2. Key data categories SASBE structure the data but also facilitate the integration with DTs and AI by enabling the creation of an ontology-based knowledge graph of the building. However, due to its sheer size and complexity, it is challenging to create and maintain manually. This information should be auto-translated into ontology-based knowledge graphs automating the process of transferring this information to the AIM (see Table 4).
The divergence in digital maturity across the AECO-FM industry was frequently brought up during the interviews. Digitization of the real estate portfolio, keeping drawings or models up to date through remodeling and general segregation of data/information were identified as some of the contemporary challenges for real estate companies (see Table 4). Since real estate companies hold the main responsibility of maintaining asset data it is perhaps not too surprising that it was rated as the most challenging in the questionnaire (see Table 6). There is also expectedly, a negative correlation between the stated level of expertise regarding DTs and AM, and the process of defining, delivering and maintaining asset data (see Table 7). I.e. as the level of expertise regarding DTs and AM increases, the respondents find it less challenging to define, deliver and maintain asset data.

Integration of digital twins and blockchain technology
Since the promised benefits of DTs are predicated on the assumption of data trust, integrity and security (Rasheed et al., 2020;Suhail et al., 2022), BCT is arguably a fitting solution (Lee et al., 2021;Rasheed et al., 2020;Suhail et al., 2022;Teisserenc and Sepasgozar, 2021a). Based on the questionnaire (see Table 8), Immutability stands out at an average of 30.6% indicating that it is not as desirable as the other attributes. In terms of the key data categories, core data and dynamic OM data received higher averages across all data categories, 42.6% and 46.2%, respectively. Although no prior knowledge regarding BCT is required to match the beneficial attributes with a data category, the mean value of the level of expertise regarding BCT is rather low at 5.14 (see Table 5), which could reduce the reliability of the results.
Nonetheless, there are several factors that go into deciding what information to process on the blockchain. Deciding upon which protocol to build the blockchain dictates transaction speed, energy efficiency and cost. For instance, transaction speed can vary between 10 min to less than one second (Salah et al., 2019), and the consensus mechanism has a great effect on energy use. In terms of DTs in the AECO-FM industry, there could be thousands of data points streamed every second from various IoT devices. Due to limitations regarding transaction speed in BCT, a bottleneck is formed. To solve this problem off-chain storage can be implemented to alleviate the scalability difficulties of BCT (Hasan et al., 2020;Nawari and Ravindran, 2019;Pedersen et al., 2019;Putz et al., 2021;Suhail et al., 2022). Concerning datadriven predictive analytics, combining BCT with AI provides synergetic effects. By recording data and decisions made by a predictive model the processes become transparent and traceable, ultimately leading to better performance (Salah et al., 2019). With regards to protocols, off-chain storage, and synergies between AI and BCT there is consensus between the literature and the empirical interview data (see Table 4).
Aside from technical limitations, there are other reasons why storing data or information on the blockchain could be beneficial. During the interviews, the level of significance, sensitivity and importance of data was brought up as key drivers for blockchain implementation. Some examples are access control credentials, service intervals, inspection logs, state of assets and active/retired assets. It should however be noted that data deemed significant, sensitive or important varies greatly depending on the type of facility or business (see Table 4).

Limitations
In this study, a literature review was conducted to frame the theoretical background, and the empirical data was collected through interviews and a digital questionnaire. A total of 12 interviews were performed with industry professionals regarding DTs, AM, BCT and AI. While the distribution in the area of expertise is deemed satisfying, 10 out of 12 interviewees are based in Scandinavia. Similarly in the questionnaire, 62.6% of the respondents are based in Scandinavia or Europe perhaps providing a misrepresentative view for other parts of the world.
As the research topic involves novel technologies not widely known or implemented in the AECO-FM industry respondents were selected through purposive sampling. While this might have reduced the sample size, the quality of the data should be higher. On the other hand, being too selective reduces the congruence between the sample and the population, in this case, the AECO-FM industry. Also, because of the relatively small sample size, the statistical significance can be disputed.
Due to the novelty and extent of the research topic identifying individuals with expertise covering all topics was challenging. In terms of an alternative study design, focus groups or workshops could have been applied involving the subject matter experts allowing for answers and discussions to build upon each other. Ultimately, leading to answers grounded in several areas of expertise rather than being puzzled together from individual interviews by the authors.

Conclusion
This paper contributes to the body of knowledge by providing insights into the future development of blockchain-based DTs in the AECO-FM industry. Furthermore, it presents three key data categories to support predictive data-driven analytics and their key characteristics and discusses how it affects the development and maintenance of the AIM and the potential role of BCT in this context. An AIM suitable for representing the virtual replica in the context of a blockchain-based DT capable of performing predictive data-driven analytics through AI requires an ontology-based data structure and a high level of fidelity. This facilitates the translation of the data into a knowledge graph of the building providing the DT with awareness of the relationship between assets. The sheer size and complexity of a building knowledge graph make it very challenging to create and maintain manually. Ideally, this process should be automated. The structure provided is also beneficial to predictive analytics as it requires structured historical and real-time data.
There are several challenges in terms of defining and delivering the data and information required to create and maintain the AIM with an ontology-based data structure. With BIM being one of the primary vessels for transferring data and information from the planning, design and construction phases to the OM phase suboptimal modeling practices that cause discrepancies between the models and the as-built condition must cease. This would also allow for auto-translation into building knowledge graphs ensuring that the necessary data and information can be transferred to the OM phase. Building information needs to be digitized, segregated information must be aggregated and models must be updated through remodeling or changes in order to be able to develop and maintain an ontology-based AIM.
BCT has the potential to provide DTs with additional reliability, authenticity, and transparency. Combined with AI it provides a layer of traceability providing an additional tool for improving the AI's performance and transparency that facilitates understanding of decisions made by AI.
Deciding how to utilize BCT in this context depends on multiple variables. For instance, technical limitations are dictated by which protocol the blockchain is built upon, but also the level of significance, sensitivity, and importance of the data. In the cases where the technical limitations of BCT prohibit information or data from being stored on the blockchain, it can be stored off-chain and a reference or link to that data or information can be stored on the blockchain. Figure 3 provides an overview of the process discussed during the conclusion. SASBE 6.1 Indications for future work This strictly theoretical study covers novel concepts with few actual implementations in the AECO-FM industry. Therefore, future studies should focus on practical implementations of blockchain-based DT in order to test and evaluate these theoretical concepts in a real-world context.
Considering the extensive scope of DTs, committing to a full-fledged implementation might be overwhelming. Instead, it is recommended that the separate building blocks of a blockchain-based DT are researched separately, e.g. how an ontology-based AIM can be developed and maintained throughout the entire life cycle of a building or evaluating the suitability of various BCT protocols for use within DTs in the AECO-FM industry.