Building a rough sets-based prediction model for classifying large-scale construction projects based on sustainable success index

Purpose – To address requirements and specifications of construction project, academics need to build a project classification model. In recent years, project success concept, particularly on large-scale construction projects, has been a controversial issue, especially in developing countries. Hence, in this paper, after introducing a sustainable success index (SSI), a novel method called “rough set approach” had been adopted to induce decision rules and to classify construction projects. The paper aims to discuss these issues. Design/methodology/approach – At first, 20 effective success factors and 15 success criteria based on three pillars of sustainability of economy, society and environment had been categorized. The research data used for analysis had been collected from 26 large-scale construction projects in Iran and five other countries. After collecting data collection, observations had been analyzed and 51 decision rules were generated, and the projects were classified. Eventually, in order to evaluate the performance of the generated rules, confusion matrix was applied, and the model was validated. Findings – The results of the present study show that rough set theory (RST) can be an effective and valuable tool for building expert systems. Practical applications of these results along with limitations and future research are described. Originality/value – Perhaps for the first time, in the present study, a number of large-scale construction projects are classified based on SSI. Applying RST for building rule-based system and classifying projects in construction project area are novel attempts undertaken in this paper. The rules induced in this study can be applied to develop a sustainable success prediction model in the future studies.


Introduction
Conducting a study on successful implementation of infrastructure projects has been a controversial issue in recent years, especially in developing countries.However, since key characteristic of construction industries is unpredictability comparing to static production industries (Safa et al., 2015), there is still no consensus among project management researchers on how project success should be measured, classified and predicted project managers.The definition of project success has undergone a number of transformations over the years.Traditionally, a construction project is deemed successful when it meets criteria related to time, cost and quality (Atkinson, 1999).
As pointed out by Belassi and Tukel (1996), research on project success needs to distinguish success factors and success criteria.Cooke-Davies (2002) has highlighted the difference between the success criteria and success factors.Project success factors are independent variables of a project which contribute to achieving success in a project (Müller and Turner, 2007;Rockart, 1982).On the other hand, project success criteria are dependent variables by which the success or failure of a project will be judged and measured by its stakeholders (Belassi and Tukel, 1996;Pinto and Slevin, 1988;Rockart, 1982).Factors constituting the success criteria are commonly referred to as the key performance indicators or KPIs.The difference between KPIs and CCFs needs to be taken in to account.Cox et al. (2003) has argued that success factors are the efforts madeor strategies adoptedto achieve the success of a project.Whereas, KPIs are the compilations of data measures used to access the performance of the construction project.In fact, the KPIs are essential for comparing effectiveness, efficiency and quality of the actual and estimated performances in both workmanship and product.
Literature regarding the area of project success reveals that various authors have identified a number of success determinants, either from experience or research.Pioneer researchers of project success, Pinto and Slevin (1988) argue that "project success" is something much more complicated than simply meeting cost, time and quality.They added customer satisfaction to the list of important criteria for assessing project success.Westerveld (2003) reveals that along with the conventional measures of cost, time, quality and scope, there are five KPIs that are used most frequently, including: client's appreciation; project personnel appreciation; users' appreciation; contracting partners' appreciation and stakeholders' appreciation.Bryde and Brown (2004), also, suggest that in addition to the measures of iron triangle, overall satisfaction of stakeholders should also be considered in performance evaluation criteria.Belout and Gauvreau (2004) has emphasized on project team's ability to manage project risks and to resolve the problems encountered on the project to evaluate the project success.In a study by Cserhati and Szabo (2014), analysis of correlations revealed that relationship-oriented success factors, such as communication, co-operation and project leadership, play a vital role in successful implementation of projects.
Large construction projects generally require large budgets and prolonged schedule, and they involve many complicated procedures.Several attempts have been made to address project success in large-scale construction projects.Brundtland (1987) had identified success factors for large projects using factor analysis method.These factors were grouped into four major categories: incompetent designers and contractors, poor estimation and change management, social and technological issues, and improper techniques and tools.In another research, Ogunlana (2010) argued that these traditional criteria for success were not sufficient to determine whether the project was successful; moreover, quantitative and qualitative criteria such as environmental regulations, building performance and client satisfaction should also be considered.Al-Tmeemy et al. (2011), for the first time, had measured the success of building projects for sustainable social housing in Nigeria.They have identified several critical success factors influencing the sustainable housing.In the most recent study, Krajangsri and Pongpeng (2017) have addressed the effect of sustainable infrastructure assessments on construction project success using structural equation modeling.In this research construction project success was measured using six criteria: time, cost, quality, client satisfaction, safety and environment.

Sustainable development
Another concept which needs to be taken into consideration is sustainability, which has been broadly used in many sectors, including construction management.This concept was emerged in the definition given in a report by Brundtland Commission (Cserhati and Szabo, 2014).This most quoted definition states that it is "a process that aims to meet the needs of the present generation and to consider the ability of future generations to meet their needs at the same time."Choguill (2007) has acknowledged that, to achieve sustainability, construction initiatives must be economically cost-efficient, socially acceptable, technically viable and environmentally attuned.Bakar et al. (2009) have established a theoretical framework for project success factors to achieve sustainability in housing.They identified a list of critical success factors for project management practices which is required for sustainable housing.
Social well-being concerns the benefits of the employees and the future users.Fundamentally, this aspect is derived from the human feelings such as: security, satisfaction, safety and comfort (Abidin, 2009) and human contributions such as skills, health, knowledge and motivation (Parkin, 2000).Sustainability is also care about the extraction of natural resources.Although constructors have little influence on the extraction of natural resources, they are able to discourage this activity by demanding less non-renewable natural resources, more recycled materials, and efficient use of energy and mineral resources (Abidin, 2010).Finally, the economic sustainability focuses on the micro-and macro-economic profits.Micro economic draws attention on the factors or activities which could lead to monetary gains from the construction while macro-economic is attributed to the benefits gained from the project success by the public and government.

537
Rough sets-based prediction model

Project classification
In order to address construction project, scholars and researchers need to establish classification approach according to their requirements (Safa et al., 2015).To achieve this, many authors have used different typological frameworks and methodologies in the field of construction management, each of which has a particular domain of application.
During the 1990s, several researchers had focused on this issue.For instance, Tan and Lu (1995) have grouped construction projects with respect to the type of construction work being completed.Meanwhile, Dvir et al. (1998) have presented an applicable project classification model, using linear discriminant analysis, which is applied separately for each group of managerial variables, in order to classify 110 observations (projects).Their findings suggested that some variables are more effective than others in anticipating project success.
In recent years, there have been a few studies that addressed this issue.Shokri et al. (2012) has extended the study of (Baccarini, 1996) by categorizing projects according to complexity, relative size, and organizational risk and maturity level.Safa (2013) asserts that fully categorizing construction projects is impossible due to the various characteristics that could be used to define classes, and the existence of unknown factors.This lead to a lack of consensus on classification of construction projects because of their uniqueness and disparity in terms of size, time, investment, complexity and technological content.
In the recent studies, Bērziša (2015) has developed a set of main project classification features to describe the project resemblance using Case-based reasoning, and Safa et al. (2015) have classified construction projects according to their size, complexity, and risk tolerance.After a comprehensive review of practical project classification methods, they analyzed construction projects by grouping them according to analogous attributes.
Yet, none of existing characterization methods is able to provide a basis for presenting a prediction model.To bridge this gap, this research, in addition to mentioned purposes, aims to establish a rule-based classification system to capture the patterns hidden in the data set.

Methodology background
In recent years, a wide variety of artificial intelligence techniques have been applied for rule induction in many different disciplines (Sikder and Munakata, 2009).These techniques include data mining tools such as neural network (Fu, 1999), decision tree (Quinlan, 1986), rough sets (Pawlak, 1982) and fuzzy sets (Zadeh, 1965).
RST, proposed by Pawlak (1982), has been proved to be an effective mathematical approach for data mining by applying the rule induction method.This approach is based on the assumption that every set using a lower and an upper approximation will be roughly defined.In fact, RST seeks to find classification rules from vague information by considering classification of indiscernible objects (Pawlak et al., 1988), and the term "rough set" denotes a set that cannot be classified into a group of data with certainty (Pawlak et al., 1988;Walczak and Massart, 1999).
The rough set approach has many advantages.The most important advantage of this approach is its capability to estimate the significance of specific attributes (Liu and Yu, 2009).RST calculates the significance of the attributes by discovering the dependency between attributes (Pawlak et al., 1988;Walczak and Massart, 1999;Liu and Yu, 2009).However, despite these advantages, very little research has adopted RST in the field of construction (Huang et al., 2010;Pheng and Hongbin, 2006;Tam et al., 2006;Choi et al., 2014).
To assess the performance of the classification techniques, some researchers have emphasized on the supremacy and applicability of each of them in different cases.Mak and Munakata (2002) have compared the capability of ID3 decision tree, rough sets, and neural networks with respect to their classification and predictive accuracy.They showed that 538 ECAM 25,4 each of these methods might be more suitable than the others depending on the type of data analyzed and the objective of the analysis.Their findings are as follows: (1) In neural networks, it is difficult to trace and explain the way in which the data pattern is derived and rule distillation involves intricate analysis as opposed to rough sets and ID3.
(2) Training for neural network requires long computational time before the network stabilizes or converges.If the data are inconsistent or incomplete, neural networks may fail to converge, while the training time for RST is considerably shorter.
(3) ID3 may be more efficient for dealing with excessively large number of rules, but may potentially overlook useful rules.
(4) Compared with ID3, rough set method showed better predictive capability when refines inadequate data.Accordingly, it can be argued that RST works well in a situation in which discovering relationships in incomplete, inadequate or imprecise data is required.This capability is especially important for the current research since experts' opinions are subjective and involve impreciseness and limited number of large-scaled construction projects is available as well.

Research methodology
To identify sustainable success, determinants are initially important step in this research.
A number of most important related studies with different perspectives are summarized in Tables I and II.Since, up until now, there is no empirical study incorporating project success indicators in three dimensions of sustainable development, the applications of previous methods for determining sustainable success indicators, namely SSFs and SSC, are restricted.This had driven us to undertake a forerunner research in order to establish a list of success indicators categorized into the three dimensions of sustainability, to reach subsequent possible objectives.
To identify the potential success determinants, separately for SSFs and SSC, a content analysis method (Holsti, 1969) was adopted.From the content analysis conducted on the former research work, a total set of 49 raw SSFs and SSC were initially obtained from comprehensive literature review on previous studies on KPIs/success determinants.These 49 optional indicators are divided into three groups, including economic, social, and environmental indicators.
It is noteworthy that, as Hill and Bowen (1997) stated that some of the sustainable principles could be categorized as either "social" or "economic," or both.For instance, "leadership" and "motivation" clearly have influence on economic performance.However, on the basis of aforementioned definition of social sustainability and comprehensive literature review, it is deduced that social aspect of these indicators should be emphasized (Szekely and Knirsch, 2005;Lam et al., 2010;Parkin, 2000;Abidin, 2009;Hill and Bowen, 1997).
To ensure the clarity and relevance of the questionnaire, pilot study was carried out.Five experts, including scholars specialized in sustainable development in construction industry, participated in the pilot study, and their comments had been used in the final questionnaire.In the meantime, they were given a chance to add and remove determinants at the end of each group.With respect to the feedback received, minor amendments were made to the questionnaire.Consequently, questionnaire survey was designed by the use of 20 SSFc and 15 SSC classified into the three sustainable development groups as shown in Tables III and IV, respectively.

Data collection
Based on literature review and experts' experience, a questionnaire was designed using 20 critical success factors and 15 success criteria which were classified into three Rough sets-based prediction model sustainable development groups.The questionnaire distributed among the experts, and it was designed based on five-point Likert scale (where 1, 2, 3, 4 and 5 represented "Not Significant," "Slightly Significant," "Moderately Significant," "Very Significant" and "Extremely Significant", respectively) in order to capture the importance of the critical success factors and criteria.At the end of second step, after gathering questionnaires and analyzing data, the average values were calculated as the relative importance of these determinants which is shown in Tables VI and VII.In order to increase the rate of response and sample representation, the questionnaires were distributed via both e-mail and personal delivery.
3.1.1Case study details and respondent profiles.During the process of conducting this research, in order to determine large-scale projects, 26 various types of infrastructure construction projects had been completed in Iran and five other countries namely Australia, Kazakhstan, Sri Lanka, Norway, Venezuela were also studied.In the current study, numerical threshold around $30 million was considered for determining large-scale construction projects.By applying this tool, 16 out of 44 projects did not meet the required conditions.Therefore, 26 large-scale construction projects were specified as detailed The target population for contractor and subcontractor groups respondents consisted of large building construction firms (classified and registered as "Grade 1" companies) working in Iran and five other aforementioned countries.Names and addresses of the appropriate people were mainly obtained from the departments of project and construction management in these companies.The average values of working experience (in year) of the clients, consultants and contractors were 15.2, 11.2, and 13.8, respectively.Moreover, all experts had knowledge and experience on implementation of sustainable development.
3.1.2Using relative importance index method to prioritize the sustainability objectives.In this step, a relative importance index introduced by Kometa et al. (1994) for the first time, had been used and 20 critical success factors and 15 success criteria for each category, from the perspective of project participants, were analyzed and ranked based on the following equation: Where, w i is the relative importance weight factor (w i ) of the expert (i), y i is the rating score ascribed to each success determinant ( j) by each expert (i) on the Likert scale from 1 to 5, and z is the highest probable rating value of the Likert scale, which is 5 in this case.
The relative importance index (RII) can be derived in a range of 0-1 (0 not inclusive); and the higher its value is, the more important the success determinant will be.
The results of questionnaire are shown in Tables VI and VII.From Table VI and based on SD and RII, top success factors had been selected from each sustainability categories with the thresholds of RII more than 0.7 and standard deviation less than 1.Similarly, from Table VII and based on these thresholds, top three success criteria had been selected from each sustainability categories.As a result, 11 factors and 9 criteria had been selected from three sustainability groups already mentioned.These selected determinants are shown in the success breakdown structure (Figure 2).
3.1.3Reliability analysis.In this study, the Cronbach's α coefficient method (Cronbach, 1951) was used in order to test the reliability of the data.From the information provided by 42 valid respondents, Cronbach's α coefficient was calculated for three sustainable groups; namely: economic, social, and environmental.The calculation results for success factors and criteria are illustrated in Tables VIII and IX, respectively.Cronbach's α coefficient for both success factors and criteria are more than 0.7.Therefore, the information derived from the questionnaire is considered reliable.Sikder and Munakata (2009) has been widely used for multi criteria decision making.In this study, in order to measure success of projects based on sustainability and to compare the importance of different aspects of sustainability in successful implementation of projects, after omitting low-effect criteria, a questionnaire was designed based on AHP method for weighting the SSC.This questionnaire was distributed among 20 experts who had participated in largescale projects.Those questionnaires that had been retuned at this stage were analyzed by Expert Choice software.We used widely accepted nine-point scale, which is the original  Sikder and Munakata (2009).The meaning for each value is shown in Table X.
In this approach, the numerical values representing the judgments of the pairwise comparisons were arranged in the upper triangle of the square matrix.For example, a ij represents how much criteria i is preferred over criteria j.This means that: In this step, the analytical hierarchical process was utilized to compute the relative importance weight factor (w i ) of the success criteria.The corresponding weight factors for project success criteria are shown in Table XI.
Table XI shows the weight of the project success criteria according to the AHP and rank method.Since the opinions of all experts had been considered to be have an equal importance, the geometric mean was applied as the aggregation method to calculate the weights of the success criteria.With respect to the weight factors, the success indices of the three categories of sustainability are calculated using the following equations: ECSI ¼ 0:31C3 þ0:38C1 þ0:31C2; (2) SOSI ¼ 0:43C11 þ0:34C15 þ0:23C13; (3) where ECSI is the economical success index; SOSI is social success index, ENSI is environmental success index.Furthermore, the results from the pairwise comparison matrices of the sustainable categories are detailed in Table XII.Also, Table XIII shows normalized priorities of these values.As shown in Table XIII, economic category is the most important one (62 percent).It was two to four times greater than that of the social category (24 percent) and environmental category (14 percent), respectively.Therefore, most attention should be paid to economic point of view, whilst less attention is needed for the social and environmental aspects.It should be mentioned that the consistency rate of this model is 0.04.According to Table XIII, the SSI is calculated as follows: SSI ¼ 0:62ECEI þ0:24SOSI þ0:14ENSI; (5 where SSI is the sustainable success index.3.2.2Evaluation of success determinants of case studies.At this stage, also, respondents were asked to give a score based on the status of selected success factors and criteria in the aforementioned projects which they were responsible to them; these numbers were between 1 and 5 where: 1 represents "very bad," 2 -"Slightly bad," 3 -"Moderate," 4 -"good," and 5 -"excellent."Finally, based on Equation 5, the SSI of these projects had been calculated and a decision table used in RST was created.

Rough sets: foundations
The fundamental concept of the rough set algorithm for the proposed application is described as follows: Definition 1. Information Systems Information systems are the set of objects described by their attributes and attribute values.
The information system is defined as follows: where U is the universe, a finite non-empty set of objects, U ¼ {x 1 , x 2 ,..., x m }, and A is the set of attributes.Each attribute a∈A (attribute a, belonging to the considered set of attributes A) defines an information function: where V a is the a set of values of a, called the domain of attribute a.In all attributes, there are decision attributes and condition attributes: Definition 2. Indiscernible relation Let a∈A and B ⊆ A, where B is a subset of attributes.Then, the indiscernibility relation is defined as: The rough sets approach is based on two basic concepts, namely the lower and the upper approximations of a set.Let B ⊆ C and X ⊆ U: BX and BX are the B-lower approximation of X and the B-upper approximation of X, respectively.The B-lower approximation of X is the set of attributes of U that can definitively be classified as belonging to X based on the information of B, while the B-upper approximation of X is the set of attributes of U that can be probably classified to X.
The difference is called a boundary of X in U: If BNX ≠ ϕ, then X is referred to as rough with respect to B; otherwise, X is crisp: if g is 1, D totally depends on C; otherwise, it partially depends on C: Definition 5. Core and reduct of attributes The concepts of core and reduct are two fundamental concepts of the rough sets theory.
A reduct is the minimal subset of attributes that enables the same classification of elements of the universe as the entire set of attributes.In other words, properties that do not belong to a reduct are superfluous with regards to the classification of elements of the universe.The core is the necessary element for rules, and is the common portion of all reducts.Let B be a subset of A. The core of B is the set of all indispensable attributes of B. The following equation is an important property, linking the concept of the core and reducts: where Red(B) is the set of all reducts of B.
The significance of an attribute can be measured by comparing the degree of partial dependency (g) of a set, which includes the attribute, with the degree of a set without the attribute.This idea can be formally described as follows: Where a∈C, and σ(a) is the significance of attribute a (0 ⩽ σ(a) ⩽ 1).The significance of a set of attributes can be calculated in the same way as follows: Where B is a subset of C. The significance of a set B, i.e., σ(B), represents the effect of elimination of the set.Thus, the set of decision attributes D will not be properly classified into the same extent as the degree of s(B) when taking out set B from C, the set of condition attributes.Thus, we can determine an approximate reduct, the best subset for explaining a decision, by determining the significance of all possible sets.

Decision language
It is often useful to describe decision tables in logical terms.Every dependency C⇒g D can be described by a set of decision rules in the form "IF […] THEN […]," written Φ→Ψ, where Φ and Ψ are logical formulas such that Φ∈For(C ), Ψ∈ For(D) and C, D are condition and decision attributes, respectively; Φ and Ψ are referred to as condition and decision parts of the rule, respectively.With every decision rule Φ→Ψ, we associate a conditional probability that Ψ is true in S, given Φ is true in S with the probability π S (Φ) called certainty factor and is defined as follows: where |Φ| S denotes the set of all objects satisfying Φ in S, and the number card (|Φ∧Ψ| S ) will be called the support of the rule Φ→Ψ in S. Besides, we will also use a coverage factor of the decision rule defined as: Which is the conditional probability that Φ is true in S, given Ψ is true in S with the probabilityπ S (Ψ).
Let {Φ i →Ψ} be a set of decision rules such that all conditions Φ i are pairwise mutually exclusive, i.e., {Φ i ∧Φ j } s ¼ Φ, for any oI, j⩽n, i≠j, Þ¼1.For any decision rule Φ→Ψ the following relationship between the certainty factor and the coverage factor is true:

Attribute reduction and rules generation
As described earlier, RST is used to identify the most significant features by computing subsets and cores.In order to generate reducts, genetic algorithm is applied as it provides more exhaustive exploration of the search space (Wroblewski, 1995).Generation of reducts has two options; full object reduction and object related reduction.Object-related reduction produces a set of decision rules through minimal attributes subset that distinguishes a per object basis while reduct with full object reduction creates a set of minimal attributes subset that designates functional dependencies (Sulaiman et al., 2008).In this study, full object reduction approach is adopted.Therefore, the reducts used for generating rules in economic, social and environmental categories are [F15, F16, F18],

549
Rough sets-based prediction model [F12, F10, F13, F14], [F15, F16, F18], respectively.A unique feature of the RS method is its generation of rules, which has a great importance in prediction of the outputs.For this purpose, the Rosetta system was applied to induce rough-based models.Rosetta tool lists the rules and provides some statistics for the rules which are support, accuracy, coverage, stability and length.The definition of the rule statistics is as follows (Sulaiman et al., 2008): • The rule LHS support is defined as the number of objects in the training data that fully demonstrate attribute described by the IF condition.

•
The rule RHS support is defined as the number of objects in the training data that fully exhibit the attribute described by the THEN condition.

•
The rule RHS accuracy is defined as the number of RHS support divided by the number of LHS support.

•
The rule LHS coverage is the fraction of the records that satisfied the IF conditions of the rule.It is calculated by dividing the support of the rule by the total number of records in the training sample.2)-( 4).Also, since there was no project with SSI below 2, the lowest amount for class label is 2.

Model validation
Based on the filtered rules and the classification process described, the recognition ability of the model was validated.A confusion matrix is a specific table which summarizes the performance of an algorithm, applied to the objects in an information system.Each column of the matrix represents the cases in a predicted class, while each row represents the cases in an actual class.The results for the three aforementioned sustainability aspects are shown in the Table XVI, such that the italic values are those samples classified correctly by the model developed in this study.
As to economic perspective, in class 1, 48 sets are classified correctly and two sets are wrongly classified to class 1 and class 2, with accuracy of 86 percent.Other classes are classified correctly, and the accuracy is 1.In the social category, although class 3 and 4 are classified correctly, accuracy for class 1 and 2 is 75 and 83 percent, respectively.Similarly, from environmental perspective only two classes were correctly classified.As to class 2, four sets are classified correctly and one set is classified wrongly to class 3. Therefore, the accuracy is 80 percent, and the accuracy rate of class 4 is 60 percent which is a lowest amount in comparison to other classes.Two other classes are classified correctly and the accuracy is 1.As to all sets, the rules generated can classify 83 percent of the sets.Besides this, as it can be seen in table XIII, overall accuracies (accuracy plus sensitivity) of the proposed system for economic, social and environmental perspectives are 84, 95 and 89 percent, respectively.Sensitivity and accuracy of the sustainable categories of each class are presented as well.

Conclusion
Infrastructure projects, especially large-size ones, play a pivotal role in economic, social, and environmental activities, particularly in developing countries like Iran.Hence, in this paper, after introducing a SSI, a novel method called "rough set approach" was adopted to induce decision rules and to classify construction projects accordingly.
Literature review shows that the success measurement of construction projects is slowly going beyond the traditional measures (such as cost, time, and quality).A critical review of publications related to project success revealed that there has been lack of comprehensive study on the CSFs and CSC from the sustainable development perspective.Moreover, economic success factors such as time and cost management which were perceived to be the most important contributing factors have been rarely discussed and have not been widely researched.
At first, the paper classified 20 effective success factors and 15 success criteria based on three pillars of sustainability: economic, social, and environmental.The study used expert  VI and VII.This research proved that there is sometimes a major difference between what is recorded in literature and professionals' opinions about the importance of project success criteria and factors.Although the present study has supported the other study implying that time is the most important project success criterion, but the other important criteria ranked by respondents were not in accordance with the literature review.The respondents believed that in addition to time, cost and quality, other criteria such as employer satisfaction, environmental degradation, overall health and safety measures should be also taken into account.This can be supported by Collins and Baccarini (2004) who believe that time, cost and quality are not merely project success criteria.Contrary to other studies, the most important project success factors were time and cost management, which were rarely addressed in previous studies.
After identifying the success-related determinants, an intelligent data analysis approach was applied on the basis of RST to generate classification rules from a set of 26 large-size construction projects.In the analysis process, ROSETTA toolkit was run to generate the rules.After reducing, generating and filtering rules, the rule-based system was built.Finally, the method was validated by using test samples.Moreover, based on the set of generated rules, the projects were classified into four categories.
Classification process was performed using SSI.In ROSETTA classification algorithm, the induced rules were used for the classification process.Using this process, 26 large-size construction projects were classified into four categories as shown in Table XV.The number of primary decision rules generated based on the reduce produced in economic, social and environmental categories were 17, 18, and 16, respectively.The process of rules generation is illustrated in Figure 3. Furthermore, the results obtained from the pairwise comparison matrices of the sustainable categories showed that economic category is the most important one (62 percent).It was two and four times greater than that of the social category (24 percent) and Environmental category (14 percent), respectively.Therefore, most attention should be paid to economic point of view, whilst less attention is needed for social and environmental aspects (Table XVI).
Rough confusion matrix was used to evaluate the performance of the predicted classes.The test results showed that overall accuracy (accuracy plus sensitivity) of the proposed system for economic, social and environmental perspectives was found to be 84, 95 and 89 percent, respectively.Therefore, in contrast to other conventional approaches, RST is an effective mathematical tool to deal with imprecise, uncertain and incomplete data.The present study showed that RST seems to be an effective tool and a valuable aid for building expert systems.The rules induced in this study can be applied to learn a potential further decision support system through which a manager would be able to predict SSI.Indeed, by observing the project status in each stage, prediction would be possible based on historical data of the previous projects.Thus, further studies can focus on the establishment of a prediction model by combining rough sets with other intelligent systems like neural networks, fuzzy approaches, and so forth to build a decision support system.This study does have some limitation.In the current study the number of large-scale construction projects and subsequent extracted rules were limited.Therefore, in future studies, it would be interesting to ascertain whether the identified SSFs and SSC can be generalized across different countries using similar studies.Although the adequacy of the questionnaire determinants and the number of projects were tested with a pilot study, it would not be concluded that the number of projects and selected success determinants is definite.Since using greater number of samples will more likely generate better results, further studies can be conducted with greater number of projects.It is noteworthy that, applying another approach e.g.Life Cycle Assessment for evaluating environmental impacts of projects, alternatively will have more objective and practical results, rather than questionnaire which is merely on the basis of project managers' perceptions, which can be taken into account in the future studies.
Figure 1.Steps applied in the present study

Definition 4 .
Partial dependency If a decision attribute D depends totally on C, denoted as C⇒D, all values of D are uniquely determined by the values of C (Pawlak et al., 1988).To generalize this concept, Pawlak et al. (1988) had introduced the concept of partial dependency of attributes, which means that "some values of D are determined by the values of C."When D depends on C to a certain degree g(0⩽g⩽1), we can denote the relation as C⇒g D, where: The rule RHS coverage is the fraction of the training records that satisfied the THEN conditions.It is obtained by dividing the support of the rule by the number of objects in the training that satisfied the THEN condition.Number of primary decision rules generated based on reducts produced in economic, social and environmental categories is 17, 18, 16, respectively.The process of rules generation is illustrated in Figure3.To raise effectiveness, this study filters the decision rules according the principle LHS support ⩾ 2. Using Rosetta, rules with the highest LHS Support in all sustainable groups are extracted.These rules are sorted based on LHS Support in the TableXIV.It is noteworthy that the LHS Support indicates the number of projects satisfying the condition of the rule while the RHS Support indicates the number of projects satisfying the decision of the rule.In the current study, based on the set of rules generated, the projects were classified into four categories.4.2 Classification processClassification process is performed using SSI.In ROSETTA classification algorithm, the induced rules are used for the classification process.In this process 26 large-scale construction projects are classified into four categories as it is shown in TableXV.In this table, class label is the amount of SSI which was calculated for each of sustainable groups based on Equations ( • to to prioritize these success determinants.Success of 26 large-size construction projects in Iran and five other countries, as case studies, was assessed to develop the proposed model.Success factors were adopted as conditional variables and success criteria were considered as decision variables and used as output in development of the rough-based model.Typical critical success factors included time management, cost management, leadership, feasibility study, quality management, competent project team, attempt to preserve environment, motivation, effective communication, and the most important success criteria included project quality, project completion within time, project completion within budget, employer satisfaction, environmental degradation, overall health and safety measures.The detailed list of success factors and criteria are provided in Tables judgment