Data Mining and Decision Support: Integration and Collaboration

David Bawden (City University, London, UK)

Journal of Documentation

ISSN: 0022-0418

Article publication date: 1 June 2005

329

Keywords

Citation

Bawden, D. (2005), "Data Mining and Decision Support: Integration and Collaboration", Journal of Documentation, Vol. 61 No. 3, pp. 443-445. https://doi.org/10.1108/00220410510598580

Publisher

:

Emerald Group Publishing Limited

Copyright © 2005, Emerald Group Publishing Limited


This multi‐authored book reports the findings of a European Commission sponsored project, involving both academic and industrial partners from several countries. A main influence, to judge from the editors and authors of this volume, comes from the Joûef Stefan Institute, Slovenia. The aim of the project was to bring together the two disciplines, or practices, of data mining and decision support, in order to assess the value gained by using these in an integrated way. Given that both of the topics have relevance to the information sciences, even though their practice is not often a part of the work of LIS practitioners, this book is valuable in shedding light on the potential of the methods.

It opens with some introductory chapters which introduce and exemplify, in an unusual clear and accessible way, the nature of the two main topics. [Indeed, given that English is not a first language for most of the contributors, the writing is commendably clear and precise throughout.] There follow a number of chapters dealing in greater detail with applications of these methods by the project.

“Data mining” is noted as “concerned with the discovery of interesting and useful patterns in data” with the aim of “solving problems by analysing data that already exists in databases”. It is generally regarded as having two modes: descriptive, using exploratory data analysis to discover potentially interesting patterns; and predictive, producing models which may be used for prediction and classification/categorisation. Its relation to conventional usage of information collections is that, which conventional systems are designed to provide an answer to a given query, data mining can address the issue of “what are the right questions to ask ?”. Increasingly, much of data mining is, in fact, text mining or web mining, rather than the analysis of numerical data sets.

A variety of techniques may be used, including parametric and non‐parametric statistics, classifying and clustering, association rules, decision trees, and various forms of machine learning. For text applications, automatic classification and categorisation, and various forms of natural language processing may also be employed. Information extraction from the web (“web mining”) is of increasing importance (see for example, Chen and Chau, 2003).

“Decision support” revolves around the construction of systems to provide practical assistance to human decision makers, and relies upon two foundational areas: theoretical approaches to “good” decision making – decision theory, game theory, utility theory, etc.; and cognitive understanding of how decisions are “really” made. Again a variety of techniques may be applied for decision support; this project relied on a qualitative multi‐attribute model.

These two basic concepts may be integrated in various ways. Data mining may, for example, be used to assist decision support, by revealing clusters, groupings, relationships etc., which may directly affect decisions. Decision support may, conversely, be used to guide data mining, by suggesting aspects where good evidence is needed. The project investigated these, and other forms of integration.

The work of this project emphasised the value of standards in bringing some systematisation to the wide variety of approaches and techniques available. In particular, the CRISP‐DM and PMML standards were used. CRISP‐DM (Cross‐Industry Standard Process for Data Mining) is a methodology for data mining, independent of the particular tools and techniques being used. It has six stages:

  1. 1.

    Understanding the context of the data and its application, and the goals of the mining.

  2. 2.

    Understanding the data.

  3. 3.

    Preparation of data, through selection, cleaning and pre‐processing.

  4. 4.

    Data mining and modelling.

  5. 5.

    Evaluation and interpretation of results.

  6. 6.

    Using the knowledge discovered.

PMML (Predictive Modeling Markup Language) is an XML‐based standard for storing and sharing data mining results.

The “application” chapters describe investigations of the integration of data mining and decision support in a variety of environments, including:an analysis of “media space” to identify typical readers of publications; an analysis of road traffic accident reports; an analysis of text descriptions of research projects to find ideas for better research planning; a web log analysis of a national statistical centre; several decision support applications for Slovenian public bodies; architectural planning; and the prediction of educational attainment.

Concluding chapters summarise, realistically and honestly, the lessons learned, the advantages found and the problems encountered. The value of the systems for assisting in the explanation of decisions based on complex information seems to be one important finding. For those with an interest in documentation, the importance and value of “automatic categorisation” seems a relevant point.

Finally, some comments are made on the working of the virtual teams who carried out the project work. The need for some physical meetings to enhance electronic communication is strongly made, and echoes what has been found elsewhere. The purely “virtual team” is surely an oxymoron.

This book, unusually for writing stemming from a European project, can be recommended highly, to researchers or practitioners in the information sciences wanting an insight into just what data mining and decision support are really about, and where their potential lies.

References

Chen, H. and Chau, M. (2003), “Web mining: machine learning for web applications”, Annual Review of Information Science and Technology, Vol. 38, Information Today, Medford, NJ, pp. 289329.

Related articles