home * about us * contact us * past features * columns * resource links * site map


9/11 Remembered
Statistics and Data Mining

There is little doubt that "data mining" is one of the latest buzzwords (or buzz phrases) to sweep business world. Institutions from banks to retailers are using it for everything from credit decisions, to fraud detection, to inventory management. Despite the obvious connections between data mining and statistics, however, until recently it has not been a field of particular interest for many statisticians. In fact, to some extent it is considered "a dirty word" in statistics (Pregibon, 1997, p. 8). Still, as demonstrated by its rapid growth in use, its importance is undeniable, and as some statisticians are realizing, statistics can potentially play a major role in the future development of data mining technologies. The following are several online papers discussing the role of statistics in data mining:

Data Mining and Statistics: What's the Connection? by Jerome Friedman, Department of Statistics, Stanford - Explores the reasons why methodologies used in Data Mining have originated in fields other than statistics and why statisticians should have an interest in Data Mining.

Statistics and Data Mining: Intersecting Disciplines by David Hand , Department of Mathematics, Imperial College, London, UK - Discusses the nature of data mining and statistics with an emphasis on their similarities and differences.

Theoretical Frameworks for Data Mining by Heikki Mannila, Nokia Research Center - Presents some possible theoretical approaches to data mining.

Data Mining from a Statistical Perspective by John Maindonald, Statistical Consulting Unit, Australian National University