There is little doubt that "data mining" is one of the latest buzzwords (or buzz phrases) to sweep business world.
Institutions from banks
to retailers are using it for everything from credit decisions, to fraud detection, to inventory management. Despite
the obvious connections between data mining and statistics, however, until recently it has not been a field of
particular interest for many statisticians. In fact, to some extent it is considered "a dirty word" in statistics
(Pregibon, 1997, p. 8). Still, as demonstrated by its rapid growth in use, its importance is undeniable, and as some
statisticians are realizing, statistics can potentially play a major role in the future development of data mining
technologies. The following are several online papers discussing the role of statistics in data mining:
Data Mining and Statistics: What's the Connection?
by Jerome Friedman, Department of Statistics, Stanford - Explores the reasons why methodologies used in Data Mining have originated in fields other than statistics and why statisticians should have an interest in Data Mining.
Statistics and Data Mining:
Intersecting Disciplines by David Hand , Department of Mathematics, Imperial College, London, UK - Discusses the
nature of data mining and statistics with an emphasis on their similarities and differences.
Theoretical Frameworks for Data
Mining by Heikki Mannila, Nokia Research Center - Presents some possible theoretical approaches to data mining.
Data Mining from a Statistical Perspective by
John Maindonald, Statistical Consulting Unit, Australian National University
|