home * about us * contact us * past features * columns * resource links * site map


9/11 Remembered
Evaluating Text Mining Applications
Today, more and more businesses are beginning to implement text mining technologies. From improving customer service performance to interpreting legal issues, the potential applications for text mining are vast. Still, implementing new technology is always difficult, and text mining is no exception. The following is a list of some key features to consider when evaluating text mining applications.

  • Scalability – With regard to scalability, there are two areas of consideration. (1) Over time, data sets grow in size. How much data can the software handle and how fast can it process it? (2) Most text mining solutions are offered as a suite of software, which in turn must be integrated with existing applications and operating environments. Are all the elements equally scalable? For example, how difficult would it be to expand from a single-server environment to a distributed multiserver environment? Understanding the limitations with respect to the various functions such as searching, indexing, and retrieval across all aspects of the system is essential.

  • Multiple source information processing – What is the range of textual data sources that the software can handle? At a minimum, the application should be able to process text from mainstream applications such as Microsoft Office, Lotus Notes, spreadsheets, PDF files, presentation files, emails, intranet file servers, SQL/ODBC databases, live chat/IRC and newsfeeds.

  • Compatibility to existing systems – Many search applications were initially designed to run on a specific platform. Keep in mind that in some cases, subsequent releases that run on other platforms do not work as well as the original version. Be wary. Take some time to learn the history of the product.

  • Auto update and categorization – The ability to track and receive up-to-date information from both Internet and intranet sources, as well as the ability to automatically assign documents to user defined categories and distribute them to appropriate personnel can be a very useful, time saving tool.

  • Text summarization – The ability to identify key thoughts and provide meaningful summaries is an excellent time saving feature given that even the most relevant documents can be too numerous and lengthy to review.

  • Visual representation – Some applications can organize and display collections of documents as points on a topographical map in which related documents are clustered together. Providing a sort of bird’s eye view of the textual landscape, this type of representation can be extremely useful and time saving.

  • Textual analysis technique – From simple meta tag and keyword searches, to probabilistic latent semantic indexing models, to neural network and pattern recognition methods, the sophistication of textual analysis techniques and the required computing power vary from application to application. It is always a good idea to understand the basic theories underlying the analysis techniques each application is built upon.

  • Cross language features – Today, more than ever, customer bases, as well as competition, are often global. The ability to conceptually analyze foreign-language documents can provide a competitive edge.

As it stands, most text mining applications are not turnkey solutions. They require a good deal of planning and customization for maximum utility. The above list is not intended to be comprehensive, but simply a starting point for identifying and evaluating the most effective application for your text mining needs.