Today, more and more businesses are beginning to implement text mining technologies. From improving customer
service performance to interpreting legal issues, the potential applications for text mining are vast. Still,
implementing new technology is always difficult, and text mining is no exception. The following is a list of some
key features to consider when evaluating text mining applications.
Scalability – With regard to scalability, there are two areas of consideration. (1) Over time, data sets grow
in size. How much data can the software handle and how fast can it process it? (2) Most text mining solutions are
offered as a suite of software, which in turn must be integrated with existing applications and operating
environments. Are all the elements equally scalable? For example, how difficult would it be to expand from a
single-server environment to a distributed multiserver environment? Understanding the limitations with respect to
the various functions such as searching, indexing, and retrieval across all aspects of the system is essential.
Multiple source information processing – What is the range of textual data sources that the software can
handle? At a minimum, the application should be able to process text from mainstream applications such as
Microsoft Office, Lotus Notes, spreadsheets, PDF files, presentation files, emails, intranet file servers,
SQL/ODBC databases, live chat/IRC and newsfeeds.
Compatibility to existing systems – Many search applications were initially designed to run on a specific
platform. Keep in mind that in some cases, subsequent releases that run on other platforms do not work as well as
the original version. Be wary. Take some time to learn the history of the product.
Auto update and categorization – The ability to track and receive up-to-date information from both Internet
and intranet sources, as well as the ability to automatically assign documents to user defined categories and
distribute them to appropriate personnel can be a very useful, time saving tool.
Text summarization – The ability to identify key thoughts and provide meaningful summaries is an excellent
time saving feature given that even the most relevant documents can be too numerous and lengthy to review.
Visual representation – Some applications can organize and display collections of documents as points on a
topographical map in which related documents are clustered together. Providing a sort of bird’s eye view of the
textual landscape, this type of representation can be extremely useful and time saving.
Textual analysis technique – From simple meta tag and keyword searches, to probabilistic latent semantic
indexing models, to neural network and pattern recognition methods, the sophistication of textual analysis
techniques and the required computing power vary from application to application. It is always a good idea to
understand the basic theories underlying the analysis techniques each application is built upon.
Cross language features – Today, more than ever, customer bases, as well as competition, are often global.
The ability to conceptually analyze foreign-language documents can provide a competitive edge.
As it stands, most text mining applications are not turnkey solutions. They require a good deal of planning and
customization for maximum utility. The above list is not intended to be comprehensive, but simply a starting
point for identifying and evaluating the most effective application for your text mining needs.
|