1 June 2007
The increasing growth of the gap between the capacity and abilities of the storage and data retrieval systems and the users' ability to analyze the information usefully and act according to the conclusion of the analysis prevents many times to exploit the information inflation in the organization. Nowadays, as information is the most critical resource in many fields and nearly every organization accumulates vast amounts of information, there is a growing need for smart tools that can enable maximally benefitting from the raw information accumulated. The process of refining the relevant information from the vast amounts of the existing data, searching for trends, connections and patterns of interest are referred to professionally as "Data Mining", referring to an enormous mountain of data from which pebbles of knowledge are extracted. Mining the information is defined as an activity which includes discovering information, predicting and searching for tacit connections in the large databases. Data Mining is sometimes referred to as KDD (Knowledge Discovery in Databases).
Knowledge Discovery or knowledge/ information Mining refers to the functionality of activating algorithms (either manual or computerized) in order to discover knowledge buried in databases and soft contents and drawing conclusions from them. Mining information is allegorized to mining the earth in order to locate natural resources. The purpose of mining is researching and analyzing the data and information from various environments with as automatic means as possible in order to discover patterns. The information produced must be validated, innovating, useful and meaningful so that the decision makers in the organization can improve existing processes, locate weaknesses and strengths and enable an intelligent decision regarding the future course of action and organizational strategy.
Information mining enables discovering interconnections originally unknown in the organization. It is noteworthy to point out that information mining is one stage in the process of creating the knowledge and understanding it. The final product of the entire process is knowledge of first-class importance, whether if in the field of business intelligence or many other organizational fields. It enables intra-organizational transparency and a high level of exposure for senior management.
Knowledge mining reports can find interesting patterns in data that cannot be discovered using "regular" reports". "Regular" reports are reports produced by a report generator, similar to the generator found software such as Cognos, Business Objects etc. These reports are produced by slicing the data, sorting it and activating arithmetical actions. These reports answer many needs of those using the software, yet cannot produce this way reports which answer the following questions: What characterizes clients? What characterizes malfunctions in the production process? In order to answer these questions, there is a need to analyze documents using data mining. In the world of information mining, there is a field of mining data from texts, a field appropriately named Text Mining. This is the field we are interested in as knowledge managers since it is a field in our area of responsibility. Mining information from a text is defined as a process of text analysis and language pattern characterization in order to extract the information hidden in the texts. The information produced in this process can be the name of the author, the title of the article, or its publication date but can also be: the content of the article, identifying patterns, relationships between entities etc. Furthermore, the produced information can serve as the basis for creating taxonomy and sorting similar documents to shared categories in the hierarchical subject-tree. The efficiency of extracting information from the text is measured by criteria of coverage and precision.
This field is one of the most advancing fields in knowledge mining. The simple and most common problem in the field is the categorization of text documents. The most basic method is finding words and terms which characterize the reviewed document (terms which appear in the document more than "expected") and cataloguing the document into its category group by using the information in text mining technologies.
Although categorizing in itself is a rather limited task, the field is flourishing both due to the large variety of uses which require this sort of treatment (search engines, intelligence systems etc.) and the enormous amount of information saved as text compared to the amount of information saved in tables.
With the accumulation of information comes the need to provide it meaning for the organization, to refine its collecting abilities as well as its analyzing and implementation. The point is to turn information which is almost static into knowledge which provides competitive edge and control on the organization and enables a correct and updated view of the market and provides the ability to respond to changes in it in real time.
Text mining technology is still taking its baby steps. Nevertheless, the various abilities it presents us knowledge managers are beneficial for a more precise collection of knowledge already existing in organizations.
Experts evaluate that in the near future, as other technologies evolve, it will be possible to discover other horizons in fields which transform knowledge existing in documents into an asset for the whole organization.