A really interesting topic rose up recently on the difference of a text mining solution vs Google search solution at the text analytics forum.
When Stanley Kubrick did his movie 2001: A Space Odyssey in 1968 Artificial
Intelligence(AI) research was creating really high expectation. Why AI
didn't rich such expectations?. Maybe an overestimation of computer
capabilities or that they didn't realize the tremendous amount of
information and heuristics the human being takes into account on daily
communications. The human comprehension process is complex and is the
result of relating environment information, previous knowledge and the
new input of the conversation.
For decades know, computer scientist have been trying to organize data
in a way that can be heavily analyzed by computers, and therefore we
can extract information (data mining). Organizing data in databases
(structure in fields and with restrictive values) to apply statistical
analyses, find correlations and make decisions are now processes well
established. In fact, the risk analyses of the insurance companies our
banks are based on this kind of technologies.
When it comes to free text data these analyses become more difficult.
The data is not stored and organized a priori for a computer to
analyze it, is organize for a human to understand it. Text mining
techniques try to organize the text in order to be able to analyze it
with computers algorithms. Text mining technologies study the words in
its context, their meaning and their role in the sentence. Is not the
same "Flying planes are dangerous" and "Flying planes is dangerous".
The main difference of text mining and search technologies is that,
when we search we are trying to find something somewhere, when we
apply text mining technologies we are trying to have a better
understanding of whatever we are searching. Therefore, if we know what
we are looking for (where I can get a certain type of shoes?) search
technologies such as Google's are really efficient. If we want to
understand impact of and antibiotic in our body and the environment,
search technologies will make us read to much and have a narrower
picture; text mining technologies would be more efficient.