Approaches and Stages in the Development of Enterprise Search Engines

Dov Berger
Oct 1, 2003
4 min read

Glowing digital brain with neural network lines, surrounded by data icons. Blue and black color scheme, futuristic and tech-oriented mood.

With the beginning of the information revolution in the early 1990s, an exponential increase began in the amount of information flowing to organizations and in the diversity of its sources.

These enormous amounts of information gave rise to the need for appropriate tools that would enable the organization and its members to navigate the vast knowledge present within it, which was previously hidden from view. The most common tool is the enterprise search engine.

A search engine is a general term for an application designed to locate vital information and its source for an organization and its members when needed. However, behind it are hidden many approaches and various diverse technologies, which despite all the efforts and extensive investments in the field, we still feel that existing engines do not reach the most relevant information, and usually introduce a lot of "junk information" that causes noise in the results dimension and inefficiency in using the search engine in the organization.

So, how does one design a search engine? And what are the elements required to make it efficient and useful?

To answer this question, it is necessary to understand and trace the cognitive desires of the individual within the organization when using the search engine.

Sometimes an individual's desire is to locate a document based on a word they remember appearing in it, even if it is not particularly important to the document's content. Reflection.

In other cases, the desire is to find material and content about a specific topic that can be defined by a word or several words - Concept, Category.

Often, an individual desires to locate information about a topic they cannot define most unambiguously or objectively, but as the topic is perceived in their mind in the cognitive context.

The oldest and most common technology among search engines is the full-text index.

In this technology, every document in the organization undergoes cataloging (Indexation) of all the words appearing in it. During the search, rapid location of all documents containing the "Query Term" is enabled.

This technology enables fast performance across large amounts of information.

In searches based on reflective memory, aimed at locating a specific document by recalling a word that appears in it, this technology is very efficient and even necessary. However, when searching for information on topics, this technology often feeds the results page with excessive "noise" (many irrelevant documents), causing our search object to disappear and rendering the engine inefficient, as well as the search experience frustrating and deterring.

Over the years, this technology has undergone various improvements through the addition of linguistic elements, such as stemming engines and synonym dictionaries, to enhance the engine's capabilities for finding documents related to the search expression.

The result indeed contained relevant documents, but with an increasing ratio of "noise" additions.

Attempts to neutralize the "noise" by adding Boolean elements (Boolean search), filtering systems (Filter by), attempts to improve the order of results through statistical algorithms such as the effect of the quantity of "search expression" occurrences in the document or user feedback on the document, exhausted themselves and prepared the ground for the next technological leap (second generation) - search engines based on topics Concept/Category, "finding documents that talk about" and not necessarily "documents that contain."

These technologies are complex primarily because they presume to "know" what the document is talking about and to match it to the search object. These engines are usually based on statistical mathematical methods, such as Vector-Space, or artificial intelligence networks (Neural nets). These methods map the characteristic relationships between different words across the space of all documents in the organization's information sources (learning stage) and use this mapping to build a semantic map for each new document added to the organizational repository. This semantic map quantitatively and qualitatively expresses the linguistic relationships between all words in the document, providing information about the value of words and their connections to the document's content topics.

In the search stage, the engine matches the "search expression" to all documents in which the expression has a high value in their semantic map, or has a high connection to other words appearing in these maps. One of the special advantages of these engines, beyond reducing noise and increasing result relevance, is the ability to locate documents of high value around the "search expression" topic, even though the expression itself does not appear in these documents, based on a significant relationship expressed from other documents. In other words, there is an overall influence of the entire organizational information repository on a document's belonging to the search expression, something that indexical engines are prevented from doing.

Despite a significant reduction of "noise" and a broad expansion of the ability to find documents relevant to the search topic, it turns out that identical topics have many contextual angles and possibilities that do not always align with the searcher's desire.

To try to achieve high compatibility with the searcher's desire, contextual algorithms must be added that try to trace the searcher's experience (activity) about the "search expression," as well as trace the "activity of target documents" and their users.

Through statistical-semantic analyses of this activity, combined with an understanding of the socio-organizational structure, these contextual engines (Contextual Engines) can leverage conceptual technology and sharpen the results dimension (True Relevancy) to the desired context for the searcher.

Approaches and Stages in the Development of Enterprise Search Engines

Want to learn more about portals and channels?

Here are some articles you might find interesting:

Recent Posts

Comments