Comparative study of searches on internet

Introduction to the project area

In recent years, the Web has grown exponentially in size and capacity. Paradoxically, this non-stopping evolution became one of its main problems.While the Web offers every day much more functionalities, at the same time the large number of information emerging leads it to not optimal processes of management and discovering of this information. In other words, as more information becomes available on Internet, precise answers are difficult to find. This means that searches in Internet are not optimal due to the overloading of existing information and its redundancy. This system is very poor to the end user who has to filter the information himself among a huge amount of results to find the desired one. Therefore, this project aims to offer better searches using users' keywords to describe the solution to their problems.

Although,iIt is widely known that everyday more companies are using Internet and offering their services online. It is possible thanks to software as it allows evolution and leads to transformation in the way companies do their business.Nowadays, software in organizations is fully integrated and intelligently controls infinitely many complex processes, while still being flexible enough to adapt to changing business needs.

In the case of tourism market, it could be considered as one of the most benefited fields from the use of E-Commerce business and semantic Web technologies, due to the significant heterogeneity of the market and information sources, and due to the high volume in online transactions (Werthner & Klein, 1999). It can also be observed that E-commerce has transformed the business models and processes of the tourism industry much quicker and more substantially than rest type of business within Business to Consumer (B2C) (Werthner & Ricci, 2004).

However, current Web technology suffers from the following limitations:

  1. Searches are limited to keywords and cannot be based on concepts. Thus, there is no use at all of homonyms and synonyms what entails that the result suffers of lack of precision and also not all matching resources are found.
  2. Facts from multiple sources cannot be automatically combined in order to answer a query."

"Today's information management concepts and solutions for the complex tasks of tourism market - are still low-level from a semantic point of view. Hence, information creation, maintenance and delivery being the primary business process of tourism intermediaries face heterogeneities in various dimensions, require manual coordination tasks, and suffer from missing consensus on agreed concepts and technologies." (

The semantic web in Berners-lee word is "an extension of the current web in which information is given well-defined meaning, better enabling computers and people to work in cooperation." (Tim Berners-Lee et al., 2001). By using semantic web technologies, it is possible to annotate information sources on tourism services and products with the concepts included in a defined ontology, improving searches and allowing interoperability between computer systems by reducing the mediation of the user. The resulting metadata or enriched information can be used to match tailored tourism related information or package holidays to client preferences.

The nature and the methodology of the project

The area of research is related with semantic technologies in online searches. This study will be applied on the tourism sector; therefore the final output will be a dual system: a touristic search engine based on keywords and other one based on ontologies, making possible the comparison of the quality of the results by the final user.

It is a fact that current keyword-based search engines could not aid the users to retrieve the detailed information they need with desirable semantic content unless the users submit the exact words of the required service. In order to facilitate the retrieval of accurate touristic solutions, the semantic pre-annotation of the touristic information is inevitable. Thus, this project intends to demonstrate how we can support semantic retrieval of the information.

Typically, the design of keyword based searches and semantic searches comprise the following mechanisms:

Keyword searches

The traditional way to find results from a search engine is through robust comparisons made between the text of the HTML sites pages already stored and indexed in its database.

To decide if the page is related to the search topic is estimated how many times and where on the webpage the keyword appears. This way of handling the search without any categorization, but only with a reference plain text, creates inaccurate results in many cases (Cruz et al., 2002).

Ontologies & Semantic searches

An ontology is a formal representation of a particular domain or area of knowledge. The key concepts of the domain are identified and the relationships existing among them, as well. The result is a structure of related concepts that provides a common vocabulary for the domain of knowledge modeled by the ontology. Definitely, ontologies are very important, as they are used by people, databases and applications that need to share information about a specific domain.

An ontology consists of:

  • Classes:They are the domain concepts.
  • Properties:They fall into two types:
  • Relations: Linking two properties of the ontology.
  • Attributes: These are the characteristics of a class.
  • Individuals:They are concrete instances of a class.
  • Axioms:They are restrictions imposed on the elements of the ontology.

The main objectives of the project


To be able to make a sound comparison between a keyword based searcher, where is the best example; and a semantic searcher, where the use of metadata in searching allows the dynamic improvement of the results, focused on tourist packages marketing.

Core Objectives

To demonstrate how the semantic annotations of the tourist information can significantly enhance the performance of the users retrievals comparing them with traditional keyword search methods.


First phase (2 weeks):

Studying the state of the art, throughout previous works on the topic.

Second phase (1 week):

Gathering the test set information of touristic services and semantically annotates it.

Development of a touristic ontology.

Third phase (4 weeks):

Development of a traditional keyword search method.

Development of a semantic search method based on ontologies.

Comparison of both methods.

Fourth phase (5 weeks):

Researching and writing the final report.

Preparing the demonstration.

Software and hardware requirements

  • Operating System: Linux & Microsoft Windows.
  • Programming Language: Java.
  • Integrated Development Environment (IDE): Netbeans.
  • Database server: MySQL.
  • Query language: SPARQL.
  • Ontology editor: Protg.


