Information Retrieval: A Comparative Study of Textual Indexing using an Oriented Object Database (DB4O) and the Inverted File
Abstract
The Growth in the volume of text data such as books and articles in libraries for centuries has imposed to establish effective mechanisms to locate them. Early techniques such as abstraction, indexing and the use of classification categories have marked the birth of a new field of research called "Information Retrieval". Information Retrieval (IR) can be defined as the task of defining models and systems whose purpose is to facilitate access to a set of documents in electronic form (corpus) to allow a user to find the relevant ones for him, that is to say, the contents which matches with the information needs of the user. Most of the models of information retrieval use a specific data structure to index a corpus which is called "inverted file" or "reverse index". This inverted file collects information on all terms over the corpus documents specifying the identifiers of documents that contain the term in question, the frequency of each term in the documents of the corpus, the positions of the occurrences of the word...
In this paper we use an oriented object database (db4o) instead of the inverted file, that is to say, instead to search a term in the inverted file, we will search it in the db4o database.
The purpose of this work is to make a comparative study to see if the oriented object databases may be competing for the inverse index in terms of access speed and resource consumption using a large volume of data.
Keywords
Full Text:
PDFReferences
Ricardo B Y., Berthier R N. Modern information retrieval, ACM (Association for Computing Machinery).
Baziz, M. (2005). Indexation conceptuelle guide par ontologie pour la recherche d'information (Doctoral dissertation, Toulouse 3).
Mooers, C. N. (1948). Application of random codes to the gathering of statistical information (Doctoral dissertation, Massachusetts Institute of Technology).
KARBASI, S. Pondration des termes en Recherche dInformation (Doctoral dissertation, Toulouse 3).
Harrathi, F. (2009). Extraction de concepts et de relations entre concepts partir des documents multilingues: approche statistique et ontologique.
Salton, G. (1969). A comparison between manual and automatic indexing methods. American Documentation, 20(1), 61-71.
Mallak, I. (2011). De nouveaux facteurs pour l'exploitation de la smantique d'un texte en Recherche d'Information (Doctoral dissertation, Universit Paul Sabatier-Toulouse III).
Aouicha, M. B. (2009). Une approche algbrique pour la recherche d'information structure (Doctoral dissertation).
Barry, C. L. (1994). User-defined relevance criteria: an exploratory study.JASIS, 45(3), 149-159.
Boubekeur-Amirouche, F. (2008). Contribution la dfinition de modles de recherche d'information flexibles bass sur les CP-Nets (Doctoral dissertation, Universit de Toulouse, Universit Toulouse III-Paul Sabatier).
Roussey, C. (2001). Une mthode dindexation smantique adapte aux corpus multilingues. Institut National des Sciences Appliques de Lyon Lyon, Ecole Doctorale Informatique et Information pour la Socit.
Azzoug, W. (2014). Contribution la dfinition dune approche dindexation smantique de documents textuels.
Porter, M. F. (1980). An algorithm for suffix stripping. Program: electronic library and information systems, 14(3), 130-137.
Buckley, C., Singhal, A., Mitra, M., & Salton, G. (1995, November). New retrieval approaches using SMART: TREC 4. In Proceedings of the Fourth Text REtrieval Conference (TREC-4) (pp. 25-48).
Brini, A. H. (2005). Un modle de recherche d'information bas sur les rseaux possibilistes (Doctoral dissertation, Toulouse 3).
Maron, M. E., & Kuhns, J. L. (1960). On relevance, probabilistic indexing and information retrieval. Journal of the ACM (JACM), 7(3), 216-244.
Agrawal, R., Imieli?ski, T., & Swami, A. (1993, June). Mining association rules between sets of items in large databases. In ACM SIGMOD Record (Vol. 22, No. 2, pp. 207-216). ACM.
Tebri H. Formalisation et spcification dun systme de filtrage incrmental dinformation. Thse de doctorat de luniversit Paul Sabatier, Toulouse, 2004.
V.Rijsbergen C. J. Information Retrieval. Department of Computing Science University of Glasgow.
Iadh O. Un modle d'indexation relationnel pour les graphes conceptuels fond sur une interprtation logique, Thse pour obtenir le grade de Docteur de l'Universit Joseph Fourier, 1992.
Piwowarski B, Denoyer L, Gallinari P. Un modle pour la recherche dinformation sur des documents structurs. 6es Journes internationales dAnalyse statistique des Donnes Textuelles. LIP6, PARIS France, 2002.
Denos N. Modlisation de la pertinence en recherche d'information : modle conceptuel, formalisation et application. Thse pour obtenir le grade de Docteur de l'Universit Joseph Fourier-Grenoble I, 1997.
Refbacks
- There are currently no refbacks.
Copyright (c) 2015 Journal of Information Sciences and Computing Technologies
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © 2014 Journal of Information Sciences and Computing Technologies. All rights reserved.
ISSN: 2394-9066
For any help/support contact us at jiscteditor@scitecresearch.com.