Skip to main content

Consortium for Mathematics and its Applications

Product ID: Articles
Supplementary Print
Undergraduate

Searching for Text in Vector Space

Author: Nicholas A. Dovidio, Timothy P. Chartier


Introduction

Submitting a query to a search engine is a common method of information retrieval. Companies compete to be listed high in the rankings returned by a search engine. In fact, some companies' business is to help raise the rankings of a paying customer's web page. This is done by exploiting their knowledge of the algorithms used by search engines. While a certain amount of information about search engine algorithms is known, there is a certain amount that is proprietary. In Chartier [2006], the PageRank method was discussed; this algorithm, patented by Google, examines the link structure of theWorldWideWeb in order to determine which pages are most important. Note that such an algorithm does not look at the content of Web pages. So, such results must be combined with a variety of other techniques to decide which Web pages appear first in the rankings returned by a search engine for a particular query.

This article discusses how one can rank Web pages based on content. The article is introductory in nature. An interested reader is encouraged to research the literature on search engine analysis, which is an ever-growing field. In particular, we will consider a vector spacemodel for performing a search. This method does not take into account the hyperlink structure of the World Wide Web. As such, the rankings from the vector space model could be aggregated with the results of PageRank, for instance, to produce a final ranking based on the hyperlink structure of the web andWeb pages' content.

The vector space model we consider in this article analyzes the content of individualWeb pages. We will see why this method is typically not used for searching on the web but rather smaller databases. Nonetheless, the ideas of this article can give a reader insight on a mathematical technique for rankingWeb pages based on content and relative to a query. Further, the reader will be introduced to the complexity of this problem and the type of innovative mathematics that is utilized everyday when we submit queries to search engines and use the results.

©2008 by COMAP, Inc.
The UMAP Journal 29.4
14 pages

Mathematics Topics:

Application Areas:

You must have a Full Membership to download this resource.

If you're already a member, login here.

Not yet a member?