A Case Study in Web Search using TREC Algorithms: Paper from WWW10 by Google employees Amit Singhal and Marcin Kaszkiel.
Building a Distributed Full-Text Index for the Web: Paper from WWW10 by Sergey Melnik, Sriram Raghavan, Beverly Yang, Hector Garcia-Molina from the Computer Science Department at Stanford University.
Dynamic Data Mining: Exploring Large Rule Spaces by Sampling: Paper by Sergey Brin and Lawrence Page, available in Postscript, PDF, and plain text formats.
Finding Near-replicas of Documents on the Web: By Narayanan Shivakumar and Hector Garcia-Molina. Available in Postscript format.
Method for Node Ranking in a Linked Database: United States Patent 7,058,628, granted to Lawrence Page, which incorporates material from two earlier patents relating to the PageRank system used by Google.
Papers by Googlers: Google supplies a partial list of papers written by people now at Google.
The Anatomy of a Large-Scale Hypertextual Web Search Engine: The definitive paper by Sergey Brin and Lawrence Page describing PageRank, the algorithm that was later incorporated into the Google search engine.
The Nature of Meaning in the Age of Google: Terrence A. Brooks writes a paper about how search engines are changing the way we understand the world around us.
The PageRank Citation Ranking: Bringing Order to the Web: Stanford paper by Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, describing PageRank as a static ranking, performed at indexing time, which interprets a link as a vote. Available in Postscript, PDF, and plain text formats.
Topic-Sensitive PageRank: Taher H. Haveliwala's paper for the 11th International World Wide Web Conference explains that Google proposes to make PageRank reflect importance with respect to a particular topic.
United States Patent: 6,526,440: Ranking search results by reranking the results based on local inter-connectivity. Inventor Krishna Bharat; assignee Google.
WWW2003: Detecting Near-replicas on the Web by Content and Hyperlink Analysis: Paper by Ernesto Di Iorio, et. al. proposing a technique for finding lists of similar documents, based on a pair of signatures which take into account both the document contents and the hyperlink structure.