Distributed search engine
title: "Distributed search engine" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["internet-search-engines", "internet-service-providers"] topic_path: "general/internet-search-engines" source: "https://en.wikipedia.org/wiki/Distributed_search_engine" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0
A distributed search engine is a search engine where there is no central server. Unlike traditional centralized search engines, work such as crawling, data mining, indexing, and query processing is distributed among several peers in a decentralized manner where there is no single point of control.
History
Presearch
Main article: Presearch (search engine)
Started in 2017, Presearch is an ERC20 powered (PRE) search engine powered by a distributed network of community operated nodes which aggregate results from a variety of sources. This powers the searches at presearch.com. This is planned to be a precursor where each node collaborates on a global decentralised index. |url=https://www.presearch.io/ |title=Presearch is a Decentralized Search Engine Presearch averages 5 million searches per day and has 2.2 million registered users. On Sept 1, 2021, Presearch was added as a default option to the search engine list on Android for the EU. On May 27, 2022, Presearch officially transitioned from its Testnet to a Mainnet. This means all search traffic through the service now runs over Presearch's decentralized network of volunteer-run nodes.
[[YaCy]]
On December 15, 2003, Michael Christen announced development of a P2P-based search engine, eventually named YaCy, on the heise online forums. | title = YaCy: News | archiveurl = https://web.archive.org/web/20051124084140/http://www.yacy.net/yacy/News.html | archivedate = 2005-11-24 | url= http://www.yacy.net/yacy/News.html | url = http://www.heise.de/newsticker/foren/S-Ich-entwickle-eine-P2P-basierende-Suchmaschine-Wer-macht-mit/forum-50682/msg-4744034/read/ | title = Ich entwickle eine P2P-basierende Suchmaschine. Wer macht mit? | author = Michael Christen | publisher = heise online
[[Seeks]]
Seeks was an open source websearch proxy and collaborative distributed tool for websearch. It ceased to have a usable release in 2016.
InfraSearch
In April 2000 several programmers (including Gene Kan, Steve Waterhouse) built a prototype P2P web search engine based on Gnutella called InfraSearch. The technology was later acquired by Sun Microsystems and incorporated into the JXTA project. |url=http://www.redherring.com/Home/9528 |title=Can peer-to-peer grow up? |author=Justin Hibbard |publisher=Red Herring It was meant to run inside the participating websites' databases creating a P2P network that could be accessed through the InfraSearch website. | title = Move Over Yahoo, Here Comes InfraSearch | author = Simon Foust |website= Dmusic | archiveurl = https://web.archive.org/web/20001013141235/http://www.dmusic.com/news/news.php?id=2614 | archivedate = 2000-10-13 | url= http://www.dmusic.com/news/news.php?id=2614 | title = Peer-to-peer networking is poised to revolutionize the Internet once again | author = Sean M. Dugan | magazine = InfoWorld | archiveurl = https://web.archive.org/web/20001018022633/http://www.infoworld.com/articles/op/xml/00/07/17/000717opprophet.xml | archivedate = 2000-10-18 | url= http://www.infoworld.com/articles/op/xml/00/07/17/000717opprophet.xml | url = http://news.cnet.com/2100-1023-241223.html | title = Napster-like technology takes Web search to new level | author = John Borland | publisher = Cnet
Opencola
On May 31, 2000 Steelbridge Inc. announced development of OpenCOLA a collaborative distributive open source search engine. | title = Software launched with a little pop | author = David Akin | author-link = David Akin | newspaper = Financial Post | url= https://nationalpost.com/financialpost.asp?f=000531/303636.html/17/000717opprophet.xml It runs on the user's computer and crawls the web pages and links the user puts in their opencola folder and shares resulting index over its P2P network. | url = http://www.techreview.com/web/12360/?a=f | title = OpenCola-Have Some Code and a Smile | author = Paul Heltzel | magazine = Technology Review
Faroo
In February 2001 Wolf Garbe published an idea of a peer-to-peer search engine, |url = http://www.pubzone.org/dblp/journals/wi/Garbe01 |title = BINGOOO - Die Transformation des World Wide Web zur virtuellen Datenbank |author = Wolf Garbe |publisher = Wirtschaftinformatik |language = German |quote = ... Wir setzen dem das Konzept einer verteilten Peer-to-Peer-Suchmaschine entgegen [We counter with the concept of a distributed peer-to-peer search engine] ... |access-date = 2010-12-21 |archive-url = https://web.archive.org/web/20140202093532/http://www.pubzone.org/dblp/journals/wi/Garbe01 |archive-date = 2014-02-02 |url-status = dead started the Faroo prototype in 2004, |url = http://www.readwriteweb.com/start/2009/12/technical-qa-with-faroo-founder.php |title = Technical Q&A With FAROO Founder |author = Bernard Lunn |publisher = ReadWriteWeb |quote = ... When I started to work on the first prototype in 2004 ... |url-status = dead |archiveurl = https://web.archive.org/web/20110214194656/http://www.readwriteweb.com/start/2009/12/technical-qa-with-faroo-founder.php |archivedate = 2011-02-14 and released it in 2005. | title = FAROO: History | archiveurl = https://web.archive.org/web/20080322000927/http://www.faroo.com/english/download/history.html | archivedate = 2008-03-22 | url= http://www.faroo.com/english/download/history.html | url = http://blog.faroo.com/2010/01/03/revisited-deriving-crawler-start-points-from-visited-pages-by-monitoring-http-traffic/ | title = Revisited: Deriving crawler start points from visited pages by monitoring HTTP traffic | publisher = Faroo
Goals
The goals of building a distributed search engine include:
-
to create an independent search engine powered by the community;
-
to make the search operation open and transparent by relying on open-source software;
-
to distribute the advertising revenue to node maintainers, which may help create more robust web infrastructure;
-
to allow researchers to contribute to the development of open-source and publicly-maintainable ranking algorithms and to oversee the training of the algorithm parameters.
Challenges
-
The amount of data to be processed is enormous. The size of the visible web is estimated at 5PB spread around 10 billion pages.
-
The latency of the distributed operation must be competitive with the latency of the commercial search engines.
-
A mechanism that prevents malicious users from corrupting the distributed data structures or the rank needs to be developed.
References
References
- (2021-09-01). "Google Adds Presearch As A Default Option on Android Devices in EU".
- Kan, Michael. (2022-05-26). "The Next Google? Decentralized Search Engine 'Presearch' Exits Testing Phase".
::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::