Free web crawlers
80legs
80legs is a web crawling service that allows its users to create and run web crawls through its software as a service platform.
80legs is a web crawling service that allows its users to create and run web crawls through its software as a service platform.
cURL
cURL is a computer software project providing a library and command-line tool for transferring data using various protocols.
cURL is a computer software project providing a library and command-line tool for transferring data using various protocols.
DataparkSearch
DataparkSearch is a search engine designed to organize search within a website, group of websites, intranet or local system.
DataparkSearch is a search engine designed to organize search within a website, group of websites, intranet or local system.
GWget
GWget is a free graphical frontend for of Wget.
GWget is a free graphical frontend for of Wget.
Heritrix
Heritrix is the Internet Archive’s web crawler, which was specially designed for web archiving.
Heritrix is the Internet Archive’s web crawler, which was specially designed for web archiving.
HTTrack
HTTrack is a free and open source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License.
HTTrack is a free and open source Web crawler and offline browser, developed by Xavier Roche and licensed under the GNU General Public License.
libwww
libwww is a highly-modular client-side web API for Unix and Windows, and is also the name of the reference implementation of this API.
libwww is a highly-modular client-side web API for Unix and Windows, and is also the name of the reference implementation of this API.
Methabot
Methabot is a scriptable web crawler designed for flexibility and speed.
Methabot is a scriptable web crawler designed for flexibility and speed.
mnoGoSearch
mnoGoSearch is an open source search engine for Unix-like computer systems written in C. It is distributed under the GNU General Public License and designed to organize search within a website,...
mnoGoSearch is an open source search engine for Unix-like computer systems written in C. It is distributed under the GNU General Public License and designed to organize search within a website,...
Nutch
Nutch is an effort to build an open source web search engine based on Lucene Java for the search and index component.
Nutch is an effort to build an open source web search engine based on Lucene Java for the search and index component.
Wget
GNU Wget (or just Wget, formerly Geturl) is a computer program that retrieves content from web servers, and is part of the GNU Project.
GNU Wget (or just Wget, formerly Geturl) is a computer program that retrieves content from web servers, and is part of the GNU Project.
YaCy
YaCy (read "ya see") is a free distributed search engine, built on principles of peer-to-peer (P2P) networks.
YaCy (read "ya see") is a free distributed search engine, built on principles of peer-to-peer (P2P) networks.
Settings