Distributed Proofreaders

Web-based proofreading project


title: "Distributed Proofreaders" type: doc version: 1 created: 2026-02-28 author: "Wikipedia contributors" status: active scope: public tags: ["collaborative-projects", "crowdsourcing", "distributed-computing-projects", "human-based-computation", "internet-properties-established-in-2000", "mass-digitization", "proofreading"] description: "Web-based proofreading project" topic_path: "technology/computing" source: "https://en.wikipedia.org/wiki/Distributed_Proofreaders" license: "CC BY-SA 4.0" wikipedia_page_id: 0 wikipedia_revision_id: 0

::summary Web-based proofreading project ::

::data[format=table title="Infobox website"]

FieldValue
nameDistributed Proofreaders
logoDistributed Proofreaders logo.svg
logo_size300px
logo_alt"Distributed Proofreaders" set in blue serif text, with the second word beginning at the bottom right of the first one. Below the second word, "Preserving History One Page at a Time." in black serif text.
logo_caption
screenshotDistributed Proofreaders.png
collapsible
collapsetext
background
screenshot_size
screenshot_altScreenshot of the proofreading interface on Distributed Proofreaders
captionScreenshot of the proofreading interface on Distributed Proofreaders
url
commercialNo
typeNot-for-profit
languageEnglish, French, German
language_count3
registrationOptional
content_licensePublic domain
programming_languagePHP
country_of_originUnited States
ownerDistributed Proofreaders Foundation (DPF)
author
founderCharles Franks
launch_date
GMLinda Hamilton
alexa
current_statusActive
oclc1087497129
::

| name = Distributed Proofreaders | logo = Distributed Proofreaders logo.svg | logo_size = 300px | logo_alt = "Distributed Proofreaders" set in blue serif text, with the second word beginning at the bottom right of the first one. Below the second word, "Preserving History One Page at a Time." in black serif text. | logo_caption = | screenshot = Distributed Proofreaders.png | collapsible = | collapsetext = | background = | screenshot_size = | screenshot_alt = Screenshot of the proofreading interface on Distributed Proofreaders | caption = Screenshot of the proofreading interface on Distributed Proofreaders | url = | commercial = No | type = Not-for-profit | language = English, French, German | language_count = 3 | registration = Optional | num_users = | content_license = Public domain | programming_language = PHP | country_of_origin = United States | owner = Distributed Proofreaders Foundation (DPF) | author = | founder = Charles Franks | launch_date =
| GM = Linda Hamilton | alexa = | current_status = Active | oclc = 1087497129 | footnotes =

Distributed Proofreaders (commonly abbreviated as DP or PGDP) is a web-based project that supports the development of e-texts for Project Gutenberg by allowing many people to work together in proofreading drafts of e-texts for errors. the site had digitized 50,000 titles.

History

Distributed Proofreaders was founded by Charles Franks in 2000 as an independent site to assist Project Gutenberg. Distributed Proofreaders became an official Project Gutenberg site in 2002.

On 8 November 2002, Distributed Proofreaders was slashdotted, and more than 4,000 new members joined in one day, causing an influx of new proofreaders and software developers, which helped to increase the quantity and quality of e-text production.

In 2006, the Distributed Proofreaders Foundation was formed to provide Distributed Proofreaders with its own legal entity and not-for-profit status, separate from Project Gutenberg. The founding trustees were Charles Franks, Juliet Sutherland, and Gregory B. Newby.

In July 2015, the 30,000th Distributed Proofreaders produced e-text was posted to Project Gutenberg. DP-contributed e-texts comprised more than half of works in Project Gutenberg by 2009.

Proofreading process

DP servers are located in the United States, and therefore works must be cleared by Project Gutenberg as being in the public domain according to United States copyright law before they can be proofread and eventually published.

Public domain works, typically books with expired copyright, are scanned by volunteers or sourced from digitization projects, and the images are run through optical character recognition (OCR) software. Since OCR software is far from perfect, the resulting text always includes errors. To correct them, pages are made available to volunteers via the Internet; the original page image and the recognized text appear side by side.{{cite conference |author1=Gentry, Craig |author2=Ramzan, Zulfikar |author3=Stuart Stubblebine | title=Secure Distributed Human Computation | book-title=Financial cryptography and data security: 9th International Conference | page=329 | date=February 28 – March 3, 2005 | volume=3570 | series=Lecture Notes in Computer Science |editor=Andrew S. Patrick |editor2=Moti Yung |editor2-link=Moti Yung | location=Roseau, The Commonwealth of Dominica | publisher=Springer | isbn=3-540-26656-9 | url=https://books.google.com/books?id=JegO2ly7IccC&pg=PA329 | doi=10.1145/1064009.1064026 }} Each set is presented to multiple volunteers to enter corrections, which results in a combined dataset that minimizes errors. This process distributes the time-consuming error-correction process with a method akin to distributed computing.

A post-processor combines the pages and prepares the text for uploading to Project Gutenberg.

Besides custom software created to support the project, DP also runs a forum and a wiki for project coordinators and participants.

Related projects

DP Europe

In January 2004, Distributed Proofreaders Europe started, hosted by Project Rastko, Serbia. This site had the ability to process text in Unicode UTF-8 encoding. Books proofread centered on European culture, with a considerable proportion of non-English texts including Hebrew, Arabic, Urdu, and many others. , DP Europe had produced 787 e-texts, the last of these in November 2011.

DP Canada

In December 2007, Distributed Proofreaders Canada launched to support the production of e-books for Project Gutenberg Canada and take advantage of shorter Canadian copyright terms. Although it was established by members of the original Distributed Proofreaders site, it is a separate entity. All its projects are posted to Faded Page, their book archive website. In addition, it supplies books to Project Gutenberg Canada, and, where copyright laws are compatible, to the original Project Gutenberg.

Milestones

The source for many of these entries is the DP Timeline. ::data[format=table]

MilestoneDateE-textFirst1,000th2,000th3,000th4,000th5,000th10,000th15,000th20,000th25,000th30,000th35,000th40,000th45,000th50,000th
1 Oct 2000The Odyssey, Homer, Lang tr. (first pages for proofreading)
19 Feb 2003Tales of St. Austin's, P. G. Wodehouse
3 Sep 2003''Hamlet — the 'Bad Quarto''', William Shakespeare
14 Jan 2004The Anatomy of Melancholy, Robert Burton
6 Apr 2004Aventures du Capitaine Hatteras, Jules Verne
24 Aug 2004A Short Biographical Dictionary of English Literature, John William Cousin
9 Mar 2007(See 10,000th e-book below.)
12 May 2009Philosophical Transactions of the Royal Society - Vol 1 - 1666, various, Henry Oldenburg (editor)
10 April 2011(See 20,000th e-book below.)
10 April 2013The Art and Practice of Silver Printing, H. P. Robinson and Capt. Abney{{cite weburl=https://blog.pgdp.net/2013/04/10/a-silver-anniversary-25000-titles-posted/title=A Silver Anniversary—25,000 Titles posted to Project Gutenberg!date=10 April 2013publisher=Pgdp.netaccess-date=20 October 2025}}
7 July 2015Graded Literature Readers: Fourth Book{{cite weburl=https://blog.pgdp.net/2015/07/07/celebrating-30000-titles/title=Celebrating 30,000 Titlesdate=7 July 2015publisher=Pgdp.netaccess-date=20 October 2025}}
26 Jan 2018Shores of the Polar Sea, a Narrative of the Arctic Expedition of 1875–1876{{cite weburl=https://blog.pgdp.net/2018/01/26/celebrating-35000-titles/title=Celebrating 35,000 Titlesdate=26 January 2018publisher=Pgdp.netaccess-date=20 October 2025}}
10 October 2020All four volumes of London Labour and the London Poor{{cite weburl=https://blog.pgdp.net/2020/10/10/celebrating-40000-titlestitle=Celebrating 40,000 Titlesdate=10 October 2020publisher=Pgdp.net}}
18 January 2023Down the Mackenzie and Up the Yukon in 1906,{{cite weburl=https://blog.pgdp.net/2023/01/18/celebrating-45000-titles/title=Celebrating 45,000 Titlesdate=18 January 2023publisher=Pgdp.net}} Elihu Stewart
7 December 2025A Dictionary of the Art of Printing
::

10,000th e-book

On 9 March 2007, Distributed Proofreaders announced the completion of more than 10,000 titles. In celebration, a collection of fifteen titles was published:

20,000th e-book

On April 10, 2011, the 20,000th book milestone was celebrated as a group release of bilingual books:

  • The Renaissance in Italy–Italian Literature, Vol 1, John Addington Symonds (English with Italian)
  • Märchen und Erzählungen für Anfänger; erster Teil, H. A. Guerber (German with English)
  • Gedichte und Sprüche, Walther von der Vogelweide (Middle High German (–1500) with German)
  • Studien und Plaudereien im Vaterland, Sigmon Martin Stern (German with English)
  • Caos del Triperuno, Teofilo Folengo (Italian with Latin)
  • Niederländische Volkslieder, Hoffmann von Fallersleben (German with Dutch)
  • A "San Francisco", Salvatore Di Giacomo (Italian with Neapolitan)
  • O' voto, Salvatore Di Giacomo (Italian with Neapolitan)
  • De Latino sine Flexione & Principio de Permanentia, Giuseppe Peano (1858–1932) (Latin with Latino sine Flexione)
  • Cappiddazzu paga tuttu—Nino Martoglio, Luigi Pirandello (Italian with Sicilian)
  • The International Auxiliary Language Esperanto, George Cox (English with Esperanto)
  • Lusitania: canti popolari portoghesi, Ettore Toci (Italian with French)

30,000th e-book

On 7 July 2015, the 30,000th book milestone was celebrated with a group of thirty texts. One was numbered 30,000:

  • Graded literature readers - Fourth book, editors: Harry Pratt Judson and Ida C. Bender, 1900

40,000th e-book

On 10 October 2020, the 40,000th book milestone was celebrated with the completion of a four-volume work, London Labour and the London Poor, by Henry Mayhew.

50,000th e-book

On 7 December 2025, the 50,000th book milestone was celebrated with the posting of A Dictionary of the Art of Printing by William Savage.

References

References

  1. "Distributed Proofreaders". github.com.
  2. jandac. (2025-12-07). "Celebrating 50,000 Titles | Hot off the Press". Distributed Proofreaders.
  3. Lessig, Lawrence. (2009). "Remix: Making Art and Commerce Thrive in the Hybrid Economy". Penguin.
  4. (11 August 2025). "Project Gutenberg".
  5. "Gutenberg:Volunteers' Voices". [[Project Gutenberg]].
  6. (12 November 2002). "Distributed Proofreading's slashdotting". [[Boing Boing]].
  7. (2025-06-18). "Distributed Proofreaders Foundation History".
  8. John, Last. (October 31, 2025). "Obituary: Project Gutenberg CEO Greg Newby helped put a trove of literature online".
  9. (2003). "2003 Joint Conference on Digital Libraries, 2003. Proceedings.". IEEE Comput. Soc.
  10. Piotrowski, Michael. (2022-05-31). "Natural Language Processing for Historical Texts". Springer Nature.
  11. (2009-10-15). "Security Protocols: 14th International Workshop, Cambridge, UK, March 27-29, 2006, Revised Selected Papers". Springer Science & Business Media.
  12. Lebert, Marie. (November 4, 2010). "Distributed Proofreaders, producteur des livres du Projet Gutenberg, a 10 ans". Actualitté.
  13. Lebert, Marie. (November 5, 2010). "Distributed Proofreaders just celebrated its 10th anniversary".
  14. "DP Timeline".
  15. [http://blog.pgdp.net/2011/04/09/distributed-proofreaders-celebrates-20000-books-posted/ Distributed Proofreaders celebrates 20,000 books posted] {{Webarchive. link. (2011-06-19 , Distributed Proofreaders, April 10, 2011)
  16. "Distributed Proofreaders • View topic - 30,000 Unique Titles Preserved!". Pgdp.net.
  17. (10 October 2020). "Celebrating 40,000 Titles". Pgdp.net.

::callout[type=info title="Wikipedia Source"] This article was imported from Wikipedia and is available under the Creative Commons Attribution-ShareAlike 4.0 License. Content has been adapted to SurfDoc format. Original contributors can be found on the article history page. ::

collaborative-projectscrowdsourcingdistributed-computing-projectshuman-based-computationinternet-properties-established-in-2000mass-digitizationproofreading