Spam weapon helps preserve books
Many websites use an automated test to tell computers and humans apart when signing up to an account or logging in.
Carnegie Mellon is using this test to help decipher words in books that machines cannot read by letting sites use them to authenticate log-ins.
The team is involved in digitising old books and manuscripts supplied by a non-profit organisation called the Internet Archive, and uses Optical Character Recognition (OCR) software to examine scanned images of texts and turn them into digital text files which can be stored and searched by computers.
The only reliable way to decode them is for a human to examine them individually - a mammoth task since CMU processes thousands of pages of text every month.
Thanks to the adoption of reCAPTCHAs by popular websites like Facebook, Twitter and StumbleUpon, the system is helping to decipher about one million words every day for CMU's book archiving project, according to von Ahn.
"There's no danger of us running out of words," says von Ahn. "There's still about 100 million books to be digitised, which at the current rate will take us about 400 years to complete."