Getting Spammed? Help Scan a Book!

Humans are apparently much better than machines at decoding words than OCR scanners are, so Carnegie Mellon University is putting the unreadable words online for the world to decipher. All in the interest of enhancing their digitizing efforts for the Internet Archive.

They’ve set up ReCAPTCHA, a free CAPTCHA service that gives webmasters the opportunity to add spam-defeating interfaces to websites. What’s the connection? Well, you’ve seen those small forms that force you to type in a word in order to successfully submit? On a ReCAPTCHA form, there is a second word in the CAPTCHA image that an OCR scanner couldn’t read well enough to decipher while scanning a book for the Archive.

If a website user decodes the first word successfully, the system assumes that they also decoded the second word, which becomes a candidate for being marked as deciphered. The system sends the second word to a second tier of CAPTCHAs, and if all of the second set of CAPTCHAs come up with the same reading, it is considered decoded and sent back to the database.

Their tagline? STOP SPAM. READ BOOKS.

Student club at the iSchool@UBC

Getting Spammed? Help Scan a Book!

Leave a Reply Cancel reply

Alex Garnett

Posts by Alex Garnett

how can we be of ässistance?

ASIS&T Productivity Workshop

Wrap Up – ASIST 09 Conference

Conference Musings Day 2

The Killer App of the Internet is Other People – ASIST Day 1

Comments by Alex Garnett

Getting Spammed? Help Scan a Book!

Leave a Reply Cancel reply

The Good, Bad, and Ugly of Google Books

New Officers Elected

Alex Garnett

Posts by Alex Garnett

how can we be of ässistance?

ASIS&T Productivity Workshop

Wrap Up – ASIST 09 Conference

Conference Musings Day 2

The Killer App of the Internet is Other People – ASIST Day 1

Comments by Alex Garnett