Google Releases Tesseract OCR Open Source Software

SearchEngineWatch announces “Google Opens Tesseract OCR Software”, which is exciting news for those of us who scan or want to covert a lot of documents to text:

The Google Code Blog announced that Google has “re-released” the Tesseract OCR software to the open source community. OCR, optical character recognition, is the technology for converting text on a physical paper into computer based text. So if you have a ton of papers you typed up in your college days and you want them stored in digital format, you can use OCR to translate those documents for you.

OCR (Optical Character Recognition) converts image scans of documents into text. Bitmaps, TIFF, and other image scans can be imported into the program and the software crawls through the images to detect recognizable letters of the alphabet.

There are limitations to OCR programs, but their ability to detect and generate nearly accurate results is amazing. Nuance’s OmniPage was able to scan hundreds and hundreds of pages typed on an old manual typewriter by some of my relatives about their life stories, and though it tried to make letters out of ink marks and the occassional coffee stain, the results were quite accurate. Even down to the misspelled words which were left intentionally misspelled, allowing me to choose which spelling I wanted, keeping their phonetic attempts at spelling or not.

Sourceforge.net has the download site for Tesseract OCR, and I’ll be installing it soon and putting it through it’s paces. I’ll report on how it does, though if you have used it or are familiar with OCR programs, I’d love your input and experiences.

Most Recent Articles by Lorelle VanFossen

About Lorelle VanFossen

Lorelle VanFossen hosts Family History Blog covering her ancestors and related family members. She is one of the top bloggers in the world, and host of the Lorelle on WordPress, providing WordPress and blogging tips for bloggers of all levels. A popular keynote speaker and trainer, she is also editor, producer, contributor, and official disruptive thinker for Bitwire Media which includes WordCast, Making My Life Network, Stories of Our Journeys, Life on the Road, WordCast Conversations, and the very popular WordCast Podcast.

View all posts by Lorelle VanFossen →

Google Releases Tesseract OCR Open Source Software

Most Recent Articles by Lorelle VanFossen

About Lorelle VanFossen

Leave a Reply Cancel reply

Information

Research Topics

Family History Updates

Eastman’s Online Genealogy

Dave's Whizmatronic Widgulating Calibrational Scribometer

West Family

Knapp Family

Anderson Family

Vaughn Family

Meta

Pages

Family Photos

Our Family Trees

Google Releases Tesseract OCR Open Source Software

Most Recent Articles by Lorelle VanFossen

About Lorelle VanFossen

Leave a Reply Cancel reply

Information

Research Topics

Family History Updates

Eastman’s Online Genealogy

Tags

Dave's Whizmatronic Widgulating Calibrational Scribometer

West Family

Knapp Family

Anderson Family

Vaughn Family

Meta