Entrecard


Searching for PDFs and Multimedia with Google

November 1st, 2008 | Posted in web news

If you're new here, you may want to subscribe to my RSS feed. Thanks for visiting!

Google rarely includes scanned documents in its search results since it cannot determine the nature of the content. This will change according to Google since it is planning to use advanced optical character recogniton (OCR) software to make PDF search a reality.

The advanced form of the OCR technology will convert scanned documents into equivalent text files for searching and indexing. This technology will also aid Google in its Book Search project, for scanning the entire book collections of the world’s major libraries. This is taking place at a rate of 3,000 books per day.

This step highlights the cumbersome process that currently exists when it comes to search retrieval. Currently, only multimedia material that is tagged as text can be searched. Ideally a search query could provide results of text, audio, video, etc’, even when no text is used as a tag.

Adobe has made some advances with this type of search when it revealed in July that its Flash player could be used by search engines to index Flash files.

1 Comment

  1. 1
    EasyProfitPack // November 4th, 2008 at 6:19 am

    Well that is news!

    I think this is why Google is untouchable!

    Well this makes our search a bit more variant!

Leave a Comment