Web lists-archives.com

searching non plain text files


Can anyone point me to instruction/advice about
opening and reading files that are not plain text:

word processing docs, pdf, ps, image files,
even complied code.

I am writing a search function to search file systems
and don't know a lot about the formatting of non plain
text files.

The immediate concern is line breaks in word
processing docs, pdf and ps files.

Then detecting compiled code files so I can
leave them alone. This type of file might not
have a suffix to consider.

Then the various image files that might be

Even suffixes aren't a guarantee of the content.


Jeff K.