Web lists-archives.com

Re: [OT] scanned files are large in size




On Thu 03 Jan 2019 at 14:07:15 (+0100), Siard wrote:
> David Wright wrote:
> > So I can't understand your objection to wrapping a scanned image into
> > a PDF container, which makes a lot of data handling a lot easier than
> > would otherwise be the case.
> 
> After scanning, an image almost always needs editing. Crop, rotate to
> correct a skew horizon, remove specks, adjust light and contrast.

In that case it sounds as if selecting PDF would be the wrong format
for you to save in. I hope your scanner has a more appropriate choice
available.

> Gimp can open a pdf, but not in its original resolution, so there is
> loss of quality.  Pdfimages can extract the image first, but its
> original format (tiff? jpg? pnm?) remains unclear then, so there is a
> conversion, again causing loss of quality (AFAIU).

Not knowing your model of scanner, I can make no comment. People
presumably investigate how to obtain the highest quality scan from
whichever they buy, and in a format that is appropriate for them.

Here, if I were working directly on the bits of raw image, I would
choose PDF colour uncompressed 600dpi from which pdfimages yields
PPMs (type P6), which are easy to handle, unlike the lossy JPEG.

> > Other examples would be postprocessing with programs like pdftk and
> > pdfjam.
> 
> Those programs cannot edit images.

No, they're really for working at the level of pages. But some of the
things they do can be considered as "editing", like scaling, masking,
watermarking, straightening up (though one could be forgiven for
failing to find that option). These are the sorts of things that
commercial office workers might expect to do. (I haven't bothered
to mention collation, 90° rotations, and so on.) This is likely a much
bigger target for marketing all-in-one devices as well as cheaper
scanners.

I'm guessing that serious image manipulators buy much more versatile
up-market scanners, just as pro digital photographers expect their
cameras to be able to output raw image data. Some of the things they
do could be considered fraudulent in an office environment!

> > An obvious example was already mentioned: put a document into the
> > ADF, press the button, obtain one file containing the entire
> > document. [...] Would you really send a scanned document to a
> > company/institution as a multitude of image attachments instead of
> > a single PDF?
> 
> That should be the final stage of the process, not the beginning!
> You can use img2pdf to put the images in a pdf container, without
> affecting the image quality.

That would be a disaster for office productivity.

Cheers,
David.