Web lists-archives.com

Re: [OT] scanned files are large in size




On Fri 04 Jan 2019 at 17:26:07 (+0000), Brian wrote:
> On Wed 02 Jan 2019 at 22:56:22 -0500, kamaraju kusumanchi wrote:
> > On Wed, Jan 2, 2019 at 9:23 PM David Wright <deblis@xxxxxxxxxxxxxxxxx> wrote:
> > > On Wed 02 Jan 2019 at 14:44:14 (+0000), Brian wrote:
> > > >
> > > > I'm intrigued; I hadn't realised that conversion of the scanned image
> > > > for some vendors' devices took place on the device itself. How do you
> > > > know this happens? It is the frontend to SANE (xsane or scanimage, for
> > > > example) which I've always associated with image aquisition conversion.
> > >
> > > It really is rather easy. You insert a USB stick into the scanner,
> > > press scan, and later observe that a JPEG or PDF file has appeared
> > > on the stick, as appropriate.
> > 
> > Yes, that is precisely what I did. Stick a USB into the scanner and
> > press the scan button.
> 
> My HP Envy 4520 has no such button. There is an option for scanning to
> the computer, but software is required on the computer to do that and
> HPLIP does not provide it.
> 
> Anyway, I managed to persuade the device to give me the PDF it would
> have sent to a USB stick if the facility had existed (the device has
> Apple's AirScan). If it matters, the PDF does not have any Creator or
> Publisher information and doesn't contain any embedded or subset fonts.

It sounds as if this is sufficient to make you confident that the
device is doing the conversion and not the computer: anything that
decouples the two from privately passing information to one another
outside the delivered file. A USB stick, or email, is just the most
obvious.

> Scanned at a resolution of 600:
> 
> brian@desktop:~$ pdfimages -list out.pdf
> page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
> --------------------------------------------------------------------------------------------
>    1     0 image    5100  6600  gray    1   8  jpeg   no         1  0   600   600 2090K 6.4%
> 
> ps2pdf reduces the 2090K by about 50% to 1051K.
> 
> A different scanner device and source document, of course, and maybe
> different methods of PDF production, so I wouldn't read too much into
> this.

Proving whether any compression applied is lossless is more difficult
because pdfimages seems mute on what processes were carried out in
extracting an image from the PDF. I have made the assumption that
scanning compressed means that lossy compression is applied whereas
scanning "uncompressed" means that lossless compression is applied.

Cheers,
David.