Web lists-archives.com

Re: [OT] scanned files are large in size




On Tue, Jan 01, 2019 at 12:34:38PM -0500, kamaraju kusumanchi wrote:
> A scanned document from Canon pixma mx870 printer is significantly
> larger compared to the same document scanned on a different scanner.
> When I look at both the images side by side on a PC, there is no
> visual difference between the two. I am trying to understand the
> underlying cause and fix it if possible.
> 
> As shown below, scanned_in_office.pdf is 332Kb, scanned_on_mx870.pdf is 1.7 Mb.
> 
> % ls -al scanned_in_office.pdf scanned_on_mx870.pdf
> -rw-r--r-- 1 rajulocal rajulocal  331796 Jan  1 11:54 scanned_in_office.pdf
> -rw-r--r-- 1 rajulocal rajulocal 1775460 Jan  1 11:48 scanned_on_mx870.pdf
> 
> Both are are scanned at 600 dpi. The only difference I see is in bpc,
> enc fields.

Yep. The one image is encoded as CCITT (aka Group 4, aka fax [1]), which is
passable for low res B&W images, but not that much for hi-res or color (or
gray scale). It compresses much worse than the other which is JPEG, which is
expressly made for hi-res and color (or grayscale) images.

OTOH, CCITT is lossless and JPEG lossy ;-)

> Questions:
> 1) Does the large file size have anything to do with the printer
> itself? Is there anything I can do (ex:- update the driver/firmware or
> something)?

That depends on what is encoding the images: does the scanner itself
"make" the PDF? Or some software, computer-side?

> 2) Is the difference in image sizes due to the bpc (1 vs. 8) or
> encoding (ccitt vs jped) fields?

CCITT vs JPEG, yes.

> 3) If yes, how to change them?

Hmmm. I don't know yet whether you have to talk to your scanner
or to your scan software...

Cheers

[1] https://en.wikipedia.org/wiki/Group_4_compression
-- tomás

Attachment: signature.asc
Description: Digital signature