A file size is frequently expressed in bytes , kilobytes (kb) or megabytes (mb). A byte generally represents a single character, digit , or symbol (including a space) of data. Each byte is composed of 8 bits . Bits are represented as "0" (off) or "1" (on) and are the simplest unit used for operations in the processes of computers. This is referred to as a "binary" system of representation.- it sometimes helps if you think of a bit as working like an on/off switch or a true/false value. Thus a byte might be represented as 00000000 or 11111111 or any other 8-digit combination of zeroes and ones. The letter "a" happens to be equal to 01100001.
There are 1,024 bytes in a kilobyte and 1,024 kilobytes in a megabyte , thus a 1 kb document would contain 1,024 bytes of data or 1,024 characters of text and other programming information that describes the document's formatting and other characteristics so it can be opened and used by a software application such as a Adobe Acrobat or Microsoft Word.
Images are represented on your screen through pixels or dots of color, but can be created in different formats requiring widely different file sizes for storage. Each image requires a different number of bytes per pixel to define the color and location of each pixel on a screen. Black and white images require less space than grayscale or color due to the number of bytes required to uniquely describe each color. Images can be expressed in many formats and some large file formats such as TIFF images, are "lossless" - that means every pixel (dot of color on your screen) gets its own set of bytes to describe it. Colors and other factors being equal, a 100x100 pixel image (total= 1000 pixels) requires about 10 times the amount of space to store as a 10x10 pixel image (total= 100 pixels). By comparison, a single character of text that occupies a 10X10 pixel space on your screen usually requires only a byte to represent it.
If you place scanned TIFF images inside of a PDF file, you will find that the amount of space required for the new PDF file exceeds amount of space occupied by the TIFF images alone. This is because information is embedded in the PDF file to describe how to view and interpret the TIFF images inside of a PDF viewer, as well as information (metadata) to describe the file itself.
You will find that a full page of electronic text is significantly smaller than a scanned TIFF image of that same text when it is added into a PDF file.
Type of File | Bytes | Kilobytes (kb) | Comments |
---|---|---|---|
TXT (Notepad text) file | 1 | .001 | This is the simplest, smallest file for storing text. No formattting can be preserved other than fixed spacing |
PDF file (converted from Notepad txt file) | 7,076 | 6.910 | The difference in size over the txt file above represents what is added to make it a PDF file. |
Microsoft Word file, single letter "a" | 24,064 | 23.500 | MS Word documents have a lot more formatting and other information embedded into the file than a simple txt file does. Much of this added code would transfer with the file into a PDF conversion. |
TIFF (lossless format) 8 bit 10x10 pixels of letter "a" (same size on screen as original text) | 1,790 | 1.748 | This is the format most scanners generate by default. Because it is "lossless" it can preserve the full size and resolution of the scanned original in each pixel). |
PDF file containing 8 bit TIFF 10x10 of letter "a" (same size on screen as original text) | 10,724 | 10.473 | The difference in size over th TIFF file above represents what is added to make it a PDF file. |
TIFF (lossless format) black and white10x10 pixels of letter "a" (same size on screen as original text) | 270 | .264 | By stripping out colors (reducing colorspace) you can generate a smaller TIFF but you need to remember to tell Adobe to treat it as a black and white image in the PDF to benefit from a reduced file size. |
GIF (a "lossy" format) 10x10 of letter "a" (same size on screen as original text) | 58 | .057 | A format that reduces the number of colors used to no more than 216 and compresses the file by using a form of coded "shorthand" to indicate blocks of similar, adjacent pixels insted of storing information for each pixel individually. A visually complicated GIF may be no smaller than an equivalent TIFF. |
TIFF (lossless format) 100x100 pixels 8 bit color | 2,902 | 2.834 | This is an image file 10X larger than the ones above for comparison purposes. |
TIFF (lossless format)100x100 pixels grayscale | 3,470 | 3.389 | This is an image file 10X larger than the ones above for comparison purposes. |
GIF (a "lossy" format) 100x100 pixels of letter "a" | 754 | .736 | This is an image file 10X larger than the ones above for comparison purposes. |