Adobe’s Portable Document Format (PDF) is a universal file format for representing documents. These documents can be viewed or printed without regard to the application software, hardware, or operating system used to create the original file. Now managed by ISO, PDF is a widely used document format. They are commonly used for downloadable and printable documents and forms on the Web.
PDF files, in general, can have a relatively straightforward structure. The Header of a PDF document includes metadata such as the file’s creation date, author name, and PDF version. The Body is where all essential information like the text, images, and other media are stored. The Cross-Reference Table is a collection of pointers to locate the individual objects within the Body. PDF files have Trailers that provide details about the file, such as its size, checksum, and storage location.

HEADER SECTION
The “Header” of a PDF file consists of one or two lines that begin with %PDF. For this, you can use either a hex editor or the xxd command:
$ xxd temp.pdf | head -n 1
0000000: 2550 4446 2d31 2e33 0a25 c4e5 f2e5 eba7 %PDF-1.3.%……
BODY SECTION
The “body” section of a PDF file stores its contents as a series of indirect objects. Fonts, pages, and sampled images are all examples of document parts that are represented by objects.
Since PDF version 1.5, the Body can also include object streams, each of which can have its sequence of indirect objects.
CROSS-REFERENCE TABLE SECTION
Critical to any PDF File is the Cross-Reference Table (xref). Readers can find all of the PDF file’s indirect objects using the xref as an index. A single PDF file can store multiple xref tables with incremental saving or linearization.
The data it contains enables random access to indirect objects within the file, eliminating the need to read the entire file in search of a specific one. The table comprises single-line entries per indirect object to specify their byte offsets within the file body.
Here’s what you should get if you open a PDF file in a text editor and run a search for ‘xref’:
xref
0 271
0000000000 65535 f
0000000015 00000 n
0000000102 00000 n
The information about the table entries is given in the first line. In this example, there are 271 entries in the xref table, and all the object numbers begin at 0.
The following lines provide:
- the object offset from file’s beginning
- the generation number (an object may have multiple revisions)
- and a flag to denote whether or not the object is in use through ‘n’ and ‘f’, respectively.
If the objects of an edited PDF have been modified, a new xref table may be appended to the PDF to reflect those changes. This feature means that multiple xref tables can coexist in a single PDF file, with the most recent ones being used.
This table contains an entry for every object in the document, facilitating random, fast access to any object in the file.
TRAILER SECTION
Finally, the “Trailer” section stores the pointer to the Cross-Reference Table’s beginning. The conforming reader should start reading a PDF file from its end to more easily locate the reference table and other unique objects.
Example:
trailer
<< /Size 22 /Root 2 0 R /Info 1 0 R >>
startxref
24212
%%EOF
INCREMENTAL SAVES
PDF is compatible with incremental saves, which refers to the capability of making changes to the file without affecting the data contained within it.
The PDF file maintains the source material’s Header, Body, Cross-Reference Table, and Trailer. It holds additional supplementary sections in the form of a Body and an xref detailing the objects’ modification, removal, and replacement.

PDF file format: Internal Document Structure Explained