PDF Processing with Python
Learn how to merge, split, extract text, and manipulate PDFs using Python. Complete guide with practical examples.
Getting Started with PDFs in Python
Python makes it easy to work with PDF files. Whether you need to combine multiple PDFs, extract specific pages, or read text from documents, Python has the tools to help.
pip install PyPDF2
This library provides everything you need to manipulate PDF files programmatically.
Merging Multiple PDFs
Combine several PDF files into one document:
from PyPDF2 import PdfMerger
# Create merger object
merger = PdfMerger()
# Add PDF files
merger.append('document1.pdf')
merger.append('document2.pdf')
merger.append('document3.pdf')
# Save merged PDF
merger.write('merged_document.pdf')
merger.close()
print("✅ PDFs merged successfully!")
Splitting PDFs
Extract specific pages from a PDF:
from PyPDF2 import PdfReader, PdfWriter
# Read the PDF
reader = PdfReader('document.pdf')
# Extract pages 1-3
writer = PdfWriter()
for page_num in range(0, 3):
writer.add_page(reader.pages[page_num])
# Save extracted pages
with open('extracted_pages.pdf', 'wb') as output_file:
writer.write(output_file)
print("📄 Pages extracted successfully!")
Extracting Text from PDFs
Read and extract text content from PDF documents:
from PyPDF2 import PdfReader
# Open PDF
reader = PdfReader('document.pdf')
# Extract text from all pages
text = ""
for page in reader.pages:
text += page.extract_text()
print(text)
# Save to text file
with open('extracted_text.txt', 'w', encoding='utf-8') as f:
f.write(text)
Advanced PDF Operations
- 🔒 Password Protection — Add passwords to secure your PDFs
- 🔄 Rotating Pages — Change page orientation programmatically
- 💧 Adding Watermarks — Brand your documents automatically
- 📊 Extracting Images — Pull images from PDF documents
- ✂️ Cropping Pages — Trim pages to specific dimensions