It’s not often, but there will be situations where you need to extract images from a PDF document. For instance, you might be working on a presentation and need images from a research paper. Maybe you’re a graphic designer and need to reuse logos or images from the client’s PDF brochure. Or perhaps you’re a student creating custom notes and in need of images from the textbook.
Whatever the case/situation may be, saving all images from a PDF document is a simple task. In this tutorial, I will show you how to extract images from a PDF document using pdfcpu, a free and open-source tool. Let’s get started.
Steps to Extract Images from PDF
Since Windows doesn’t have a native option, we will use a free and open-source PDF tool called pdfcpu. It is a powerful command-line tool for PDF processing. Here’s how.
Step 1: Download pdfcpu
First, go to the official pdfcpu’s GitHub page. Scroll down to the Assets section and download the latest “Windows_x86_64.zip” file for 64-bit Windows systems.
Step 2: Extract the downloaded pdfcpu file
Find the downloaded zip file in the Downloads folder, right-click on it, and select “Extract All.”
When prompted, click the “Extract” button. This will extract the zip to a separate folder in the same directory.
(Optional) For ease of use, rename the extracted folder to “pdfcpu.” This is not a necessary step but makes navigating in Command Prompt easy.
Step 3: Open Command Prompt
Press the Start button, search for “Command Prompt,” and click “Open.”
Step 4: Navigate to the pdfcpu directory in Command Prompt
In the Command Prompt, run the following command to go to the pdfcpu directory. Make sure to replace the dummy path with the actual folder path. This makes executing the command easy in the next step.
cd /d “C:\path\to\pdfcpu\folder”
Step 5: Execute pdfcpu command to extract images from PDF
Next, run the following command, replacing the dummy PDF path and dummy output directory path with the actual paths.
pdfcpu extract -mode image “C:\path\to\file.pdf” “C:\path\to\output\folder”
For example, after replacing both dummy paths, the command will look like this:
pdfcpu extract -mode image "D: \WindowsLoop\PDFs\catalogue.pdf"
"D:\WindowsLoop\PDFs\PDF Images"
As soon as you run the command, pdfcpu will extract images from the given PDF file and save them to the output directory.
Step 6: Verify extracted images
To verify the extracted images, open File Explorer (press Start + E) and navigate to the output directory. It should have all the images extracted from the PDF file.
Troubleshooting Steps
Here are some common errors you might encounter while using the pdfcpu tool and how to fix them.
Error: ‘pdfcpu’ is not recognized as an internal or external command, operable program, or batch file
If you see this error, ensure that the directory path where you extracted “pdfcpu” is correct and contains the “pdfcpu.exe” file. To verify, open File Explorer and navigate to the folder. You should see the pdfcpu.exe file in it.
Error: The system cannot find the path specified
If you see the error “The system cannot find the path specified,” it usually means there’s a problem with the paths for the PDF file or the output directory. To fix this, follow these steps:
- First, make sure the path to your PDF file is correct and that the file exists there.
- Next, check if the path for the output directory is correct and that the directory already exists, as “pdfcpu” won’t create it for you
Encrypted PDFs
If the PDF document from which you are trying to extract images is encrypted, you first need to decrypt it. If you try to extract images without decrypting, you will see the error message “pdfcpu: please provide the correct password.” To decrypt a PDF file, run the following command, replacing <user-password>
with the actual password to open the PDF file and the dummy paths for input and output PDFs with the actual paths.
pdfcpu decrypt -upw <user-password> "C:\path\to\input.pdf" "C:\path\to\output.pdf"
After decrypting, follow the steps described in the tutorial to extract images.
pdfcpu is unable to extract images
There are a couple of scenarios where pdfcpu is unable to extract images from a PDF file. They are as follows:
- Unsupported image formats: Some PDFs may include images embedded as vector graphics (e.g., SVG) or in specialized formats like JBIG2 or JPEG2000, which may not be supported by PDF tools such as pdfcpu.
- Embedded objects: If the images are embedded in other objects, such as interactive content, multimedia content, form XObjects, inline images, etc., pdfcpu may ignore them or extract them incorrectly.
Wrapping Up — Saving Images from PDF
As you can see, extracting and saving images from PDF files is easier than you think, thanks to pdfcpu. While using the tool, make sure the paths are correct and the PDF file isn’t encrypted. If you encounter errors, take a look at the troubleshooting section above and it will help you fix the common errors and answer common problems.
If you have any questions or need help, comment below. I will answer.
Related tutorials: