PDF password cracking using Python
- Balaji
- Jan 31, 2021
- 4 min read
Preface: Python learning
I've been trying to learn Python for the past few months. I've been In and out. Dedicated at least 1 hour a day to learn Python. So why Python, i don't know, all of a sudden a year back i just got interested in a course on Python by Chuck. There i began loving this language. Wow, don't know why it it seems so natural to code in Python. It's some thing like gliding down a fast lane in a highway with no obstacles. So many things that we need to
While learning we do get a feel of the beauty and intricacies of Python when we go along with the Tutorials and walkthroughs. Interesting as it may seem with all the Data manipulation features of the versatile language, where to put it into practice is a question which didn't even come up in mind.
The Opportunity
Finally, a month back i had to open up a password protected PDF file. As usual i tried first at any software that cracks(!) the password, way too costly for a single pdf file. So now i thought lets do it myself... Hmmm where to start, need a PDF library that could handle these. Okay i thought Python be the language and began searching libraries, there on the first few search results i found pikepdf.
The Python library and Cracking methodology
Okay now what? Exploring the documentation found a simple method to pass the password with the open method... wow that was getting interesting. Well, but i don't know the password, i need to guess and enter, and how many i don't know. So the best way out was to brute force or use a dictionary attack.
Brute force attack requires great computing power and might run for days put together. Brute force attack tries all possible combinations for the given password length and combinations. To give you an easily understandable example, assume the password is a 4 digit numeric pin XXXX. This in itself gives us 10^4 combinations which is 10000. Well i'm not mean here, 4 is the length of the password and 10 is the digits which could be anything from 0 to 9. Here we are talking about any number from 0000 to 9999. Think of an alphanumeric password with at least 8 characters. Imagine these sprinkled with special characters. Well this is not for me.
A Dictionary attack is similar to Brute force except that we need not build combinations. We can set the system to pick from an available list that we have or know what the user will be using. The system should be set to pick each of the entries in the dictionary and input as password. It's always advisable to go from Dictionary attack to Brute force.
Another reason i chose dictionary attack was that i know the length of the password and what combination it should be in. In my case it was 11 digits with the first 5 digits being the last digit of mobile number and the rest the Date of Birth. Its pretty easy to construct a list of possible items since i was able to narrow down on the mobile number, the only unknown element was the date of birth. This list in a dictionary also needed some code. Will update how to generate a dictionary file in my next blog.
Python pikepdf + Dictionary Attack:
So we know what we are going to do and know what we want to accomplish it. We need to loop through a dictionary list, fetch each item in the list and try to open the pdf using the extracted item as password. If it succeeds we know the password, else move on to the next item in the dictionary list.
So basically
Create a dictionary list - i will cover this in my next blog. For now type in what you think the password will be in a text file one item per line.
Open the dictionary file, loop through each of the items
Use each item to try and open the pdf, If it opens we are good Else, move to the next item
The code:
import pikepdf from termcolor import colored file = open("pwlistnewidea.txt") for password in file: try: with pikepdf.open("8535470900131102020.pdf",password.strip()) as pdf: print(colored("Password Found: {}".format(password), 'green')) break except: print(colored("Password Elusive: {}".format(password), 'red')) continue
For those new to Python, there are no braces or begin...end for code blocks. Code blocks must be indented by the same amount of indentation
How the code works:
Import the library required to handle pdf files, in our case pikepdf
I've used the colored entity from termcolor module to print ANSI colored output to the terminal
Open the dictionary file: pwlistnewidea.txt in my case, which contains a generated set of possible passwords, each item in one line.
The next is the for loop which reads line by line and assigns the value to password
We try to open the pdf file using password(We strip it to remove leading and trailing spaces in the item)
If its a match print a Success message and end the program, else print failure message and continue with the for loop
See how i've made use of termcolor to print messages in different colors. This is similar to chalk library in NodeJS

This is a simple but real life application of Python code. Though Python isn't specifically used for these activities, see the ease of looping through each item in the file and syntax which is so simple.
Disclaimer:
This is a use case which might occur to you if you forget the password. Illegal or unwarranted use of password cracking may be illegal and you are responsible for what you do :-)
Comments