To start this assignment, download this zip file.
The following guide pages cover material needed for this assignment:
Project 5 - Cipher
Background
Julius Caesar was known (among many things) for encoding important military instructions using what became known as the Caesar Cipher.
The Caesar Cipher was a simple mapping from one letter of the alphabet to another: A -> D, B -> E, C -> F, etc.
A cipher that performs a simple character substitution is called a simple substitution cipher. Many other simple substitution ciphers have since been invented, each one using a different strategy for coming up with the letter mappings.
Text that has not been encoded is called plaintext. Text that has been encoded is called ciphertext.
These ciphers are easy to generate and easy to use. They are also fairly easy to break.
One strategy for breaking a simple substitution cipher is to do a character frequency analysis.
In regular English text, each letter is used at a fairly consistent frequency. By analyzing how frequent a letter is in plaintext and ciphertext, you can create reasonable predictions for which characters map to which.
Instructions
In this project, you will implement a simple substitution cipher using a randomized alphabet. You will use that cipher to encode and decode some messages. Finally, you will perform a character frequency analysis on encoded and decoded text as a first step towards breaking the cipher.
1. build_cipher.py
Write a program called build_cipher.py
that generates a randomized simple substitution cipher which it saves to CSV file.
The program should have the following command-line arguments:
- The name of the CSV file in which to save the cipher
The output file should have the following format:
- each line is a mapping from one letter to another, separated by a comma (no spaces)
For example:
a,x
b,y
c,z
TIPS
- Use
random.sample(alphabet, len(alphabet))
to generate a randomized version of the alphabet - You can
import string
and then usealphabet = string.ascii_lowercase
- Or you can simply hardcode the alphabet:
alphabet = 'abcdefg...'
- Or you can simply hardcode the alphabet:
You can test your file using:
$ python build_cipher.py my_cipher.csv
2. encode.py
Write a program named encode.py
that takes
- A CSV file containing a simple substitution cipher (in the format described above)
- An input file containing plaintext to encode
- An output file to write with the ciphertext
This program then encodes the input file and writes the output file using the provided cipher.
Characters that are not in the cipher should be preserved.
Preserve casing of the input characters; for example, i -> x
and I -> X
.
You can test your file using:
$ python encode.py test_files/cipher.csv test_files/message.txt my_encoded.txt
3. decode.py
Write a program named decode.py
that takes
- A CSV file containing a simple substitution cipher (in the format described above)
- An input file containing ciphertext to decode
- An output file to write with the decoded plaintext
This program then decodes the input file and writes the output file using the provided cipher.
As with encode.py
, preserve casing and characters not found in the cipher.
NOTE
The cipher CSV is the same one as used by encode.py
.
To decode using this cipher, the cipher will need to be inverted.
You can test your file using:
$ python decode.py test_files/cipher.csv test_files/message.encoded.expected.txt my_decoded.txt
4. frequency.py
Write a program named frequency.py
that performs a character-frequency analysis on an input file.
Every distinct character in the file should be counted, and the final counts for each character should be divided by the total number of characters in the file. Also convert any uppercase characters to lowercase.
The frequencies should be rounded to 3 decimal places.
The program should print a dictionary mapping each character to its associated frequency.
Example
For a file containing
a
BbB
ccc
The frequencies should be
{'a': 0.111, 'b': 0.333, 'c': 0.333, '\n': 0.222}
Another file might have
a
bb
The frequencies should be
{'a': 0.25, 'b': 0.5, '\n': 0.25}
You can test your file by creating your own file with a few characters:
$ python frequency.py my_short_file.txt
Testing
The test_files
folder contains several files you can use to test your code.
These files are used by the autograder.
You are also encouraged to use your own files to test these programs.
- Generate a cipher CSV
- Encode and decode custom messages
- Perform your own frequency analysis
Grading
Activity | Points |
---|---|
build_cipher.py | 10 |
encode.py | 30 |
decode.py | 30 |
frequency.py | 30 |
Just for fun
More ciphers
There are many other ways to generate a simple substitution cipher. Which ones can you implement?
For example, the ROT ciphers (e.g. ROT13) use a rotation to generate the cipher, and you only need the rotation factor to decode the message.
Crack the code
Can you use the character frequency analysis to derive the cipher from a ciphertext?
Try building a custom cipher CSV by hand based on the frequencies you see. How well does the custom cipher decode the message?
Here is a ciphertext you can practice on:
Bmansopvwopimah! Gmv uorz tiaihuzd puz bicuzs csmfzbp, puz wohp csmfzbp mt BH110.
Lz umcz gmv uorz zafmgzd puih bwohh.
Oad xmsz ixcmspoapwg, lz umcz gmv uorz nsmla tsmx puih zkczsizabz.
Lz nirz gmv hmxz cospian lmsdh mt mvs csmcuzp pm bmahidzs:
I cwzod lipu gmv pm poez buosnz mt gmvs pzhpixmag mt Fzhvh Busihp.
Lmse tms ip. Mla ip. Bosz tms ip.
Avspvsz ip hm puop ip liww nsml.
Puza lopbu tms xisobwzh pm uoccza ia gmvs witz.
--Svhhzww X. Azwhma
Oh gmv bmxz vapm Busihp liww oww gmvs uzosp wmrz Uix ia toipu,
Uz liww ywzhh gmv oad czstmsx xisobwzh ia gmvs witz.
Xog gmv uorz Uih ywzhhianh lipu gmv ia gmvs tvpvsz zadzormsh.
Hiabzszwg, BH 110 iahpsvbpmsh oad POh