1. 程式人生 > >Hacking Vigenère Cipher

Hacking Vigenère Cipher

Ciphertext-only Attack to Vigenère Cipher

  Author: Joyce_BY, all rights reserved. Contact by email: [email protected]

Pre-knowledges

What’s vigenere cipher?

The Vigenère cipher is a method of encrypting alphabetic text by using a series of interwoven Caesar ciphers, based on the letters of a keyword. It is a form of

polyalphabetic substitution.

Start

What we have? – A string of nonsense ciphertext What’s our aim? – Cracking the cipher to get sensible plaintext What should we do? – Guess key length and then key

Cryptanalysis

How to guess key length?

There are several ways that can achieve our goals.

The Kasiski examination takes advantage of the fact that repeated words are, by chance, sometimes encrypted using the same key letters, leading to repeated groups in the ciphertext.

If we follow the Kasiski test to guess the key length, here are the steps we should do:

  • find all repeated sequences, record their positions
  • calculate the intervals and factors of each sequence.
  • guess that some common factors may be the true length of the key.

The Friedman test used the index of coincidence, which measures the unevenness of the cipher letter frequencies to break the cipher. By knowing the probability (kp) that any two randomly chosen source language letters are the same (around 0.067 for monocase English), and the probability (kr) of a coincidence for a uniform random selection from the alphabet (0.0385 for randomly selected alphabet from English) The key length can be estimated as the following: (kp-kr)/(ko-kr) from the observed coincidence rate ko = sum(i=0 to c) { ni * (ni-1) } / (N * (N-1)) in which c is the size of the alphabet (26 for English), N is the length of the text and n1 to nc are the observed ciphertext letter frequencies, as integers.

Index of Coincidence A better approach for repeating-key ciphers is to copy the ciphertext into rows of a matrix with as many columns as an assumed key length and then to compute the average index of coincidence with each column considered separately. When that is done for each possible key length, the highest average I.C. then corresponds to the most-likely key length.

How to guess key after having key length?

this is simple. First, notice that if we split the ciphertext into a matrix by key length, so that every col is encrypted by a same key, a letter in Vigenere case. Second, it is important to understand the fact that shifting in text is the same as shifting in freqencies of each letter. Therefore, using the following fomula to calculate Mg for every shift, and the one that is most near 0.065 is the desirable shift, then we get the key for that group. Formula: sum(i=0 to 25)pi*fi; pi is the origin frequency of each letter calculated by former people; fi is the frequency of each alphabet in that group.

coding

What approach did I use?

  • I roughly follow the Friedman test, but used a little trick to simplify the algorithm. What I counted is how many coincidences are there when each time I shift the ciphertext itself. There may be some blank on the start place and some redundant at the tail, just ignore them, what we need is just the middle part to compare with.
  • I used the method I described above to figure out the key, which is introduced int the book Cryptography theory and practice.
  • After getting the key, we follow the method of encrypting substitution cipher to get the plaintext.
    • Dk(ci) = mi - ki (mod 26)

Following is the code

Skeleton:

# !/usr/bin/python
# python 3.7.0
# environment: windows 10
# encode -- UTF-8 --
# authorized by Joyce_BY, all rights reserved.
# contact by email: [email protected]

# decryption function:
def hack_vigenere(ciphertext):
	# guess key length: 
    key_length = get_key_len(ciphertext)
    # guess key by using key length: 
    key = get_key(ciphertext, key_length)
    print('KEY:', ''.join(key))
    # get plaintext:
    chars = [] 
    i = 0 
    for ch in ciphertext: 
        c = chr( (ord(ch) - ord(key[i % key_length]) + 26) % 26 + 65 )
        chars.append(c)
        i += 1
    return ''.join(chars)

if __name__ == '__main__':
    ciphertext = 'KCCPKBGUFDPHQTYAVINRRTMVGRKDNBVFDETDGILTXRGUDDKOTFMBPVGEGLTGCKQRACQCWDNAWCRXIZAKFTLEWRPTYCQKYVXCHKFTPONCQQRHJVAJUWETMCMSPKQDYHJVDAHCTRLSVSKCGCZQQDZXGSFRLSWCWSJTBHAFSIASPRJAHKJRJUMVGKMITZHFPDISPZLVLGWTFPLKKEBDPGCEBSHCTJRWXBAFSPEZQNRWXCVYCGAONWDDKACKAWBBIKFTIOVKCGGHJVLNHIFFSQESVYCLACNVRWBBIREPBBVFEXOSCDYGZWPFDTKFQIYCWHJVLNHIQIBTKHJVNPIST'
    plaintext = hack_vigenere(ciphertext.upper())
    print('The plaintext is:\n', plaintext)

Details in functions:

# find the length of the key: 
def get_key_len(ciphertext): 
    key_length = 1
    maxcount = 0 
    # here we assume the key length is between 1 and 10, 
    # try different shifts and count coincident letter numbers,
    # the shift linked to the maxcount is the most probable key length:
    print('\n|Shifts|Counts|')
    print('---------------')
    for shift in range (1,11): 
        count = 0
        for i in range (shift, len(ciphertext) - shift): 
            if(ciphertext[i] == ciphertext[shift+i]):
                count += 1
        # print shifts and counts, left-align    
        print('|{:<6}|{:<6}|'.format(shift, count))
        # check the max count and linked shift: 
        if count > maxcount: 
            maxcount = count 
            key_length = shift 
    # show maxcount and linked key length: 
    print('maxcount:{}; key length:{}'.format(maxcount, key_length))
    return key_length 
# find key: 
def get_key(ciphertext, key_length): 
    key = []
    # the dist of frequencies of letters in English: 
    alpha_fre =[0.08167,0.01492,0.02782,0.04253,0.12705,0.02228,0.02015,0.06094,0.06996,0.00153,0.00772,0.04025,0.02406,0.06749,0.07507,0.01929,0.0009,0.05987,0.06327,0.09056,0.02758,0.00978,0.02360,0.0015,0.01974,0.00074]
    # get the matrix of the ciphertext: 
    matrix = make_matrix(ciphertext, key_length)
    
    # for each group, calculate the shift: 
    for group in range (0, key_length): 
        gap = 1.0
        key_i = ''
        # calculate f_i: 
        f_i = get_group_fre(matrix, key_length, group)
        # while calculate Mg, compare for shift linked to the narrowest gap: 
        for g in range(0,26):
            Mg_ij = 0.0
            for j in range(0,26): 
                Mg_ij += f_i[j] * alpha_fre[j]
                if (abs(0.065-Mg_ij) < gap): 
                    gap = abs(0.065-Mg_ij) 
                    key_i = chr(g+65) 
            # rotate by 1: 
            f_i = f_i[1:] + f_i[:1]
        key.append(key_i)
    return key

Further details on functions:

def make_matrix(ciphertext, key_length): 
    matrix = []
    # get the group text 
    temp = ciphertext
    while temp: 
        matrix.append(temp[:key_length])
        temp = temp[key_length:]
    return matrix

def get_group_fre(matrix, key_length, group): 
    ch = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]
    fre = []
    group_len = 0
    for row in matrix: 
        if len(row) == key_length or group < len(row): 
            ch[ord(row[group])-65] += 1
            group_len += 1
    for j in range(0,26): 
        fre.append(ch[j]/group_len) 
    return fre 

Running result is shown as follows