Huffman Encoding¶
This module implements functionalities relating to Huffman encoding and decoding.
AUTHOR:
- Nathann Cohen (2010-05): initial version.
Classes and functions¶
-
class
sage.coding.source_coding.huffman.
Huffman
(source)¶ Bases:
sage.structure.sage_object.SageObject
This class implements the basic functionalities of Huffman codes.
It can build a Huffman code from a given string, or from the information of a dictionary associating to each key (the elements of the alphabet) a weight (most of the time, a probability value or a number of occurrences).
INPUT:
source
– can be either- A string from which the Huffman encoding should be created.
- A dictionary that associates to each symbol of an alphabet a numeric
value. If we consider the frequency of each alphabetic symbol, then
source
is considered as the frequency table of the alphabet with each numeric (non-negative integer) value being the number of occurrences of a symbol. The numeric values can also represent weights of the symbols. In that case, the numeric values are not necessarily integers, but can be real numbers.
In order to construct a Huffman code for an alphabet, we use exactly one of the following methods:
- Let
source
be a string of symbols over an alphabet and feedsource
to the constructor of this class. Based on the input string, a frequency table is constructed that contains the frequency of each unique symbol insource
. The alphabet in question is then all the unique symbols insource
. A significant implication of this is that any subsequent string that we want to encode must contain only symbols that can be found insource
. - Let
source
be the frequency table of an alphabet. We can feed this table to the constructor of this class. The tablesource
can be a table of frequencies or a table of weights.
Examples:
sage: from sage.coding.source_coding.huffman import Huffman, frequency_table sage: h1 = Huffman("There once was a french fry") sage: for letter, code in h1.encoding_table().iteritems(): ....: print("'{}' : {}".format(letter, code)) 'a' : 0111 ' ' : 00 'c' : 1010 'e' : 100 'f' : 1011 'h' : 1100 'o' : 11100 'n' : 1101 's' : 11101 'r' : 010 'T' : 11110 'w' : 11111 'y' : 0110
We can obtain the same result by “training” the Huffman code with the following table of frequency:
sage: ft = frequency_table("There once was a french fry"); ft {' ': 5, 'T': 1, 'a': 2, 'c': 2, 'e': 4, 'f': 2, 'h': 2, 'n': 2, 'o': 1, 'r': 3, 's': 1, 'w': 1, 'y': 1} sage: h2 = Huffman(ft)
Once
h1
has been trained, and hence possesses an encoding table, it is possible to obtain the Huffman encoding of any string (possibly the same) using this code:sage: encoded = h1.encode("There once was a french fry"); encoded '11110110010001010000111001101101010000111110111111010001110010110101001101101011000010110100110'
We can decode the above encoded string in the following way:
sage: h1.decode(encoded) 'There once was a french fry'
Obviously, if we try to decode a string using a Huffman instance which has been trained on a different sample (and hence has a different encoding table), we are likely to get some random-looking string:
sage: h3 = Huffman("There once were two french fries") sage: h3.decode(encoded) ' wehnefetrhft ne ewrowrirTc'
This does not look like our original string.
Instead of using frequency, we can assign weights to each alphabetic symbol:
sage: from sage.coding.source_coding.huffman import Huffman sage: T = {"a":45, "b":13, "c":12, "d":16, "e":9, "f":5} sage: H = Huffman(T) sage: L = ["deaf", "bead", "fab", "bee"] sage: E = [] sage: for e in L: ....: E.append(H.encode(e)) ....: print(E[-1]) 111110101100 10111010111 11000101 10111011101 sage: D = [] sage: for e in E: ....: D.append(H.decode(e)) ....: print(D[-1]) deaf bead fab bee sage: D == L True
-
decode
(string)¶ Decode the given string using the current encoding table.
INPUT:
string
– a string of Huffman encodings.
OUTPUT:
- The Huffman decoding of
string
.
EXAMPLES:
This is how a string is encoded and then decoded:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: encoded = h.encode(str); encoded '00000110100010101011000011101010011100101010011011011100111101110010110100001011011111000001110101010001010110011010111111011001110100101000111110010011011100101011100000110001100101000101110101111101110110011000101011000111111101101111010010111001110100011' sage: h.decode(encoded) 'Sage is my most favorite general purpose computer algebra system'
TESTS:
Of course, the string one tries to decode has to be a binary one. If not, an exception is raised:
sage: h.decode('I clearly am not a binary string') Traceback (most recent call last): ... ValueError: Input must be a binary string.
-
encode
(string)¶ Encode the given string based on the current encoding table.
INPUT:
string
– a string of symbols over an alphabet.
OUTPUT:
- A Huffman encoding of
string
.
EXAMPLES:
This is how a string is encoded and then decoded:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: encoded = h.encode(str); encoded '00000110100010101011000011101010011100101010011011011100111101110010110100001011011111000001110101010001010110011010111111011001110100101000111110010011011100101011100000110001100101000101110101111101110110011000101011000111111101101111010010111001110100011' sage: h.decode(encoded) 'Sage is my most favorite general purpose computer algebra system'
-
encoding_table
()¶ Returns the current encoding table.
INPUT:
- None.
OUTPUT:
- A dictionary associating an alphabetic symbol to a Huffman encoding.
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: T = sorted(h.encoding_table().items()) sage: for symbol, code in T: ....: print("{} {}".format(symbol, code)) 101 S 00000 a 1101 b 110001 c 110000 e 010 f 110010 g 0001 i 10000 l 10011 m 0011 n 110011 o 0110 p 0010 r 1111 s 1110 t 0111 u 10001 v 00001 y 10010
-
tree
()¶ Returns the Huffman tree corresponding to the current encoding.
INPUT:
- None.
OUTPUT:
- The binary tree representing a Huffman code.
EXAMPLES:
sage: from sage.coding.source_coding.huffman import Huffman sage: str = "Sage is my most favorite general purpose computer algebra system" sage: h = Huffman(str) sage: T = h.tree(); T Digraph on 39 vertices sage: T.show(figsize=[20,20]) <BLANKLINE>
-
sage.coding.source_coding.huffman.
frequency_table
(string)¶ Return the frequency table corresponding to the given string.
INPUT:
string
– a string of symbols over some alphabet.
OUTPUT:
- A table of frequency of each unique symbol in
string
. Ifstring
is an empty string, return an empty table.
EXAMPLES:
The frequency table of a non-empty string:
sage: from sage.coding.source_coding.huffman import frequency_table sage: str = "Stop counting my characters!" sage: T = sorted(frequency_table(str).items()) sage: for symbol, code in T: ....: print("{} {}".format(symbol, code)) 3 ! 1 S 1 a 2 c 3 e 1 g 1 h 1 i 1 m 1 n 2 o 2 p 1 r 2 s 1 t 3 u 1 y 1
The frequency of an empty string:
sage: frequency_table("") {}