Huffman Coding in Swift
up vote
1
down vote
favorite
I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.
Here’s an example of how the API is used:
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()
- I’d love to learn of anything I’m not doing that would be more swift-like.
- I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.
- I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.
- I’m happy to hear any suggestions on my approach to the algorithm itself
- My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?
Thanks for any help!
class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()
init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}
func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}
for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}
static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}
static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}
static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}
static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]
// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}
static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}
}
class Node {
var name: String
var value: Int
var left: Node?
var right: Node?
init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}
extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}
algorithm swift compression
add a comment |
up vote
1
down vote
favorite
I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.
Here’s an example of how the API is used:
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()
- I’d love to learn of anything I’m not doing that would be more swift-like.
- I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.
- I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.
- I’m happy to hear any suggestions on my approach to the algorithm itself
- My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?
Thanks for any help!
class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()
init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}
func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}
for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}
static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}
static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}
static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}
static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]
// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}
static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}
}
class Node {
var name: String
var value: Int
var left: Node?
var right: Node?
init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}
extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}
algorithm swift compression
1
Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.
Here’s an example of how the API is used:
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()
- I’d love to learn of anything I’m not doing that would be more swift-like.
- I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.
- I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.
- I’m happy to hear any suggestions on my approach to the algorithm itself
- My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?
Thanks for any help!
class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()
init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}
func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}
for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}
static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}
static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}
static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}
static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]
// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}
static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}
}
class Node {
var name: String
var value: Int
var left: Node?
var right: Node?
init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}
extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}
algorithm swift compression
I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.
Here’s an example of how the API is used:
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()
- I’d love to learn of anything I’m not doing that would be more swift-like.
- I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.
- I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.
- I’m happy to hear any suggestions on my approach to the algorithm itself
- My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?
Thanks for any help!
class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()
init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}
func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}
for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}
static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}
static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}
static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}
static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]
// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}
static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}
}
class Node {
var name: String
var value: Int
var left: Node?
var right: Node?
init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}
extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}
algorithm swift compression
algorithm swift compression
edited Dec 5 at 21:07
Sᴀᴍ Onᴇᴌᴀ
8,11461751
8,11461751
asked Dec 4 at 19:10
Turnipdabeets
1265
1265
1
Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07
add a comment |
1
Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07
1
1
Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30
Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07
add a comment |
1 Answer
1
active
oldest
votes
up vote
1
down vote
accepted
Concurrency
In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.
Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).
Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.
The API
In its current form, the class seems of limited use to me.
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to
- retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,
- retrieve the compressed string as a sequence of zeros and ones,
- an initializer which takes a (previously created) Huffman tree,
- a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.
Simplifications and other remarks
All properties in class Node are never mutated, and can be declared as constants (with let).
In func decode() the reverse dictionary mapping can be created with
let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))
instead of a loop. The following loop can be more simply done with compactMap():
let word = code.compactMap({ reverseKey[$0] }).joined()
In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:
let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})
In func encode(), compactMap() can be used again instead of a for loop:
let code = input.compactMap( { key[String($0)] } )
In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.
The Huffman tree algorithm
Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes
"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001
with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
@Turnipdabeets: Here is an example: You start with weights1, 1, 1, 1, 1, 1. Then the last two are joined:1, 1, 1, 1, 2. Then1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array forcodeor maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance varcodeis readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
accepted
Concurrency
In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.
Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).
Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.
The API
In its current form, the class seems of limited use to me.
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to
- retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,
- retrieve the compressed string as a sequence of zeros and ones,
- an initializer which takes a (previously created) Huffman tree,
- a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.
Simplifications and other remarks
All properties in class Node are never mutated, and can be declared as constants (with let).
In func decode() the reverse dictionary mapping can be created with
let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))
instead of a loop. The following loop can be more simply done with compactMap():
let word = code.compactMap({ reverseKey[$0] }).joined()
In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:
let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})
In func encode(), compactMap() can be used again instead of a for loop:
let code = input.compactMap( { key[String($0)] } )
In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.
The Huffman tree algorithm
Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes
"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001
with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
@Turnipdabeets: Here is an example: You start with weights1, 1, 1, 1, 1, 1. Then the last two are joined:1, 1, 1, 1, 2. Then1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array forcodeor maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance varcodeis readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51
add a comment |
up vote
1
down vote
accepted
Concurrency
In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.
Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).
Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.
The API
In its current form, the class seems of limited use to me.
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to
- retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,
- retrieve the compressed string as a sequence of zeros and ones,
- an initializer which takes a (previously created) Huffman tree,
- a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.
Simplifications and other remarks
All properties in class Node are never mutated, and can be declared as constants (with let).
In func decode() the reverse dictionary mapping can be created with
let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))
instead of a loop. The following loop can be more simply done with compactMap():
let word = code.compactMap({ reverseKey[$0] }).joined()
In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:
let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})
In func encode(), compactMap() can be used again instead of a for loop:
let code = input.compactMap( { key[String($0)] } )
In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.
The Huffman tree algorithm
Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes
"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001
with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
@Turnipdabeets: Here is an example: You start with weights1, 1, 1, 1, 1, 1. Then the last two are joined:1, 1, 1, 1, 2. Then1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array forcodeor maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance varcodeis readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51
add a comment |
up vote
1
down vote
accepted
up vote
1
down vote
accepted
Concurrency
In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.
Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).
Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.
The API
In its current form, the class seems of limited use to me.
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to
- retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,
- retrieve the compressed string as a sequence of zeros and ones,
- an initializer which takes a (previously created) Huffman tree,
- a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.
Simplifications and other remarks
All properties in class Node are never mutated, and can be declared as constants (with let).
In func decode() the reverse dictionary mapping can be created with
let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))
instead of a loop. The following loop can be more simply done with compactMap():
let word = code.compactMap({ reverseKey[$0] }).joined()
In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:
let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})
In func encode(), compactMap() can be used again instead of a for loop:
let code = input.compactMap( { key[String($0)] } )
In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.
The Huffman tree algorithm
Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes
"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001
with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.
Concurrency
In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.
Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).
Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.
The API
In its current form, the class seems of limited use to me.
let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to
- retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,
- retrieve the compressed string as a sequence of zeros and ones,
- an initializer which takes a (previously created) Huffman tree,
- a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.
Simplifications and other remarks
All properties in class Node are never mutated, and can be declared as constants (with let).
In func decode() the reverse dictionary mapping can be created with
let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))
instead of a loop. The following loop can be more simply done with compactMap():
let word = code.compactMap({ reverseKey[$0] }).joined()
In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:
let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})
In func encode(), compactMap() can be used again instead of a for loop:
let code = input.compactMap( { key[String($0)] } )
In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.
The Huffman tree algorithm
Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes
"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001
with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.
edited Dec 4 at 22:53
answered Dec 4 at 21:37
Martin R
15.4k12264
15.4k12264
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
@Turnipdabeets: Here is an example: You start with weights1, 1, 1, 1, 1, 1. Then the last two are joined:1, 1, 1, 1, 2. Then1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array forcodeor maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance varcodeis readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51
add a comment |
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
@Turnipdabeets: Here is an example: You start with weights1, 1, 1, 1, 1, 1. Then the last two are joined:1, 1, 1, 1, 2. Then1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array forcodeor maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance varcodeis readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47
@Turnipdabeets: Here is an example: You start with weights
1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.– Martin R
Dec 4 at 22:22
@Turnipdabeets: Here is an example: You start with weights
1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.– Martin R
Dec 4 at 22:22
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for
code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.– Turnipdabeets
Dec 5 at 15:51
I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for
code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.– Turnipdabeets
Dec 5 at 15:51
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209018%2fhuffman-coding-in-swift%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30
Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07