Huffman Coding in Swift











up vote
1
down vote

favorite












I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.



Here’s an example of how the API is used:



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()



  1. I’d love to learn of anything I’m not doing that would be more swift-like.

  2. I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.

  3. I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.

  4. I’m happy to hear any suggestions on my approach to the algorithm itself

  5. My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?


Thanks for any help!



class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()

init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}

func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}

for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}

static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}

static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}

static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}

static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]

// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}

static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}

}

class Node {
var name: String
var value: Int
var left: Node?
var right: Node?

init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}

extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}









share|improve this question




















  • 1




    Here is an alternative implementation, you might glean some ideas from it.
    – Carpsen90
    Dec 4 at 22:30










  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 5 at 21:07















up vote
1
down vote

favorite












I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.



Here’s an example of how the API is used:



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()



  1. I’d love to learn of anything I’m not doing that would be more swift-like.

  2. I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.

  3. I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.

  4. I’m happy to hear any suggestions on my approach to the algorithm itself

  5. My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?


Thanks for any help!



class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()

init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}

func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}

for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}

static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}

static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}

static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}

static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]

// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}

static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}

}

class Node {
var name: String
var value: Int
var left: Node?
var right: Node?

init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}

extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}









share|improve this question




















  • 1




    Here is an alternative implementation, you might glean some ideas from it.
    – Carpsen90
    Dec 4 at 22:30










  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 5 at 21:07













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.



Here’s an example of how the API is used:



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()



  1. I’d love to learn of anything I’m not doing that would be more swift-like.

  2. I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.

  3. I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.

  4. I’m happy to hear any suggestions on my approach to the algorithm itself

  5. My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?


Thanks for any help!



class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()

init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}

func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}

for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}

static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}

static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}

static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}

static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]

// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}

static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}

}

class Node {
var name: String
var value: Int
var left: Node?
var right: Node?

init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}

extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}









share|improve this question















I’m learning Swift and decided the Huffman Coding algorithm would be a good exercise while learning this new language. Below is a working version that encodes a string and decodes a string.



Here’s an example of how the API is used:



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")
let decode = huffEncoded.decode()



  1. I’d love to learn of anything I’m not doing that would be more swift-like.

  2. I’m wondering if my choice of using a class with a lot of static functions is a good approach or if a struct would be better. They seem like similar solutions but I’m not 100% sure the real difference past structs are value types and classes are reference types.

  3. I'm also not confident in how I used GCD, but I think that using a concurrent queue will be helpful if encoding or decoding a large string since this isn't updating a view.

  4. I’m happy to hear any suggestions on my approach to the algorithm itself

  5. My next step is to make this into a framework so it can be used by other code. Does my approach scale for this?


Thanks for any help!



class Huffman {
private var key = [String: String]()
private(set) var code = [String]()
lazy private var queue: DispatchQueue = {
return DispatchQueue(label: "huffman-queue", qos: .userInitiated, attributes: .concurrent)
}()

init(_ input: String) {
self.key = Huffman.getKey(for: input)
self.code = Huffman.encode(for: input, with: self.key)
}

func decode() -> String {
var word = ""
queue.sync { [unowned self] in
var reverseKey = [String:String]()
for (k, v) in self.key {
reverseKey[v] = k
}

for prefix in self.code {
if let letter = reverseKey[prefix] {
word += letter
}
}
}
return word
}

static private func getKey(for input: String) -> [String: String] {
// sort letter frequency by decreasing count
let sortedFrequency = Array(input)
.reduce(into: [String: Int](), { freq, char in
let letter = String(char)
return freq[letter] = freq[letter, default: 0] + 1
})
.sorted(by: {$0.1 > $1.1})
// create queue of initial Nodes
let queue = sortedFrequency.map{ Node(name: $0.key, value: $0.value)}
// generate key by traversing tree
return Huffman.generateKey(for: Huffman.createTree(with: queue), prefix: "")
}

static private func encode(for input: String, with key: [String: String]) -> [String] {
var code = [String]()
let queue = DispatchQueue(label: "huffman-encode-queue", qos: .userInitiated, attributes: .concurrent)
queue.sync {
for char in input {
if let prefix = key[String(char)] {
code.append(prefix)
}
}
}
return code
}

static private func generateKey(for node: Node, prefix: String) -> [String: String] {
var key = [String: String]()
if let left = node.left, let right = node.right {
key.merge(generateKey(for: left, prefix: prefix + "0"), uniquingKeysWith: {current,_ in current})
key.merge(generateKey(for: right, prefix: prefix + "1"), uniquingKeysWith: {current,_ in current})
}else {
key[node.name] = prefix
}
return key
}

static private func createTree(with queue: [Node]) -> Node {
var queue = queue
// until we have 1 root node, get subtree of least frequency
while queue.count > 1 {
let last = queue.count - 1
let node1 = queue[last]
let node2 = queue[last - 1]

// if we have a third then compare frequency to second
if let node3 = queue[safe: last - 2], node3.value + node2.value < node2.value + node1.value {
queue.remove(at: last - 1)
queue.remove(at: last - 2)
queue.append(Huffman.createRoot(with: node2, and: node3))
} else {
queue.removeLast()
queue.removeLast()
queue.append(Huffman.createRoot(with: node1, and: node2))
}
}
return queue[0]
}

static private func createRoot(with first: Node, and second: Node) -> Node {
return Node(name: "(first.name)(second.name)", value: first.value + second.value, left: first, right: second)
}

}

class Node {
var name: String
var value: Int
var left: Node?
var right: Node?

init(name: String, value: Int, left: Node? = nil, right: Node? = nil) {
self.name = name
self.value = value
self.left = left
self.right = right
}
}

extension Collection {
subscript (safe index: Index) -> Element? {
return indices.contains(index) ? self[index] : nil
}
}






algorithm swift compression






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 5 at 21:07









Sᴀᴍ Onᴇᴌᴀ

8,11461751




8,11461751










asked Dec 4 at 19:10









Turnipdabeets

1265




1265








  • 1




    Here is an alternative implementation, you might glean some ideas from it.
    – Carpsen90
    Dec 4 at 22:30










  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 5 at 21:07














  • 1




    Here is an alternative implementation, you might glean some ideas from it.
    – Carpsen90
    Dec 4 at 22:30










  • Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
    – Sᴀᴍ Onᴇᴌᴀ
    Dec 5 at 21:07








1




1




Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30




Here is an alternative implementation, you might glean some ideas from it.
– Carpsen90
Dec 4 at 22:30












Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07




Please do not update the code in your question to incorporate feedback from answers, doing so goes against the Question + Answer style of Code Review. This is not a forum where you should keep the most updated version in your question. Please see what you may and may not do after receiving answers.
– Sᴀᴍ Onᴇᴌᴀ
Dec 5 at 21:07










1 Answer
1






active

oldest

votes

















up vote
1
down vote



accepted










Concurrency



In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.



Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).



Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.



The API



In its current form, the class seems of limited use to me.



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")


builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to




  • retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,

  • retrieve the compressed string as a sequence of zeros and ones,

  • an initializer which takes a (previously created) Huffman tree,

  • a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.


Simplifications and other remarks



All properties in class Node are never mutated, and can be declared as constants (with let).



In func decode() the reverse dictionary mapping can be created with



let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))


instead of a loop. The following loop can be more simply done with compactMap():



let word = code.compactMap({ reverseKey[$0] }).joined()


In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:



let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})


In func encode(), compactMap() can be used again instead of a for loop:



let code = input.compactMap( { key[String($0)] } )


In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.



The Huffman tree algorithm



Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes



"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001


with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.






share|improve this answer























  • Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
    – Turnipdabeets
    Dec 4 at 21:47










  • @Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
    – Martin R
    Dec 4 at 22:22










  • Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
    – Turnipdabeets
    Dec 5 at 15:50












  • I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
    – Turnipdabeets
    Dec 5 at 15:51











Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209018%2fhuffman-coding-in-swift%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
1
down vote



accepted










Concurrency



In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.



Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).



Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.



The API



In its current form, the class seems of limited use to me.



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")


builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to




  • retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,

  • retrieve the compressed string as a sequence of zeros and ones,

  • an initializer which takes a (previously created) Huffman tree,

  • a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.


Simplifications and other remarks



All properties in class Node are never mutated, and can be declared as constants (with let).



In func decode() the reverse dictionary mapping can be created with



let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))


instead of a loop. The following loop can be more simply done with compactMap():



let word = code.compactMap({ reverseKey[$0] }).joined()


In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:



let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})


In func encode(), compactMap() can be used again instead of a for loop:



let code = input.compactMap( { key[String($0)] } )


In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.



The Huffman tree algorithm



Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes



"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001


with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.






share|improve this answer























  • Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
    – Turnipdabeets
    Dec 4 at 21:47










  • @Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
    – Martin R
    Dec 4 at 22:22










  • Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
    – Turnipdabeets
    Dec 5 at 15:50












  • I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
    – Turnipdabeets
    Dec 5 at 15:51















up vote
1
down vote



accepted










Concurrency



In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.



Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).



Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.



The API



In its current form, the class seems of limited use to me.



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")


builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to




  • retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,

  • retrieve the compressed string as a sequence of zeros and ones,

  • an initializer which takes a (previously created) Huffman tree,

  • a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.


Simplifications and other remarks



All properties in class Node are never mutated, and can be declared as constants (with let).



In func decode() the reverse dictionary mapping can be created with



let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))


instead of a loop. The following loop can be more simply done with compactMap():



let word = code.compactMap({ reverseKey[$0] }).joined()


In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:



let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})


In func encode(), compactMap() can be used again instead of a for loop:



let code = input.compactMap( { key[String($0)] } )


In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.



The Huffman tree algorithm



Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes



"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001


with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.






share|improve this answer























  • Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
    – Turnipdabeets
    Dec 4 at 21:47










  • @Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
    – Martin R
    Dec 4 at 22:22










  • Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
    – Turnipdabeets
    Dec 5 at 15:50












  • I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
    – Turnipdabeets
    Dec 5 at 15:51













up vote
1
down vote



accepted







up vote
1
down vote



accepted






Concurrency



In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.



Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).



Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.



The API



In its current form, the class seems of limited use to me.



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")


builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to




  • retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,

  • retrieve the compressed string as a sequence of zeros and ones,

  • an initializer which takes a (previously created) Huffman tree,

  • a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.


Simplifications and other remarks



All properties in class Node are never mutated, and can be declared as constants (with let).



In func decode() the reverse dictionary mapping can be created with



let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))


instead of a loop. The following loop can be more simply done with compactMap():



let word = code.compactMap({ reverseKey[$0] }).joined()


In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:



let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})


In func encode(), compactMap() can be used again instead of a for loop:



let code = input.compactMap( { key[String($0)] } )


In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.



The Huffman tree algorithm



Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes



"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001


with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.






share|improve this answer














Concurrency



In the encode and decode method you use queue.sync() to dispatch code to a concurrent queue – which means that the current queue is blocked until the computation is finished. If that is the main queue in an iOS or macOS application, UI updates and event handling are blocked for that time.



Therefore using a dispatch queue does not really help here. Also the Huffman encoder cannot know in which context it is executed (such as “user initiated”).



Dispatching the code with GCD – if necessary – should be done by the caller of these methods, not in the Huffman class itself.



The API



In its current form, the class seems of limited use to me.



let huffEncoded = Huffman("MISSISSIPPI_RIVER!")


builds the Huffman tree and encodes the given string and stores the results in private member variables. The only thing that I can do with the result is to decode it again. What I am missing are methods to




  • retrieve the generated Huffman tree so that it can be stored (perhaps in some serialized form) for later decompression,

  • retrieve the compressed string as a sequence of zeros and ones,

  • an initializer which takes a (previously created) Huffman tree,

  • a decode methods which takes a previously compressed string and decodes it, using the given Huffman tree.


Simplifications and other remarks



All properties in class Node are never mutated, and can be declared as constants (with let).



In func decode() the reverse dictionary mapping can be created with



let reverseKey = Dictionary(uniqueKeysWithValues: zip(key.values, key.keys))


instead of a loop. The following loop can be more simply done with compactMap():



let word = code.compactMap({ reverseKey[$0] }).joined()


In func getKey() it is not necessary to create an array from the given string. The return in the closure of the reduce() call is misleading: An assignment does not return anything. += can be used for the increment. Using $n.value instead of $n.1$ in the sort predicate emphasizes that the dictionary is sorted according to its values:



let sortedFrequency = input.reduce(into: [String: Int](), { freq, char in
freq[String(char), default: 0] += 1
})
.sorted(by: {$0.value > $1.value})


In func encode(), compactMap() can be used again instead of a for loop:



let code = input.compactMap( { key[String($0)] } )


In func createTree() a “safe subscript” method (defined as an extension
of the Collection protocol) is used to access the third last queue element if it exists. Testing if queue.count >= 3 would seem more clear to me.



The Huffman tree algorithm



Your algorithm does not generate the optimal code. The reason is that at each step it only considers the last two or three nodes, not the two nodes with the minimal total weight. Here is an example: For the string
"ABCDEFGH" (i.e. 8 distinct characters with equal freqency) your
program generates the codes



"D": 01
"G": 11
"F": 001
"E": 101
"A": 0001
"C": 0000
"H": 1000
"B": 1001


with an average code length of 26/8 = 3.25. (Your output can be different because hash values are randomized since Swift 4.2.) The optimal tree in this case would be a balanced binary tree, where every code has length 3.







share|improve this answer














share|improve this answer



share|improve this answer








edited Dec 4 at 22:53

























answered Dec 4 at 21:37









Martin R

15.4k12264




15.4k12264












  • Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
    – Turnipdabeets
    Dec 4 at 21:47










  • @Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
    – Martin R
    Dec 4 at 22:22










  • Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
    – Turnipdabeets
    Dec 5 at 15:50












  • I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
    – Turnipdabeets
    Dec 5 at 15:51


















  • Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
    – Turnipdabeets
    Dec 4 at 21:47










  • @Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
    – Martin R
    Dec 4 at 22:22










  • Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
    – Turnipdabeets
    Dec 5 at 15:50












  • I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
    – Turnipdabeets
    Dec 5 at 15:51
















Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47




Thanks! I need to spend some time thinking through all your suggestions. Wouldn't the last two or three nodes always be of minimal total weight if I've sorted in that order?
– Turnipdabeets
Dec 4 at 21:47












@Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22




@Turnipdabeets: Here is an example: You start with weights 1, 1, 1, 1, 1, 1. Then the last two are joined: 1, 1, 1, 1, 2. Then 1, 1, 2, 2. Now the first two should be joined, but your algorithm looks only at the last three. – Also for nodes with equal weight the one with lower height should be chosen.
– Martin R
Dec 4 at 22:22












Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50






Thanks for that example. I’ll rework on the weights. Can you elaborate on the API improvements? An initializer that takes a previously created Huffman tree is a great idea. But is it better to retrieve and initialize with the tree or the key that I generate from the tree? My thought was that the key is O(n) look up so better than the tree.
– Turnipdabeets
Dec 5 at 15:50














I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51




I was also thinking maybe the encrypted code should contain the tree/key rather than separate them… maybe as the first element in the array for code or maybe return a struct with these two values. What are your thoughts about keeping the values together like that? The instance var code is readable but its an array of zeros and ones. Is there a benefit of having a string of zeros and ones vs an array? I’ll have to google around for compressing the string or array of strings.
– Turnipdabeets
Dec 5 at 15:51


















draft saved

draft discarded




















































Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f209018%2fhuffman-coding-in-swift%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Terni

A new problem with tex4ht and tikz

Sun Ra