Print words in decreasing order of frequency in Scala
Given a string print its words in decreasing order of frequency.
Example:
i/p - "aa bbb ccc aa ddd aa ccc"
o/p - aa,ccc,ddd,bbb
Scala:
import collection.mutable.PriorityQueue
import collection.mutable.HashMap
object FindWordFreq extends App {
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
val words = "aa bbb ccc aa ddd aa ccc" split " +"
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
println(myHeap.dequeueAll.map(_.w).mkString(","))
}
I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.
algorithm strings interview-questions functional-programming scala
add a comment |
Given a string print its words in decreasing order of frequency.
Example:
i/p - "aa bbb ccc aa ddd aa ccc"
o/p - aa,ccc,ddd,bbb
Scala:
import collection.mutable.PriorityQueue
import collection.mutable.HashMap
object FindWordFreq extends App {
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
val words = "aa bbb ccc aa ddd aa ccc" split " +"
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
println(myHeap.dequeueAll.map(_.w).mkString(","))
}
I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.
algorithm strings interview-questions functional-programming scala
add a comment |
Given a string print its words in decreasing order of frequency.
Example:
i/p - "aa bbb ccc aa ddd aa ccc"
o/p - aa,ccc,ddd,bbb
Scala:
import collection.mutable.PriorityQueue
import collection.mutable.HashMap
object FindWordFreq extends App {
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
val words = "aa bbb ccc aa ddd aa ccc" split " +"
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
println(myHeap.dequeueAll.map(_.w).mkString(","))
}
I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.
algorithm strings interview-questions functional-programming scala
Given a string print its words in decreasing order of frequency.
Example:
i/p - "aa bbb ccc aa ddd aa ccc"
o/p - aa,ccc,ddd,bbb
Scala:
import collection.mutable.PriorityQueue
import collection.mutable.HashMap
object FindWordFreq extends App {
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
val words = "aa bbb ccc aa ddd aa ccc" split " +"
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
println(myHeap.dequeueAll.map(_.w).mkString(","))
}
I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.
algorithm strings interview-questions functional-programming scala
algorithm strings interview-questions functional-programming scala
edited 1 hour ago
Jamal♦
30.2k11116226
30.2k11116226
asked Dec 22 at 1:02
vikrant
476
476
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Using just a priority queue
Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.
Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.
Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:
Use UpperCamelCase for class names
Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq
should be WordFreq
.
Use Scala syntax consistently
Scala let's you replace the dot and parentheses with a space in some case. x method param
is equivalent to x.method(param)
. In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"
), sometimes use just part of it c compareTo(that.c)
and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))
). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.
Embrace immutability
Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:
- https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures
- https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html
Prefer Ordering
over Ordered
Ordered
binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering
lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.
So instead of
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
I would write
case class WordFreq(w: String, c: Int)
val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
(x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)
you could make the mostFrequentFirstWordFreqOrdering
implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq
I would prefer to stay explicit and pass the ordering by-hand.
Use sorted/sortBy instead of hand-crafting a heap sort
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
myHeap.dequeueAll
Is basically a heap sort implementation. It could be replaced by the sorted
or the sortBy
methods of SeqLike
. It could look like
wordCount.toSeq.sortBy{case (word, count) => -count}
or if you find the -
and inelegant way of reversing the order, alternatives are:
wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)
or
wordCount.toSeq.sortBy{case (word, count) => count}.reverse
the last one being less efficient in terms of allocations.
countBy
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
can be replaced with
words.groupBy(x => x).mapValues(_.length)
In fact I observe this pattern so often, that in some project I have an "extension method" countBy
added to Traversable like this:
implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
def countBy[K, That](accessor: A => K): Map[K, Int] =
traversable.groupBy(accessor).mapValues(_.length)
}
Separate calculations from making side effects
The line println(myHeap.dequeueAll.map(_.w).mkString(","))
does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w)
and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).
Split by more than just spaces
In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W
is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+"
would be better than " +"
, although most likely there would still be some edge-cases where it would go wrong.
Final result
After all the suggested modifications the code could look like this:
object FindWordFreq extends App {
val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
val wordCounts = words.groupBy(x => x).mapValues(_.length)
val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
println(wordsByDescendingNumberOfOccurrences.mkString(","))
}
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210147%2fprint-words-in-decreasing-order-of-frequency-in-scala%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using just a priority queue
Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.
Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.
Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:
Use UpperCamelCase for class names
Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq
should be WordFreq
.
Use Scala syntax consistently
Scala let's you replace the dot and parentheses with a space in some case. x method param
is equivalent to x.method(param)
. In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"
), sometimes use just part of it c compareTo(that.c)
and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))
). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.
Embrace immutability
Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:
- https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures
- https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html
Prefer Ordering
over Ordered
Ordered
binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering
lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.
So instead of
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
I would write
case class WordFreq(w: String, c: Int)
val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
(x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)
you could make the mostFrequentFirstWordFreqOrdering
implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq
I would prefer to stay explicit and pass the ordering by-hand.
Use sorted/sortBy instead of hand-crafting a heap sort
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
myHeap.dequeueAll
Is basically a heap sort implementation. It could be replaced by the sorted
or the sortBy
methods of SeqLike
. It could look like
wordCount.toSeq.sortBy{case (word, count) => -count}
or if you find the -
and inelegant way of reversing the order, alternatives are:
wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)
or
wordCount.toSeq.sortBy{case (word, count) => count}.reverse
the last one being less efficient in terms of allocations.
countBy
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
can be replaced with
words.groupBy(x => x).mapValues(_.length)
In fact I observe this pattern so often, that in some project I have an "extension method" countBy
added to Traversable like this:
implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
def countBy[K, That](accessor: A => K): Map[K, Int] =
traversable.groupBy(accessor).mapValues(_.length)
}
Separate calculations from making side effects
The line println(myHeap.dequeueAll.map(_.w).mkString(","))
does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w)
and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).
Split by more than just spaces
In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W
is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+"
would be better than " +"
, although most likely there would still be some edge-cases where it would go wrong.
Final result
After all the suggested modifications the code could look like this:
object FindWordFreq extends App {
val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
val wordCounts = words.groupBy(x => x).mapValues(_.length)
val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
println(wordsByDescendingNumberOfOccurrences.mkString(","))
}
add a comment |
Using just a priority queue
Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.
Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.
Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:
Use UpperCamelCase for class names
Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq
should be WordFreq
.
Use Scala syntax consistently
Scala let's you replace the dot and parentheses with a space in some case. x method param
is equivalent to x.method(param)
. In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"
), sometimes use just part of it c compareTo(that.c)
and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))
). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.
Embrace immutability
Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:
- https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures
- https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html
Prefer Ordering
over Ordered
Ordered
binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering
lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.
So instead of
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
I would write
case class WordFreq(w: String, c: Int)
val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
(x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)
you could make the mostFrequentFirstWordFreqOrdering
implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq
I would prefer to stay explicit and pass the ordering by-hand.
Use sorted/sortBy instead of hand-crafting a heap sort
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
myHeap.dequeueAll
Is basically a heap sort implementation. It could be replaced by the sorted
or the sortBy
methods of SeqLike
. It could look like
wordCount.toSeq.sortBy{case (word, count) => -count}
or if you find the -
and inelegant way of reversing the order, alternatives are:
wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)
or
wordCount.toSeq.sortBy{case (word, count) => count}.reverse
the last one being less efficient in terms of allocations.
countBy
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
can be replaced with
words.groupBy(x => x).mapValues(_.length)
In fact I observe this pattern so often, that in some project I have an "extension method" countBy
added to Traversable like this:
implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
def countBy[K, That](accessor: A => K): Map[K, Int] =
traversable.groupBy(accessor).mapValues(_.length)
}
Separate calculations from making side effects
The line println(myHeap.dequeueAll.map(_.w).mkString(","))
does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w)
and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).
Split by more than just spaces
In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W
is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+"
would be better than " +"
, although most likely there would still be some edge-cases where it would go wrong.
Final result
After all the suggested modifications the code could look like this:
object FindWordFreq extends App {
val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
val wordCounts = words.groupBy(x => x).mapValues(_.length)
val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
println(wordsByDescendingNumberOfOccurrences.mkString(","))
}
add a comment |
Using just a priority queue
Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.
Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.
Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:
Use UpperCamelCase for class names
Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq
should be WordFreq
.
Use Scala syntax consistently
Scala let's you replace the dot and parentheses with a space in some case. x method param
is equivalent to x.method(param)
. In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"
), sometimes use just part of it c compareTo(that.c)
and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))
). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.
Embrace immutability
Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:
- https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures
- https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html
Prefer Ordering
over Ordered
Ordered
binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering
lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.
So instead of
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
I would write
case class WordFreq(w: String, c: Int)
val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
(x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)
you could make the mostFrequentFirstWordFreqOrdering
implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq
I would prefer to stay explicit and pass the ordering by-hand.
Use sorted/sortBy instead of hand-crafting a heap sort
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
myHeap.dequeueAll
Is basically a heap sort implementation. It could be replaced by the sorted
or the sortBy
methods of SeqLike
. It could look like
wordCount.toSeq.sortBy{case (word, count) => -count}
or if you find the -
and inelegant way of reversing the order, alternatives are:
wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)
or
wordCount.toSeq.sortBy{case (word, count) => count}.reverse
the last one being less efficient in terms of allocations.
countBy
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
can be replaced with
words.groupBy(x => x).mapValues(_.length)
In fact I observe this pattern so often, that in some project I have an "extension method" countBy
added to Traversable like this:
implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
def countBy[K, That](accessor: A => K): Map[K, Int] =
traversable.groupBy(accessor).mapValues(_.length)
}
Separate calculations from making side effects
The line println(myHeap.dequeueAll.map(_.w).mkString(","))
does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w)
and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).
Split by more than just spaces
In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W
is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+"
would be better than " +"
, although most likely there would still be some edge-cases where it would go wrong.
Final result
After all the suggested modifications the code could look like this:
object FindWordFreq extends App {
val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
val wordCounts = words.groupBy(x => x).mapValues(_.length)
val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
println(wordsByDescendingNumberOfOccurrences.mkString(","))
}
Using just a priority queue
Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.
Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.
Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:
Use UpperCamelCase for class names
Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq
should be WordFreq
.
Use Scala syntax consistently
Scala let's you replace the dot and parentheses with a space in some case. x method param
is equivalent to x.method(param)
. In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"
), sometimes use just part of it c compareTo(that.c)
and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))
). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.
Embrace immutability
Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:
- https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures
- https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html
Prefer Ordering
over Ordered
Ordered
binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering
lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.
So instead of
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
I would write
case class WordFreq(w: String, c: Int)
val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
(x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)
you could make the mostFrequentFirstWordFreqOrdering
implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq
I would prefer to stay explicit and pass the ordering by-hand.
Use sorted/sortBy instead of hand-crafting a heap sort
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
myHeap.dequeueAll
Is basically a heap sort implementation. It could be replaced by the sorted
or the sortBy
methods of SeqLike
. It could look like
wordCount.toSeq.sortBy{case (word, count) => -count}
or if you find the -
and inelegant way of reversing the order, alternatives are:
wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)
or
wordCount.toSeq.sortBy{case (word, count) => count}.reverse
the last one being less efficient in terms of allocations.
countBy
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
can be replaced with
words.groupBy(x => x).mapValues(_.length)
In fact I observe this pattern so often, that in some project I have an "extension method" countBy
added to Traversable like this:
implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
def countBy[K, That](accessor: A => K): Map[K, Int] =
traversable.groupBy(accessor).mapValues(_.length)
}
Separate calculations from making side effects
The line println(myHeap.dequeueAll.map(_.w).mkString(","))
does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w)
and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).
Split by more than just spaces
In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W
is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+"
would be better than " +"
, although most likely there would still be some edge-cases where it would go wrong.
Final result
After all the suggested modifications the code could look like this:
object FindWordFreq extends App {
val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
val wordCounts = words.groupBy(x => x).mapValues(_.length)
val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
println(wordsByDescendingNumberOfOccurrences.mkString(","))
}
answered 7 hours ago
Sumik
111
111
add a comment |
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210147%2fprint-words-in-decreasing-order-of-frequency-in-scala%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown