Print words in decreasing order of frequency in Scala












0















Given a string print its words in decreasing order of frequency.



Example:



i/p - "aa bbb  ccc aa ddd aa ccc"
o/p - aa,ccc,ddd,bbb



Scala:



import collection.mutable.PriorityQueue
import collection.mutable.HashMap

object FindWordFreq extends App {
case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
override def compare(that: wordFreq): Int = c compareTo(that.c)
}
val words = "aa bbb ccc aa ddd aa ccc" split " +"
val wordCount = HashMap[String, Int]()
for (word <- words) {
wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
}
val myHeap = PriorityQueue[wordFreq]()
wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
println(myHeap.dequeueAll.map(_.w).mkString(","))
}


I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.










share|improve this question





























    0















    Given a string print its words in decreasing order of frequency.



    Example:



    i/p - "aa bbb  ccc aa ddd aa ccc"
    o/p - aa,ccc,ddd,bbb



    Scala:



    import collection.mutable.PriorityQueue
    import collection.mutable.HashMap

    object FindWordFreq extends App {
    case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
    override def compare(that: wordFreq): Int = c compareTo(that.c)
    }
    val words = "aa bbb ccc aa ddd aa ccc" split " +"
    val wordCount = HashMap[String, Int]()
    for (word <- words) {
    wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
    }
    val myHeap = PriorityQueue[wordFreq]()
    wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
    println(myHeap.dequeueAll.map(_.w).mkString(","))
    }


    I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.










    share|improve this question



























      0












      0








      0








      Given a string print its words in decreasing order of frequency.



      Example:



      i/p - "aa bbb  ccc aa ddd aa ccc"
      o/p - aa,ccc,ddd,bbb



      Scala:



      import collection.mutable.PriorityQueue
      import collection.mutable.HashMap

      object FindWordFreq extends App {
      case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
      override def compare(that: wordFreq): Int = c compareTo(that.c)
      }
      val words = "aa bbb ccc aa ddd aa ccc" split " +"
      val wordCount = HashMap[String, Int]()
      for (word <- words) {
      wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
      }
      val myHeap = PriorityQueue[wordFreq]()
      wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
      println(myHeap.dequeueAll.map(_.w).mkString(","))
      }


      I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.










      share|improve this question
















      Given a string print its words in decreasing order of frequency.



      Example:



      i/p - "aa bbb  ccc aa ddd aa ccc"
      o/p - aa,ccc,ddd,bbb



      Scala:



      import collection.mutable.PriorityQueue
      import collection.mutable.HashMap

      object FindWordFreq extends App {
      case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
      override def compare(that: wordFreq): Int = c compareTo(that.c)
      }
      val words = "aa bbb ccc aa ddd aa ccc" split " +"
      val wordCount = HashMap[String, Int]()
      for (word <- words) {
      wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
      }
      val myHeap = PriorityQueue[wordFreq]()
      wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
      println(myHeap.dequeueAll.map(_.w).mkString(","))
      }


      I do think that we can just do with Priority Queue (and get rid of map), but could not come up with very clean code with that.







      algorithm strings interview-questions functional-programming scala






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited 1 hour ago









      Jamal

      30.2k11116226




      30.2k11116226










      asked Dec 22 at 1:02









      vikrant

      476




      476






















          1 Answer
          1






          active

          oldest

          votes


















          0














          Using just a priority queue



          Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.



          Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.



          Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:



          Use UpperCamelCase for class names



          Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq should be WordFreq.



          Use Scala syntax consistently



          Scala let's you replace the dot and parentheses with a space in some case. x method param is equivalent to x.method(param). In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"), sometimes use just part of it c compareTo(that.c) and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.



          Embrace immutability



          Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:




          • https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures

          • https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html


          Prefer Ordering over Ordered



          Ordered binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.



          So instead of



          case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
          override def compare(that: wordFreq): Int = c compareTo(that.c)
          }


          I would write



          case class WordFreq(w: String, c: Int)
          val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
          (x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)


          you could make the mostFrequentFirstWordFreqOrdering implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq I would prefer to stay explicit and pass the ordering by-hand.



          Use sorted/sortBy instead of hand-crafting a heap sort



          val myHeap = PriorityQueue[wordFreq]()
          wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
          myHeap.dequeueAll


          Is basically a heap sort implementation. It could be replaced by the sorted or the sortBy methods of SeqLike. It could look like



          wordCount.toSeq.sortBy{case (word, count) => -count}


          or if you find the - and inelegant way of reversing the order, alternatives are:



          wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)


          or



          wordCount.toSeq.sortBy{case (word, count) => count}.reverse


          the last one being less efficient in terms of allocations.



          countBy



          val wordCount = HashMap[String, Int]()
          for (word <- words) {
          wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
          }


          can be replaced with



          words.groupBy(x => x).mapValues(_.length)


          In fact I observe this pattern so often, that in some project I have an "extension method" countBy added to Traversable like this:



          implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
          def countBy[K, That](accessor: A => K): Map[K, Int] =
          traversable.groupBy(accessor).mapValues(_.length)
          }


          Separate calculations from making side effects



          The line println(myHeap.dequeueAll.map(_.w).mkString(",")) does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w) and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).



          Split by more than just spaces
          In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+" would be better than " +", although most likely there would still be some edge-cases where it would go wrong.



          Final result



          After all the suggested modifications the code could look like this:



          object FindWordFreq extends App {
          val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
          val wordCounts = words.groupBy(x => x).mapValues(_.length)
          val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
          println(wordsByDescendingNumberOfOccurrences.mkString(","))
          }





          share|improve this answer





















            Your Answer





            StackExchange.ifUsing("editor", function () {
            return StackExchange.using("mathjaxEditing", function () {
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            });
            });
            }, "mathjax-editing");

            StackExchange.ifUsing("editor", function () {
            StackExchange.using("externalEditor", function () {
            StackExchange.using("snippets", function () {
            StackExchange.snippets.init();
            });
            });
            }, "code-snippets");

            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "196"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210147%2fprint-words-in-decreasing-order-of-frequency-in-scala%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            0














            Using just a priority queue



            Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.



            Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.



            Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:



            Use UpperCamelCase for class names



            Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq should be WordFreq.



            Use Scala syntax consistently



            Scala let's you replace the dot and parentheses with a space in some case. x method param is equivalent to x.method(param). In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"), sometimes use just part of it c compareTo(that.c) and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.



            Embrace immutability



            Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:




            • https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures

            • https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html


            Prefer Ordering over Ordered



            Ordered binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.



            So instead of



            case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
            override def compare(that: wordFreq): Int = c compareTo(that.c)
            }


            I would write



            case class WordFreq(w: String, c: Int)
            val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
            (x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)


            you could make the mostFrequentFirstWordFreqOrdering implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq I would prefer to stay explicit and pass the ordering by-hand.



            Use sorted/sortBy instead of hand-crafting a heap sort



            val myHeap = PriorityQueue[wordFreq]()
            wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
            myHeap.dequeueAll


            Is basically a heap sort implementation. It could be replaced by the sorted or the sortBy methods of SeqLike. It could look like



            wordCount.toSeq.sortBy{case (word, count) => -count}


            or if you find the - and inelegant way of reversing the order, alternatives are:



            wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)


            or



            wordCount.toSeq.sortBy{case (word, count) => count}.reverse


            the last one being less efficient in terms of allocations.



            countBy



            val wordCount = HashMap[String, Int]()
            for (word <- words) {
            wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
            }


            can be replaced with



            words.groupBy(x => x).mapValues(_.length)


            In fact I observe this pattern so often, that in some project I have an "extension method" countBy added to Traversable like this:



            implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
            def countBy[K, That](accessor: A => K): Map[K, Int] =
            traversable.groupBy(accessor).mapValues(_.length)
            }


            Separate calculations from making side effects



            The line println(myHeap.dequeueAll.map(_.w).mkString(",")) does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w) and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).



            Split by more than just spaces
            In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+" would be better than " +", although most likely there would still be some edge-cases where it would go wrong.



            Final result



            After all the suggested modifications the code could look like this:



            object FindWordFreq extends App {
            val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
            val wordCounts = words.groupBy(x => x).mapValues(_.length)
            val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
            println(wordsByDescendingNumberOfOccurrences.mkString(","))
            }





            share|improve this answer


























              0














              Using just a priority queue



              Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.



              Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.



              Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:



              Use UpperCamelCase for class names



              Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq should be WordFreq.



              Use Scala syntax consistently



              Scala let's you replace the dot and parentheses with a space in some case. x method param is equivalent to x.method(param). In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"), sometimes use just part of it c compareTo(that.c) and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.



              Embrace immutability



              Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:




              • https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures

              • https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html


              Prefer Ordering over Ordered



              Ordered binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.



              So instead of



              case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
              override def compare(that: wordFreq): Int = c compareTo(that.c)
              }


              I would write



              case class WordFreq(w: String, c: Int)
              val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
              (x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)


              you could make the mostFrequentFirstWordFreqOrdering implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq I would prefer to stay explicit and pass the ordering by-hand.



              Use sorted/sortBy instead of hand-crafting a heap sort



              val myHeap = PriorityQueue[wordFreq]()
              wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
              myHeap.dequeueAll


              Is basically a heap sort implementation. It could be replaced by the sorted or the sortBy methods of SeqLike. It could look like



              wordCount.toSeq.sortBy{case (word, count) => -count}


              or if you find the - and inelegant way of reversing the order, alternatives are:



              wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)


              or



              wordCount.toSeq.sortBy{case (word, count) => count}.reverse


              the last one being less efficient in terms of allocations.



              countBy



              val wordCount = HashMap[String, Int]()
              for (word <- words) {
              wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
              }


              can be replaced with



              words.groupBy(x => x).mapValues(_.length)


              In fact I observe this pattern so often, that in some project I have an "extension method" countBy added to Traversable like this:



              implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
              def countBy[K, That](accessor: A => K): Map[K, Int] =
              traversable.groupBy(accessor).mapValues(_.length)
              }


              Separate calculations from making side effects



              The line println(myHeap.dequeueAll.map(_.w).mkString(",")) does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w) and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).



              Split by more than just spaces
              In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+" would be better than " +", although most likely there would still be some edge-cases where it would go wrong.



              Final result



              After all the suggested modifications the code could look like this:



              object FindWordFreq extends App {
              val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
              val wordCounts = words.groupBy(x => x).mapValues(_.length)
              val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
              println(wordsByDescendingNumberOfOccurrences.mkString(","))
              }





              share|improve this answer
























                0












                0








                0






                Using just a priority queue



                Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.



                Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.



                Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:



                Use UpperCamelCase for class names



                Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq should be WordFreq.



                Use Scala syntax consistently



                Scala let's you replace the dot and parentheses with a space in some case. x method param is equivalent to x.method(param). In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"), sometimes use just part of it c compareTo(that.c) and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.



                Embrace immutability



                Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:




                • https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures

                • https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html


                Prefer Ordering over Ordered



                Ordered binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.



                So instead of



                case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
                override def compare(that: wordFreq): Int = c compareTo(that.c)
                }


                I would write



                case class WordFreq(w: String, c: Int)
                val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
                (x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)


                you could make the mostFrequentFirstWordFreqOrdering implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq I would prefer to stay explicit and pass the ordering by-hand.



                Use sorted/sortBy instead of hand-crafting a heap sort



                val myHeap = PriorityQueue[wordFreq]()
                wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
                myHeap.dequeueAll


                Is basically a heap sort implementation. It could be replaced by the sorted or the sortBy methods of SeqLike. It could look like



                wordCount.toSeq.sortBy{case (word, count) => -count}


                or if you find the - and inelegant way of reversing the order, alternatives are:



                wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)


                or



                wordCount.toSeq.sortBy{case (word, count) => count}.reverse


                the last one being less efficient in terms of allocations.



                countBy



                val wordCount = HashMap[String, Int]()
                for (word <- words) {
                wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
                }


                can be replaced with



                words.groupBy(x => x).mapValues(_.length)


                In fact I observe this pattern so often, that in some project I have an "extension method" countBy added to Traversable like this:



                implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
                def countBy[K, That](accessor: A => K): Map[K, Int] =
                traversable.groupBy(accessor).mapValues(_.length)
                }


                Separate calculations from making side effects



                The line println(myHeap.dequeueAll.map(_.w).mkString(",")) does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w) and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).



                Split by more than just spaces
                In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+" would be better than " +", although most likely there would still be some edge-cases where it would go wrong.



                Final result



                After all the suggested modifications the code could look like this:



                object FindWordFreq extends App {
                val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
                val wordCounts = words.groupBy(x => x).mapValues(_.length)
                val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
                println(wordsByDescendingNumberOfOccurrences.mkString(","))
                }





                share|improve this answer












                Using just a priority queue



                Yes, it's possible, although you would need a special implementation of priority queue, which internally is a hash map + a heap. This trick is commonly used in Dijkstra's algorithm. See https://stackoverflow.com/a/17581306/1370445.



                Unlike in Dijkstra's algorithm, in your task, you don't need to be able to alternately update values on the heap and remove the top element. So I would discourage you from using this data structure, and encourage to instead focus on achieving high code quality (including low code volume), using the data structures and operations available in Scala's standard library.



                Here are my recommendations regarding possible improvements to the code, some of them obsolete the others in the context of this specific piece of code, but I think it's worth to mention them anyway:



                Use UpperCamelCase for class names



                Scala has an official naming convention (https://docs.scala-lang.org/style/naming-conventions.html) which recommends to name classes with UpperCamelCase, so wordFreq should be WordFreq.



                Use Scala syntax consistently



                Scala let's you replace the dot and parentheses with a space in some case. x method param is equivalent to x.method(param). In the snippet of code you've provided you sometimes use that feature ("aa bbb ccc aa ddd aa ccc" split " +"), sometimes use just part of it c compareTo(that.c) and sometimes you don't use it (myHeap.enqueue(wordFreq(w,c))). I would recommend you to decide which style of code you prefer (I prefer dots and parentheses) and use it consistently, at least within a single source file.



                Embrace immutability



                Scala provides two sets of collections, mutable and immutable. In most cases it's preferable to use immutable collection as they make reasoning easier and eliminate whole class of bugs. I won't dive deeper into that topic, as it is covered in detail by many articles on the internet, for example:




                • https://www.quora.com/What-are-the-advantages-and-disadvantages-of-immutable-data-structures

                • https://www.yegor256.com/2014/06/09/objects-should-be-immutable.html


                Prefer Ordering over Ordered



                Ordered binds together the data structure and the way of ordering/comparing them. This is reasonable for thing which have some notion of "natural order" like Strings, or Integers. Ordering lets you deliver the way of ordering separately from the data structure declaration, resulting in more composable code.



                So instead of



                case class wordFreq(w: String, c: Int) extends Ordered[wordFreq] {
                override def compare(that: wordFreq): Int = c compareTo(that.c)
                }


                I would write



                case class WordFreq(w: String, c: Int)
                val mostFrequentFirstWordFreqOrdering: Ordering[WordFreq] =
                (x: WordFreq, y: WordFreq) => y.c.compareTo(x.c)


                you could make the mostFrequentFirstWordFreqOrdering implicit so that the compiler will pick it up automatic, but since there's more than one reasonable way to sort WordFreq I would prefer to stay explicit and pass the ordering by-hand.



                Use sorted/sortBy instead of hand-crafting a heap sort



                val myHeap = PriorityQueue[wordFreq]()
                wordCount.toSeq foreach { case (w,c) => myHeap.enqueue(wordFreq(w,c)) }
                myHeap.dequeueAll


                Is basically a heap sort implementation. It could be replaced by the sorted or the sortBy methods of SeqLike. It could look like



                wordCount.toSeq.sortBy{case (word, count) => -count}


                or if you find the - and inelegant way of reversing the order, alternatives are:



                wordCount.toSeq.sorted(Ordering.by[(String, Int), Int]{case (word, count) => count}.reverse)


                or



                wordCount.toSeq.sortBy{case (word, count) => count}.reverse


                the last one being less efficient in terms of allocations.



                countBy



                val wordCount = HashMap[String, Int]()
                for (word <- words) {
                wordCount += (word -> (wordCount.getOrElse(word,0) + 1))
                }


                can be replaced with



                words.groupBy(x => x).mapValues(_.length)


                In fact I observe this pattern so often, that in some project I have an "extension method" countBy added to Traversable like this:



                implicit class RichTraversable[+A](val traversable: Traversable[A]) extends AnyVal {
                def countBy[K, That](accessor: A => K): Map[K, Int] =
                traversable.groupBy(accessor).mapValues(_.length)
                }


                Separate calculations from making side effects



                The line println(myHeap.dequeueAll.map(_.w).mkString(",")) does two things. It finishes the process of sorting the results myHeap.dequeueAll.map(_.w) and prints them in human readable format println(results.mkString(",")). Two things should be done in two lines, or better, two methods, or better functions (..., but functional IO is a longer topic).



                Split by more than just spaces
                In real-world text words may be separated by more than just spaces - new-lines, commas, semicolons, etc.. W is a regex pattern for "non-word character" (see https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html). Using "\W+" would be better than " +", although most likely there would still be some edge-cases where it would go wrong.



                Final result



                After all the suggested modifications the code could look like this:



                object FindWordFreq extends App {
                val words = "aa bbb ccc, aa dddnaarccc" split "\W+"
                val wordCounts = words.groupBy(x => x).mapValues(_.length)
                val wordsByDescendingNumberOfOccurrences = wordCounts.toSeq.sortBy{case (_, count) => count}.reverse
                println(wordsByDescendingNumberOfOccurrences.mkString(","))
                }






                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered 7 hours ago









                Sumik

                111




                111






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Code Review Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    Use MathJax to format equations. MathJax reference.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210147%2fprint-words-in-decreasing-order-of-frequency-in-scala%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Сан-Квентин

                    8-я гвардейская общевойсковая армия

                    Алькесар