Data Structures for Counting Duplicates and using std::vector::erase

up vote
4
down vote

favorite

Problem

Dupe detection for a vector of ints. I simply want a count of the unique input characters that occurred at least twice. The goal is to count a dupe only once and ignore that input character if another dupe of it is seen in the future. A test input could look something like this vector<int> test = { 4,5,9,6,9,9,6,3,4 };

Looking for Feedback on

Looking for basic feedback on the data structures I'm using and the possibility of using the vector erase method to iterate and take advantage of the space allocated to my numbers vector instead of using a map to not count dups more than once. Any C++ 11 or 17 features I can take advantage of here too?

int countDuplicates(vector<int> numbers) {

    int dups = 0;

    set<int> s;

    map<int, int> m;

    for (int n : numbers) {

      if (s.insert(n).second == false && m.find(n) == m.end()) {

          dups++;

          m.insert(pair<int, int>(n,0));

          // better to remove from vector than increase space with the map?

          // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

        } else {

          s.insert(n);

        }

    }



    return dups;

}

edited Nov 29 at 0:05

asked Nov 28 at 21:05

greg

29017

Not really enough for a full answer, but m.insert(pair<int, int>(n, 0)) can be replaced with simply m.emplace(n, 0) saving you from writing out the pair constructor.
– Kyle
Nov 29 at 8:04

add a comment |

up vote
4
down vote

favorite

Problem

Looking for Feedback on

int countDuplicates(vector<int> numbers) {

    int dups = 0;

    set<int> s;

    map<int, int> m;

    for (int n : numbers) {

      if (s.insert(n).second == false && m.find(n) == m.end()) {

          dups++;

          m.insert(pair<int, int>(n,0));

          // better to remove from vector than increase space with the map?

          // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

        } else {

          s.insert(n);

        }

    }



    return dups;

}

edited Nov 29 at 0:05

asked Nov 28 at 21:05

greg

29017

Not really enough for a full answer, but m.insert(pair<int, int>(n, 0)) can be replaced with simply m.emplace(n, 0) saving you from writing out the pair constructor.
– Kyle
Nov 29 at 8:04

add a comment |

up vote
4
down vote

favorite

Problem

Looking for Feedback on

int countDuplicates(vector<int> numbers) {

    int dups = 0;

    set<int> s;

    map<int, int> m;

    for (int n : numbers) {

      if (s.insert(n).second == false && m.find(n) == m.end()) {

          dups++;

          m.insert(pair<int, int>(n,0));

          // better to remove from vector than increase space with the map?

          // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

        } else {

          s.insert(n);

        }

    }



    return dups;

}

edited Nov 29 at 0:05

asked Nov 28 at 21:05

greg

29017

Problem

Looking for Feedback on

int countDuplicates(vector<int> numbers) {

    int dups = 0;

    set<int> s;

    map<int, int> m;

    for (int n : numbers) {

      if (s.insert(n).second == false && m.find(n) == m.end()) {

          dups++;

          m.insert(pair<int, int>(n,0));

          // better to remove from vector than increase space with the map?

          // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

        } else {

          s.insert(n);

        }

    }



    return dups;

}

c++ vectors hash-map set

edited Nov 29 at 0:05

asked Nov 28 at 21:05

greg

29017

edited Nov 29 at 0:05

asked Nov 28 at 21:05

greg

29017

edited Nov 29 at 0:05

asked Nov 28 at 21:05

greg

29017

asked Nov 28 at 21:05

greg

29017

asked Nov 28 at 21:05

greg

29017

Not really enough for a full answer, but m.insert(pair<int, int>(n, 0)) can be replaced with simply m.emplace(n, 0) saving you from writing out the pair constructor.
– Kyle
Nov 29 at 8:04

add a comment |

Not really enough for a full answer, but m.insert(pair<int, int>(n, 0)) can be replaced with simply m.emplace(n, 0) saving you from writing out the pair constructor.
– Kyle
Nov 29 at 8:04

Not really enough for a full answer, but m.insert(pair<int, int>(n, 0)) can be replaced with simply m.emplace(n, 0) saving you from writing out the pair constructor.
– Kyle
Nov 29 at 8:04

add a comment |

2 Answers
2

active

oldest

votes

up vote
7
down vote

accepted

Basic Algorithm

At least if I understand the intent correctly, you simply want a count of the unique input characters that occurred at least twice.

In that case, I think I'd do something like this:

int count_dupes(std::vector<int> const &inputs) { 

    std::map<int, int> counts;



    for (auto i : inputs)

        ++counts[i];



    return std::count_if(counts.begin(), counts.end(),

                         (auto const &p) { return p.second >= 2; });

}

I'd also consider using an array instead of a map, as outlined in an answer to an earlier question: https://codereview.stackexchange.com/a/208502/489 --but this can depend on the range of values you're dealing with. With a 16-bit int, it's no problem at all on most machines. With a 32-bit int (and no other constraints on values) it's still possible on many machines, but probably impractical. For arbitrary 64-bit int, an array won't be practical.

Parameter Passing

Right now, you're passing the input by value. This means when you call the function with some vector, a copy of the original vector will normally be made and passed to the function. As a general rule, something like a vector that's potentially large and slow to copy should be passed by reference to const, as shown in the code above.

Logical Comparisons

Comparing a Boolean value to true or false is generally a poor idea. if (x==true) is equivalent to if (x) and if (x == false) is equivalent to if (!x). Normally, if it's Boolean in nature, a variable should be given a name that reflects that nature, and should be used directly rather than being compared to true or false. For example, s.insert(n).second == false wold be better written as: if (!s.insert(n).second).

Some people (understandably, I guess) prefer to use the written form: if if (not s.insert(n).second). I've written C and C++ long enough that I have no difficulty with reading ! as meaning "not", but especially if it may be read by people less accustomed to programming, it may make more sense to use the words instead of symbols.

Formatting/Indentation

At least to me, this indentation looks a bit odd:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

    } else {

      s.insert(n);

    }

If you use indentation like that consistently, I guess it's not necessarily terrible, but I think more people are accustomed to something more like this:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

  } else {

      s.insert(n);

  }

...where each closing brace is vertically aligned with the beginning of the block it closes. As a side-note, there are almost endless debates about the efficacy of various bracing styles. I'm not going to advocate for or against any of the well known styles, but I think there's a fair amount to be gained from using a style that's well known, and then using it consistently. I don't see much to gain from style that's different from what almost anybody else uses.

edited Nov 29 at 15:16

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

If my understanding is correct, count_if iterates over counts and increments p when it finds a unique input that occurred at least twice. ++counts[i] is a very clean map update, I've not seen this before and took me a moment to understand, but both the key and value are updated. Agree on logical comparison feedback. I've gone back and fort with false and !, I've used false more often for readability, but I'm at the point in my life we're brevity and speed are becoming more important. Brace indent was a paste error, but good catch and raised my awareness of this detail.
– greg
Nov 29 at 0:04

1

Yes, ++counts[i] will insert a new record for i if it hasn't been inserted yet. That newly inserted record will have its count at 0. The ++ will then increment the current count. count_if just counts the number of elements in a collection that meet the specified criteria, so it basically just counts and returns the number of items for which your predicate returned true.
– Jerry Coffin
Nov 29 at 1:39

2

It would also be quite easy to do with the original version, as it makes a copy. One could then sort copied vector, apply std::unique and get std::distance between begin and returned iterator. It is not really certain which version is better though. I've written this comment with relation to the one above, but it seems to be deleted now.
– Incomputable
Nov 29 at 11:37

You might want to attach a caveat to the suggestion to use an array, since int has a much wider range of values than char (so we're not in quite the same context as that other question).
– Toby Speight
Nov 29 at 11:49

1

@Incomputable: yes, it was my comment, I had some doubts about its validity after reading the original question again. It might have been justified. / sadly std::unique doesn't do the job if what we need to count is the number of elements appearing at least twice
– papagaga
Nov 29 at 11:57

add a comment |

up vote
5
down vote

I don't agree with @JerryCoffin on two accounts: algorithm and paramater passing, the latter being a consequence of the former. That's why I submit this extra review, even if @JerryCoffin's has already been accepted, and even if I agree with the other points he made.

When you design an algorithm, especially in C++, you want it to be as efficient as possible, in as many situations as possible. It's a good idea to take a look at existing algorithms in the standard library to see how it can be achieved, all the more when there is an algorithm there that is closely related to the one you're designing: std::unique, that removes all but the first of consecutive equivalent elements. What's interesting is 1) that it operates on a sorted range and 2) that it modifies the input sequence: thus it makes it optimal when the input sequence is already sorted, and also when it's disposable. Can we benefit from std::uniques interface in our largely similar problem? I would say so:

#include <algorithm>



template <typename Iterator>

int count_duplicates(Iterator first, Iterator last) {

    // requires a sorted range

    int count = 0;

    while (true) {

        first = std::adjacent_find(first, last);

        if (first == last) return count;

        first = std::adjacent_find(++first, last, std::not_equal_to<>());

        ++count;

    }

}

Let's now compare with @JerryCoffin's proposed solution, which allocates memory for a std::map and then has in all cases a complexity of O(n*log(n)) for populating it + O(n) for counting elements with a frequency higher than 1:

if the input range is already sorted, this algorithm has O(n) complexity, which is better

if the input range is disposable but not sorted, this algorithm has the same complexity (O(n*log(n)) for prior sorting and O(n) for counting), but doesn't allocate memory and has better cache locality

if the input is neither sorted nor disposable, we have the same complexity and memory requirements (we need to copy the input range) but we keep the better cache locality

On the other hand it lacks the possibility of relying on a more efficient structure to count the occurrences of each element, such as an array or a hash table. We could then theoretically go from O(n*log(n)) to O(n) when looking for duplicates. But I'm still unconvinced because those data structures would be oversized if the input range has a small alphabet.

EDIT: I think I've read the submitted code and the question a bit too fast. If what we need is not only to count elements appearing at least twice, but erasing other elements of the vector, then the solution is different, even if most building blocks remain:

#include <vector>

#include <algorithm>

#include <iostream>



template <typename Iterator>

Iterator one_of_duplicates(Iterator first, Iterator last) {

    // requires a sorted input

    auto current = first;

    while (true) {

        // find a duplicated element, move it behind 'first' 

        // and find the next different element

        current = std::adjacent_find(current, last);

        if (current == last) return first;

        *first++ = std::move(*current);

        std::cerr << *current << std::endl;

        current = std::adjacent_find(current, last, std::not_equal_to<>());

    }

}





int main() {



    std::vector<int> data = { 0, 1, 2, 3, 4, 5, 1, 2, 2, 3, 5, 5, 5 };

    std::sort(data.begin(), data.end());

    data.erase(one_of_duplicates(data.begin(), data.end()), data.end());

    for (auto i : data) std::cout << i << ',';



}

edited Nov 29 at 15:40

answered Nov 29 at 11:26

papagaga

4,089221

add a comment |

Your Answer

StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");

StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f208648%2fdata-structures-for-counting-duplicates-and-using-stdvectorerase%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

up vote
7
down vote

accepted

Basic Algorithm

At least if I understand the intent correctly, you simply want a count of the unique input characters that occurred at least twice.

In that case, I think I'd do something like this:

int count_dupes(std::vector<int> const &inputs) { 

    std::map<int, int> counts;



    for (auto i : inputs)

        ++counts[i];



    return std::count_if(counts.begin(), counts.end(),

                         (auto const &p) { return p.second >= 2; });

}

Parameter Passing

Logical Comparisons

Formatting/Indentation

At least to me, this indentation looks a bit odd:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

    } else {

      s.insert(n);

    }

If you use indentation like that consistently, I guess it's not necessarily terrible, but I think more people are accustomed to something more like this:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

  } else {

      s.insert(n);

  }

edited Nov 29 at 15:16

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

If my understanding is correct, count_if iterates over counts and increments p when it finds a unique input that occurred at least twice. ++counts[i] is a very clean map update, I've not seen this before and took me a moment to understand, but both the key and value are updated. Agree on logical comparison feedback. I've gone back and fort with false and !, I've used false more often for readability, but I'm at the point in my life we're brevity and speed are becoming more important. Brace indent was a paste error, but good catch and raised my awareness of this detail.
– greg
Nov 29 at 0:04

1

Yes, ++counts[i] will insert a new record for i if it hasn't been inserted yet. That newly inserted record will have its count at 0. The ++ will then increment the current count. count_if just counts the number of elements in a collection that meet the specified criteria, so it basically just counts and returns the number of items for which your predicate returned true.
– Jerry Coffin
Nov 29 at 1:39

2

It would also be quite easy to do with the original version, as it makes a copy. One could then sort copied vector, apply std::unique and get std::distance between begin and returned iterator. It is not really certain which version is better though. I've written this comment with relation to the one above, but it seems to be deleted now.
– Incomputable
Nov 29 at 11:37

You might want to attach a caveat to the suggestion to use an array, since int has a much wider range of values than char (so we're not in quite the same context as that other question).
– Toby Speight
Nov 29 at 11:49

1

@Incomputable: yes, it was my comment, I had some doubts about its validity after reading the original question again. It might have been justified. / sadly std::unique doesn't do the job if what we need to count is the number of elements appearing at least twice
– papagaga
Nov 29 at 11:57

add a comment |

up vote
7
down vote

accepted

Basic Algorithm

At least if I understand the intent correctly, you simply want a count of the unique input characters that occurred at least twice.

In that case, I think I'd do something like this:

int count_dupes(std::vector<int> const &inputs) { 

    std::map<int, int> counts;



    for (auto i : inputs)

        ++counts[i];



    return std::count_if(counts.begin(), counts.end(),

                         (auto const &p) { return p.second >= 2; });

}

Parameter Passing

Logical Comparisons

Formatting/Indentation

At least to me, this indentation looks a bit odd:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

    } else {

      s.insert(n);

    }

If you use indentation like that consistently, I guess it's not necessarily terrible, but I think more people are accustomed to something more like this:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

  } else {

      s.insert(n);

  }

edited Nov 29 at 15:16

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

If my understanding is correct, count_if iterates over counts and increments p when it finds a unique input that occurred at least twice. ++counts[i] is a very clean map update, I've not seen this before and took me a moment to understand, but both the key and value are updated. Agree on logical comparison feedback. I've gone back and fort with false and !, I've used false more often for readability, but I'm at the point in my life we're brevity and speed are becoming more important. Brace indent was a paste error, but good catch and raised my awareness of this detail.
– greg
Nov 29 at 0:04

1

Yes, ++counts[i] will insert a new record for i if it hasn't been inserted yet. That newly inserted record will have its count at 0. The ++ will then increment the current count. count_if just counts the number of elements in a collection that meet the specified criteria, so it basically just counts and returns the number of items for which your predicate returned true.
– Jerry Coffin
Nov 29 at 1:39

2

It would also be quite easy to do with the original version, as it makes a copy. One could then sort copied vector, apply std::unique and get std::distance between begin and returned iterator. It is not really certain which version is better though. I've written this comment with relation to the one above, but it seems to be deleted now.
– Incomputable
Nov 29 at 11:37

You might want to attach a caveat to the suggestion to use an array, since int has a much wider range of values than char (so we're not in quite the same context as that other question).
– Toby Speight
Nov 29 at 11:49

1

@Incomputable: yes, it was my comment, I had some doubts about its validity after reading the original question again. It might have been justified. / sadly std::unique doesn't do the job if what we need to count is the number of elements appearing at least twice
– papagaga
Nov 29 at 11:57

add a comment |

up vote
7
down vote

accepted

Basic Algorithm

At least if I understand the intent correctly, you simply want a count of the unique input characters that occurred at least twice.

In that case, I think I'd do something like this:

int count_dupes(std::vector<int> const &inputs) { 

    std::map<int, int> counts;



    for (auto i : inputs)

        ++counts[i];



    return std::count_if(counts.begin(), counts.end(),

                         (auto const &p) { return p.second >= 2; });

}

Parameter Passing

Logical Comparisons

Formatting/Indentation

At least to me, this indentation looks a bit odd:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

    } else {

      s.insert(n);

    }

If you use indentation like that consistently, I guess it's not necessarily terrible, but I think more people are accustomed to something more like this:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

  } else {

      s.insert(n);

  }

edited Nov 29 at 15:16

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

Basic Algorithm

At least if I understand the intent correctly, you simply want a count of the unique input characters that occurred at least twice.

In that case, I think I'd do something like this:

int count_dupes(std::vector<int> const &inputs) { 

    std::map<int, int> counts;



    for (auto i : inputs)

        ++counts[i];



    return std::count_if(counts.begin(), counts.end(),

                         (auto const &p) { return p.second >= 2; });

}

Parameter Passing

Logical Comparisons

Formatting/Indentation

At least to me, this indentation looks a bit odd:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

    } else {

      s.insert(n);

    }

If you use indentation like that consistently, I guess it's not necessarily terrible, but I think more people are accustomed to something more like this:

  if (s.insert(n).second == false && m.find(n) == m.end()) {

      dups++;

      m.insert(pair<int, int>(n,0));

      // better to remove from vector than increase space with the map?

      // numbers.erase(remove(numbers.begin(), numbers.end(), n), numbers.end());          

  } else {

      s.insert(n);

  }

edited Nov 29 at 15:16

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

edited Nov 29 at 15:16

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

answered Nov 28 at 23:24

Jerry Coffin

27.8k460125

If my understanding is correct, count_if iterates over counts and increments p when it finds a unique input that occurred at least twice. ++counts[i] is a very clean map update, I've not seen this before and took me a moment to understand, but both the key and value are updated. Agree on logical comparison feedback. I've gone back and fort with false and !, I've used false more often for readability, but I'm at the point in my life we're brevity and speed are becoming more important. Brace indent was a paste error, but good catch and raised my awareness of this detail.
– greg
Nov 29 at 0:04

1

Yes, ++counts[i] will insert a new record for i if it hasn't been inserted yet. That newly inserted record will have its count at 0. The ++ will then increment the current count. count_if just counts the number of elements in a collection that meet the specified criteria, so it basically just counts and returns the number of items for which your predicate returned true.
– Jerry Coffin
Nov 29 at 1:39

2

It would also be quite easy to do with the original version, as it makes a copy. One could then sort copied vector, apply std::unique and get std::distance between begin and returned iterator. It is not really certain which version is better though. I've written this comment with relation to the one above, but it seems to be deleted now.
– Incomputable
Nov 29 at 11:37

You might want to attach a caveat to the suggestion to use an array, since int has a much wider range of values than char (so we're not in quite the same context as that other question).
– Toby Speight
Nov 29 at 11:49

1

@Incomputable: yes, it was my comment, I had some doubts about its validity after reading the original question again. It might have been justified. / sadly std::unique doesn't do the job if what we need to count is the number of elements appearing at least twice
– papagaga
Nov 29 at 11:57

add a comment |

If my understanding is correct, count_if iterates over counts and increments p when it finds a unique input that occurred at least twice. ++counts[i] is a very clean map update, I've not seen this before and took me a moment to understand, but both the key and value are updated. Agree on logical comparison feedback. I've gone back and fort with false and !, I've used false more often for readability, but I'm at the point in my life we're brevity and speed are becoming more important. Brace indent was a paste error, but good catch and raised my awareness of this detail.
– greg
Nov 29 at 0:04

1

Yes, ++counts[i] will insert a new record for i if it hasn't been inserted yet. That newly inserted record will have its count at 0. The ++ will then increment the current count. count_if just counts the number of elements in a collection that meet the specified criteria, so it basically just counts and returns the number of items for which your predicate returned true.
– Jerry Coffin
Nov 29 at 1:39

2

It would also be quite easy to do with the original version, as it makes a copy. One could then sort copied vector, apply std::unique and get std::distance between begin and returned iterator. It is not really certain which version is better though. I've written this comment with relation to the one above, but it seems to be deleted now.
– Incomputable
Nov 29 at 11:37

You might want to attach a caveat to the suggestion to use an array, since int has a much wider range of values than char (so we're not in quite the same context as that other question).
– Toby Speight
Nov 29 at 11:49

1

@Incomputable: yes, it was my comment, I had some doubts about its validity after reading the original question again. It might have been justified. / sadly std::unique doesn't do the job if what we need to count is the number of elements appearing at least twice
– papagaga
Nov 29 at 11:57

If my understanding is correct, count_if iterates over counts and increments p when it finds a unique input that occurred at least twice. ++counts[i] is a very clean map update, I've not seen this before and took me a moment to understand, but both the key and value are updated. Agree on logical comparison feedback. I've gone back and fort with false and !, I've used false more often for readability, but I'm at the point in my life we're brevity and speed are becoming more important. Brace indent was a paste error, but good catch and raised my awareness of this detail.
– greg
Nov 29 at 0:04

Yes, ++counts[i] will insert a new record for i if it hasn't been inserted yet. That newly inserted record will have its count at 0. The ++ will then increment the current count. count_if just counts the number of elements in a collection that meet the specified criteria, so it basically just counts and returns the number of items for which your predicate returned true.
– Jerry Coffin
Nov 29 at 1:39

It would also be quite easy to do with the original version, as it makes a copy. One could then sort copied vector, apply std::unique and get std::distance between begin and returned iterator. It is not really certain which version is better though. I've written this comment with relation to the one above, but it seems to be deleted now.
– Incomputable
Nov 29 at 11:37

You might want to attach a caveat to the suggestion to use an array, since int has a much wider range of values than char (so we're not in quite the same context as that other question).
– Toby Speight
Nov 29 at 11:49

@Incomputable: yes, it was my comment, I had some doubts about its validity after reading the original question again. It might have been justified. / sadly std::unique doesn't do the job if what we need to count is the number of elements appearing at least twice
– papagaga
Nov 29 at 11:57

add a comment |

up vote
5
down vote

#include <algorithm>



template <typename Iterator>

int count_duplicates(Iterator first, Iterator last) {

    // requires a sorted range

    int count = 0;

    while (true) {

        first = std::adjacent_find(first, last);

        if (first == last) return count;

        first = std::adjacent_find(++first, last, std::not_equal_to<>());

        ++count;

    }

}

if the input range is already sorted, this algorithm has O(n) complexity, which is better

if the input range is disposable but not sorted, this algorithm has the same complexity (O(n*log(n)) for prior sorting and O(n) for counting), but doesn't allocate memory and has better cache locality

if the input is neither sorted nor disposable, we have the same complexity and memory requirements (we need to copy the input range) but we keep the better cache locality

#include <vector>

#include <algorithm>

#include <iostream>



template <typename Iterator>

Iterator one_of_duplicates(Iterator first, Iterator last) {

    // requires a sorted input

    auto current = first;

    while (true) {

        // find a duplicated element, move it behind 'first' 

        // and find the next different element

        current = std::adjacent_find(current, last);

        if (current == last) return first;

        *first++ = std::move(*current);

        std::cerr << *current << std::endl;

        current = std::adjacent_find(current, last, std::not_equal_to<>());

    }

}





int main() {



    std::vector<int> data = { 0, 1, 2, 3, 4, 5, 1, 2, 2, 3, 5, 5, 5 };

    std::sort(data.begin(), data.end());

    data.erase(one_of_duplicates(data.begin(), data.end()), data.end());

    for (auto i : data) std::cout << i << ',';



}

edited Nov 29 at 15:40

answered Nov 29 at 11:26

papagaga

4,089221

add a comment |

up vote
5
down vote

#include <algorithm>



template <typename Iterator>

int count_duplicates(Iterator first, Iterator last) {

    // requires a sorted range

    int count = 0;

    while (true) {

        first = std::adjacent_find(first, last);

        if (first == last) return count;

        first = std::adjacent_find(++first, last, std::not_equal_to<>());

        ++count;

    }

}

if the input range is already sorted, this algorithm has O(n) complexity, which is better

if the input range is disposable but not sorted, this algorithm has the same complexity (O(n*log(n)) for prior sorting and O(n) for counting), but doesn't allocate memory and has better cache locality

if the input is neither sorted nor disposable, we have the same complexity and memory requirements (we need to copy the input range) but we keep the better cache locality

#include <vector>

#include <algorithm>

#include <iostream>



template <typename Iterator>

Iterator one_of_duplicates(Iterator first, Iterator last) {

    // requires a sorted input

    auto current = first;

    while (true) {

        // find a duplicated element, move it behind 'first' 

        // and find the next different element

        current = std::adjacent_find(current, last);

        if (current == last) return first;

        *first++ = std::move(*current);

        std::cerr << *current << std::endl;

        current = std::adjacent_find(current, last, std::not_equal_to<>());

    }

}





int main() {



    std::vector<int> data = { 0, 1, 2, 3, 4, 5, 1, 2, 2, 3, 5, 5, 5 };

    std::sort(data.begin(), data.end());

    data.erase(one_of_duplicates(data.begin(), data.end()), data.end());

    for (auto i : data) std::cout << i << ',';



}

edited Nov 29 at 15:40

answered Nov 29 at 11:26

papagaga

4,089221

add a comment |

up vote
5
down vote

#include <algorithm>



template <typename Iterator>

int count_duplicates(Iterator first, Iterator last) {

    // requires a sorted range

    int count = 0;

    while (true) {

        first = std::adjacent_find(first, last);

        if (first == last) return count;

        first = std::adjacent_find(++first, last, std::not_equal_to<>());

        ++count;

    }

}

if the input range is already sorted, this algorithm has O(n) complexity, which is better

if the input range is disposable but not sorted, this algorithm has the same complexity (O(n*log(n)) for prior sorting and O(n) for counting), but doesn't allocate memory and has better cache locality

if the input is neither sorted nor disposable, we have the same complexity and memory requirements (we need to copy the input range) but we keep the better cache locality

#include <vector>

#include <algorithm>

#include <iostream>



template <typename Iterator>

Iterator one_of_duplicates(Iterator first, Iterator last) {

    // requires a sorted input

    auto current = first;

    while (true) {

        // find a duplicated element, move it behind 'first' 

        // and find the next different element

        current = std::adjacent_find(current, last);

        if (current == last) return first;

        *first++ = std::move(*current);

        std::cerr << *current << std::endl;

        current = std::adjacent_find(current, last, std::not_equal_to<>());

    }

}





int main() {



    std::vector<int> data = { 0, 1, 2, 3, 4, 5, 1, 2, 2, 3, 5, 5, 5 };

    std::sort(data.begin(), data.end());

    data.erase(one_of_duplicates(data.begin(), data.end()), data.end());

    for (auto i : data) std::cout << i << ',';



}

edited Nov 29 at 15:40

answered Nov 29 at 11:26

papagaga

4,089221

#include <algorithm>



template <typename Iterator>

int count_duplicates(Iterator first, Iterator last) {

    // requires a sorted range

    int count = 0;

    while (true) {

        first = std::adjacent_find(first, last);

        if (first == last) return count;

        first = std::adjacent_find(++first, last, std::not_equal_to<>());

        ++count;

    }

}

if the input range is already sorted, this algorithm has O(n) complexity, which is better

if the input range is disposable but not sorted, this algorithm has the same complexity (O(n*log(n)) for prior sorting and O(n) for counting), but doesn't allocate memory and has better cache locality

if the input is neither sorted nor disposable, we have the same complexity and memory requirements (we need to copy the input range) but we keep the better cache locality

#include <vector>

#include <algorithm>

#include <iostream>



template <typename Iterator>

Iterator one_of_duplicates(Iterator first, Iterator last) {

    // requires a sorted input

    auto current = first;

    while (true) {

        // find a duplicated element, move it behind 'first' 

        // and find the next different element

        current = std::adjacent_find(current, last);

        if (current == last) return first;

        *first++ = std::move(*current);

        std::cerr << *current << std::endl;

        current = std::adjacent_find(current, last, std::not_equal_to<>());

    }

}





int main() {



    std::vector<int> data = { 0, 1, 2, 3, 4, 5, 1, 2, 2, 3, 5, 5, 5 };

    std::sort(data.begin(), data.end());

    data.erase(one_of_duplicates(data.begin(), data.end()), data.end());

    for (auto i : data) std::cout << i << ',';



}

edited Nov 29 at 15:40

answered Nov 29 at 11:26

papagaga

4,089221

edited Nov 29 at 15:40

answered Nov 29 at 11:26

papagaga

4,089221

answered Nov 29 at 11:26

papagaga

4,089221

answered Nov 29 at 11:26

papagaga

4,089221

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Code Review Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

Data Structures for Counting Duplicates and using std::vector::erase

Problem

Looking for Feedback on

Problem

Looking for Feedback on

Problem

Looking for Feedback on

Problem

Looking for Feedback on

2 Answers 2

Basic Algorithm

Parameter Passing

Logical Comparisons

Formatting/Indentation

Your Answer

Sign up or log in

Post as a guest

Post as a guest

2 Answers 2

2 Answers 2

Basic Algorithm

Parameter Passing

Logical Comparisons

Formatting/Indentation

Basic Algorithm

Parameter Passing

Logical Comparisons

Formatting/Indentation

Basic Algorithm

Parameter Passing

Logical Comparisons

Formatting/Indentation

Basic Algorithm

Parameter Passing

Logical Comparisons

Formatting/Indentation

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

Terni

A new problem with tex4ht and tikz

Sun Ra

2 Answers
2

2 Answers
2

2 Answers
2