If I need to read lots of files, will it get faster if I break the problem into multiple threads? [closed]












-4














I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.



Here’s the Problem:



Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.



I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.



The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords



Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?



I base part of my opinion on this:



https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations



A small sample of the code:



namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.

protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();

private:
std::string _word;

public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}









share|improve this question







New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











closed as off-topic by vnp, Deduplicator, Jamal Dec 21 at 2:52


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    I feel your pain. However, in this exchange we only review code directly embedded in the question.
    – vnp
    Dec 21 at 1:49










  • Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
    – Menace
    Dec 21 at 2:34


















-4














I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.



Here’s the Problem:



Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.



I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.



The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords



Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?



I base part of my opinion on this:



https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations



A small sample of the code:



namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.

protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();

private:
std::string _word;

public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}









share|improve this question







New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











closed as off-topic by vnp, Deduplicator, Jamal Dec 21 at 2:52


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal

If this question can be reworded to fit the rules in the help center, please edit the question.









  • 3




    I feel your pain. However, in this exchange we only review code directly embedded in the question.
    – vnp
    Dec 21 at 1:49










  • Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
    – Menace
    Dec 21 at 2:34
















-4












-4








-4







I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.



Here’s the Problem:



Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.



I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.



The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords



Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?



I base part of my opinion on this:



https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations



A small sample of the code:



namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.

protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();

private:
std::string _word;

public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}









share|improve this question







New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.



Here’s the Problem:



Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.



I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.



The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords



Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?



I base part of my opinion on this:



https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations



A small sample of the code:



namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.

protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();

private:
std::string _word;

public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}






c++ performance multithreading file-system c++17






share|improve this question







New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question







New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question






New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Dec 21 at 1:43









Menace

931




931




New contributor




Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Menace is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




closed as off-topic by vnp, Deduplicator, Jamal Dec 21 at 2:52


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal

If this question can be reworded to fit the rules in the help center, please edit the question.




closed as off-topic by vnp, Deduplicator, Jamal Dec 21 at 2:52


This question appears to be off-topic. The users who voted to close gave this specific reason:


  • "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal

If this question can be reworded to fit the rules in the help center, please edit the question.








  • 3




    I feel your pain. However, in this exchange we only review code directly embedded in the question.
    – vnp
    Dec 21 at 1:49










  • Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
    – Menace
    Dec 21 at 2:34
















  • 3




    I feel your pain. However, in this exchange we only review code directly embedded in the question.
    – vnp
    Dec 21 at 1:49










  • Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
    – Menace
    Dec 21 at 2:34










3




3




I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49




I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49












Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34






Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34

















active

oldest

votes






















active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes

Popular posts from this blog

Список кардиналов, возведённых папой римским Каликстом III

Deduzione

Mysql.sock missing - “Can't connect to local MySQL server through socket”