If I need to read lots of files, will it get faster if I break the problem into multiple threads? [closed]
I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.
Here’s the Problem:
Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.
I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.
The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords
Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?
I base part of my opinion on this:
https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations
A small sample of the code:
namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.
protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();
private:
std::string _word;
public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}
c++ performance multithreading file-system c++17
New contributor
closed as off-topic by vnp, Deduplicator, Jamal♦ Dec 21 at 2:52
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
add a comment |
I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.
Here’s the Problem:
Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.
I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.
The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords
Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?
I base part of my opinion on this:
https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations
A small sample of the code:
namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.
protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();
private:
std::string _word;
public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}
c++ performance multithreading file-system c++17
New contributor
closed as off-topic by vnp, Deduplicator, Jamal♦ Dec 21 at 2:52
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
3
I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49
Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34
add a comment |
I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.
Here’s the Problem:
Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.
I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.
The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords
Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?
I base part of my opinion on this:
https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations
A small sample of the code:
namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.
protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();
private:
std::string _word;
public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}
c++ performance multithreading file-system c++17
New contributor
I need some help. I had an interview with NetApp recently for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. My response from them was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.
Here’s the Problem:
Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel. The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.
I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.
The code is in my GitHub account here. https://github.com/MenaceSan/CountTextWords
Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?
I base part of my opinion on this:
https://stackoverflow.com/questions/902425/does-multithreading-make-sense-for-io-bound-operations
A small sample of the code:
namespace SSFI
{
class cThreadFileReader : public cThreadBase
{
// Read a file on a separate thread.
// Q: We don't bother reading single files on more than one thread at a time. Assume files are serial on a single device. SAN array would make this NOT true.
protected:
void FlushWord();
void ReadFile(const fsx::path& filePath);
virtual void Run();
private:
std::string _word;
public:
cThreadFileReader(cApp& app)
: cThreadBase(app)
{
}
};
}
c++ performance multithreading file-system c++17
c++ performance multithreading file-system c++17
New contributor
New contributor
New contributor
asked Dec 21 at 1:43
Menace
931
931
New contributor
New contributor
closed as off-topic by vnp, Deduplicator, Jamal♦ Dec 21 at 2:52
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
closed as off-topic by vnp, Deduplicator, Jamal♦ Dec 21 at 2:52
This question appears to be off-topic. The users who voted to close gave this specific reason:
- "Authorship of code: Since Code Review is a community where programmers improve their skills through peer review, we require that the code be posted by an author or maintainer of the code, that the code be embedded directly, and that the poster know why the code is written the way it is." – vnp, Deduplicator, Jamal
If this question can be reworded to fit the rules in the help center, please edit the question.
3
I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49
Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34
add a comment |
3
I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49
Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34
3
3
I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49
I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49
Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34
Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
3
I feel your pain. However, in this exchange we only review code directly embedded in the question.
– vnp
Dec 21 at 1:49
Oh, stink, OK, thanks. I reposted it on stackoverflow since I really care more about the concept than the code in this case. (although i like the code too) stackoverflow.com/questions/53878291/…
– Menace
Dec 21 at 2:34