Extract specific positions from a char array












6












$begingroup$


I have two arrays:



char input = "ughIuytLikeretC";


and



bool mask = {
false, false, false, true, false,
false, false, true, true, true,
true, false, false, false, true,
};


My function takes these two arrays and returns the characters in input whose positions are true in mask such that in this example, the result being ILikeC.



#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

char *filtArray(char input, bool mask, char *filtered) {
int j = 0;
int i;
for (i = 0; input[i]; i++) {
filtered[j] = input[i];
j += mask[i];
}

filtered[j] = 0;
return filtered;
}


filtArray will run on billions of "input" strings of constant length and "mask" will be the same for all "input"s.










share|improve this question











$endgroup$












  • $begingroup$
    It's bad practice to write data to a result array that shouldn't be there. You write the contents to filtered regardless of if it should be there or not, then overwrite it. Avoid this. Suppose there is no correct data - you will then corrupt the result array. I would also avoid using booleans for arithmetic, it's quite ugly and hard to read.
    $endgroup$
    – Lundin
    Dec 20 '18 at 15:58










  • $begingroup$
    @Lundin: I'm not clear on what you mean when you say the array "shouldn't be there." Seems to me it should always exist since that result is the point of calling the function in the first place.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:52










  • $begingroup$
    Please fix the indentation in your sample code.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:30










  • $begingroup$
    Is it a hard requirement that mask be an array of booleans? Because that's quite inefficient. You really should be using a binary mask in an integer.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:32
















6












$begingroup$


I have two arrays:



char input = "ughIuytLikeretC";


and



bool mask = {
false, false, false, true, false,
false, false, true, true, true,
true, false, false, false, true,
};


My function takes these two arrays and returns the characters in input whose positions are true in mask such that in this example, the result being ILikeC.



#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

char *filtArray(char input, bool mask, char *filtered) {
int j = 0;
int i;
for (i = 0; input[i]; i++) {
filtered[j] = input[i];
j += mask[i];
}

filtered[j] = 0;
return filtered;
}


filtArray will run on billions of "input" strings of constant length and "mask" will be the same for all "input"s.










share|improve this question











$endgroup$












  • $begingroup$
    It's bad practice to write data to a result array that shouldn't be there. You write the contents to filtered regardless of if it should be there or not, then overwrite it. Avoid this. Suppose there is no correct data - you will then corrupt the result array. I would also avoid using booleans for arithmetic, it's quite ugly and hard to read.
    $endgroup$
    – Lundin
    Dec 20 '18 at 15:58










  • $begingroup$
    @Lundin: I'm not clear on what you mean when you say the array "shouldn't be there." Seems to me it should always exist since that result is the point of calling the function in the first place.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:52










  • $begingroup$
    Please fix the indentation in your sample code.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:30










  • $begingroup$
    Is it a hard requirement that mask be an array of booleans? Because that's quite inefficient. You really should be using a binary mask in an integer.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:32














6












6








6


1



$begingroup$


I have two arrays:



char input = "ughIuytLikeretC";


and



bool mask = {
false, false, false, true, false,
false, false, true, true, true,
true, false, false, false, true,
};


My function takes these two arrays and returns the characters in input whose positions are true in mask such that in this example, the result being ILikeC.



#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

char *filtArray(char input, bool mask, char *filtered) {
int j = 0;
int i;
for (i = 0; input[i]; i++) {
filtered[j] = input[i];
j += mask[i];
}

filtered[j] = 0;
return filtered;
}


filtArray will run on billions of "input" strings of constant length and "mask" will be the same for all "input"s.










share|improve this question











$endgroup$




I have two arrays:



char input = "ughIuytLikeretC";


and



bool mask = {
false, false, false, true, false,
false, false, true, true, true,
true, false, false, false, true,
};


My function takes these two arrays and returns the characters in input whose positions are true in mask such that in this example, the result being ILikeC.



#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>

char *filtArray(char input, bool mask, char *filtered) {
int j = 0;
int i;
for (i = 0; input[i]; i++) {
filtered[j] = input[i];
j += mask[i];
}

filtered[j] = 0;
return filtered;
}


filtArray will run on billions of "input" strings of constant length and "mask" will be the same for all "input"s.







performance beginner c strings






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 21 '18 at 2:57









Jamal

30.3k11116226




30.3k11116226










asked Dec 20 '18 at 13:37









jregaladjregalad

313




313












  • $begingroup$
    It's bad practice to write data to a result array that shouldn't be there. You write the contents to filtered regardless of if it should be there or not, then overwrite it. Avoid this. Suppose there is no correct data - you will then corrupt the result array. I would also avoid using booleans for arithmetic, it's quite ugly and hard to read.
    $endgroup$
    – Lundin
    Dec 20 '18 at 15:58










  • $begingroup$
    @Lundin: I'm not clear on what you mean when you say the array "shouldn't be there." Seems to me it should always exist since that result is the point of calling the function in the first place.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:52










  • $begingroup$
    Please fix the indentation in your sample code.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:30










  • $begingroup$
    Is it a hard requirement that mask be an array of booleans? Because that's quite inefficient. You really should be using a binary mask in an integer.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:32


















  • $begingroup$
    It's bad practice to write data to a result array that shouldn't be there. You write the contents to filtered regardless of if it should be there or not, then overwrite it. Avoid this. Suppose there is no correct data - you will then corrupt the result array. I would also avoid using booleans for arithmetic, it's quite ugly and hard to read.
    $endgroup$
    – Lundin
    Dec 20 '18 at 15:58










  • $begingroup$
    @Lundin: I'm not clear on what you mean when you say the array "shouldn't be there." Seems to me it should always exist since that result is the point of calling the function in the first place.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:52










  • $begingroup$
    Please fix the indentation in your sample code.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:30










  • $begingroup$
    Is it a hard requirement that mask be an array of booleans? Because that's quite inefficient. You really should be using a binary mask in an integer.
    $endgroup$
    – Reinderien
    Dec 20 '18 at 17:32
















$begingroup$
It's bad practice to write data to a result array that shouldn't be there. You write the contents to filtered regardless of if it should be there or not, then overwrite it. Avoid this. Suppose there is no correct data - you will then corrupt the result array. I would also avoid using booleans for arithmetic, it's quite ugly and hard to read.
$endgroup$
– Lundin
Dec 20 '18 at 15:58




$begingroup$
It's bad practice to write data to a result array that shouldn't be there. You write the contents to filtered regardless of if it should be there or not, then overwrite it. Avoid this. Suppose there is no correct data - you will then corrupt the result array. I would also avoid using booleans for arithmetic, it's quite ugly and hard to read.
$endgroup$
– Lundin
Dec 20 '18 at 15:58












$begingroup$
@Lundin: I'm not clear on what you mean when you say the array "shouldn't be there." Seems to me it should always exist since that result is the point of calling the function in the first place.
$endgroup$
– Edward
Dec 20 '18 at 16:52




$begingroup$
@Lundin: I'm not clear on what you mean when you say the array "shouldn't be there." Seems to me it should always exist since that result is the point of calling the function in the first place.
$endgroup$
– Edward
Dec 20 '18 at 16:52












$begingroup$
Please fix the indentation in your sample code.
$endgroup$
– Reinderien
Dec 20 '18 at 17:30




$begingroup$
Please fix the indentation in your sample code.
$endgroup$
– Reinderien
Dec 20 '18 at 17:30












$begingroup$
Is it a hard requirement that mask be an array of booleans? Because that's quite inefficient. You really should be using a binary mask in an integer.
$endgroup$
– Reinderien
Dec 20 '18 at 17:32




$begingroup$
Is it a hard requirement that mask be an array of booleans? Because that's quite inefficient. You really should be using a binary mask in an integer.
$endgroup$
– Reinderien
Dec 20 '18 at 17:32










2 Answers
2






active

oldest

votes


















4












$begingroup$

I see some things that may help you improve your code.



Provide complete code to reviewers



This is not so much a change to the code as a change in how you present it to other people. Without the full context of the code and an example of how to use it, it takes more effort for other people to understand your code. This affects not only code reviews, but also maintenance of the code in the future, by you or by others. One good way to address that is by the use of comments. Another good technique is to include test code showing how your code is intended to be used. Here's the test main I used for your code:



int main() {
const char *input[2] = {
"ughIuytLikeretC",
"xxxExxxdwarxxxd",
};
const bool mask = {
false, false, false, true, false,
false, false, true, true, true,
true, false, false, false, true,
};
char filt[100];
char maskstr[100];
// create the mask string
pmask(mask, maskstr);

printf("Orig: %snMask: %snFilt: %sn", input[0], maskstr, filtArray(input[0], mask, filt));
printf("Orig: %snMask: %snFilt: %sn", input[1], maskstr, filtArray(input[1], mask, filt));
for (int i = 0; i < 10000000; ++i) {
int n = rand() > RAND_MAX/2 ? 1 : 0;
printf("Orig: %snMask: %snFilt: %sn", input[n], maskstr, filtArray(input[n], mask, filt));
}
}


After it applies the function to two strings, it then iterates 10 million times, choosing one or the other test inputs randomly. This is for testing timing.



Use const where practical



The filtArray function does not (and should not) alter either the passed input or mask arrays and so both of those should be declared const.



char *filtArray(const char input, const bool mask, char *filtered) {


Consider bounds checking



If the input strings have already been validated for length, the function you have is OK, but in general, it's good to make sure there is enough room to copy the masked characters. If there isn't enough room, that's the recipe for a buffer overflow vulnerability and must be eliminated, either by the calling routine or by this one.



Consider a custom copy



If the same mask is used for billions of strings, it would probably make sense to do things differently. For example, one alternative might look like this:



#include <string.h>

char *filtArray(const char input, char *filtered) {
memcpy(&filtered[1], &input[7], 4);
filtered[0] = input[3];
filtered[5] = input[14];
filtered[6] = '';
return filtered;
}


Note that the mask is no longer used in this version, because the code has implemented it implicitly. This is less flexible but offers better performance. For 10 million strings on my machine, your original version takes about 1.3 seconds, while the version shown here takes around 1.0 seconds (redirecting the output to /dev/null on a Linux machine).



Use pointers rather than indexing for speed



Pointers are generally a faster way to access elements than using index variables. For example, your filtArray routine could be written like this:



char *filtArray(const char *input, const bool *mask, char *filtered) {
char *beginning = filtered;
for ( ; *input; ++input, ++mask) {
if (*mask) {
*filtered++ = *input;
}
}
*filtered = '';
return beginning;
}


Because you're just beginning, this may seem strange to you, but this kind of use of pointers is a very common idiom in C.



Compilers are good, but not quite that good yet



Because there's a tendency to assume the compiler will take care of it, here's compiler output comparison of the two approaches using gcc for ARM using the on-line compiler explorer: https://godbolt.org/z/Y0TeVX



As can be seen in this case, the generated assembly code for the pointer version is much shorter. Shorter code is usually faster (and it is in this case according to my testing) but not always. For those who are expert in compiler design: The typical improvement is as likely to be the elimination of extra live variables as for the use of pointers per se, but the effect is nonetheless real.



Measured timings



For each of the three variations, original, pointer, and memcpy, here are the measured times for 10 million iterations and the variances of each set of samples and the relative speed measured as the average speed compared with the average speed of the original expressed as a percentage. With no optimization:



$begin{array}{l|c|c|c}
{bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
hline
text{original}&1.344&0.01853&100.00% \
text{pointer}&1.244&0.01193&92.56% \
text{memcpy}&0.998&0.01177&74.26%
end{array}$



With -O2 optimization:



$begin{array}{l|c|c|c}
{bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
hline
text{original}&1.038&0.01462&100.00% \
text{pointer}&1.000&0.00135&96.34% \
text{memcpy}&0.948&0.00692&91.33%
end{array}$



These results were on a 64-bit Linux machine using gcc version 8.2.1. I look forward to seeing other measured timing results. Time is user time as measured by time -f %U (See https://linux.die.net/man/1/time for man page).






share|improve this answer











$endgroup$









  • 4




    $begingroup$
    "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
    $endgroup$
    – Lundin
    Dec 20 '18 at 16:08










  • $begingroup$
    It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:43










  • $begingroup$
    I've added more data and explanation to show why pointers are faster with this code.
    $endgroup$
    – Edward
    Dec 20 '18 at 18:06










  • $begingroup$
    The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
    $endgroup$
    – pseudonym117
    Dec 20 '18 at 19:57










  • $begingroup$
    @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
    $endgroup$
    – Edward
    Dec 20 '18 at 20:34



















1












$begingroup$

The code compiled with no warnings and ran properly on first time which is great.



Let's see what can be improved.



Style



The indentation of the code seems a bit weird. I do not know if this is how it looked originally or if it got broken when it was copied here.



Also, it may be worth adding some documentation describing the inputs you are expecting (in your case, 3 arrays of same size, the first one being 0-terminated).



Do less



You use the mask only to know whether j is to be incremented. Actually, you could rewrite:



j += mask[i];


as



    if (mask[i])
j++;


which is more explicit but less concise.



The real benefic is when you realize than updating filtered can be done only when we have mask[i]. We can write:



    if (mask[i])
{
filtered[j] = input[i];
j++;
}


or the equivalent:



    if (mask[i])
filtered[j++] = input[i];


Null character



Instead of filtered[j] = 0;, you could use the Null Character which is equivalent here but more usual and write: filtered[j] = '';.



Signature



I am not sure if it is really useful to have the filtered value returned as it is already known by the calling function. Also, filterArray may be a better name.



Going further



Instead of definining a mask as an array of boolean, you could provide an array with the positions of the characters you are interested in.



In your case, you'd provide something like: {3, 7, 8, 9, 10, 14 }.



This could be less efficient because we'd perform a smaller number of iterations. Here, we'd iterate over 6 elements instead of 15.



The corresponding mask could be converted manually (which is what I did here) if it is for a known value or you could write a function to pre-process the mask. This seems to be relevant in your case as the same mask is used many times on different inputs.






share|improve this answer









$endgroup$













    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210044%2fextract-specific-positions-from-a-char-array%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    2 Answers
    2






    active

    oldest

    votes








    2 Answers
    2






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    4












    $begingroup$

    I see some things that may help you improve your code.



    Provide complete code to reviewers



    This is not so much a change to the code as a change in how you present it to other people. Without the full context of the code and an example of how to use it, it takes more effort for other people to understand your code. This affects not only code reviews, but also maintenance of the code in the future, by you or by others. One good way to address that is by the use of comments. Another good technique is to include test code showing how your code is intended to be used. Here's the test main I used for your code:



    int main() {
    const char *input[2] = {
    "ughIuytLikeretC",
    "xxxExxxdwarxxxd",
    };
    const bool mask = {
    false, false, false, true, false,
    false, false, true, true, true,
    true, false, false, false, true,
    };
    char filt[100];
    char maskstr[100];
    // create the mask string
    pmask(mask, maskstr);

    printf("Orig: %snMask: %snFilt: %sn", input[0], maskstr, filtArray(input[0], mask, filt));
    printf("Orig: %snMask: %snFilt: %sn", input[1], maskstr, filtArray(input[1], mask, filt));
    for (int i = 0; i < 10000000; ++i) {
    int n = rand() > RAND_MAX/2 ? 1 : 0;
    printf("Orig: %snMask: %snFilt: %sn", input[n], maskstr, filtArray(input[n], mask, filt));
    }
    }


    After it applies the function to two strings, it then iterates 10 million times, choosing one or the other test inputs randomly. This is for testing timing.



    Use const where practical



    The filtArray function does not (and should not) alter either the passed input or mask arrays and so both of those should be declared const.



    char *filtArray(const char input, const bool mask, char *filtered) {


    Consider bounds checking



    If the input strings have already been validated for length, the function you have is OK, but in general, it's good to make sure there is enough room to copy the masked characters. If there isn't enough room, that's the recipe for a buffer overflow vulnerability and must be eliminated, either by the calling routine or by this one.



    Consider a custom copy



    If the same mask is used for billions of strings, it would probably make sense to do things differently. For example, one alternative might look like this:



    #include <string.h>

    char *filtArray(const char input, char *filtered) {
    memcpy(&filtered[1], &input[7], 4);
    filtered[0] = input[3];
    filtered[5] = input[14];
    filtered[6] = '';
    return filtered;
    }


    Note that the mask is no longer used in this version, because the code has implemented it implicitly. This is less flexible but offers better performance. For 10 million strings on my machine, your original version takes about 1.3 seconds, while the version shown here takes around 1.0 seconds (redirecting the output to /dev/null on a Linux machine).



    Use pointers rather than indexing for speed



    Pointers are generally a faster way to access elements than using index variables. For example, your filtArray routine could be written like this:



    char *filtArray(const char *input, const bool *mask, char *filtered) {
    char *beginning = filtered;
    for ( ; *input; ++input, ++mask) {
    if (*mask) {
    *filtered++ = *input;
    }
    }
    *filtered = '';
    return beginning;
    }


    Because you're just beginning, this may seem strange to you, but this kind of use of pointers is a very common idiom in C.



    Compilers are good, but not quite that good yet



    Because there's a tendency to assume the compiler will take care of it, here's compiler output comparison of the two approaches using gcc for ARM using the on-line compiler explorer: https://godbolt.org/z/Y0TeVX



    As can be seen in this case, the generated assembly code for the pointer version is much shorter. Shorter code is usually faster (and it is in this case according to my testing) but not always. For those who are expert in compiler design: The typical improvement is as likely to be the elimination of extra live variables as for the use of pointers per se, but the effect is nonetheless real.



    Measured timings



    For each of the three variations, original, pointer, and memcpy, here are the measured times for 10 million iterations and the variances of each set of samples and the relative speed measured as the average speed compared with the average speed of the original expressed as a percentage. With no optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.344&0.01853&100.00% \
    text{pointer}&1.244&0.01193&92.56% \
    text{memcpy}&0.998&0.01177&74.26%
    end{array}$



    With -O2 optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.038&0.01462&100.00% \
    text{pointer}&1.000&0.00135&96.34% \
    text{memcpy}&0.948&0.00692&91.33%
    end{array}$



    These results were on a 64-bit Linux machine using gcc version 8.2.1. I look forward to seeing other measured timing results. Time is user time as measured by time -f %U (See https://linux.die.net/man/1/time for man page).






    share|improve this answer











    $endgroup$









    • 4




      $begingroup$
      "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
      $endgroup$
      – Lundin
      Dec 20 '18 at 16:08










    • $begingroup$
      It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
      $endgroup$
      – Edward
      Dec 20 '18 at 16:43










    • $begingroup$
      I've added more data and explanation to show why pointers are faster with this code.
      $endgroup$
      – Edward
      Dec 20 '18 at 18:06










    • $begingroup$
      The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
      $endgroup$
      – pseudonym117
      Dec 20 '18 at 19:57










    • $begingroup$
      @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
      $endgroup$
      – Edward
      Dec 20 '18 at 20:34
















    4












    $begingroup$

    I see some things that may help you improve your code.



    Provide complete code to reviewers



    This is not so much a change to the code as a change in how you present it to other people. Without the full context of the code and an example of how to use it, it takes more effort for other people to understand your code. This affects not only code reviews, but also maintenance of the code in the future, by you or by others. One good way to address that is by the use of comments. Another good technique is to include test code showing how your code is intended to be used. Here's the test main I used for your code:



    int main() {
    const char *input[2] = {
    "ughIuytLikeretC",
    "xxxExxxdwarxxxd",
    };
    const bool mask = {
    false, false, false, true, false,
    false, false, true, true, true,
    true, false, false, false, true,
    };
    char filt[100];
    char maskstr[100];
    // create the mask string
    pmask(mask, maskstr);

    printf("Orig: %snMask: %snFilt: %sn", input[0], maskstr, filtArray(input[0], mask, filt));
    printf("Orig: %snMask: %snFilt: %sn", input[1], maskstr, filtArray(input[1], mask, filt));
    for (int i = 0; i < 10000000; ++i) {
    int n = rand() > RAND_MAX/2 ? 1 : 0;
    printf("Orig: %snMask: %snFilt: %sn", input[n], maskstr, filtArray(input[n], mask, filt));
    }
    }


    After it applies the function to two strings, it then iterates 10 million times, choosing one or the other test inputs randomly. This is for testing timing.



    Use const where practical



    The filtArray function does not (and should not) alter either the passed input or mask arrays and so both of those should be declared const.



    char *filtArray(const char input, const bool mask, char *filtered) {


    Consider bounds checking



    If the input strings have already been validated for length, the function you have is OK, but in general, it's good to make sure there is enough room to copy the masked characters. If there isn't enough room, that's the recipe for a buffer overflow vulnerability and must be eliminated, either by the calling routine or by this one.



    Consider a custom copy



    If the same mask is used for billions of strings, it would probably make sense to do things differently. For example, one alternative might look like this:



    #include <string.h>

    char *filtArray(const char input, char *filtered) {
    memcpy(&filtered[1], &input[7], 4);
    filtered[0] = input[3];
    filtered[5] = input[14];
    filtered[6] = '';
    return filtered;
    }


    Note that the mask is no longer used in this version, because the code has implemented it implicitly. This is less flexible but offers better performance. For 10 million strings on my machine, your original version takes about 1.3 seconds, while the version shown here takes around 1.0 seconds (redirecting the output to /dev/null on a Linux machine).



    Use pointers rather than indexing for speed



    Pointers are generally a faster way to access elements than using index variables. For example, your filtArray routine could be written like this:



    char *filtArray(const char *input, const bool *mask, char *filtered) {
    char *beginning = filtered;
    for ( ; *input; ++input, ++mask) {
    if (*mask) {
    *filtered++ = *input;
    }
    }
    *filtered = '';
    return beginning;
    }


    Because you're just beginning, this may seem strange to you, but this kind of use of pointers is a very common idiom in C.



    Compilers are good, but not quite that good yet



    Because there's a tendency to assume the compiler will take care of it, here's compiler output comparison of the two approaches using gcc for ARM using the on-line compiler explorer: https://godbolt.org/z/Y0TeVX



    As can be seen in this case, the generated assembly code for the pointer version is much shorter. Shorter code is usually faster (and it is in this case according to my testing) but not always. For those who are expert in compiler design: The typical improvement is as likely to be the elimination of extra live variables as for the use of pointers per se, but the effect is nonetheless real.



    Measured timings



    For each of the three variations, original, pointer, and memcpy, here are the measured times for 10 million iterations and the variances of each set of samples and the relative speed measured as the average speed compared with the average speed of the original expressed as a percentage. With no optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.344&0.01853&100.00% \
    text{pointer}&1.244&0.01193&92.56% \
    text{memcpy}&0.998&0.01177&74.26%
    end{array}$



    With -O2 optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.038&0.01462&100.00% \
    text{pointer}&1.000&0.00135&96.34% \
    text{memcpy}&0.948&0.00692&91.33%
    end{array}$



    These results were on a 64-bit Linux machine using gcc version 8.2.1. I look forward to seeing other measured timing results. Time is user time as measured by time -f %U (See https://linux.die.net/man/1/time for man page).






    share|improve this answer











    $endgroup$









    • 4




      $begingroup$
      "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
      $endgroup$
      – Lundin
      Dec 20 '18 at 16:08










    • $begingroup$
      It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
      $endgroup$
      – Edward
      Dec 20 '18 at 16:43










    • $begingroup$
      I've added more data and explanation to show why pointers are faster with this code.
      $endgroup$
      – Edward
      Dec 20 '18 at 18:06










    • $begingroup$
      The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
      $endgroup$
      – pseudonym117
      Dec 20 '18 at 19:57










    • $begingroup$
      @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
      $endgroup$
      – Edward
      Dec 20 '18 at 20:34














    4












    4








    4





    $begingroup$

    I see some things that may help you improve your code.



    Provide complete code to reviewers



    This is not so much a change to the code as a change in how you present it to other people. Without the full context of the code and an example of how to use it, it takes more effort for other people to understand your code. This affects not only code reviews, but also maintenance of the code in the future, by you or by others. One good way to address that is by the use of comments. Another good technique is to include test code showing how your code is intended to be used. Here's the test main I used for your code:



    int main() {
    const char *input[2] = {
    "ughIuytLikeretC",
    "xxxExxxdwarxxxd",
    };
    const bool mask = {
    false, false, false, true, false,
    false, false, true, true, true,
    true, false, false, false, true,
    };
    char filt[100];
    char maskstr[100];
    // create the mask string
    pmask(mask, maskstr);

    printf("Orig: %snMask: %snFilt: %sn", input[0], maskstr, filtArray(input[0], mask, filt));
    printf("Orig: %snMask: %snFilt: %sn", input[1], maskstr, filtArray(input[1], mask, filt));
    for (int i = 0; i < 10000000; ++i) {
    int n = rand() > RAND_MAX/2 ? 1 : 0;
    printf("Orig: %snMask: %snFilt: %sn", input[n], maskstr, filtArray(input[n], mask, filt));
    }
    }


    After it applies the function to two strings, it then iterates 10 million times, choosing one or the other test inputs randomly. This is for testing timing.



    Use const where practical



    The filtArray function does not (and should not) alter either the passed input or mask arrays and so both of those should be declared const.



    char *filtArray(const char input, const bool mask, char *filtered) {


    Consider bounds checking



    If the input strings have already been validated for length, the function you have is OK, but in general, it's good to make sure there is enough room to copy the masked characters. If there isn't enough room, that's the recipe for a buffer overflow vulnerability and must be eliminated, either by the calling routine or by this one.



    Consider a custom copy



    If the same mask is used for billions of strings, it would probably make sense to do things differently. For example, one alternative might look like this:



    #include <string.h>

    char *filtArray(const char input, char *filtered) {
    memcpy(&filtered[1], &input[7], 4);
    filtered[0] = input[3];
    filtered[5] = input[14];
    filtered[6] = '';
    return filtered;
    }


    Note that the mask is no longer used in this version, because the code has implemented it implicitly. This is less flexible but offers better performance. For 10 million strings on my machine, your original version takes about 1.3 seconds, while the version shown here takes around 1.0 seconds (redirecting the output to /dev/null on a Linux machine).



    Use pointers rather than indexing for speed



    Pointers are generally a faster way to access elements than using index variables. For example, your filtArray routine could be written like this:



    char *filtArray(const char *input, const bool *mask, char *filtered) {
    char *beginning = filtered;
    for ( ; *input; ++input, ++mask) {
    if (*mask) {
    *filtered++ = *input;
    }
    }
    *filtered = '';
    return beginning;
    }


    Because you're just beginning, this may seem strange to you, but this kind of use of pointers is a very common idiom in C.



    Compilers are good, but not quite that good yet



    Because there's a tendency to assume the compiler will take care of it, here's compiler output comparison of the two approaches using gcc for ARM using the on-line compiler explorer: https://godbolt.org/z/Y0TeVX



    As can be seen in this case, the generated assembly code for the pointer version is much shorter. Shorter code is usually faster (and it is in this case according to my testing) but not always. For those who are expert in compiler design: The typical improvement is as likely to be the elimination of extra live variables as for the use of pointers per se, but the effect is nonetheless real.



    Measured timings



    For each of the three variations, original, pointer, and memcpy, here are the measured times for 10 million iterations and the variances of each set of samples and the relative speed measured as the average speed compared with the average speed of the original expressed as a percentage. With no optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.344&0.01853&100.00% \
    text{pointer}&1.244&0.01193&92.56% \
    text{memcpy}&0.998&0.01177&74.26%
    end{array}$



    With -O2 optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.038&0.01462&100.00% \
    text{pointer}&1.000&0.00135&96.34% \
    text{memcpy}&0.948&0.00692&91.33%
    end{array}$



    These results were on a 64-bit Linux machine using gcc version 8.2.1. I look forward to seeing other measured timing results. Time is user time as measured by time -f %U (See https://linux.die.net/man/1/time for man page).






    share|improve this answer











    $endgroup$



    I see some things that may help you improve your code.



    Provide complete code to reviewers



    This is not so much a change to the code as a change in how you present it to other people. Without the full context of the code and an example of how to use it, it takes more effort for other people to understand your code. This affects not only code reviews, but also maintenance of the code in the future, by you or by others. One good way to address that is by the use of comments. Another good technique is to include test code showing how your code is intended to be used. Here's the test main I used for your code:



    int main() {
    const char *input[2] = {
    "ughIuytLikeretC",
    "xxxExxxdwarxxxd",
    };
    const bool mask = {
    false, false, false, true, false,
    false, false, true, true, true,
    true, false, false, false, true,
    };
    char filt[100];
    char maskstr[100];
    // create the mask string
    pmask(mask, maskstr);

    printf("Orig: %snMask: %snFilt: %sn", input[0], maskstr, filtArray(input[0], mask, filt));
    printf("Orig: %snMask: %snFilt: %sn", input[1], maskstr, filtArray(input[1], mask, filt));
    for (int i = 0; i < 10000000; ++i) {
    int n = rand() > RAND_MAX/2 ? 1 : 0;
    printf("Orig: %snMask: %snFilt: %sn", input[n], maskstr, filtArray(input[n], mask, filt));
    }
    }


    After it applies the function to two strings, it then iterates 10 million times, choosing one or the other test inputs randomly. This is for testing timing.



    Use const where practical



    The filtArray function does not (and should not) alter either the passed input or mask arrays and so both of those should be declared const.



    char *filtArray(const char input, const bool mask, char *filtered) {


    Consider bounds checking



    If the input strings have already been validated for length, the function you have is OK, but in general, it's good to make sure there is enough room to copy the masked characters. If there isn't enough room, that's the recipe for a buffer overflow vulnerability and must be eliminated, either by the calling routine or by this one.



    Consider a custom copy



    If the same mask is used for billions of strings, it would probably make sense to do things differently. For example, one alternative might look like this:



    #include <string.h>

    char *filtArray(const char input, char *filtered) {
    memcpy(&filtered[1], &input[7], 4);
    filtered[0] = input[3];
    filtered[5] = input[14];
    filtered[6] = '';
    return filtered;
    }


    Note that the mask is no longer used in this version, because the code has implemented it implicitly. This is less flexible but offers better performance. For 10 million strings on my machine, your original version takes about 1.3 seconds, while the version shown here takes around 1.0 seconds (redirecting the output to /dev/null on a Linux machine).



    Use pointers rather than indexing for speed



    Pointers are generally a faster way to access elements than using index variables. For example, your filtArray routine could be written like this:



    char *filtArray(const char *input, const bool *mask, char *filtered) {
    char *beginning = filtered;
    for ( ; *input; ++input, ++mask) {
    if (*mask) {
    *filtered++ = *input;
    }
    }
    *filtered = '';
    return beginning;
    }


    Because you're just beginning, this may seem strange to you, but this kind of use of pointers is a very common idiom in C.



    Compilers are good, but not quite that good yet



    Because there's a tendency to assume the compiler will take care of it, here's compiler output comparison of the two approaches using gcc for ARM using the on-line compiler explorer: https://godbolt.org/z/Y0TeVX



    As can be seen in this case, the generated assembly code for the pointer version is much shorter. Shorter code is usually faster (and it is in this case according to my testing) but not always. For those who are expert in compiler design: The typical improvement is as likely to be the elimination of extra live variables as for the use of pointers per se, but the effect is nonetheless real.



    Measured timings



    For each of the three variations, original, pointer, and memcpy, here are the measured times for 10 million iterations and the variances of each set of samples and the relative speed measured as the average speed compared with the average speed of the original expressed as a percentage. With no optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.344&0.01853&100.00% \
    text{pointer}&1.244&0.01193&92.56% \
    text{memcpy}&0.998&0.01177&74.26%
    end{array}$



    With -O2 optimization:



    $begin{array}{l|c|c|c}
    {bf name}&{bf avg (s)}&{bf var (s)}&{bf relative}\
    hline
    text{original}&1.038&0.01462&100.00% \
    text{pointer}&1.000&0.00135&96.34% \
    text{memcpy}&0.948&0.00692&91.33%
    end{array}$



    These results were on a 64-bit Linux machine using gcc version 8.2.1. I look forward to seeing other measured timing results. Time is user time as measured by time -f %U (See https://linux.die.net/man/1/time for man page).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Dec 21 '18 at 19:44

























    answered Dec 20 '18 at 14:36









    EdwardEdward

    46.7k378212




    46.7k378212








    • 4




      $begingroup$
      "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
      $endgroup$
      – Lundin
      Dec 20 '18 at 16:08










    • $begingroup$
      It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
      $endgroup$
      – Edward
      Dec 20 '18 at 16:43










    • $begingroup$
      I've added more data and explanation to show why pointers are faster with this code.
      $endgroup$
      – Edward
      Dec 20 '18 at 18:06










    • $begingroup$
      The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
      $endgroup$
      – pseudonym117
      Dec 20 '18 at 19:57










    • $begingroup$
      @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
      $endgroup$
      – Edward
      Dec 20 '18 at 20:34














    • 4




      $begingroup$
      "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
      $endgroup$
      – Lundin
      Dec 20 '18 at 16:08










    • $begingroup$
      It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
      $endgroup$
      – Edward
      Dec 20 '18 at 16:43










    • $begingroup$
      I've added more data and explanation to show why pointers are faster with this code.
      $endgroup$
      – Edward
      Dec 20 '18 at 18:06










    • $begingroup$
      The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
      $endgroup$
      – pseudonym117
      Dec 20 '18 at 19:57










    • $begingroup$
      @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
      $endgroup$
      – Edward
      Dec 20 '18 at 20:34








    4




    4




    $begingroup$
    "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
    $endgroup$
    – Lundin
    Dec 20 '18 at 16:08




    $begingroup$
    "Pointers are generally a faster way to access elements than using index variables." This is very subjective. The opposite may as well be true, depending on system. We should not replace indexing with pointer arithmetic unless we have very good reasons - doing so for the sake of performance is pre-mature optimization. To truly optimize for speed, it might be better to drop the bool array in favour of true bit masks.
    $endgroup$
    – Lundin
    Dec 20 '18 at 16:08












    $begingroup$
    It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:43




    $begingroup$
    It's not actually subjective, but based on measurement and experience. On my machine it's faster, and for many embedded systems compilers (which is what I use often) it's generally faster. But what matters is whether it's faster for the author of the code. Only measurement on that system and with real data will tell whether it's faster or not.
    $endgroup$
    – Edward
    Dec 20 '18 at 16:43












    $begingroup$
    I've added more data and explanation to show why pointers are faster with this code.
    $endgroup$
    – Edward
    Dec 20 '18 at 18:06




    $begingroup$
    I've added more data and explanation to show why pointers are faster with this code.
    $endgroup$
    – Edward
    Dec 20 '18 at 18:06












    $begingroup$
    The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
    $endgroup$
    – pseudonym117
    Dec 20 '18 at 19:57




    $begingroup$
    The resulting assembly is still very dependent on compiler as well. Switching to using the intel compiler on godbolt results in indexing generating shorter assembly.
    $endgroup$
    – pseudonym117
    Dec 20 '18 at 19:57












    $begingroup$
    @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
    $endgroup$
    – Edward
    Dec 20 '18 at 20:34




    $begingroup$
    @pseudonym117 At the risk of stating the obvious, the output assembly is always dependent on the compiler. In the example you mentioned, while there are fewer instructions, the assembled code is both longer and slower with the indexed version vs the pointer version. (42 bytes vs. 38 bytes for pointer version). godbolt.org/z/Sr1Bsq It's another example of why we must measure rather than guess.
    $endgroup$
    – Edward
    Dec 20 '18 at 20:34













    1












    $begingroup$

    The code compiled with no warnings and ran properly on first time which is great.



    Let's see what can be improved.



    Style



    The indentation of the code seems a bit weird. I do not know if this is how it looked originally or if it got broken when it was copied here.



    Also, it may be worth adding some documentation describing the inputs you are expecting (in your case, 3 arrays of same size, the first one being 0-terminated).



    Do less



    You use the mask only to know whether j is to be incremented. Actually, you could rewrite:



    j += mask[i];


    as



        if (mask[i])
    j++;


    which is more explicit but less concise.



    The real benefic is when you realize than updating filtered can be done only when we have mask[i]. We can write:



        if (mask[i])
    {
    filtered[j] = input[i];
    j++;
    }


    or the equivalent:



        if (mask[i])
    filtered[j++] = input[i];


    Null character



    Instead of filtered[j] = 0;, you could use the Null Character which is equivalent here but more usual and write: filtered[j] = '';.



    Signature



    I am not sure if it is really useful to have the filtered value returned as it is already known by the calling function. Also, filterArray may be a better name.



    Going further



    Instead of definining a mask as an array of boolean, you could provide an array with the positions of the characters you are interested in.



    In your case, you'd provide something like: {3, 7, 8, 9, 10, 14 }.



    This could be less efficient because we'd perform a smaller number of iterations. Here, we'd iterate over 6 elements instead of 15.



    The corresponding mask could be converted manually (which is what I did here) if it is for a known value or you could write a function to pre-process the mask. This seems to be relevant in your case as the same mask is used many times on different inputs.






    share|improve this answer









    $endgroup$


















      1












      $begingroup$

      The code compiled with no warnings and ran properly on first time which is great.



      Let's see what can be improved.



      Style



      The indentation of the code seems a bit weird. I do not know if this is how it looked originally or if it got broken when it was copied here.



      Also, it may be worth adding some documentation describing the inputs you are expecting (in your case, 3 arrays of same size, the first one being 0-terminated).



      Do less



      You use the mask only to know whether j is to be incremented. Actually, you could rewrite:



      j += mask[i];


      as



          if (mask[i])
      j++;


      which is more explicit but less concise.



      The real benefic is when you realize than updating filtered can be done only when we have mask[i]. We can write:



          if (mask[i])
      {
      filtered[j] = input[i];
      j++;
      }


      or the equivalent:



          if (mask[i])
      filtered[j++] = input[i];


      Null character



      Instead of filtered[j] = 0;, you could use the Null Character which is equivalent here but more usual and write: filtered[j] = '';.



      Signature



      I am not sure if it is really useful to have the filtered value returned as it is already known by the calling function. Also, filterArray may be a better name.



      Going further



      Instead of definining a mask as an array of boolean, you could provide an array with the positions of the characters you are interested in.



      In your case, you'd provide something like: {3, 7, 8, 9, 10, 14 }.



      This could be less efficient because we'd perform a smaller number of iterations. Here, we'd iterate over 6 elements instead of 15.



      The corresponding mask could be converted manually (which is what I did here) if it is for a known value or you could write a function to pre-process the mask. This seems to be relevant in your case as the same mask is used many times on different inputs.






      share|improve this answer









      $endgroup$
















        1












        1








        1





        $begingroup$

        The code compiled with no warnings and ran properly on first time which is great.



        Let's see what can be improved.



        Style



        The indentation of the code seems a bit weird. I do not know if this is how it looked originally or if it got broken when it was copied here.



        Also, it may be worth adding some documentation describing the inputs you are expecting (in your case, 3 arrays of same size, the first one being 0-terminated).



        Do less



        You use the mask only to know whether j is to be incremented. Actually, you could rewrite:



        j += mask[i];


        as



            if (mask[i])
        j++;


        which is more explicit but less concise.



        The real benefic is when you realize than updating filtered can be done only when we have mask[i]. We can write:



            if (mask[i])
        {
        filtered[j] = input[i];
        j++;
        }


        or the equivalent:



            if (mask[i])
        filtered[j++] = input[i];


        Null character



        Instead of filtered[j] = 0;, you could use the Null Character which is equivalent here but more usual and write: filtered[j] = '';.



        Signature



        I am not sure if it is really useful to have the filtered value returned as it is already known by the calling function. Also, filterArray may be a better name.



        Going further



        Instead of definining a mask as an array of boolean, you could provide an array with the positions of the characters you are interested in.



        In your case, you'd provide something like: {3, 7, 8, 9, 10, 14 }.



        This could be less efficient because we'd perform a smaller number of iterations. Here, we'd iterate over 6 elements instead of 15.



        The corresponding mask could be converted manually (which is what I did here) if it is for a known value or you could write a function to pre-process the mask. This seems to be relevant in your case as the same mask is used many times on different inputs.






        share|improve this answer









        $endgroup$



        The code compiled with no warnings and ran properly on first time which is great.



        Let's see what can be improved.



        Style



        The indentation of the code seems a bit weird. I do not know if this is how it looked originally or if it got broken when it was copied here.



        Also, it may be worth adding some documentation describing the inputs you are expecting (in your case, 3 arrays of same size, the first one being 0-terminated).



        Do less



        You use the mask only to know whether j is to be incremented. Actually, you could rewrite:



        j += mask[i];


        as



            if (mask[i])
        j++;


        which is more explicit but less concise.



        The real benefic is when you realize than updating filtered can be done only when we have mask[i]. We can write:



            if (mask[i])
        {
        filtered[j] = input[i];
        j++;
        }


        or the equivalent:



            if (mask[i])
        filtered[j++] = input[i];


        Null character



        Instead of filtered[j] = 0;, you could use the Null Character which is equivalent here but more usual and write: filtered[j] = '';.



        Signature



        I am not sure if it is really useful to have the filtered value returned as it is already known by the calling function. Also, filterArray may be a better name.



        Going further



        Instead of definining a mask as an array of boolean, you could provide an array with the positions of the characters you are interested in.



        In your case, you'd provide something like: {3, 7, 8, 9, 10, 14 }.



        This could be less efficient because we'd perform a smaller number of iterations. Here, we'd iterate over 6 elements instead of 15.



        The corresponding mask could be converted manually (which is what I did here) if it is for a known value or you could write a function to pre-process the mask. This seems to be relevant in your case as the same mask is used many times on different inputs.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 20 '18 at 14:19









        JosayJosay

        25.7k14087




        25.7k14087






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210044%2fextract-specific-positions-from-a-char-array%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Сан-Квентин

            8-я гвардейская общевойсковая армия

            Алькесар