Processing a large list of strings












3














I am working on an application in C# WPF which reads a large file containing a number of HL7 formatted reports. My application takes in the file, reads and extracts any lines that starts with OBX and stores it into a List. It then tries to extract report headers from the each line, if one exists, based on a handle-full of rules:




  1. Ends with a ':'

  2. Is in all caps

  3. Is less then 6 words (not include words in brackets)

  4. Contains more 4 characters

  5. May be on its own in a line or embedded into the content of the string (always at the start)


I have the algorithm down and it works, but I am dealing with files which can contain an upward of a million lines. My initial design took about 10-15 minutes to read and process around 1 million lines. Through hours of research, I was able to optimize the code a bit, bring it to about a few minutes. However, I am hoping to optimize it even further in order to reduce the time it takes for the app to process the lines. This is where I need some help as I do not know what I can do further improve the performance of my code.



I was able to narrow down the bottleneck to this method which does the header extraction from the string collected. Below is the most recent version of my method and is as optimized as I can get it (hopefully it will be better with your help):



    private List<string> GetHeader(List<string> FileLines)
{
List<string> headers = new List<string>();
foreach (string line in FileLines)
{
string header = string.Empty;
//Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2} d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
{
continue;
}

string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
if (line.IndexOf(':') != -1)
{
string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
{
headers.Add(line.Substring(0, line.IndexOf(':') + 1));
continue;
}
}

//Checks if a string is larger then 5 words (not including brackets)
if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
continue;
//Checks if the string is in all CAPS
char letter = nobrackets.ToCharArray();

if(letter.All(l => char.IsUpper(l))){
headers.Add(line);
continue;
}

//Checks if the string is 5 words or less
string temp = Regex.Replace(line, @"(.*?)", string.Empty, RegexOptions.Compiled);
if (temp.Split(' ').Length < 6)
{
headers.Add(line);
}

//Checks for an all caps header embedded in a string
bool caps = true;
string word = line.Split(' ');
int lastCapWordIndex = 0;
for (int i = 0; i < word.Length && caps; i++)
{
char char_array = word[i].ToCharArray();

if (!letter.All(l => char.IsUpper(l)))
{
caps = false;
continue;
}
if (caps)
lastCapWordIndex++;
}
if (lastCapWordIndex > 0)
{
for (int i = 0; i < lastCapWordIndex; i++)
{
header += " " + word[i];
}
headers.Add(header.Trim());
continue;
}
}

//final check for string with less then 4 characters
string tempH = headers.ToArray();
headers = new List<string>();
foreach (string h in tempH)
{
if (h.Length > 4)
{
headers.Add(h);
}
}
return headers;
}









share|improve this question









New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • You have many things you can improve here,, most of them with a marginal impacts on performance but one thing catched my eyes: all those regex. Can't you create a single static regex and use it for them all?
    – Adriano Repetti
    Dec 25 at 9:54










  • It would be great, if you could provide some typical example lines and the expected output for each of them - both valid an invalid formats. Further we need to see, how you actually calls the method - in other words: we need some context in order to fully understand what you are doing.
    – Henrik Hansen
    Dec 25 at 12:01












  • @AdrianoRepetti I initially had one large regex which used OR to determine if it matched on case or another. But I found that it ran slower then using many smaller regex. Maybe I was doing something wrong? For example, I would have @"w{2,4}[/-]w{2,3}[/-]w{2,4}|^w+, w{2} d{5}-{0,1}d{0,5}". But this ran a lot slower then making it two regex.
    – k-Rocker
    Dec 25 at 20:14










  • @HenrikHansen I have added an example to my question, please see the edit at the bottom. Hopefully this helps.
    – k-Rocker
    Dec 25 at 20:16










  • @k-Rocker: I don't see the change?
    – Henrik Hansen
    Dec 26 at 6:27
















3














I am working on an application in C# WPF which reads a large file containing a number of HL7 formatted reports. My application takes in the file, reads and extracts any lines that starts with OBX and stores it into a List. It then tries to extract report headers from the each line, if one exists, based on a handle-full of rules:




  1. Ends with a ':'

  2. Is in all caps

  3. Is less then 6 words (not include words in brackets)

  4. Contains more 4 characters

  5. May be on its own in a line or embedded into the content of the string (always at the start)


I have the algorithm down and it works, but I am dealing with files which can contain an upward of a million lines. My initial design took about 10-15 minutes to read and process around 1 million lines. Through hours of research, I was able to optimize the code a bit, bring it to about a few minutes. However, I am hoping to optimize it even further in order to reduce the time it takes for the app to process the lines. This is where I need some help as I do not know what I can do further improve the performance of my code.



I was able to narrow down the bottleneck to this method which does the header extraction from the string collected. Below is the most recent version of my method and is as optimized as I can get it (hopefully it will be better with your help):



    private List<string> GetHeader(List<string> FileLines)
{
List<string> headers = new List<string>();
foreach (string line in FileLines)
{
string header = string.Empty;
//Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2} d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
{
continue;
}

string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
if (line.IndexOf(':') != -1)
{
string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
{
headers.Add(line.Substring(0, line.IndexOf(':') + 1));
continue;
}
}

//Checks if a string is larger then 5 words (not including brackets)
if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
continue;
//Checks if the string is in all CAPS
char letter = nobrackets.ToCharArray();

if(letter.All(l => char.IsUpper(l))){
headers.Add(line);
continue;
}

//Checks if the string is 5 words or less
string temp = Regex.Replace(line, @"(.*?)", string.Empty, RegexOptions.Compiled);
if (temp.Split(' ').Length < 6)
{
headers.Add(line);
}

//Checks for an all caps header embedded in a string
bool caps = true;
string word = line.Split(' ');
int lastCapWordIndex = 0;
for (int i = 0; i < word.Length && caps; i++)
{
char char_array = word[i].ToCharArray();

if (!letter.All(l => char.IsUpper(l)))
{
caps = false;
continue;
}
if (caps)
lastCapWordIndex++;
}
if (lastCapWordIndex > 0)
{
for (int i = 0; i < lastCapWordIndex; i++)
{
header += " " + word[i];
}
headers.Add(header.Trim());
continue;
}
}

//final check for string with less then 4 characters
string tempH = headers.ToArray();
headers = new List<string>();
foreach (string h in tempH)
{
if (h.Length > 4)
{
headers.Add(h);
}
}
return headers;
}









share|improve this question









New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.




















  • You have many things you can improve here,, most of them with a marginal impacts on performance but one thing catched my eyes: all those regex. Can't you create a single static regex and use it for them all?
    – Adriano Repetti
    Dec 25 at 9:54










  • It would be great, if you could provide some typical example lines and the expected output for each of them - both valid an invalid formats. Further we need to see, how you actually calls the method - in other words: we need some context in order to fully understand what you are doing.
    – Henrik Hansen
    Dec 25 at 12:01












  • @AdrianoRepetti I initially had one large regex which used OR to determine if it matched on case or another. But I found that it ran slower then using many smaller regex. Maybe I was doing something wrong? For example, I would have @"w{2,4}[/-]w{2,3}[/-]w{2,4}|^w+, w{2} d{5}-{0,1}d{0,5}". But this ran a lot slower then making it two regex.
    – k-Rocker
    Dec 25 at 20:14










  • @HenrikHansen I have added an example to my question, please see the edit at the bottom. Hopefully this helps.
    – k-Rocker
    Dec 25 at 20:16










  • @k-Rocker: I don't see the change?
    – Henrik Hansen
    Dec 26 at 6:27














3












3








3


0





I am working on an application in C# WPF which reads a large file containing a number of HL7 formatted reports. My application takes in the file, reads and extracts any lines that starts with OBX and stores it into a List. It then tries to extract report headers from the each line, if one exists, based on a handle-full of rules:




  1. Ends with a ':'

  2. Is in all caps

  3. Is less then 6 words (not include words in brackets)

  4. Contains more 4 characters

  5. May be on its own in a line or embedded into the content of the string (always at the start)


I have the algorithm down and it works, but I am dealing with files which can contain an upward of a million lines. My initial design took about 10-15 minutes to read and process around 1 million lines. Through hours of research, I was able to optimize the code a bit, bring it to about a few minutes. However, I am hoping to optimize it even further in order to reduce the time it takes for the app to process the lines. This is where I need some help as I do not know what I can do further improve the performance of my code.



I was able to narrow down the bottleneck to this method which does the header extraction from the string collected. Below is the most recent version of my method and is as optimized as I can get it (hopefully it will be better with your help):



    private List<string> GetHeader(List<string> FileLines)
{
List<string> headers = new List<string>();
foreach (string line in FileLines)
{
string header = string.Empty;
//Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2} d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
{
continue;
}

string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
if (line.IndexOf(':') != -1)
{
string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
{
headers.Add(line.Substring(0, line.IndexOf(':') + 1));
continue;
}
}

//Checks if a string is larger then 5 words (not including brackets)
if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
continue;
//Checks if the string is in all CAPS
char letter = nobrackets.ToCharArray();

if(letter.All(l => char.IsUpper(l))){
headers.Add(line);
continue;
}

//Checks if the string is 5 words or less
string temp = Regex.Replace(line, @"(.*?)", string.Empty, RegexOptions.Compiled);
if (temp.Split(' ').Length < 6)
{
headers.Add(line);
}

//Checks for an all caps header embedded in a string
bool caps = true;
string word = line.Split(' ');
int lastCapWordIndex = 0;
for (int i = 0; i < word.Length && caps; i++)
{
char char_array = word[i].ToCharArray();

if (!letter.All(l => char.IsUpper(l)))
{
caps = false;
continue;
}
if (caps)
lastCapWordIndex++;
}
if (lastCapWordIndex > 0)
{
for (int i = 0; i < lastCapWordIndex; i++)
{
header += " " + word[i];
}
headers.Add(header.Trim());
continue;
}
}

//final check for string with less then 4 characters
string tempH = headers.ToArray();
headers = new List<string>();
foreach (string h in tempH)
{
if (h.Length > 4)
{
headers.Add(h);
}
}
return headers;
}









share|improve this question









New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I am working on an application in C# WPF which reads a large file containing a number of HL7 formatted reports. My application takes in the file, reads and extracts any lines that starts with OBX and stores it into a List. It then tries to extract report headers from the each line, if one exists, based on a handle-full of rules:




  1. Ends with a ':'

  2. Is in all caps

  3. Is less then 6 words (not include words in brackets)

  4. Contains more 4 characters

  5. May be on its own in a line or embedded into the content of the string (always at the start)


I have the algorithm down and it works, but I am dealing with files which can contain an upward of a million lines. My initial design took about 10-15 minutes to read and process around 1 million lines. Through hours of research, I was able to optimize the code a bit, bring it to about a few minutes. However, I am hoping to optimize it even further in order to reduce the time it takes for the app to process the lines. This is where I need some help as I do not know what I can do further improve the performance of my code.



I was able to narrow down the bottleneck to this method which does the header extraction from the string collected. Below is the most recent version of my method and is as optimized as I can get it (hopefully it will be better with your help):



    private List<string> GetHeader(List<string> FileLines)
{
List<string> headers = new List<string>();
foreach (string line in FileLines)
{
string header = string.Empty;
//Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2} d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
{
continue;
}

string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
if (line.IndexOf(':') != -1)
{
string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
{
headers.Add(line.Substring(0, line.IndexOf(':') + 1));
continue;
}
}

//Checks if a string is larger then 5 words (not including brackets)
if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
continue;
//Checks if the string is in all CAPS
char letter = nobrackets.ToCharArray();

if(letter.All(l => char.IsUpper(l))){
headers.Add(line);
continue;
}

//Checks if the string is 5 words or less
string temp = Regex.Replace(line, @"(.*?)", string.Empty, RegexOptions.Compiled);
if (temp.Split(' ').Length < 6)
{
headers.Add(line);
}

//Checks for an all caps header embedded in a string
bool caps = true;
string word = line.Split(' ');
int lastCapWordIndex = 0;
for (int i = 0; i < word.Length && caps; i++)
{
char char_array = word[i].ToCharArray();

if (!letter.All(l => char.IsUpper(l)))
{
caps = false;
continue;
}
if (caps)
lastCapWordIndex++;
}
if (lastCapWordIndex > 0)
{
for (int i = 0; i < lastCapWordIndex; i++)
{
header += " " + word[i];
}
headers.Add(header.Trim());
continue;
}
}

//final check for string with less then 4 characters
string tempH = headers.ToArray();
headers = new List<string>();
foreach (string h in tempH)
{
if (h.Length > 4)
{
headers.Add(h);
}
}
return headers;
}






c# performance regex wpf






share|improve this question









New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited Dec 25 at 8:08









Jamal

30.3k11116226




30.3k11116226






New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked Dec 25 at 5:37









ShandowViper18

211




211




New contributor




ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






ShandowViper18 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












  • You have many things you can improve here,, most of them with a marginal impacts on performance but one thing catched my eyes: all those regex. Can't you create a single static regex and use it for them all?
    – Adriano Repetti
    Dec 25 at 9:54










  • It would be great, if you could provide some typical example lines and the expected output for each of them - both valid an invalid formats. Further we need to see, how you actually calls the method - in other words: we need some context in order to fully understand what you are doing.
    – Henrik Hansen
    Dec 25 at 12:01












  • @AdrianoRepetti I initially had one large regex which used OR to determine if it matched on case or another. But I found that it ran slower then using many smaller regex. Maybe I was doing something wrong? For example, I would have @"w{2,4}[/-]w{2,3}[/-]w{2,4}|^w+, w{2} d{5}-{0,1}d{0,5}". But this ran a lot slower then making it two regex.
    – k-Rocker
    Dec 25 at 20:14










  • @HenrikHansen I have added an example to my question, please see the edit at the bottom. Hopefully this helps.
    – k-Rocker
    Dec 25 at 20:16










  • @k-Rocker: I don't see the change?
    – Henrik Hansen
    Dec 26 at 6:27


















  • You have many things you can improve here,, most of them with a marginal impacts on performance but one thing catched my eyes: all those regex. Can't you create a single static regex and use it for them all?
    – Adriano Repetti
    Dec 25 at 9:54










  • It would be great, if you could provide some typical example lines and the expected output for each of them - both valid an invalid formats. Further we need to see, how you actually calls the method - in other words: we need some context in order to fully understand what you are doing.
    – Henrik Hansen
    Dec 25 at 12:01












  • @AdrianoRepetti I initially had one large regex which used OR to determine if it matched on case or another. But I found that it ran slower then using many smaller regex. Maybe I was doing something wrong? For example, I would have @"w{2,4}[/-]w{2,3}[/-]w{2,4}|^w+, w{2} d{5}-{0,1}d{0,5}". But this ran a lot slower then making it two regex.
    – k-Rocker
    Dec 25 at 20:14










  • @HenrikHansen I have added an example to my question, please see the edit at the bottom. Hopefully this helps.
    – k-Rocker
    Dec 25 at 20:16










  • @k-Rocker: I don't see the change?
    – Henrik Hansen
    Dec 26 at 6:27
















You have many things you can improve here,, most of them with a marginal impacts on performance but one thing catched my eyes: all those regex. Can't you create a single static regex and use it for them all?
– Adriano Repetti
Dec 25 at 9:54




You have many things you can improve here,, most of them with a marginal impacts on performance but one thing catched my eyes: all those regex. Can't you create a single static regex and use it for them all?
– Adriano Repetti
Dec 25 at 9:54












It would be great, if you could provide some typical example lines and the expected output for each of them - both valid an invalid formats. Further we need to see, how you actually calls the method - in other words: we need some context in order to fully understand what you are doing.
– Henrik Hansen
Dec 25 at 12:01






It would be great, if you could provide some typical example lines and the expected output for each of them - both valid an invalid formats. Further we need to see, how you actually calls the method - in other words: we need some context in order to fully understand what you are doing.
– Henrik Hansen
Dec 25 at 12:01














@AdrianoRepetti I initially had one large regex which used OR to determine if it matched on case or another. But I found that it ran slower then using many smaller regex. Maybe I was doing something wrong? For example, I would have @"w{2,4}[/-]w{2,3}[/-]w{2,4}|^w+, w{2} d{5}-{0,1}d{0,5}". But this ran a lot slower then making it two regex.
– k-Rocker
Dec 25 at 20:14




@AdrianoRepetti I initially had one large regex which used OR to determine if it matched on case or another. But I found that it ran slower then using many smaller regex. Maybe I was doing something wrong? For example, I would have @"w{2,4}[/-]w{2,3}[/-]w{2,4}|^w+, w{2} d{5}-{0,1}d{0,5}". But this ran a lot slower then making it two regex.
– k-Rocker
Dec 25 at 20:14












@HenrikHansen I have added an example to my question, please see the edit at the bottom. Hopefully this helps.
– k-Rocker
Dec 25 at 20:16




@HenrikHansen I have added an example to my question, please see the edit at the bottom. Hopefully this helps.
– k-Rocker
Dec 25 at 20:16












@k-Rocker: I don't see the change?
– Henrik Hansen
Dec 26 at 6:27




@k-Rocker: I don't see the change?
– Henrik Hansen
Dec 26 at 6:27










1 Answer
1






active

oldest

votes


















3














Let us first check the overall style of that method.




  • The name of the method doesn't match the return type. The method is named GetHeader but it returns a List<string> hence GetHeaders would be a better name.

  • Based on the .NET Naming Guidelines method-parameters should be named using camelCase casing hence FileLines should be fileLines.

  • If the type of the right-hand-side of an assignment is obvious one should use var instead of the concrete type.


  • Stick to one coding style. Currently you are mixing styles in that method. Sometimes you place the opening braces { on the next line and sometimes you place it on the same line. Sometimes you use braces {} for single-line if statements and sometimes you don't. Omitting braces for single-line if statements should be avoided. Omitting barces can lead to hidden and therefor hard to find bugs.




Now let's dig into the code.



This




//Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2}d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
{
continue;
}



can be removed completely because it will always evaluate to false.





The regexes you use for replacements and matching should be extracted to private static fields like e.g



private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);


and used like so



string nobrackets = noBracketsRegex.Replace(line, string.Empty);




This




string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
if (line.IndexOf(':') != -1)
{
string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
{
headers.Add(line.Substring(0, line.IndexOf(':') + 1));
continue;
}
}

//Checks if a string is larger then 5 words (not including brackets)
if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
continue;



should be reorderd. You do the Regex.Replace() althought it could be possible that the most inner if condition could be true. You should store the result of line.IndexOf(':') in a variable otherwise if line contains a : you are calling IndexOf() twice and if the most inner if returns true you call it three times. Switching the most inner condition to evaluating the fastest condition should be done as well.





This




string word = line.Split(' ');



should be renamed to words.





This




for (int i = 0; i < word.Length && caps; i++)
{
char char_array = word[i].ToCharArray();

if (!letter.All(l => char.IsUpper(l)))
{
caps = false;
continue;
}
if (caps)
lastCapWordIndex++;
}



doesn't buy you anything. You already checked letter.All(l => char.IsUpper(l)) some lines above and if it returned true you continue; the moste outer loop. Hence it will return in this loop always true hence lastCapWordIndex will always be 0. In addition a simple break; would be sufficiant because looping condition checks for caps being true.



The following




if (lastCapWordIndex > 0)
{
for (int i = 0; i < lastCapWordIndex; i++)
{
header += " " + word[i];
}
headers.Add(header.Trim());
continue;
}



can be removed as well because lastCapWordIndex won't ever be true like stated above.





This




//final check for string with less then 4 characters
string tempH = headers.ToArray();
headers = new List<string>();
foreach (string h in tempH)
{
if (h.Length > 4)
{
headers.Add(h);
}
}
return headers;



can be simplified by using a little bit of Linq like so



return new List<string>(headers.Where(s => s.Length > 4));  


In addition the comment you placed above is lying because you check for strings which are less then 5 characters.





Implementing the mentioned points will lead to



private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);
private static Regex noBracksRegex = new Regex(@"(.*?)", RegexOptions.Compiled);
private List<string> GetHeaders(List<string> fileLines)
{
var headers = new List<string>();
foreach (string line in fileLines)
{
string header = string.Empty;

int colonIndex = line.IndexOf(':');
if (colonIndex != -1)
{
string nobracks = noBracksRegex.Replace(line.Substring(0, colonIndex + 1), string.Empty);
if (nobracks.Length > 6 && nobracks.Split(' ').Length < 5)
{
headers.Add(line.Substring(0, colonIndex + 1));
continue;
}
}

string removedBracketsLine = noBracketsRegex.Replace(line, string.Empty);
//Checks if a string is larger then 5 words (not including brackets)
if (!(removedBracketsLine.Length > 6 && removedBracketsLine.Split(' ').Length < 5))
{
continue;
}

//Checks if the string is in all CAPS
char letters = removedBracketsLine.ToCharArray();
if (letters.All(l => char.IsUpper(l)))
{
headers.Add(line);
continue;
}

//Checks if the string is 5 words or less
string temp = noBracksRegex.Replace(line, string.Empty);
if (temp.Split(' ').Length < 6)
{
headers.Add(line);
}

}
return new List<string>(headers.Where(s => s.Length > 4));
}


The naming of the Regex could use a facelift but you should do it yourself because you know the meaning of them.






share|improve this answer





















    Your Answer





    StackExchange.ifUsing("editor", function () {
    return StackExchange.using("mathjaxEditing", function () {
    StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
    StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
    });
    });
    }, "mathjax-editing");

    StackExchange.ifUsing("editor", function () {
    StackExchange.using("externalEditor", function () {
    StackExchange.using("snippets", function () {
    StackExchange.snippets.init();
    });
    });
    }, "code-snippets");

    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "196"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });






    ShandowViper18 is a new contributor. Be nice, and check out our Code of Conduct.










    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210298%2fprocessing-a-large-list-of-strings%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    3














    Let us first check the overall style of that method.




    • The name of the method doesn't match the return type. The method is named GetHeader but it returns a List<string> hence GetHeaders would be a better name.

    • Based on the .NET Naming Guidelines method-parameters should be named using camelCase casing hence FileLines should be fileLines.

    • If the type of the right-hand-side of an assignment is obvious one should use var instead of the concrete type.


    • Stick to one coding style. Currently you are mixing styles in that method. Sometimes you place the opening braces { on the next line and sometimes you place it on the same line. Sometimes you use braces {} for single-line if statements and sometimes you don't. Omitting braces for single-line if statements should be avoided. Omitting barces can lead to hidden and therefor hard to find bugs.




    Now let's dig into the code.



    This




    //Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
    if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2}d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
    {
    continue;
    }



    can be removed completely because it will always evaluate to false.





    The regexes you use for replacements and matching should be extracted to private static fields like e.g



    private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);


    and used like so



    string nobrackets = noBracketsRegex.Replace(line, string.Empty);




    This




    string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
    if (line.IndexOf(':') != -1)
    {
    string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
    if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
    {
    headers.Add(line.Substring(0, line.IndexOf(':') + 1));
    continue;
    }
    }

    //Checks if a string is larger then 5 words (not including brackets)
    if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
    continue;



    should be reorderd. You do the Regex.Replace() althought it could be possible that the most inner if condition could be true. You should store the result of line.IndexOf(':') in a variable otherwise if line contains a : you are calling IndexOf() twice and if the most inner if returns true you call it three times. Switching the most inner condition to evaluating the fastest condition should be done as well.





    This




    string word = line.Split(' ');



    should be renamed to words.





    This




    for (int i = 0; i < word.Length && caps; i++)
    {
    char char_array = word[i].ToCharArray();

    if (!letter.All(l => char.IsUpper(l)))
    {
    caps = false;
    continue;
    }
    if (caps)
    lastCapWordIndex++;
    }



    doesn't buy you anything. You already checked letter.All(l => char.IsUpper(l)) some lines above and if it returned true you continue; the moste outer loop. Hence it will return in this loop always true hence lastCapWordIndex will always be 0. In addition a simple break; would be sufficiant because looping condition checks for caps being true.



    The following




    if (lastCapWordIndex > 0)
    {
    for (int i = 0; i < lastCapWordIndex; i++)
    {
    header += " " + word[i];
    }
    headers.Add(header.Trim());
    continue;
    }



    can be removed as well because lastCapWordIndex won't ever be true like stated above.





    This




    //final check for string with less then 4 characters
    string tempH = headers.ToArray();
    headers = new List<string>();
    foreach (string h in tempH)
    {
    if (h.Length > 4)
    {
    headers.Add(h);
    }
    }
    return headers;



    can be simplified by using a little bit of Linq like so



    return new List<string>(headers.Where(s => s.Length > 4));  


    In addition the comment you placed above is lying because you check for strings which are less then 5 characters.





    Implementing the mentioned points will lead to



    private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);
    private static Regex noBracksRegex = new Regex(@"(.*?)", RegexOptions.Compiled);
    private List<string> GetHeaders(List<string> fileLines)
    {
    var headers = new List<string>();
    foreach (string line in fileLines)
    {
    string header = string.Empty;

    int colonIndex = line.IndexOf(':');
    if (colonIndex != -1)
    {
    string nobracks = noBracksRegex.Replace(line.Substring(0, colonIndex + 1), string.Empty);
    if (nobracks.Length > 6 && nobracks.Split(' ').Length < 5)
    {
    headers.Add(line.Substring(0, colonIndex + 1));
    continue;
    }
    }

    string removedBracketsLine = noBracketsRegex.Replace(line, string.Empty);
    //Checks if a string is larger then 5 words (not including brackets)
    if (!(removedBracketsLine.Length > 6 && removedBracketsLine.Split(' ').Length < 5))
    {
    continue;
    }

    //Checks if the string is in all CAPS
    char letters = removedBracketsLine.ToCharArray();
    if (letters.All(l => char.IsUpper(l)))
    {
    headers.Add(line);
    continue;
    }

    //Checks if the string is 5 words or less
    string temp = noBracksRegex.Replace(line, string.Empty);
    if (temp.Split(' ').Length < 6)
    {
    headers.Add(line);
    }

    }
    return new List<string>(headers.Where(s => s.Length > 4));
    }


    The naming of the Regex could use a facelift but you should do it yourself because you know the meaning of them.






    share|improve this answer


























      3














      Let us first check the overall style of that method.




      • The name of the method doesn't match the return type. The method is named GetHeader but it returns a List<string> hence GetHeaders would be a better name.

      • Based on the .NET Naming Guidelines method-parameters should be named using camelCase casing hence FileLines should be fileLines.

      • If the type of the right-hand-side of an assignment is obvious one should use var instead of the concrete type.


      • Stick to one coding style. Currently you are mixing styles in that method. Sometimes you place the opening braces { on the next line and sometimes you place it on the same line. Sometimes you use braces {} for single-line if statements and sometimes you don't. Omitting braces for single-line if statements should be avoided. Omitting barces can lead to hidden and therefor hard to find bugs.




      Now let's dig into the code.



      This




      //Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
      if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2}d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
      {
      continue;
      }



      can be removed completely because it will always evaluate to false.





      The regexes you use for replacements and matching should be extracted to private static fields like e.g



      private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);


      and used like so



      string nobrackets = noBracketsRegex.Replace(line, string.Empty);




      This




      string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
      if (line.IndexOf(':') != -1)
      {
      string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
      if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
      {
      headers.Add(line.Substring(0, line.IndexOf(':') + 1));
      continue;
      }
      }

      //Checks if a string is larger then 5 words (not including brackets)
      if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
      continue;



      should be reorderd. You do the Regex.Replace() althought it could be possible that the most inner if condition could be true. You should store the result of line.IndexOf(':') in a variable otherwise if line contains a : you are calling IndexOf() twice and if the most inner if returns true you call it three times. Switching the most inner condition to evaluating the fastest condition should be done as well.





      This




      string word = line.Split(' ');



      should be renamed to words.





      This




      for (int i = 0; i < word.Length && caps; i++)
      {
      char char_array = word[i].ToCharArray();

      if (!letter.All(l => char.IsUpper(l)))
      {
      caps = false;
      continue;
      }
      if (caps)
      lastCapWordIndex++;
      }



      doesn't buy you anything. You already checked letter.All(l => char.IsUpper(l)) some lines above and if it returned true you continue; the moste outer loop. Hence it will return in this loop always true hence lastCapWordIndex will always be 0. In addition a simple break; would be sufficiant because looping condition checks for caps being true.



      The following




      if (lastCapWordIndex > 0)
      {
      for (int i = 0; i < lastCapWordIndex; i++)
      {
      header += " " + word[i];
      }
      headers.Add(header.Trim());
      continue;
      }



      can be removed as well because lastCapWordIndex won't ever be true like stated above.





      This




      //final check for string with less then 4 characters
      string tempH = headers.ToArray();
      headers = new List<string>();
      foreach (string h in tempH)
      {
      if (h.Length > 4)
      {
      headers.Add(h);
      }
      }
      return headers;



      can be simplified by using a little bit of Linq like so



      return new List<string>(headers.Where(s => s.Length > 4));  


      In addition the comment you placed above is lying because you check for strings which are less then 5 characters.





      Implementing the mentioned points will lead to



      private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);
      private static Regex noBracksRegex = new Regex(@"(.*?)", RegexOptions.Compiled);
      private List<string> GetHeaders(List<string> fileLines)
      {
      var headers = new List<string>();
      foreach (string line in fileLines)
      {
      string header = string.Empty;

      int colonIndex = line.IndexOf(':');
      if (colonIndex != -1)
      {
      string nobracks = noBracksRegex.Replace(line.Substring(0, colonIndex + 1), string.Empty);
      if (nobracks.Length > 6 && nobracks.Split(' ').Length < 5)
      {
      headers.Add(line.Substring(0, colonIndex + 1));
      continue;
      }
      }

      string removedBracketsLine = noBracketsRegex.Replace(line, string.Empty);
      //Checks if a string is larger then 5 words (not including brackets)
      if (!(removedBracketsLine.Length > 6 && removedBracketsLine.Split(' ').Length < 5))
      {
      continue;
      }

      //Checks if the string is in all CAPS
      char letters = removedBracketsLine.ToCharArray();
      if (letters.All(l => char.IsUpper(l)))
      {
      headers.Add(line);
      continue;
      }

      //Checks if the string is 5 words or less
      string temp = noBracksRegex.Replace(line, string.Empty);
      if (temp.Split(' ').Length < 6)
      {
      headers.Add(line);
      }

      }
      return new List<string>(headers.Where(s => s.Length > 4));
      }


      The naming of the Regex could use a facelift but you should do it yourself because you know the meaning of them.






      share|improve this answer
























        3












        3








        3






        Let us first check the overall style of that method.




        • The name of the method doesn't match the return type. The method is named GetHeader but it returns a List<string> hence GetHeaders would be a better name.

        • Based on the .NET Naming Guidelines method-parameters should be named using camelCase casing hence FileLines should be fileLines.

        • If the type of the right-hand-side of an assignment is obvious one should use var instead of the concrete type.


        • Stick to one coding style. Currently you are mixing styles in that method. Sometimes you place the opening braces { on the next line and sometimes you place it on the same line. Sometimes you use braces {} for single-line if statements and sometimes you don't. Omitting braces for single-line if statements should be avoided. Omitting barces can lead to hidden and therefor hard to find bugs.




        Now let's dig into the code.



        This




        //Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
        if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2}d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
        {
        continue;
        }



        can be removed completely because it will always evaluate to false.





        The regexes you use for replacements and matching should be extracted to private static fields like e.g



        private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);


        and used like so



        string nobrackets = noBracketsRegex.Replace(line, string.Empty);




        This




        string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
        if (line.IndexOf(':') != -1)
        {
        string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
        if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
        {
        headers.Add(line.Substring(0, line.IndexOf(':') + 1));
        continue;
        }
        }

        //Checks if a string is larger then 5 words (not including brackets)
        if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
        continue;



        should be reorderd. You do the Regex.Replace() althought it could be possible that the most inner if condition could be true. You should store the result of line.IndexOf(':') in a variable otherwise if line contains a : you are calling IndexOf() twice and if the most inner if returns true you call it three times. Switching the most inner condition to evaluating the fastest condition should be done as well.





        This




        string word = line.Split(' ');



        should be renamed to words.





        This




        for (int i = 0; i < word.Length && caps; i++)
        {
        char char_array = word[i].ToCharArray();

        if (!letter.All(l => char.IsUpper(l)))
        {
        caps = false;
        continue;
        }
        if (caps)
        lastCapWordIndex++;
        }



        doesn't buy you anything. You already checked letter.All(l => char.IsUpper(l)) some lines above and if it returned true you continue; the moste outer loop. Hence it will return in this loop always true hence lastCapWordIndex will always be 0. In addition a simple break; would be sufficiant because looping condition checks for caps being true.



        The following




        if (lastCapWordIndex > 0)
        {
        for (int i = 0; i < lastCapWordIndex; i++)
        {
        header += " " + word[i];
        }
        headers.Add(header.Trim());
        continue;
        }



        can be removed as well because lastCapWordIndex won't ever be true like stated above.





        This




        //final check for string with less then 4 characters
        string tempH = headers.ToArray();
        headers = new List<string>();
        foreach (string h in tempH)
        {
        if (h.Length > 4)
        {
        headers.Add(h);
        }
        }
        return headers;



        can be simplified by using a little bit of Linq like so



        return new List<string>(headers.Where(s => s.Length > 4));  


        In addition the comment you placed above is lying because you check for strings which are less then 5 characters.





        Implementing the mentioned points will lead to



        private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);
        private static Regex noBracksRegex = new Regex(@"(.*?)", RegexOptions.Compiled);
        private List<string> GetHeaders(List<string> fileLines)
        {
        var headers = new List<string>();
        foreach (string line in fileLines)
        {
        string header = string.Empty;

        int colonIndex = line.IndexOf(':');
        if (colonIndex != -1)
        {
        string nobracks = noBracksRegex.Replace(line.Substring(0, colonIndex + 1), string.Empty);
        if (nobracks.Length > 6 && nobracks.Split(' ').Length < 5)
        {
        headers.Add(line.Substring(0, colonIndex + 1));
        continue;
        }
        }

        string removedBracketsLine = noBracketsRegex.Replace(line, string.Empty);
        //Checks if a string is larger then 5 words (not including brackets)
        if (!(removedBracketsLine.Length > 6 && removedBracketsLine.Split(' ').Length < 5))
        {
        continue;
        }

        //Checks if the string is in all CAPS
        char letters = removedBracketsLine.ToCharArray();
        if (letters.All(l => char.IsUpper(l)))
        {
        headers.Add(line);
        continue;
        }

        //Checks if the string is 5 words or less
        string temp = noBracksRegex.Replace(line, string.Empty);
        if (temp.Split(' ').Length < 6)
        {
        headers.Add(line);
        }

        }
        return new List<string>(headers.Where(s => s.Length > 4));
        }


        The naming of the Regex could use a facelift but you should do it yourself because you know the meaning of them.






        share|improve this answer












        Let us first check the overall style of that method.




        • The name of the method doesn't match the return type. The method is named GetHeader but it returns a List<string> hence GetHeaders would be a better name.

        • Based on the .NET Naming Guidelines method-parameters should be named using camelCase casing hence FileLines should be fileLines.

        • If the type of the right-hand-side of an assignment is obvious one should use var instead of the concrete type.


        • Stick to one coding style. Currently you are mixing styles in that method. Sometimes you place the opening braces { on the next line and sometimes you place it on the same line. Sometimes you use braces {} for single-line if statements and sometimes you don't. Omitting braces for single-line if statements should be avoided. Omitting barces can lead to hidden and therefor hard to find bugs.




        Now let's dig into the code.



        This




        //Checks if there is a ':' and assumes that anything before that is the header except if it contains a date or a report id
        if(Regex.IsMatch(header, @"w{2,4}[/-]w{2,3}[/-]w{2,4}", RegexOptions.Compiled) || Regex.IsMatch(header, @"^w+, w{2}d{5}-{0,1}d{0,5}", RegexOptions.Compiled))
        {
        continue;
        }



        can be removed completely because it will always evaluate to false.





        The regexes you use for replacements and matching should be extracted to private static fields like e.g



        private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);


        and used like so



        string nobrackets = noBracketsRegex.Replace(line, string.Empty);




        This




        string nobrackets = Regex.Replace(line, @".*?(.*?)", string.Empty, RegexOptions.Compiled);
        if (line.IndexOf(':') != -1)
        {
        string nobracks = Regex.Replace(line.Substring(0, line.IndexOf(':') + 1), @"(.*?)", string.Empty, RegexOptions.Compiled);
        if (nobracks.Split(' ').Length < 5 && nobracks.Length > 6)
        {
        headers.Add(line.Substring(0, line.IndexOf(':') + 1));
        continue;
        }
        }

        //Checks if a string is larger then 5 words (not including brackets)
        if (!(nobrackets.Split(' ').Length < 5 && nobrackets.Length > 6))
        continue;



        should be reorderd. You do the Regex.Replace() althought it could be possible that the most inner if condition could be true. You should store the result of line.IndexOf(':') in a variable otherwise if line contains a : you are calling IndexOf() twice and if the most inner if returns true you call it three times. Switching the most inner condition to evaluating the fastest condition should be done as well.





        This




        string word = line.Split(' ');



        should be renamed to words.





        This




        for (int i = 0; i < word.Length && caps; i++)
        {
        char char_array = word[i].ToCharArray();

        if (!letter.All(l => char.IsUpper(l)))
        {
        caps = false;
        continue;
        }
        if (caps)
        lastCapWordIndex++;
        }



        doesn't buy you anything. You already checked letter.All(l => char.IsUpper(l)) some lines above and if it returned true you continue; the moste outer loop. Hence it will return in this loop always true hence lastCapWordIndex will always be 0. In addition a simple break; would be sufficiant because looping condition checks for caps being true.



        The following




        if (lastCapWordIndex > 0)
        {
        for (int i = 0; i < lastCapWordIndex; i++)
        {
        header += " " + word[i];
        }
        headers.Add(header.Trim());
        continue;
        }



        can be removed as well because lastCapWordIndex won't ever be true like stated above.





        This




        //final check for string with less then 4 characters
        string tempH = headers.ToArray();
        headers = new List<string>();
        foreach (string h in tempH)
        {
        if (h.Length > 4)
        {
        headers.Add(h);
        }
        }
        return headers;



        can be simplified by using a little bit of Linq like so



        return new List<string>(headers.Where(s => s.Length > 4));  


        In addition the comment you placed above is lying because you check for strings which are less then 5 characters.





        Implementing the mentioned points will lead to



        private static Regex noBracketsRegex = new Regex(@".*?(.*?)", RegexOptions.Compiled);
        private static Regex noBracksRegex = new Regex(@"(.*?)", RegexOptions.Compiled);
        private List<string> GetHeaders(List<string> fileLines)
        {
        var headers = new List<string>();
        foreach (string line in fileLines)
        {
        string header = string.Empty;

        int colonIndex = line.IndexOf(':');
        if (colonIndex != -1)
        {
        string nobracks = noBracksRegex.Replace(line.Substring(0, colonIndex + 1), string.Empty);
        if (nobracks.Length > 6 && nobracks.Split(' ').Length < 5)
        {
        headers.Add(line.Substring(0, colonIndex + 1));
        continue;
        }
        }

        string removedBracketsLine = noBracketsRegex.Replace(line, string.Empty);
        //Checks if a string is larger then 5 words (not including brackets)
        if (!(removedBracketsLine.Length > 6 && removedBracketsLine.Split(' ').Length < 5))
        {
        continue;
        }

        //Checks if the string is in all CAPS
        char letters = removedBracketsLine.ToCharArray();
        if (letters.All(l => char.IsUpper(l)))
        {
        headers.Add(line);
        continue;
        }

        //Checks if the string is 5 words or less
        string temp = noBracksRegex.Replace(line, string.Empty);
        if (temp.Split(' ').Length < 6)
        {
        headers.Add(line);
        }

        }
        return new List<string>(headers.Where(s => s.Length > 4));
        }


        The naming of the Regex could use a facelift but you should do it yourself because you know the meaning of them.







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered 2 days ago









        Heslacher

        44.9k460155




        44.9k460155






















            ShandowViper18 is a new contributor. Be nice, and check out our Code of Conduct.










            draft saved

            draft discarded


















            ShandowViper18 is a new contributor. Be nice, and check out our Code of Conduct.













            ShandowViper18 is a new contributor. Be nice, and check out our Code of Conduct.












            ShandowViper18 is a new contributor. Be nice, and check out our Code of Conduct.
















            Thanks for contributing an answer to Code Review Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            Use MathJax to format equations. MathJax reference.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f210298%2fprocessing-a-large-list-of-strings%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Список кардиналов, возведённых папой римским Каликстом III

            Deduzione

            Mysql.sock missing - “Can't connect to local MySQL server through socket”