PHP function to convert a Portuguese word from plural to singular












2












$begingroup$


I know, this sounds really difficult, but it is really easy.



I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.



The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):




  • If the word ends in a vowel, remove the s at the end

  • Words ending in ões, ães and ãos should end with ão

  • Words ending in is, remove the is and add l to the end
    Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

  • Words ending in ns get it replaced with m

  • Words ending with [rsz]es should lose the es
    Special case: words ending in eses need the first e replaced with ê, like in meses => mês

  • Some words are always used in the plural, like óculos, parabéns and férias.


Below, here's the code:



function plural_to_singular($string)
{
if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
{
return $string;
}

$regexes = array(
'[õã]es' => 'ão',
'[áó].*eis' => 'el',
'[eé]is' => 'el',
'([^eé])is' => '$1l',
'ns' => 'm',
'eses' => 'ês',
'([rzs])es' => '$1',
's' => ''
);

foreach($regexes as $fragment => $replace)
{
$regex = '/' . $fragment . '$/ui';
if(preg_match($regex, $string))
{
return preg_replace($regex, $replace, $string);
}
}

return $string;
}


You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases





In your opinion, what can I improve?



Is there any obvious butchering or performance killer?










share|improve this question











$endgroup$

















    2












    $begingroup$


    I know, this sounds really difficult, but it is really easy.



    I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.



    The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):




    • If the word ends in a vowel, remove the s at the end

    • Words ending in ões, ães and ãos should end with ão

    • Words ending in is, remove the is and add l to the end
      Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

    • Words ending in ns get it replaced with m

    • Words ending with [rsz]es should lose the es
      Special case: words ending in eses need the first e replaced with ê, like in meses => mês

    • Some words are always used in the plural, like óculos, parabéns and férias.


    Below, here's the code:



    function plural_to_singular($string)
    {
    if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
    {
    return $string;
    }

    $regexes = array(
    '[õã]es' => 'ão',
    '[áó].*eis' => 'el',
    '[eé]is' => 'el',
    '([^eé])is' => '$1l',
    'ns' => 'm',
    'eses' => 'ês',
    '([rzs])es' => '$1',
    's' => ''
    );

    foreach($regexes as $fragment => $replace)
    {
    $regex = '/' . $fragment . '$/ui';
    if(preg_match($regex, $string))
    {
    return preg_replace($regex, $replace, $string);
    }
    }

    return $string;
    }


    You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases





    In your opinion, what can I improve?



    Is there any obvious butchering or performance killer?










    share|improve this question











    $endgroup$















      2












      2








      2





      $begingroup$


      I know, this sounds really difficult, but it is really easy.



      I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.



      The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):




      • If the word ends in a vowel, remove the s at the end

      • Words ending in ões, ães and ãos should end with ão

      • Words ending in is, remove the is and add l to the end
        Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

      • Words ending in ns get it replaced with m

      • Words ending with [rsz]es should lose the es
        Special case: words ending in eses need the first e replaced with ê, like in meses => mês

      • Some words are always used in the plural, like óculos, parabéns and férias.


      Below, here's the code:



      function plural_to_singular($string)
      {
      if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
      {
      return $string;
      }

      $regexes = array(
      '[õã]es' => 'ão',
      '[áó].*eis' => 'el',
      '[eé]is' => 'el',
      '([^eé])is' => '$1l',
      'ns' => 'm',
      'eses' => 'ês',
      '([rzs])es' => '$1',
      's' => ''
      );

      foreach($regexes as $fragment => $replace)
      {
      $regex = '/' . $fragment . '$/ui';
      if(preg_match($regex, $string))
      {
      return preg_replace($regex, $replace, $string);
      }
      }

      return $string;
      }


      You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases





      In your opinion, what can I improve?



      Is there any obvious butchering or performance killer?










      share|improve this question











      $endgroup$




      I know, this sounds really difficult, but it is really easy.



      I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.



      The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):




      • If the word ends in a vowel, remove the s at the end

      • Words ending in ões, ães and ãos should end with ão

      • Words ending in is, remove the is and add l to the end
        Special case: accents should be removed, if needed. The only cases I saw were anéis and pastéis, which have to be anel and papel.

      • Words ending in ns get it replaced with m

      • Words ending with [rsz]es should lose the es
        Special case: words ending in eses need the first e replaced with ê, like in meses => mês

      • Some words are always used in the plural, like óculos, parabéns and férias.


      Below, here's the code:



      function plural_to_singular($string)
      {
      if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
      {
      return $string;
      }

      $regexes = array(
      '[õã]es' => 'ão',
      '[áó].*eis' => 'el',
      '[eé]is' => 'el',
      '([^eé])is' => '$1l',
      'ns' => 'm',
      'eses' => 'ês',
      '([rzs])es' => '$1',
      's' => ''
      );

      foreach($regexes as $fragment => $replace)
      {
      $regex = '/' . $fragment . '$/ui';
      if(preg_match($regex, $string))
      {
      return preg_replace($regex, $replace, $string);
      }
      }

      return $string;
      }


      You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases





      In your opinion, what can I improve?



      Is there any obvious butchering or performance killer?







      php strings regex i18n






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Dec 15 '16 at 18:44









      Mike Brant

      8,813622




      8,813622










      asked Dec 15 '16 at 14:47









      Ismael MiguelIsmael Miguel

      4,30111453




      4,30111453






















          2 Answers
          2






          active

          oldest

          votes


















          1












          $begingroup$

          Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.



          There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.



          So you could easily do something like:



          preg_replace($pattern_array, $replacement_array, $string);


          I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:



          $regex_config = array(
          'ão' => '/[õã]es$/iu',
          ...
          );
          $pattern_array = array_values($regex_config);
          $replacement_array = array_keys($regex_config);
          $result = preg_replace($pattern_array, $replacement_array, $string, 1);


          You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?



          Should your function name indicate that the function is only applicable to Portugeuse?






          share|improve this answer











          $endgroup$













          • $begingroup$
            Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
            $endgroup$
            – Ismael Miguel
            Dec 15 '16 at 19:50












          • $begingroup$
            @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
            $endgroup$
            – Mike Brant
            Dec 16 '16 at 16:31










          • $begingroup$
            In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
            $endgroup$
            – Ismael Miguel
            Dec 16 '16 at 20:43










          • $begingroup$
            @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
            $endgroup$
            – Mike Brant
            Dec 17 '16 at 14:20












          • $begingroup$
            Oh, yeah, the magical parameter that I always forget about! That's a great idea!
            $endgroup$
            – Ismael Miguel
            Dec 17 '16 at 17:26



















          0












          $begingroup$

          Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.





          1. $regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

          2. Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

          3. Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.





          1. For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

          2. You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

          3. If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

          4. I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

          5. After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''


          6. I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).





          Code: (Demo)



          function is_allcaps($string)
          {
          $last_letter = mb_substr($string, -1, 1, 'UTF-8');
          return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
          // otherwise use cytpe_upper() and setlocale()
          }

          function plural_to_singular($string)
          {
          // quick return of "untouchables"
          if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
          {
          return $string;
          }

          $regex_map = [
          '~[õã]es$~iu' => 'ão',
          '~(?:[áó].*e|[eé])is$~iu' => 'el',
          '~[^eé]Kis$~iu' => 'l',
          '~ns$~iu' => 'm',
          '~eses$~iu' => 'ês',
          '~(?:[rzs]Ke)?s$~iu' => ''
          ];

          foreach ($regex_map as $pattern => $replacement)
          {
          $singular = preg_replace($pattern, $replacement, $string, 1, $count);
          if ($count)
          {
          return is_allcaps($string) ? mb_strtoupper($singular) : $singular;

          }
          }
          return $string;
          }

          $words = array(
          'óculos' => 'óculos',
          'papéis' => 'papel',
          'anéis' => 'anel',
          'PASTEIS' => 'PASTEL',
          'CAMIÕES' => 'CAMIÃO',
          'rodas' => 'roda',
          'cães' => 'cão',
          'meses' => 'mês',
          'vezes' => 'vez',
          'luzes' => 'luz',
          'cristais' => 'cristal',
          'canções' => 'canção',
          'nuvens' => 'nuvem',
          'alemães' => 'alemão'
          );

          foreach($words as $plural => $singular)
          {
          echo "$plural => $singular = " , plural_to_singular($plural) , "n";
          }





          share|improve this answer









          $endgroup$













          • $begingroup$
            @MikeBrant ping
            $endgroup$
            – mickmackusa
            23 mins ago











          Your Answer





          StackExchange.ifUsing("editor", function () {
          return StackExchange.using("mathjaxEditing", function () {
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          });
          });
          }, "mathjax-editing");

          StackExchange.ifUsing("editor", function () {
          StackExchange.using("externalEditor", function () {
          StackExchange.using("snippets", function () {
          StackExchange.snippets.init();
          });
          });
          }, "code-snippets");

          StackExchange.ready(function() {
          var channelOptions = {
          tags: "".split(" "),
          id: "196"
          };
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function() {
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled) {
          StackExchange.using("snippets", function() {
          createEditor();
          });
          }
          else {
          createEditor();
          }
          });

          function createEditor() {
          StackExchange.prepareEditor({
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader: {
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          },
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          });


          }
          });














          draft saved

          draft discarded


















          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149991%2fphp-function-to-convert-a-portuguese-word-from-plural-to-singular%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown

























          2 Answers
          2






          active

          oldest

          votes








          2 Answers
          2






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          1












          $begingroup$

          Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.



          There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.



          So you could easily do something like:



          preg_replace($pattern_array, $replacement_array, $string);


          I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:



          $regex_config = array(
          'ão' => '/[õã]es$/iu',
          ...
          );
          $pattern_array = array_values($regex_config);
          $replacement_array = array_keys($regex_config);
          $result = preg_replace($pattern_array, $replacement_array, $string, 1);


          You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?



          Should your function name indicate that the function is only applicable to Portugeuse?






          share|improve this answer











          $endgroup$













          • $begingroup$
            Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
            $endgroup$
            – Ismael Miguel
            Dec 15 '16 at 19:50












          • $begingroup$
            @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
            $endgroup$
            – Mike Brant
            Dec 16 '16 at 16:31










          • $begingroup$
            In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
            $endgroup$
            – Ismael Miguel
            Dec 16 '16 at 20:43










          • $begingroup$
            @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
            $endgroup$
            – Mike Brant
            Dec 17 '16 at 14:20












          • $begingroup$
            Oh, yeah, the magical parameter that I always forget about! That's a great idea!
            $endgroup$
            – Ismael Miguel
            Dec 17 '16 at 17:26
















          1












          $begingroup$

          Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.



          There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.



          So you could easily do something like:



          preg_replace($pattern_array, $replacement_array, $string);


          I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:



          $regex_config = array(
          'ão' => '/[õã]es$/iu',
          ...
          );
          $pattern_array = array_values($regex_config);
          $replacement_array = array_keys($regex_config);
          $result = preg_replace($pattern_array, $replacement_array, $string, 1);


          You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?



          Should your function name indicate that the function is only applicable to Portugeuse?






          share|improve this answer











          $endgroup$













          • $begingroup$
            Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
            $endgroup$
            – Ismael Miguel
            Dec 15 '16 at 19:50












          • $begingroup$
            @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
            $endgroup$
            – Mike Brant
            Dec 16 '16 at 16:31










          • $begingroup$
            In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
            $endgroup$
            – Ismael Miguel
            Dec 16 '16 at 20:43










          • $begingroup$
            @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
            $endgroup$
            – Mike Brant
            Dec 17 '16 at 14:20












          • $begingroup$
            Oh, yeah, the magical parameter that I always forget about! That's a great idea!
            $endgroup$
            – Ismael Miguel
            Dec 17 '16 at 17:26














          1












          1








          1





          $begingroup$

          Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.



          There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.



          So you could easily do something like:



          preg_replace($pattern_array, $replacement_array, $string);


          I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:



          $regex_config = array(
          'ão' => '/[õã]es$/iu',
          ...
          );
          $pattern_array = array_values($regex_config);
          $replacement_array = array_keys($regex_config);
          $result = preg_replace($pattern_array, $replacement_array, $string, 1);


          You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?



          Should your function name indicate that the function is only applicable to Portugeuse?






          share|improve this answer











          $endgroup$



          Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.



          There is no reason for you to loop over the regex array and preg_replace() each individually, as preg_replace() accepts arrays for both patterns and replacements.



          So you could easily do something like:



          preg_replace($pattern_array, $replacement_array, $string);


          I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:



          $regex_config = array(
          'ão' => '/[õã]es$/iu',
          ...
          );
          $pattern_array = array_values($regex_config);
          $replacement_array = array_keys($regex_config);
          $result = preg_replace($pattern_array, $replacement_array, $string, 1);


          You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?



          Should your function name indicate that the function is only applicable to Portugeuse?







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Dec 17 '16 at 14:14

























          answered Dec 15 '16 at 18:39









          Mike BrantMike Brant

          8,813622




          8,813622












          • $begingroup$
            Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
            $endgroup$
            – Ismael Miguel
            Dec 15 '16 at 19:50












          • $begingroup$
            @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
            $endgroup$
            – Mike Brant
            Dec 16 '16 at 16:31










          • $begingroup$
            In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
            $endgroup$
            – Ismael Miguel
            Dec 16 '16 at 20:43










          • $begingroup$
            @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
            $endgroup$
            – Mike Brant
            Dec 17 '16 at 14:20












          • $begingroup$
            Oh, yeah, the magical parameter that I always forget about! That's a great idea!
            $endgroup$
            – Ismael Miguel
            Dec 17 '16 at 17:26


















          • $begingroup$
            Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
            $endgroup$
            – Ismael Miguel
            Dec 15 '16 at 19:50












          • $begingroup$
            @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
            $endgroup$
            – Mike Brant
            Dec 16 '16 at 16:31










          • $begingroup$
            In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
            $endgroup$
            – Ismael Miguel
            Dec 16 '16 at 20:43










          • $begingroup$
            @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
            $endgroup$
            – Mike Brant
            Dec 17 '16 at 14:20












          • $begingroup$
            Oh, yeah, the magical parameter that I always forget about! That's a great idea!
            $endgroup$
            – Ismael Miguel
            Dec 17 '16 at 17:26
















          $begingroup$
          Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
          $endgroup$
          – Ismael Miguel
          Dec 15 '16 at 19:50






          $begingroup$
          Won't your suggestion break for meses, which would return (since the s at the end is removed, as the last case, and is a required step)?
          $endgroup$
          – Ismael Miguel
          Dec 15 '16 at 19:50














          $begingroup$
          @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
          $endgroup$
          – Mike Brant
          Dec 16 '16 at 16:31




          $begingroup$
          @IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
          $endgroup$
          – Mike Brant
          Dec 16 '16 at 16:31












          $begingroup$
          In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
          $endgroup$
          – Ismael Miguel
          Dec 16 '16 at 20:43




          $begingroup$
          In other words, if I write the code the way you suggest, would it still work for words that end with s, but that are singular?
          $endgroup$
          – Ismael Miguel
          Dec 16 '16 at 20:43












          $begingroup$
          @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
          $endgroup$
          – Mike Brant
          Dec 17 '16 at 14:20






          $begingroup$
          @IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
          $endgroup$
          – Mike Brant
          Dec 17 '16 at 14:20














          $begingroup$
          Oh, yeah, the magical parameter that I always forget about! That's a great idea!
          $endgroup$
          – Ismael Miguel
          Dec 17 '16 at 17:26




          $begingroup$
          Oh, yeah, the magical parameter that I always forget about! That's a great idea!
          $endgroup$
          – Ismael Miguel
          Dec 17 '16 at 17:26













          0












          $begingroup$

          Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.





          1. $regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

          2. Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

          3. Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.





          1. For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

          2. You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

          3. If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

          4. I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

          5. After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''


          6. I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).





          Code: (Demo)



          function is_allcaps($string)
          {
          $last_letter = mb_substr($string, -1, 1, 'UTF-8');
          return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
          // otherwise use cytpe_upper() and setlocale()
          }

          function plural_to_singular($string)
          {
          // quick return of "untouchables"
          if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
          {
          return $string;
          }

          $regex_map = [
          '~[õã]es$~iu' => 'ão',
          '~(?:[áó].*e|[eé])is$~iu' => 'el',
          '~[^eé]Kis$~iu' => 'l',
          '~ns$~iu' => 'm',
          '~eses$~iu' => 'ês',
          '~(?:[rzs]Ke)?s$~iu' => ''
          ];

          foreach ($regex_map as $pattern => $replacement)
          {
          $singular = preg_replace($pattern, $replacement, $string, 1, $count);
          if ($count)
          {
          return is_allcaps($string) ? mb_strtoupper($singular) : $singular;

          }
          }
          return $string;
          }

          $words = array(
          'óculos' => 'óculos',
          'papéis' => 'papel',
          'anéis' => 'anel',
          'PASTEIS' => 'PASTEL',
          'CAMIÕES' => 'CAMIÃO',
          'rodas' => 'roda',
          'cães' => 'cão',
          'meses' => 'mês',
          'vezes' => 'vez',
          'luzes' => 'luz',
          'cristais' => 'cristal',
          'canções' => 'canção',
          'nuvens' => 'nuvem',
          'alemães' => 'alemão'
          );

          foreach($words as $plural => $singular)
          {
          echo "$plural => $singular = " , plural_to_singular($plural) , "n";
          }





          share|improve this answer









          $endgroup$













          • $begingroup$
            @MikeBrant ping
            $endgroup$
            – mickmackusa
            23 mins ago
















          0












          $begingroup$

          Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.





          1. $regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

          2. Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

          3. Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.





          1. For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

          2. You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

          3. If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

          4. I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

          5. After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''


          6. I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).





          Code: (Demo)



          function is_allcaps($string)
          {
          $last_letter = mb_substr($string, -1, 1, 'UTF-8');
          return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
          // otherwise use cytpe_upper() and setlocale()
          }

          function plural_to_singular($string)
          {
          // quick return of "untouchables"
          if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
          {
          return $string;
          }

          $regex_map = [
          '~[õã]es$~iu' => 'ão',
          '~(?:[áó].*e|[eé])is$~iu' => 'el',
          '~[^eé]Kis$~iu' => 'l',
          '~ns$~iu' => 'm',
          '~eses$~iu' => 'ês',
          '~(?:[rzs]Ke)?s$~iu' => ''
          ];

          foreach ($regex_map as $pattern => $replacement)
          {
          $singular = preg_replace($pattern, $replacement, $string, 1, $count);
          if ($count)
          {
          return is_allcaps($string) ? mb_strtoupper($singular) : $singular;

          }
          }
          return $string;
          }

          $words = array(
          'óculos' => 'óculos',
          'papéis' => 'papel',
          'anéis' => 'anel',
          'PASTEIS' => 'PASTEL',
          'CAMIÕES' => 'CAMIÃO',
          'rodas' => 'roda',
          'cães' => 'cão',
          'meses' => 'mês',
          'vezes' => 'vez',
          'luzes' => 'luz',
          'cristais' => 'cristal',
          'canções' => 'canção',
          'nuvens' => 'nuvem',
          'alemães' => 'alemão'
          );

          foreach($words as $plural => $singular)
          {
          echo "$plural => $singular = " , plural_to_singular($plural) , "n";
          }





          share|improve this answer









          $endgroup$













          • $begingroup$
            @MikeBrant ping
            $endgroup$
            – mickmackusa
            23 mins ago














          0












          0








          0





          $begingroup$

          Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.





          1. $regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

          2. Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

          3. Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.





          1. For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

          2. You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

          3. If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

          4. I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

          5. After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''


          6. I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).





          Code: (Demo)



          function is_allcaps($string)
          {
          $last_letter = mb_substr($string, -1, 1, 'UTF-8');
          return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
          // otherwise use cytpe_upper() and setlocale()
          }

          function plural_to_singular($string)
          {
          // quick return of "untouchables"
          if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
          {
          return $string;
          }

          $regex_map = [
          '~[õã]es$~iu' => 'ão',
          '~(?:[áó].*e|[eé])is$~iu' => 'el',
          '~[^eé]Kis$~iu' => 'l',
          '~ns$~iu' => 'm',
          '~eses$~iu' => 'ês',
          '~(?:[rzs]Ke)?s$~iu' => ''
          ];

          foreach ($regex_map as $pattern => $replacement)
          {
          $singular = preg_replace($pattern, $replacement, $string, 1, $count);
          if ($count)
          {
          return is_allcaps($string) ? mb_strtoupper($singular) : $singular;

          }
          }
          return $string;
          }

          $words = array(
          'óculos' => 'óculos',
          'papéis' => 'papel',
          'anéis' => 'anel',
          'PASTEIS' => 'PASTEL',
          'CAMIÕES' => 'CAMIÃO',
          'rodas' => 'roda',
          'cães' => 'cão',
          'meses' => 'mês',
          'vezes' => 'vez',
          'luzes' => 'luz',
          'cristais' => 'cristal',
          'canções' => 'canção',
          'nuvens' => 'nuvem',
          'alemães' => 'alemão'
          );

          foreach($words as $plural => $singular)
          {
          echo "$plural => $singular = " , plural_to_singular($plural) , "n";
          }





          share|improve this answer









          $endgroup$



          Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.





          1. $regex_config can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the ... (yatta-yatta). The key clash would be on el.

          2. Simply throwing 1 at the end of preg_replace() is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê

          3. Most trivially, array_values() doesn't need to be called because preg_replace() is "key ignorant" regarding the array inputs.





          1. For this process to maintain accuracy, there needs to be a return as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs.

          2. You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the K metacharacter (restart fullstring match). This way you don't need to use $1 or rewrite a literals from the pattern into the replacement.

          3. If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call mb_strtoupper().

          4. I don't have a sample string to test against ~[áó].*eis$~iu, but I wonder if this is accurate/correct and my Portuguese is not too sharp.

          5. After my implementation of K you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean: '~(?:[áó].*eis|[eé]is)$~iu' => 'el', and '~(?:[rzs]Kes|s)$~iu' => ''


          6. I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).





          Code: (Demo)



          function is_allcaps($string)
          {
          $last_letter = mb_substr($string, -1, 1, 'UTF-8');
          return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
          // otherwise use cytpe_upper() and setlocale()
          }

          function plural_to_singular($string)
          {
          // quick return of "untouchables"
          if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
          {
          return $string;
          }

          $regex_map = [
          '~[õã]es$~iu' => 'ão',
          '~(?:[áó].*e|[eé])is$~iu' => 'el',
          '~[^eé]Kis$~iu' => 'l',
          '~ns$~iu' => 'm',
          '~eses$~iu' => 'ês',
          '~(?:[rzs]Ke)?s$~iu' => ''
          ];

          foreach ($regex_map as $pattern => $replacement)
          {
          $singular = preg_replace($pattern, $replacement, $string, 1, $count);
          if ($count)
          {
          return is_allcaps($string) ? mb_strtoupper($singular) : $singular;

          }
          }
          return $string;
          }

          $words = array(
          'óculos' => 'óculos',
          'papéis' => 'papel',
          'anéis' => 'anel',
          'PASTEIS' => 'PASTEL',
          'CAMIÕES' => 'CAMIÃO',
          'rodas' => 'roda',
          'cães' => 'cão',
          'meses' => 'mês',
          'vezes' => 'vez',
          'luzes' => 'luz',
          'cristais' => 'cristal',
          'canções' => 'canção',
          'nuvens' => 'nuvem',
          'alemães' => 'alemão'
          );

          foreach($words as $plural => $singular)
          {
          echo "$plural => $singular = " , plural_to_singular($plural) , "n";
          }






          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered 25 mins ago









          mickmackusamickmackusa

          1,159213




          1,159213












          • $begingroup$
            @MikeBrant ping
            $endgroup$
            – mickmackusa
            23 mins ago


















          • $begingroup$
            @MikeBrant ping
            $endgroup$
            – mickmackusa
            23 mins ago
















          $begingroup$
          @MikeBrant ping
          $endgroup$
          – mickmackusa
          23 mins ago




          $begingroup$
          @MikeBrant ping
          $endgroup$
          – mickmackusa
          23 mins ago


















          draft saved

          draft discarded




















































          Thanks for contributing an answer to Code Review Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid



          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.


          Use MathJax to format equations. MathJax reference.


          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function () {
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149991%2fphp-function-to-convert-a-portuguese-word-from-plural-to-singular%23new-answer', 'question_page');
          }
          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Сан-Квентин

          8-я гвардейская общевойсковая армия

          Алькесар