PHP function to convert a Portuguese word from plural to singular
$begingroup$
I know, this sounds really difficult, but it is really easy.
I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.
The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):
- If the word ends in a vowel, remove the
s
at the end - Words ending in
ões
,ães
andãos
should end withão
- Words ending in
is
, remove theis
and addl
to the end
Special case: accents should be removed, if needed. The only cases I saw wereanéis
andpastéis
, which have to beanel
andpapel
. - Words ending in
ns
get it replaced withm
- Words ending with
[rsz]es
should lose thees
Special case: words ending ineses
need the firste
replaced withê
, like inmeses => mês
- Some words are always used in the plural, like
óculos
,parabéns
andférias
.
Below, here's the code:
function plural_to_singular($string)
{
if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
{
return $string;
}
$regexes = array(
'[õã]es' => 'ão',
'[áó].*eis' => 'el',
'[eé]is' => 'el',
'([^eé])is' => '$1l',
'ns' => 'm',
'eses' => 'ês',
'([rzs])es' => '$1',
's' => ''
);
foreach($regexes as $fragment => $replace)
{
$regex = '/' . $fragment . '$/ui';
if(preg_match($regex, $string))
{
return preg_replace($regex, $replace, $string);
}
}
return $string;
}
You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases
In your opinion, what can I improve?
Is there any obvious butchering or performance killer?
php strings regex i18n
$endgroup$
add a comment |
$begingroup$
I know, this sounds really difficult, but it is really easy.
I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.
The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):
- If the word ends in a vowel, remove the
s
at the end - Words ending in
ões
,ães
andãos
should end withão
- Words ending in
is
, remove theis
and addl
to the end
Special case: accents should be removed, if needed. The only cases I saw wereanéis
andpastéis
, which have to beanel
andpapel
. - Words ending in
ns
get it replaced withm
- Words ending with
[rsz]es
should lose thees
Special case: words ending ineses
need the firste
replaced withê
, like inmeses => mês
- Some words are always used in the plural, like
óculos
,parabéns
andférias
.
Below, here's the code:
function plural_to_singular($string)
{
if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
{
return $string;
}
$regexes = array(
'[õã]es' => 'ão',
'[áó].*eis' => 'el',
'[eé]is' => 'el',
'([^eé])is' => '$1l',
'ns' => 'm',
'eses' => 'ês',
'([rzs])es' => '$1',
's' => ''
);
foreach($regexes as $fragment => $replace)
{
$regex = '/' . $fragment . '$/ui';
if(preg_match($regex, $string))
{
return preg_replace($regex, $replace, $string);
}
}
return $string;
}
You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases
In your opinion, what can I improve?
Is there any obvious butchering or performance killer?
php strings regex i18n
$endgroup$
add a comment |
$begingroup$
I know, this sounds really difficult, but it is really easy.
I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.
The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):
- If the word ends in a vowel, remove the
s
at the end - Words ending in
ões
,ães
andãos
should end withão
- Words ending in
is
, remove theis
and addl
to the end
Special case: accents should be removed, if needed. The only cases I saw wereanéis
andpastéis
, which have to beanel
andpapel
. - Words ending in
ns
get it replaced withm
- Words ending with
[rsz]es
should lose thees
Special case: words ending ineses
need the firste
replaced withê
, like inmeses => mês
- Some words are always used in the plural, like
óculos
,parabéns
andférias
.
Below, here's the code:
function plural_to_singular($string)
{
if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
{
return $string;
}
$regexes = array(
'[õã]es' => 'ão',
'[áó].*eis' => 'el',
'[eé]is' => 'el',
'([^eé])is' => '$1l',
'ns' => 'm',
'eses' => 'ês',
'([rzs])es' => '$1',
's' => ''
);
foreach($regexes as $fragment => $replace)
{
$regex = '/' . $fragment . '$/ui';
if(preg_match($regex, $string))
{
return preg_replace($regex, $replace, $string);
}
}
return $string;
}
You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases
In your opinion, what can I improve?
Is there any obvious butchering or performance killer?
php strings regex i18n
$endgroup$
I know, this sounds really difficult, but it is really easy.
I needed to convert a single Portuguese word in the plural into singular. I know there's a right name for that, but it is escaping me.
The rules are simple, and will compile them from http://www.easyportuguese.com/portuguese-lessons/plural/ (but applying in reverse):
- If the word ends in a vowel, remove the
s
at the end - Words ending in
ões
,ães
andãos
should end withão
- Words ending in
is
, remove theis
and addl
to the end
Special case: accents should be removed, if needed. The only cases I saw wereanéis
andpastéis
, which have to beanel
andpapel
. - Words ending in
ns
get it replaced withm
- Words ending with
[rsz]es
should lose thees
Special case: words ending ineses
need the firste
replaced withê
, like inmeses => mês
- Some words are always used in the plural, like
óculos
,parabéns
andférias
.
Below, here's the code:
function plural_to_singular($string)
{
if(preg_match('/^(?:[oó]culos|parab[eé]ns|f[eé]rias)$/iu', $string))
{
return $string;
}
$regexes = array(
'[õã]es' => 'ão',
'[áó].*eis' => 'el',
'[eé]is' => 'el',
'([^eé])is' => '$1l',
'ns' => 'm',
'eses' => 'ês',
'([rzs])es' => '$1',
's' => ''
);
foreach($regexes as $fragment => $replace)
{
$regex = '/' . $fragment . '$/ui';
if(preg_match($regex, $string))
{
return preg_replace($regex, $replace, $string);
}
}
return $string;
}
You can try it on http://sandbox.onlinephpfunctions.com/code/7947a0efd16f361e89491e4a64f71b578d2278df with some testcases
In your opinion, what can I improve?
Is there any obvious butchering or performance killer?
php strings regex i18n
php strings regex i18n
edited Dec 15 '16 at 18:44
Mike Brant
8,813622
8,813622
asked Dec 15 '16 at 14:47
Ismael MiguelIsmael Miguel
4,30111453
4,30111453
add a comment |
add a comment |
2 Answers
2
active
oldest
votes
$begingroup$
Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.
There is no reason for you to loop over the regex array and preg_replace()
each individually, as preg_replace()
accepts arrays for both patterns and replacements.
So you could easily do something like:
preg_replace($pattern_array, $replacement_array, $string);
I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:
$regex_config = array(
'ão' => '/[õã]es$/iu',
...
);
$pattern_array = array_values($regex_config);
$replacement_array = array_keys($regex_config);
$result = preg_replace($pattern_array, $replacement_array, $string, 1);
You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?
Should your function name indicate that the function is only applicable to Portugeuse?
$endgroup$
$begingroup$
Won't your suggestion break formeses
, which would returnmê
(since thes
at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end withs
, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
add a comment |
$begingroup$
Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.
$regex_config
can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the...
(yatta-yatta). The key clash would be onel
.- Simply throwing
1
at the end ofpreg_replace()
is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê
- Most trivially,
array_values()
doesn't need to be called becausepreg_replace()
is "key ignorant" regarding the array inputs.
- For this process to maintain accuracy, there needs to be a
return
as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs. - You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the
K
metacharacter (restart fullstring match). This way you don't need to use$1
or rewrite a literals from the pattern into the replacement. - If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call
mb_strtoupper()
. - I don't have a sample string to test against
~[áó].*eis$~iu
, but I wonder if this is accurate/correct and my Portuguese is not too sharp. After my implementation of
K
you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean:'~(?:[áó].*eis|[eé]is)$~iu' => 'el',
and'~(?:[rzs]Kes|s)$~iu' => ''
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "n";
}
$endgroup$
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
});
});
}, "mathjax-editing");
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "196"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149991%2fphp-function-to-convert-a-portuguese-word-from-plural-to-singular%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
2 Answers
2
active
oldest
votes
2 Answers
2
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.
There is no reason for you to loop over the regex array and preg_replace()
each individually, as preg_replace()
accepts arrays for both patterns and replacements.
So you could easily do something like:
preg_replace($pattern_array, $replacement_array, $string);
I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:
$regex_config = array(
'ão' => '/[õã]es$/iu',
...
);
$pattern_array = array_values($regex_config);
$replacement_array = array_keys($regex_config);
$result = preg_replace($pattern_array, $replacement_array, $string, 1);
You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?
Should your function name indicate that the function is only applicable to Portugeuse?
$endgroup$
$begingroup$
Won't your suggestion break formeses
, which would returnmê
(since thes
at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end withs
, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
add a comment |
$begingroup$
Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.
There is no reason for you to loop over the regex array and preg_replace()
each individually, as preg_replace()
accepts arrays for both patterns and replacements.
So you could easily do something like:
preg_replace($pattern_array, $replacement_array, $string);
I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:
$regex_config = array(
'ão' => '/[õã]es$/iu',
...
);
$pattern_array = array_values($regex_config);
$replacement_array = array_keys($regex_config);
$result = preg_replace($pattern_array, $replacement_array, $string, 1);
You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?
Should your function name indicate that the function is only applicable to Portugeuse?
$endgroup$
$begingroup$
Won't your suggestion break formeses
, which would returnmê
(since thes
at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end withs
, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
add a comment |
$begingroup$
Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.
There is no reason for you to loop over the regex array and preg_replace()
each individually, as preg_replace()
accepts arrays for both patterns and replacements.
So you could easily do something like:
preg_replace($pattern_array, $replacement_array, $string);
I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:
$regex_config = array(
'ão' => '/[õã]es$/iu',
...
);
$pattern_array = array_values($regex_config);
$replacement_array = array_keys($regex_config);
$result = preg_replace($pattern_array, $replacement_array, $string, 1);
You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?
Should your function name indicate that the function is only applicable to Portugeuse?
$endgroup$
Other than for simplicity of being able apply all replacement rules easily, and with more maintainable code, I don't see an absolute need to use regex for this, as simple string manipulation should be able to be used here and may be better from a performance standpoint.
There is no reason for you to loop over the regex array and preg_replace()
each individually, as preg_replace()
accepts arrays for both patterns and replacements.
So you could easily do something like:
preg_replace($pattern_array, $replacement_array, $string);
I don't like your approach of building the regex pattern in two places, why not define entire pattern in regex array? You might have something like this:
$regex_config = array(
'ão' => '/[õã]es$/iu',
...
);
$pattern_array = array_values($regex_config);
$replacement_array = array_keys($regex_config);
$result = preg_replace($pattern_array, $replacement_array, $string, 1);
You also have a potential edge case you might need to address. What if subject string is all caps? Since you use case-insenstive match you could end up with an all-caps plural word geting lowercase letters replaced into it. Should you really be case-insensitive here?
Should your function name indicate that the function is only applicable to Portugeuse?
edited Dec 17 '16 at 14:14
answered Dec 15 '16 at 18:39
Mike BrantMike Brant
8,813622
8,813622
$begingroup$
Won't your suggestion break formeses
, which would returnmê
(since thes
at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end withs
, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
add a comment |
$begingroup$
Won't your suggestion break formeses
, which would returnmê
(since thes
at the end is removed, as the last case, and is a required step)?
$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end withs
, but that are singular?
$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
$begingroup$
Won't your suggestion break for
meses
, which would return mê
(since the s
at the end is removed, as the last case, and is a required step)?$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
Won't your suggestion break for
meses
, which would return mê
(since the s
at the end is removed, as the last case, and is a required step)?$endgroup$
– Ismael Miguel
Dec 15 '16 at 19:50
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
@IsmaelMiguel I didn't really speak to any specific pattern replacement logic, so I guess I don't understand your question.
$endgroup$
– Mike Brant
Dec 16 '16 at 16:31
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with
s
, but that are singular?$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
In other words, if I write the code the way you suggest, would it still work for words that end with
s
, but that are singular?$endgroup$
– Ismael Miguel
Dec 16 '16 at 20:43
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
@IsmaelMiguel I see your meaning now. I updated my answer to use limit parameter for the replacement. A limit value of 1 will limit the number of replacements that occur to 1, meaning cases where the a second replacement would have been triggered based on an earlier replacement will not happen.
$endgroup$
– Mike Brant
Dec 17 '16 at 14:20
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
$begingroup$
Oh, yeah, the magical parameter that I always forget about! That's a great idea!
$endgroup$
– Ismael Miguel
Dec 17 '16 at 17:26
add a comment |
$begingroup$
Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.
$regex_config
can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the...
(yatta-yatta). The key clash would be onel
.- Simply throwing
1
at the end ofpreg_replace()
is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê
- Most trivially,
array_values()
doesn't need to be called becausepreg_replace()
is "key ignorant" regarding the array inputs.
- For this process to maintain accuracy, there needs to be a
return
as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs. - You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the
K
metacharacter (restart fullstring match). This way you don't need to use$1
or rewrite a literals from the pattern into the replacement. - If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call
mb_strtoupper()
. - I don't have a sample string to test against
~[áó].*eis$~iu
, but I wonder if this is accurate/correct and my Portuguese is not too sharp. After my implementation of
K
you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean:'~(?:[áó].*eis|[eé]is)$~iu' => 'el',
and'~(?:[rzs]Kes|s)$~iu' => ''
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "n";
}
$endgroup$
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
add a comment |
$begingroup$
Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.
$regex_config
can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the...
(yatta-yatta). The key clash would be onel
.- Simply throwing
1
at the end ofpreg_replace()
is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê
- Most trivially,
array_values()
doesn't need to be called becausepreg_replace()
is "key ignorant" regarding the array inputs.
- For this process to maintain accuracy, there needs to be a
return
as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs. - You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the
K
metacharacter (restart fullstring match). This way you don't need to use$1
or rewrite a literals from the pattern into the replacement. - If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call
mb_strtoupper()
. - I don't have a sample string to test against
~[áó].*eis$~iu
, but I wonder if this is accurate/correct and my Portuguese is not too sharp. After my implementation of
K
you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean:'~(?:[áó].*eis|[eé]is)$~iu' => 'el',
and'~(?:[rzs]Kes|s)$~iu' => ''
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "n";
}
$endgroup$
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
add a comment |
$begingroup$
Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.
$regex_config
can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the...
(yatta-yatta). The key clash would be onel
.- Simply throwing
1
at the end ofpreg_replace()
is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê
- Most trivially,
array_values()
doesn't need to be called becausepreg_replace()
is "key ignorant" regarding the array inputs.
- For this process to maintain accuracy, there needs to be a
return
as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs. - You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the
K
metacharacter (restart fullstring match). This way you don't need to use$1
or rewrite a literals from the pattern into the replacement. - If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call
mb_strtoupper()
. - I don't have a sample string to test against
~[áó].*eis$~iu
, but I wonder if this is accurate/correct and my Portuguese is not too sharp. After my implementation of
K
you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean:'~(?:[áó].*eis|[eé]is)$~iu' => 'el',
and'~(?:[rzs]Kes|s)$~iu' => ''
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "n";
}
$endgroup$
Let me start by saying, that I have respect for Mike Brant, and have been enjoying his posts for quite a while now. However, his answer to this question is not his finest.
$regex_config
can not store the the replacement values as associative keys unless the regex patterns that use the same replacement value are merged. This is not explained in the...
(yatta-yatta). The key clash would be onel
.- Simply throwing
1
at the end ofpreg_replace()
is NOT going to provide the desired output. Declaring a replacement limit on the call will only limit the replacements PER array element. The damage is evident in this output: meses => mês = mê
- Most trivially,
array_values()
doesn't need to be called becausepreg_replace()
is "key ignorant" regarding the array inputs.
- For this process to maintain accuracy, there needs to be a
return
as soon as a replacement occurs on the input string. To avoid calling multiple replacements, iterate the array of pattern-replacement pairs. - You can avoid using capture groups and shorten your replacement strings in a couple places by implementing the
K
metacharacter (restart fullstring match). This way you don't need to use$1
or rewrite a literals from the pattern into the replacement. - If you need to add case-sensitivity to your replacement process, you can check the last character of the incoming string. If it is uppercase, assume the whole string is in CAPS and call
mb_strtoupper()
. - I don't have a sample string to test against
~[áó].*eis$~iu
, but I wonder if this is accurate/correct and my Portuguese is not too sharp. After my implementation of
K
you can see that two pairs of patterns are making the same replacement. If you don't expect to be making lots of future adjustments to this set of regex patterns, you could combine the patterns with a pipe. Here's what I mean:'~(?:[áó].*eis|[eé]is)$~iu' => 'el',
and'~(?:[rzs]Kes|s)$~iu' => ''
I am using the regex patterns as the keys because they will all logically be unique. the same cannot be said about the replacement values (not without merging anyhow).
Code: (Demo)
function is_allcaps($string)
{
$last_letter = mb_substr($string, -1, 1, 'UTF-8');
return $last_letter === mb_strtoupper($last_letter, 'UTF-8');
// otherwise use cytpe_upper() and setlocale()
}
function plural_to_singular($string)
{
// quick return of "untouchables"
if(preg_match('~^(?:[oó]culos|parab[eé]ns|f[eé]rias)$~iu', $string))
{
return $string;
}
$regex_map = [
'~[õã]es$~iu' => 'ão',
'~(?:[áó].*e|[eé])is$~iu' => 'el',
'~[^eé]Kis$~iu' => 'l',
'~ns$~iu' => 'm',
'~eses$~iu' => 'ês',
'~(?:[rzs]Ke)?s$~iu' => ''
];
foreach ($regex_map as $pattern => $replacement)
{
$singular = preg_replace($pattern, $replacement, $string, 1, $count);
if ($count)
{
return is_allcaps($string) ? mb_strtoupper($singular) : $singular;
}
}
return $string;
}
$words = array(
'óculos' => 'óculos',
'papéis' => 'papel',
'anéis' => 'anel',
'PASTEIS' => 'PASTEL',
'CAMIÕES' => 'CAMIÃO',
'rodas' => 'roda',
'cães' => 'cão',
'meses' => 'mês',
'vezes' => 'vez',
'luzes' => 'luz',
'cristais' => 'cristal',
'canções' => 'canção',
'nuvens' => 'nuvem',
'alemães' => 'alemão'
);
foreach($words as $plural => $singular)
{
echo "$plural => $singular = " , plural_to_singular($plural) , "n";
}
answered 25 mins ago
mickmackusamickmackusa
1,159213
1,159213
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
add a comment |
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
$begingroup$
@MikeBrant ping
$endgroup$
– mickmackusa
23 mins ago
add a comment |
Thanks for contributing an answer to Code Review Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f149991%2fphp-function-to-convert-a-portuguese-word-from-plural-to-singular%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown