Remove all spaces between Chinese words with regex
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
New contributor
add a comment |
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
New contributor
2
Does your spaces actually are
or you just used it guessing?
– Justinas
19 hours ago
.replace(/ /g,'')
– Nitesh Virani
19 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
19 hours ago
Do you want to keep a space before10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
18 hours ago
add a comment |
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
New contributor
I would like to remove all spaces among Chinese text only.
My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"
Ideal output: "請把這裡的 10 多個字合併. Can you help me?"
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");
I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.
javascript regex
javascript regex
New contributor
New contributor
edited 14 hours ago
Boann
36.7k1288121
36.7k1288121
New contributor
asked 19 hours ago
Needa HellNeeda Hell
1025
1025
New contributor
New contributor
2
Does your spaces actually are
or you just used it guessing?
– Justinas
19 hours ago
.replace(/ /g,'')
– Nitesh Virani
19 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
19 hours ago
Do you want to keep a space before10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
18 hours ago
add a comment |
2
Does your spaces actually are
or you just used it guessing?
– Justinas
19 hours ago
.replace(/ /g,'')
– Nitesh Virani
19 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may uses.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
19 hours ago
Do you want to keep a space before10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.
– Wiktor Stribiżew
18 hours ago
2
2
Does your spaces actually are
or you just used it guessing?– Justinas
19 hours ago
Does your spaces actually are
or you just used it guessing?– Justinas
19 hours ago
.replace(/ /g,'')
– Nitesh Virani
19 hours ago
.replace(/ /g,'')
– Nitesh Virani
19 hours ago
2
2
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
19 hours ago
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
19 hours ago
Do you want to keep a space before
10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
18 hours ago
Do you want to keep a space before
10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
18 hours ago
add a comment |
6 Answers
6
active
oldest
votes
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
19 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
1
@Pac0 That is because of/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.
– Wiktor Stribiżew
8 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
add a comment |
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
2
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
1
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
18 hours ago
2
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
12 hours ago
|
show 5 more comments
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
StackExchange.using("externalEditor", function () {
StackExchange.using("snippets", function () {
StackExchange.snippets.init();
});
});
}, "code-snippets");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "1"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
19 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
1
@Pac0 That is because of/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.
– Wiktor Stribiżew
8 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
19 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
1
@Pac0 That is because of/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.
– Wiktor Stribiżew
8 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
add a comment |
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
Getting to the Chinese char matching pattern
Using the Unicode Tools, the p{Han}
Unicode property class that matches any Chinese char can be translated into
[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]
In ES6, to match a single Chinese char, it can be used as
/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u
Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get
(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])
pattern to match any Chinese char using JS RegExp
.
So, you may use
s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')
See the regex demo.
If your JS environment is ECMAScript 2018 compliant you may use a shorter
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
Pattern details
(CHINESE_CHAR_PATTERN)
- Capturing group 1 ($1
in the replacement pattern): any Chinese char
s+
- any 1+ whitespaces (any Unicode whitespace)
(?=CHINESE_CHAR_PATTERN)
- there must be a Chinese char immediately to the right of the current location.
JS demo:
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));
edited 18 hours ago
answered 19 hours ago
Wiktor StribiżewWiktor Stribiżew
310k16131207
310k16131207
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
19 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
1
@Pac0 That is because of/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.
– Wiktor Stribiżew
8 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
add a comment |
FYI: if only one whitespace is expected between Chinese chars, remove+
afters
.
– Wiktor Stribiżew
19 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
1
@Pac0 That is because of/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.
– Wiktor Stribiżew
8 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
FYI: if only one whitespace is expected between Chinese chars, remove
+
after s
.– Wiktor Stribiżew
19 hours ago
FYI: if only one whitespace is expected between Chinese chars, remove
+
after s
.– Wiktor Stribiżew
19 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)
– Pac0
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
@Pac0 firefox has problems with "new" regexp e.g. here
– Kamil Kiełczewski
9 hours ago
1
1
@Pac0 That is because of
/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.– Wiktor Stribiżew
8 hours ago
@Pac0 That is because of
/(p{Script=Hani})s+(?=p{Script=Hani})/gu
, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.– Wiktor Stribiżew
8 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.
– Needa Hell
3 hours ago
add a comment |
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
2
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
1
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
18 hours ago
2
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
12 hours ago
|
show 5 more comments
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
2
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
1
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
18 hours ago
2
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
12 hours ago
|
show 5 more comments
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
Using @Brett Zamir soluce on how to match chinese character in regex
Javascript unicode string, chinese character but no punctuation
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
It looks like :
([foo chinese chars]) ([foo chinese chars])*
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');
const ret = str.replace(regex, '$1$2');
console.log(ret);
edited 12 hours ago
answered 19 hours ago
Grégory NEUTGrégory NEUT
8,79921538
8,79921538
2
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
1
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
18 hours ago
2
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
12 hours ago
|
show 5 more comments
2
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
1
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
1
What about eg請 的 10 多 個 a
– bobble bubble
18 hours ago
2
@GrégoryNEUTblabla
isn't a common metasyntactic variable in English, you might want to usefoo
instead ;)
– Aaron
12 hours ago
2
2
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
The output here doesn't match with the ideal output. Notice the space in front of the 10.
– holydragon
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p
– jonatjano
19 hours ago
1
1
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
I've edited my post to match your desire
– Grégory NEUT
19 hours ago
1
1
What about eg
請 的 10 多 個 a
– bobble bubble
18 hours ago
What about eg
請 的 10 多 個 a
– bobble bubble
18 hours ago
2
2
@GrégoryNEUT
blabla
isn't a common metasyntactic variable in English, you might want to use foo
instead ;)– Aaron
12 hours ago
@GrégoryNEUT
blabla
isn't a common metasyntactic variable in English, you might want to use foo
instead ;)– Aaron
12 hours ago
|
show 5 more comments
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
add a comment |
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]
so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
,
([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)
And replace it by $1
Demo
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));
edited 19 hours ago
answered 19 hours ago
Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi
5,7722827
5,7722827
add a comment |
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
3
The space in front of the 10 is missing.
– holydragon
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
add a comment |
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
Try this
str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');
console.log(str);
edited 18 hours ago
answered 19 hours ago
Kamil KiełczewskiKamil Kiełczewski
9,28285892
9,28285892
3
The space in front of the 10 is missing.
– holydragon
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
add a comment |
3
The space in front of the 10 is missing.
– holydragon
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
3
3
The space in front of the 10 is missing.
– holydragon
19 hours ago
The space in front of the 10 is missing.
– holydragon
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
@holydragon it's fixed now
– Kamil Kiełczewski
19 hours ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
add a comment |
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';
var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];
var isChinese = function (str) {
var charCode;
var flag;
var range;
for (var i = 0; i < str.length;) {
charCode = str.codePointAt(i);
flag = false;
for (var j = 0; j < chineseRange.length; j++) {
range = chineseRange[j];
if (charCode >= range[0] && charCode <= range[1]) {
flag = true;
break;
}
}
if (!flag) {
return false;
}
if (charCode <= 0xffff) {
i++
} else {
i += 2
}
}
return true;
}
// for more information about chinese.js visite this demo in Github
//credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js
// I wrote this function to remove space between chinese word
var spl = chine.trim().split(/s+/);
var text = '';
for (var i = 0; i < spl.length; i++) {
if (isChinese(spl[i])) {
if (!isChinese(spl[i + 1])) {
text += spl[i] + ' ';
} else {
text += spl[i];
}
} else {
text += spl[i] + ' ';
}
}
console.log(text);
edited 18 hours ago
answered 19 hours ago
Younes ZaidiYounes Zaidi
4871415
4871415
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
add a comment |
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
A block of code with no explanation and negligible comments does not make an ideal answer.
– Rich
10 hours ago
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
add a comment |
This might be useful in your scenario. (?<![ -~]) (?![ -~])
This might be useful in your scenario. (?<![ -~]) (?![ -~])
edited 16 hours ago
Sebastian Hofmann
1,3214818
1,3214818
answered 17 hours ago
Shantanu PatwardhanShantanu Patwardhan
112
112
add a comment |
add a comment |
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
2
Does your spaces actually are
or you just used it guessing?– Justinas
19 hours ago
.replace(/ /g,'')
– Nitesh Virani
19 hours ago
2
Using the latest ECMAScript 2018 regex syntax you may use
s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')
– Wiktor Stribiżew
19 hours ago
Do you want to keep a space before
10
if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.– Wiktor Stribiżew
18 hours ago