Remove all spaces between Chinese words with regex












18















I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 2





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    19 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    19 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    19 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    18 hours ago
















18















I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
















  • 2





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    19 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    19 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    19 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    18 hours ago














18












18








18


5






I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.










share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.












I would like to remove all spaces among Chinese text only.



My text: "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?"



Ideal output: "請把這裡的 10 多個字合併. Can you help me?"



var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
str = str.replace("/ /", "");


I have studied a similar question for Python but it seems not to work in my situation so I brought my question here for some help.







javascript regex






share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 14 hours ago









Boann

36.7k1288121




36.7k1288121






New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 19 hours ago









Needa HellNeeda Hell

1025




1025




New contributor




Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Needa Hell is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.








  • 2





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    19 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    19 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    19 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    18 hours ago














  • 2





    Does your spaces actually are   or you just used it guessing?

    – Justinas
    19 hours ago











  • .replace(/ /g,'')

    – Nitesh Virani
    19 hours ago






  • 2





    Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

    – Wiktor Stribiżew
    19 hours ago











  • Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

    – Wiktor Stribiżew
    18 hours ago








2




2





Does your spaces actually are   or you just used it guessing?

– Justinas
19 hours ago





Does your spaces actually are   or you just used it guessing?

– Justinas
19 hours ago













.replace(/ /g,'')

– Nitesh Virani
19 hours ago





.replace(/ /g,'')

– Nitesh Virani
19 hours ago




2




2





Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
19 hours ago





Using the latest ECMAScript 2018 regex syntax you may use s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')

– Wiktor Stribiżew
19 hours ago













Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
18 hours ago





Do you want to keep a space before 10 if there are 2 spaces between a Chinese char and the digits? If yes, check my approach.

– Wiktor Stribiżew
18 hours ago












6 Answers
6






active

oldest

votes


















15














Getting to the Chinese char matching pattern



Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


In ES6, to match a single Chinese char, it can be used as



/[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


pattern to match any Chinese char using JS RegExp.



So, you may use



s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


See the regex demo.



If your JS environment is ECMAScript 2018 compliant you may use a shorter



s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


Pattern details





  • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


  • s+ - any 1+ whitespaces (any Unicode whitespace)


  • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


JS demo:






var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
// ECMAScript 2018 only
console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








share|improve this answer


























  • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

    – Wiktor Stribiżew
    19 hours ago











  • I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

    – Pac0
    9 hours ago













  • @Pac0 firefox has problems with "new" regexp e.g. here

    – Kamil Kiełczewski
    9 hours ago








  • 1





    @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

    – Wiktor Stribiżew
    8 hours ago













  • Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

    – Needa Hell
    3 hours ago



















18














Using @Brett Zamir soluce on how to match chinese character in regex



Javascript unicode string, chinese character but no punctuation








const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

const ret = str.replace(regex, '$1$2');

console.log(ret);







It looks like :



([foo chinese chars]) ([foo chinese chars])*





share|improve this answer





















  • 2





    The output here doesn't match with the ideal output. Notice the space in front of the 10.

    – holydragon
    19 hours ago











  • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

    – jonatjano
    19 hours ago






  • 1





    I've edited my post to match your desire

    – Grégory NEUT
    19 hours ago






  • 1





    What about eg 請 的 10 多 個 a

    – bobble bubble
    18 hours ago






  • 2





    @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

    – Aaron
    12 hours ago



















5














Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


And replace it by $1



Demo






var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








share|improve this answer

































    3














    Try this



    str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


    I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






    var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
    str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

    console.log(str);








    share|improve this answer





















    • 3





      The space in front of the 10 is missing.

      – holydragon
      19 hours ago











    • @holydragon it's fixed now

      – Kamil Kiełczewski
      19 hours ago





















    1

















    var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

    var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

    var isChinese = function (str) {
    var charCode;
    var flag;
    var range;
    for (var i = 0; i < str.length;) {
    charCode = str.codePointAt(i);
    flag = false;
    for (var j = 0; j < chineseRange.length; j++) {
    range = chineseRange[j];
    if (charCode >= range[0] && charCode <= range[1]) {
    flag = true;
    break;
    }
    }
    if (!flag) {
    return false;
    }
    if (charCode <= 0xffff) {
    i++
    } else {
    i += 2
    }
    }
    return true;
    }
    // for more information about chinese.js visite this demo in Github
    //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

    // I wrote this function to remove space between chinese word

    var spl = chine.trim().split(/s+/);
    var text = '';
    for (var i = 0; i < spl.length; i++) {
    if (isChinese(spl[i])) {
    if (!isChinese(spl[i + 1])) {
    text += spl[i] + ' ';
    } else {
    text += spl[i];
    }
    } else {
    text += spl[i] + ' ';
    }
    }
    console.log(text);








    share|improve this answer


























    • A block of code with no explanation and negligible comments does not make an ideal answer.

      – Rich
      10 hours ago



















    1














    This might be useful in your scenario. (?<![ -~]) (?![ -~])






    share|improve this answer

























      Your Answer






      StackExchange.ifUsing("editor", function () {
      StackExchange.using("externalEditor", function () {
      StackExchange.using("snippets", function () {
      StackExchange.snippets.init();
      });
      });
      }, "code-snippets");

      StackExchange.ready(function() {
      var channelOptions = {
      tags: "".split(" "),
      id: "1"
      };
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function() {
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled) {
      StackExchange.using("snippets", function() {
      createEditor();
      });
      }
      else {
      createEditor();
      }
      });

      function createEditor() {
      StackExchange.prepareEditor({
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: true,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: 10,
      bindNavPrevention: true,
      postfix: "",
      imageUploader: {
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      },
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      });


      }
      });






      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.










      draft saved

      draft discarded


















      StackExchange.ready(
      function () {
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
      }
      );

      Post as a guest















      Required, but never shown

























      6 Answers
      6






      active

      oldest

      votes








      6 Answers
      6






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      15














      Getting to the Chinese char matching pattern



      Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



      [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


      In ES6, to match a single Chinese char, it can be used as



      /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


      Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



      (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


      pattern to match any Chinese char using JS RegExp.



      So, you may use



      s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


      See the regex demo.



      If your JS environment is ECMAScript 2018 compliant you may use a shorter



      s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


      Pattern details





      • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


      • s+ - any 1+ whitespaces (any Unicode whitespace)


      • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


      JS demo:






      var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
      var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
      console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
      // ECMAScript 2018 only
      console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








      share|improve this answer


























      • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

        – Wiktor Stribiżew
        19 hours ago











      • I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

        – Pac0
        9 hours ago













      • @Pac0 firefox has problems with "new" regexp e.g. here

        – Kamil Kiełczewski
        9 hours ago








      • 1





        @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

        – Wiktor Stribiżew
        8 hours ago













      • Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

        – Needa Hell
        3 hours ago
















      15














      Getting to the Chinese char matching pattern



      Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



      [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


      In ES6, to match a single Chinese char, it can be used as



      /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


      Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



      (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


      pattern to match any Chinese char using JS RegExp.



      So, you may use



      s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


      See the regex demo.



      If your JS environment is ECMAScript 2018 compliant you may use a shorter



      s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


      Pattern details





      • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


      • s+ - any 1+ whitespaces (any Unicode whitespace)


      • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


      JS demo:






      var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
      var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
      console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
      // ECMAScript 2018 only
      console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








      share|improve this answer


























      • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

        – Wiktor Stribiżew
        19 hours ago











      • I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

        – Pac0
        9 hours ago













      • @Pac0 firefox has problems with "new" regexp e.g. here

        – Kamil Kiełczewski
        9 hours ago








      • 1





        @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

        – Wiktor Stribiżew
        8 hours ago













      • Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

        – Needa Hell
        3 hours ago














      15












      15








      15







      Getting to the Chinese char matching pattern



      Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



      [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


      In ES6, to match a single Chinese char, it can be used as



      /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


      Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



      (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


      pattern to match any Chinese char using JS RegExp.



      So, you may use



      s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


      See the regex demo.



      If your JS environment is ECMAScript 2018 compliant you may use a shorter



      s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


      Pattern details





      • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


      • s+ - any 1+ whitespaces (any Unicode whitespace)


      • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


      JS demo:






      var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
      var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
      console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
      // ECMAScript 2018 only
      console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








      share|improve this answer















      Getting to the Chinese char matching pattern



      Using the Unicode Tools, the p{Han} Unicode property class that matches any Chinese char can be translated into



      [u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9U00020000-U0002A6D6U0002A700-U0002B734U0002B740-U0002B81DU0002B820-U0002CEA1U0002CEB0-U0002EBE0U0002F800-U0002FA1D]


      In ES6, to match a single Chinese char, it can be used as



      /[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9u{20000}-u{2A6D6}u{2A700}-u{2B734}u{2B740}-u{2B81D}u{2B820}-u{2CEA1}u{2CEB0}-u{2EBE0}u{2F800}-u{2FA1D}]/u


      Transpiling it to ES5 using ES2015 Unicode regular expression transpiler, we get



      (?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])


      pattern to match any Chinese char using JS RegExp.



      So, you may use



      s.replace(/([u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D])s+(?=(?:[u2E80-u2E99u2E9B-u2EF3u2F00-u2FD5u3005u3007u3021-u3029u3038-u303Bu3400-u4DB5u4E00-u9FEFuF900-uFA6DuFA70-uFAD9]|[uD840-uD868uD86A-uD86CuD86F-uD872uD874-uD879][uDC00-uDFFF]|uD869[uDC00-uDED6uDF00-uDFFF]|uD86D[uDC00-uDF34uDF40-uDFFF]|uD86E[uDC00-uDC1DuDC20-uDFFF]|uD873[uDC00-uDEA1uDEB0-uDFFF]|uD87A[uDC00-uDFE0]|uD87E[uDC00-uDE1D]))/g, '$1')


      See the regex demo.



      If your JS environment is ECMAScript 2018 compliant you may use a shorter



      s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1')


      Pattern details





      • (CHINESE_CHAR_PATTERN) - Capturing group 1 ($1 in the replacement pattern): any Chinese char


      • s+ - any 1+ whitespaces (any Unicode whitespace)


      • (?=CHINESE_CHAR_PATTERN) - there must be a Chinese char immediately to the right of the current location.


      JS demo:






      var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
      var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
      console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
      // ECMAScript 2018 only
      console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));








      var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
      var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
      console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
      // ECMAScript 2018 only
      console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));





      var s = "請 把 這 裡 的 10 多 個 字 合 併. Can you help me?";
      var HanChr = "[\u2E80-\u2E99\u2E9B-\u2EF3\u2F00-\u2FD5\u3005\u3007\u3021-\u3029\u3038-\u303B\u3400-\u4DB5\u4E00-\u9FEF\uF900-\uFA6D\uFA70-\uFAD9]|[\uD840-\uD868\uD86A-\uD86C\uD86F-\uD872\uD874-\uD879][\uDC00-\uDFFF]|\uD869[\uDC00-\uDED6\uDF00-\uDFFF]|\uD86D[\uDC00-\uDF34\uDF40-\uDFFF]|\uD86E[\uDC00-\uDC1D\uDC20-\uDFFF]|\uD873[\uDC00-\uDEA1\uDEB0-\uDFFF]|\uD87A[\uDC00-\uDFE0]|\uD87E[\uDC00-\uDE1D]";
      console.log(s.replace(new RegExp('(' + HanChr + ')\s+(?=(?:' + HanChr + '))', 'g'), '$1'));
      // ECMAScript 2018 only
      console.log(s.replace(/(p{Script=Hani})s+(?=p{Script=Hani})/gu, '$1'));






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 18 hours ago

























      answered 19 hours ago









      Wiktor StribiżewWiktor Stribiżew

      310k16131207




      310k16131207













      • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

        – Wiktor Stribiżew
        19 hours ago











      • I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

        – Pac0
        9 hours ago













      • @Pac0 firefox has problems with "new" regexp e.g. here

        – Kamil Kiełczewski
        9 hours ago








      • 1





        @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

        – Wiktor Stribiżew
        8 hours ago













      • Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

        – Needa Hell
        3 hours ago



















      • FYI: if only one whitespace is expected between Chinese chars, remove + after s.

        – Wiktor Stribiżew
        19 hours ago











      • I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

        – Pac0
        9 hours ago













      • @Pac0 firefox has problems with "new" regexp e.g. here

        – Kamil Kiełczewski
        9 hours ago








      • 1





        @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

        – Wiktor Stribiżew
        8 hours ago













      • Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

        – Needa Hell
        3 hours ago

















      FYI: if only one whitespace is expected between Chinese chars, remove + after s.

      – Wiktor Stribiżew
      19 hours ago





      FYI: if only one whitespace is expected between Chinese chars, remove + after s.

      – Wiktor Stribiżew
      19 hours ago













      I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

      – Pac0
      9 hours ago







      I get " { "message": "SyntaxError: invalid identity escape in regular expression", "filename": "stacksnippets.net/js", "lineno": 17, "colno": 22 }" When I run the snippet. (Using Firefox 62)

      – Pac0
      9 hours ago















      @Pac0 firefox has problems with "new" regexp e.g. here

      – Kamil Kiełczewski
      9 hours ago







      @Pac0 firefox has problems with "new" regexp e.g. here

      – Kamil Kiełczewski
      9 hours ago






      1




      1





      @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

      – Wiktor Stribiżew
      8 hours ago







      @Pac0 That is because of /(p{Script=Hani})s+(?=p{Script=Hani})/gu, FF does not support ECMAScript 2018 Unicode property classes. Chrome does.

      – Wiktor Stribiżew
      8 hours ago















      Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

      – Needa Hell
      3 hours ago





      Thanks Wiktor, I have compared the answers. And this seems would be the most detailed and worked answer to my question.

      – Needa Hell
      3 hours ago













      18














      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([foo chinese chars]) ([foo chinese chars])*





      share|improve this answer





















      • 2





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        19 hours ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        19 hours ago






      • 1





        I've edited my post to match your desire

        – Grégory NEUT
        19 hours ago






      • 1





        What about eg 請 的 10 多 個 a

        – bobble bubble
        18 hours ago






      • 2





        @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

        – Aaron
        12 hours ago
















      18














      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([foo chinese chars]) ([foo chinese chars])*





      share|improve this answer





















      • 2





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        19 hours ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        19 hours ago






      • 1





        I've edited my post to match your desire

        – Grégory NEUT
        19 hours ago






      • 1





        What about eg 請 的 10 多 個 a

        – bobble bubble
        18 hours ago






      • 2





        @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

        – Aaron
        12 hours ago














      18












      18








      18







      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([foo chinese chars]) ([foo chinese chars])*





      share|improve this answer















      Using @Brett Zamir soluce on how to match chinese character in regex



      Javascript unicode string, chinese character but no punctuation








      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);







      It looks like :



      ([foo chinese chars]) ([foo chinese chars])*





      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);





      const str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';

      const regex = new RegExp('([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d]) ([u4E00-u9FCCu3400-u4DB5uFA0EuFA0FuFA11uFA13uFA14uFA1FuFA21uFA23uFA24uFA27-uFA29]|[ud840-ud868][udc00-udfff]|ud869[udc00-uded6udf00-udfff]|[ud86a-ud86c][udc00-udfff]|ud86d[udc00-udf34udf40-udfff]|ud86e[udc00-udc1d])* ', 'g');

      const ret = str.replace(regex, '$1$2');

      console.log(ret);






      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited 12 hours ago

























      answered 19 hours ago









      Grégory NEUTGrégory NEUT

      8,79921538




      8,79921538








      • 2





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        19 hours ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        19 hours ago






      • 1





        I've edited my post to match your desire

        – Grégory NEUT
        19 hours ago






      • 1





        What about eg 請 的 10 多 個 a

        – bobble bubble
        18 hours ago






      • 2





        @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

        – Aaron
        12 hours ago














      • 2





        The output here doesn't match with the ideal output. Notice the space in front of the 10.

        – holydragon
        19 hours ago











      • you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

        – jonatjano
        19 hours ago






      • 1





        I've edited my post to match your desire

        – Grégory NEUT
        19 hours ago






      • 1





        What about eg 請 的 10 多 個 a

        – bobble bubble
        18 hours ago






      • 2





        @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

        – Aaron
        12 hours ago








      2




      2





      The output here doesn't match with the ideal output. Notice the space in front of the 10.

      – holydragon
      19 hours ago





      The output here doesn't match with the ideal output. Notice the space in front of the 10.

      – holydragon
      19 hours ago













      you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

      – jonatjano
      19 hours ago





      you lose the space before the 10 at the center of the chineses word but still you found the right way to select chinese characters :p

      – jonatjano
      19 hours ago




      1




      1





      I've edited my post to match your desire

      – Grégory NEUT
      19 hours ago





      I've edited my post to match your desire

      – Grégory NEUT
      19 hours ago




      1




      1





      What about eg 請 的 10 多 個 a

      – bobble bubble
      18 hours ago





      What about eg 請 的 10 多 個 a

      – bobble bubble
      18 hours ago




      2




      2





      @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

      – Aaron
      12 hours ago





      @GrégoryNEUT blabla isn't a common metasyntactic variable in English, you might want to use foo instead ;)

      – Aaron
      12 hours ago











      5














      Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



      ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


      And replace it by $1



      Demo






      var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
      console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








      share|improve this answer






























        5














        Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



        ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


        And replace it by $1



        Demo






        var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
        console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








        share|improve this answer




























          5












          5








          5







          Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



          ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


          And replace it by $1



          Demo






          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








          share|improve this answer















          Range for Chinese characters can be written as [u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC] so you can use this regex which selects a chinese character and a space and ensures it is followed by a chinese character by this look ahead (?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+),



          ([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)


          And replace it by $1



          Demo






          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));








          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));





          var str = '請 把把把把把 這 裡裡裡裡裡 的 10 多多多多 個 字 合 併. Can you help me?';
          console.log(str.replace(/([u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)s+(?=[u2E80-u2FD5u3190-u319fu3400-u4DBFu4E00-u9FCC]+)/g, "$1"));






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited 19 hours ago

























          answered 19 hours ago









          Pushpesh Kumar RajwanshiPushpesh Kumar Rajwanshi

          5,7722827




          5,7722827























              3














              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              share|improve this answer





















              • 3





                The space in front of the 10 is missing.

                – holydragon
                19 hours ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                19 hours ago


















              3














              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              share|improve this answer





















              • 3





                The space in front of the 10 is missing.

                – holydragon
                19 hours ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                19 hours ago
















              3












              3








              3







              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              share|improve this answer















              Try this



              str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');


              I get codes u4E00-u9FCC from here - it contains ~20000 chars (enough for daily usage)






              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);








              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);





              var str = '請 把 這 裡 的 10 多 個 字 合 併. Can you help me?';
              str = str.replace(/ ([u4E00-u9FCC])|([0-9]+ )/g, '$1$2');

              console.log(str);






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 18 hours ago

























              answered 19 hours ago









              Kamil KiełczewskiKamil Kiełczewski

              9,28285892




              9,28285892








              • 3





                The space in front of the 10 is missing.

                – holydragon
                19 hours ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                19 hours ago
















              • 3





                The space in front of the 10 is missing.

                – holydragon
                19 hours ago











              • @holydragon it's fixed now

                – Kamil Kiełczewski
                19 hours ago










              3




              3





              The space in front of the 10 is missing.

              – holydragon
              19 hours ago





              The space in front of the 10 is missing.

              – holydragon
              19 hours ago













              @holydragon it's fixed now

              – Kamil Kiełczewski
              19 hours ago







              @holydragon it's fixed now

              – Kamil Kiełczewski
              19 hours ago













              1

















              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);








              share|improve this answer


























              • A block of code with no explanation and negligible comments does not make an ideal answer.

                – Rich
                10 hours ago
















              1

















              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);








              share|improve this answer


























              • A block of code with no explanation and negligible comments does not make an ideal answer.

                – Rich
                10 hours ago














              1












              1








              1










              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);








              share|improve this answer


















              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);








              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);





              var chine = '請 把 這 裡 的 10 多 個 字 合 併 . Can you help me?';

              var chineseRange = [[0x4e00, 0x9fff],[0x3400, 0x4dbf],[0x20000, 0x2a6df],[0x2a700, 0x2b73f],[0x2b740, 0x2b81f],[0x2b820, 0x2ceaf],[0xf900, 0xfaff],[0x3300, 0x33ff],[0xfe30, 0xfe4f],[0xf900, 0xfaff],[0x2f800, 0x2fa1f]];

              var isChinese = function (str) {
              var charCode;
              var flag;
              var range;
              for (var i = 0; i < str.length;) {
              charCode = str.codePointAt(i);
              flag = false;
              for (var j = 0; j < chineseRange.length; j++) {
              range = chineseRange[j];
              if (charCode >= range[0] && charCode <= range[1]) {
              flag = true;
              break;
              }
              }
              if (!flag) {
              return false;
              }
              if (charCode <= 0xffff) {
              i++
              } else {
              i += 2
              }
              }
              return true;
              }
              // for more information about chinese.js visite this demo in Github
              //credit https://github.com/alsotang/is-chinese/blob/master/ischinese.js

              // I wrote this function to remove space between chinese word

              var spl = chine.trim().split(/s+/);
              var text = '';
              for (var i = 0; i < spl.length; i++) {
              if (isChinese(spl[i])) {
              if (!isChinese(spl[i + 1])) {
              text += spl[i] + ' ';
              } else {
              text += spl[i];
              }
              } else {
              text += spl[i] + ' ';
              }
              }
              console.log(text);






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited 18 hours ago

























              answered 19 hours ago









              Younes ZaidiYounes Zaidi

              4871415




              4871415













              • A block of code with no explanation and negligible comments does not make an ideal answer.

                – Rich
                10 hours ago



















              • A block of code with no explanation and negligible comments does not make an ideal answer.

                – Rich
                10 hours ago

















              A block of code with no explanation and negligible comments does not make an ideal answer.

              – Rich
              10 hours ago





              A block of code with no explanation and negligible comments does not make an ideal answer.

              – Rich
              10 hours ago











              1














              This might be useful in your scenario. (?<![ -~]) (?![ -~])






              share|improve this answer






























                1














                This might be useful in your scenario. (?<![ -~]) (?![ -~])






                share|improve this answer




























                  1












                  1








                  1







                  This might be useful in your scenario. (?<![ -~]) (?![ -~])






                  share|improve this answer















                  This might be useful in your scenario. (?<![ -~]) (?![ -~])







                  share|improve this answer














                  share|improve this answer



                  share|improve this answer








                  edited 16 hours ago









                  Sebastian Hofmann

                  1,3214818




                  1,3214818










                  answered 17 hours ago









                  Shantanu PatwardhanShantanu Patwardhan

                  112




                  112






















                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.










                      draft saved

                      draft discarded


















                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.













                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.












                      Needa Hell is a new contributor. Be nice, and check out our Code of Conduct.
















                      Thanks for contributing an answer to Stack Overflow!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid



                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function () {
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fstackoverflow.com%2fquestions%2f54179179%2fremove-all-spaces-between-chinese-words-with-regex%23new-answer', 'question_page');
                      }
                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Список кардиналов, возведённых папой римским Каликстом III

                      Deduzione

                      Mysql.sock missing - “Can't connect to local MySQL server through socket”