How to bulk-rename files with invalid encoding or bulk-replace invalid encoded characters?











up vote
13
down vote

favorite
4












I have a debian server and I'm hosting music for an internet radio station. I have trouble with file names and paths because a lot of files got an invalid encoding, for example:



./music/Bändname - Some Title - additional Info/B�ndname - 07 - This Title Is Cörtain, The EncÃding Not.mp3


Ideally, I would like to remove everything that is not letters A-Z/a-z or numbers 0-9 or dash -/underscore _... The result should look like something like that:



./music/Bndname-SomeTitle-additionalInfo/Bndname-07-ThisTitleIsCrtain,TheEncdingNot.mp3


How to achieve this for a batch of a lot of files and directories?



I've seen this similar question: bulk rename (or correctly display) files with special characters



But this only fixes the encoding, I would prefer a more strict approach as described above.










share|improve this question




























    up vote
    13
    down vote

    favorite
    4












    I have a debian server and I'm hosting music for an internet radio station. I have trouble with file names and paths because a lot of files got an invalid encoding, for example:



    ./music/Bändname - Some Title - additional Info/B�ndname - 07 - This Title Is Cörtain, The EncÃding Not.mp3


    Ideally, I would like to remove everything that is not letters A-Z/a-z or numbers 0-9 or dash -/underscore _... The result should look like something like that:



    ./music/Bndname-SomeTitle-additionalInfo/Bndname-07-ThisTitleIsCrtain,TheEncdingNot.mp3


    How to achieve this for a batch of a lot of files and directories?



    I've seen this similar question: bulk rename (or correctly display) files with special characters



    But this only fixes the encoding, I would prefer a more strict approach as described above.










    share|improve this question


























      up vote
      13
      down vote

      favorite
      4









      up vote
      13
      down vote

      favorite
      4






      4





      I have a debian server and I'm hosting music for an internet radio station. I have trouble with file names and paths because a lot of files got an invalid encoding, for example:



      ./music/Bändname - Some Title - additional Info/B�ndname - 07 - This Title Is Cörtain, The EncÃding Not.mp3


      Ideally, I would like to remove everything that is not letters A-Z/a-z or numbers 0-9 or dash -/underscore _... The result should look like something like that:



      ./music/Bndname-SomeTitle-additionalInfo/Bndname-07-ThisTitleIsCrtain,TheEncdingNot.mp3


      How to achieve this for a batch of a lot of files and directories?



      I've seen this similar question: bulk rename (or correctly display) files with special characters



      But this only fixes the encoding, I would prefer a more strict approach as described above.










      share|improve this question















      I have a debian server and I'm hosting music for an internet radio station. I have trouble with file names and paths because a lot of files got an invalid encoding, for example:



      ./music/Bändname - Some Title - additional Info/B�ndname - 07 - This Title Is Cörtain, The EncÃding Not.mp3


      Ideally, I would like to remove everything that is not letters A-Z/a-z or numbers 0-9 or dash -/underscore _... The result should look like something like that:



      ./music/Bndname-SomeTitle-additionalInfo/Bndname-07-ThisTitleIsCrtain,TheEncdingNot.mp3


      How to achieve this for a batch of a lot of files and directories?



      I've seen this similar question: bulk rename (or correctly display) files with special characters



      But this only fixes the encoding, I would prefer a more strict approach as described above.







      linux batch encoding bulk






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Apr 13 '17 at 12:37









      Community

      1




      1










      asked Jan 18 '13 at 10:49









      Afri

      59741227




      59741227






















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          13
          down vote



          accepted










          You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file is easy enough. But you want to make sure the directories are also renamed. You can't simply mv Motörhead/Encöding Motorhead/Encoding since Motorhead won't exist at the time of the call.



          So, we need a depth-first traversal of all files and folders, and then rename the current file or folder only. The following works with GNU find and Bash 4.2.42 on my OS X.



          #!/usr/bin/env bash
          find "$1" -depth -print0 | while IFS= read -r -d '' file; do
          d="$( dirname "$file" )"
          f="$( basename "$file" )"
          new="${f//[^a-zA-Z0-9/._-]/}"
          if [ "$f" != "$new" ] # if equal, name is already clean, so leave alone
          then
          if [ -e "$d/$new" ]
          then
          echo "Notice: "$new" and "$f" both exist in "$d":"
          ls -ld "$d/$new" "$d/$f"
          else
          echo mv "$file" "$d/$new" # remove "echo" to actually rename things
          fi
          fi
          done


          You may change the regex by using new="${f//[\/:*?"<>|]/}" if you want to replace anything that Windows cannot handle.



          Save this script as rename.sh, make it executable with chmod +x rename.sh. Then, call it like rename.sh /some/path.



          Make sure to resolve any file name collisions (“Notice” announcements).



          If you're absolutely sure it does the right replacements, remove the echo from the script to actually rename things instead of just printing what it does.



          To be safe, I'd recommend testing this on a small subset of files first.





          Options explained



          To explain what goes on here:





          • -depth will ensure directories are recursed depth-first, so we can "roll up" everything from the end. Usually, find traverses differently (but not breadth-first).


          • -print0 ensures the find output is null-delimited, so we can read it with read -d '' into the file variable. Doing so helps us deal with all kinds of weird file names, including ones with spaces, and even newlines.

          • We'll get the directory of the file with dirname. Don't forget to always quote your variables properly, otherwise any path with spaces or globbing characters would break this script.

          • We'll get the actual filename (or directory name) with basename.

          • Then, we remove any invalid character from $f using Bash's string replacement capabilities. Invalid means anything that's not a lower- or uppercase letter, a digit, a slash (/), a dot (.), an underscore, or a minus-hyphen.

          • If $f is already clean (the cleaned name is identical to the current name), skip it.

          • If $new already exists in directory $d (e.g., you have files named resume and résumé in the same directory), issue a warning. You don't want to rename it, because, on some systems, mv foo foo causes a problem.  Otherwise,

          • We finally rename the original file (or directory) to its new name


          Since this will only act on the deepest hierarchy, renaming Motörhead/Encöding to Motorhead/Encoding is done in two steps:




          1. mv Motörhead/Encöding Motörhead/Encoding

          2. mv Motörhead Motorhead


          This ensures all replacements are done in the correct order.





          Example files and test run



          Let's assume some files in a base folder called test:



          test
          test/Motörhead
          test/Motörhead/anöther_file.mp3
          test/Motörhead/Encöding
          test/Randöm
          test/Täst
          test/Täst/Töst
          test/with space
          test/with-hyphen.txt
          test/work
          test/work/resume
          test/work/résumé
          test/work/schedule


          Here is the output from a run in debug mode (with the echo in front of the mv),
          i.e., the commands that would be called, and the collision warnings:



          mv test/Motörhead/anöther_file.mp3 test/Motörhead/another_file.mp3
          mv test/Motörhead/Encöding test/Motörhead/Encoding
          mv test/Motörhead test/Motorhead
          mv test/Randöm test/Random
          mv test/Täst/Töst test/Täst/Tost
          mv test/Täst test/Tast
          mv test/with space test/withspace
          Notice: "resume" and "résumé" both exist in test/work:
          -rw-r—r--  …  …  test/work/resume
          -rw-r—r--  …  …  test/work/résumé


          Notice the absence of messages for with-hyphen.txt, schedule, and test itself.






          share|improve this answer



















          • 1




            You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
            – Scott
            Jan 18 '13 at 21:00










          • Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
            – slhck
            Jan 18 '13 at 21:12










          • I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
            – Scott
            Jan 19 '13 at 0:48










          • +1 for using the example with "Encöding" Too much fön!:-)
            – Marcel
            Mar 22 '14 at 21:25










          • After three years I still come back here. so usefull! :-)
            – Afri
            Apr 16 '16 at 12:08


















          up vote
          14
          down vote













          I know that it's not exactly what you wanted, but if you know the original encoding, perhaps you can use convmv to change the encoding to UTF-8, which should fix most problems.



          This worked for me on a folder with some invalid-encoded Polish filenames:



          convmv -f cp1250 -t utf8 -r .


          Note that this command doesn't actually rename anything; add --notest option to really rename the files.






          share|improve this answer



















          • 1




            For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
            – user273265
            Nov 11 '13 at 20:14








          • 1




            By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
            – user273265
            Nov 11 '13 at 20:14




















          up vote
          0
          down vote













          I know, you asked about renaming.



          But you can dodge the problem quite easily using software like MusicBrainz Picard.



          It is capable of identifying music (audio fingerprinting), downloading all the necessary data (including cover images, where available) from the huge MusicBrainz database and moving the files around so that your collection can fit any pattern you like. I'm using it for years and it always worked perfectly with anything from Cyrilic to Arabic; and of course (at least for Latin-based scripts) it can also do the conversion to ASCII.



          With this approach it does not really matter how messy/badly named your collection really is, as long as the files are readable and complete.



          (Did I mention it's free? Both as in free speech and as in free beer? Both the software and the database..?)






          share|improve this answer





















            Your Answer








            StackExchange.ready(function() {
            var channelOptions = {
            tags: "".split(" "),
            id: "3"
            };
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function() {
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled) {
            StackExchange.using("snippets", function() {
            createEditor();
            });
            }
            else {
            createEditor();
            }
            });

            function createEditor() {
            StackExchange.prepareEditor({
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            imageUploader: {
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            },
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            });


            }
            });














            draft saved

            draft discarded


















            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f538161%2fhow-to-bulk-rename-files-with-invalid-encoding-or-bulk-replace-invalid-encoded-c%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown

























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            13
            down vote



            accepted










            You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file is easy enough. But you want to make sure the directories are also renamed. You can't simply mv Motörhead/Encöding Motorhead/Encoding since Motorhead won't exist at the time of the call.



            So, we need a depth-first traversal of all files and folders, and then rename the current file or folder only. The following works with GNU find and Bash 4.2.42 on my OS X.



            #!/usr/bin/env bash
            find "$1" -depth -print0 | while IFS= read -r -d '' file; do
            d="$( dirname "$file" )"
            f="$( basename "$file" )"
            new="${f//[^a-zA-Z0-9/._-]/}"
            if [ "$f" != "$new" ] # if equal, name is already clean, so leave alone
            then
            if [ -e "$d/$new" ]
            then
            echo "Notice: "$new" and "$f" both exist in "$d":"
            ls -ld "$d/$new" "$d/$f"
            else
            echo mv "$file" "$d/$new" # remove "echo" to actually rename things
            fi
            fi
            done


            You may change the regex by using new="${f//[\/:*?"<>|]/}" if you want to replace anything that Windows cannot handle.



            Save this script as rename.sh, make it executable with chmod +x rename.sh. Then, call it like rename.sh /some/path.



            Make sure to resolve any file name collisions (“Notice” announcements).



            If you're absolutely sure it does the right replacements, remove the echo from the script to actually rename things instead of just printing what it does.



            To be safe, I'd recommend testing this on a small subset of files first.





            Options explained



            To explain what goes on here:





            • -depth will ensure directories are recursed depth-first, so we can "roll up" everything from the end. Usually, find traverses differently (but not breadth-first).


            • -print0 ensures the find output is null-delimited, so we can read it with read -d '' into the file variable. Doing so helps us deal with all kinds of weird file names, including ones with spaces, and even newlines.

            • We'll get the directory of the file with dirname. Don't forget to always quote your variables properly, otherwise any path with spaces or globbing characters would break this script.

            • We'll get the actual filename (or directory name) with basename.

            • Then, we remove any invalid character from $f using Bash's string replacement capabilities. Invalid means anything that's not a lower- or uppercase letter, a digit, a slash (/), a dot (.), an underscore, or a minus-hyphen.

            • If $f is already clean (the cleaned name is identical to the current name), skip it.

            • If $new already exists in directory $d (e.g., you have files named resume and résumé in the same directory), issue a warning. You don't want to rename it, because, on some systems, mv foo foo causes a problem.  Otherwise,

            • We finally rename the original file (or directory) to its new name


            Since this will only act on the deepest hierarchy, renaming Motörhead/Encöding to Motorhead/Encoding is done in two steps:




            1. mv Motörhead/Encöding Motörhead/Encoding

            2. mv Motörhead Motorhead


            This ensures all replacements are done in the correct order.





            Example files and test run



            Let's assume some files in a base folder called test:



            test
            test/Motörhead
            test/Motörhead/anöther_file.mp3
            test/Motörhead/Encöding
            test/Randöm
            test/Täst
            test/Täst/Töst
            test/with space
            test/with-hyphen.txt
            test/work
            test/work/resume
            test/work/résumé
            test/work/schedule


            Here is the output from a run in debug mode (with the echo in front of the mv),
            i.e., the commands that would be called, and the collision warnings:



            mv test/Motörhead/anöther_file.mp3 test/Motörhead/another_file.mp3
            mv test/Motörhead/Encöding test/Motörhead/Encoding
            mv test/Motörhead test/Motorhead
            mv test/Randöm test/Random
            mv test/Täst/Töst test/Täst/Tost
            mv test/Täst test/Tast
            mv test/with space test/withspace
            Notice: "resume" and "résumé" both exist in test/work:
            -rw-r—r--  …  …  test/work/resume
            -rw-r—r--  …  …  test/work/résumé


            Notice the absence of messages for with-hyphen.txt, schedule, and test itself.






            share|improve this answer



















            • 1




              You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
              – Scott
              Jan 18 '13 at 21:00










            • Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
              – slhck
              Jan 18 '13 at 21:12










            • I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
              – Scott
              Jan 19 '13 at 0:48










            • +1 for using the example with "Encöding" Too much fön!:-)
              – Marcel
              Mar 22 '14 at 21:25










            • After three years I still come back here. so usefull! :-)
              – Afri
              Apr 16 '16 at 12:08















            up vote
            13
            down vote



            accepted










            You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file is easy enough. But you want to make sure the directories are also renamed. You can't simply mv Motörhead/Encöding Motorhead/Encoding since Motorhead won't exist at the time of the call.



            So, we need a depth-first traversal of all files and folders, and then rename the current file or folder only. The following works with GNU find and Bash 4.2.42 on my OS X.



            #!/usr/bin/env bash
            find "$1" -depth -print0 | while IFS= read -r -d '' file; do
            d="$( dirname "$file" )"
            f="$( basename "$file" )"
            new="${f//[^a-zA-Z0-9/._-]/}"
            if [ "$f" != "$new" ] # if equal, name is already clean, so leave alone
            then
            if [ -e "$d/$new" ]
            then
            echo "Notice: "$new" and "$f" both exist in "$d":"
            ls -ld "$d/$new" "$d/$f"
            else
            echo mv "$file" "$d/$new" # remove "echo" to actually rename things
            fi
            fi
            done


            You may change the regex by using new="${f//[\/:*?"<>|]/}" if you want to replace anything that Windows cannot handle.



            Save this script as rename.sh, make it executable with chmod +x rename.sh. Then, call it like rename.sh /some/path.



            Make sure to resolve any file name collisions (“Notice” announcements).



            If you're absolutely sure it does the right replacements, remove the echo from the script to actually rename things instead of just printing what it does.



            To be safe, I'd recommend testing this on a small subset of files first.





            Options explained



            To explain what goes on here:





            • -depth will ensure directories are recursed depth-first, so we can "roll up" everything from the end. Usually, find traverses differently (but not breadth-first).


            • -print0 ensures the find output is null-delimited, so we can read it with read -d '' into the file variable. Doing so helps us deal with all kinds of weird file names, including ones with spaces, and even newlines.

            • We'll get the directory of the file with dirname. Don't forget to always quote your variables properly, otherwise any path with spaces or globbing characters would break this script.

            • We'll get the actual filename (or directory name) with basename.

            • Then, we remove any invalid character from $f using Bash's string replacement capabilities. Invalid means anything that's not a lower- or uppercase letter, a digit, a slash (/), a dot (.), an underscore, or a minus-hyphen.

            • If $f is already clean (the cleaned name is identical to the current name), skip it.

            • If $new already exists in directory $d (e.g., you have files named resume and résumé in the same directory), issue a warning. You don't want to rename it, because, on some systems, mv foo foo causes a problem.  Otherwise,

            • We finally rename the original file (or directory) to its new name


            Since this will only act on the deepest hierarchy, renaming Motörhead/Encöding to Motorhead/Encoding is done in two steps:




            1. mv Motörhead/Encöding Motörhead/Encoding

            2. mv Motörhead Motorhead


            This ensures all replacements are done in the correct order.





            Example files and test run



            Let's assume some files in a base folder called test:



            test
            test/Motörhead
            test/Motörhead/anöther_file.mp3
            test/Motörhead/Encöding
            test/Randöm
            test/Täst
            test/Täst/Töst
            test/with space
            test/with-hyphen.txt
            test/work
            test/work/resume
            test/work/résumé
            test/work/schedule


            Here is the output from a run in debug mode (with the echo in front of the mv),
            i.e., the commands that would be called, and the collision warnings:



            mv test/Motörhead/anöther_file.mp3 test/Motörhead/another_file.mp3
            mv test/Motörhead/Encöding test/Motörhead/Encoding
            mv test/Motörhead test/Motorhead
            mv test/Randöm test/Random
            mv test/Täst/Töst test/Täst/Tost
            mv test/Täst test/Tast
            mv test/with space test/withspace
            Notice: "resume" and "résumé" both exist in test/work:
            -rw-r—r--  …  …  test/work/resume
            -rw-r—r--  …  …  test/work/résumé


            Notice the absence of messages for with-hyphen.txt, schedule, and test itself.






            share|improve this answer



















            • 1




              You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
              – Scott
              Jan 18 '13 at 21:00










            • Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
              – slhck
              Jan 18 '13 at 21:12










            • I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
              – Scott
              Jan 19 '13 at 0:48










            • +1 for using the example with "Encöding" Too much fön!:-)
              – Marcel
              Mar 22 '14 at 21:25










            • After three years I still come back here. so usefull! :-)
              – Afri
              Apr 16 '16 at 12:08













            up vote
            13
            down vote



            accepted







            up vote
            13
            down vote



            accepted






            You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file is easy enough. But you want to make sure the directories are also renamed. You can't simply mv Motörhead/Encöding Motorhead/Encoding since Motorhead won't exist at the time of the call.



            So, we need a depth-first traversal of all files and folders, and then rename the current file or folder only. The following works with GNU find and Bash 4.2.42 on my OS X.



            #!/usr/bin/env bash
            find "$1" -depth -print0 | while IFS= read -r -d '' file; do
            d="$( dirname "$file" )"
            f="$( basename "$file" )"
            new="${f//[^a-zA-Z0-9/._-]/}"
            if [ "$f" != "$new" ] # if equal, name is already clean, so leave alone
            then
            if [ -e "$d/$new" ]
            then
            echo "Notice: "$new" and "$f" both exist in "$d":"
            ls -ld "$d/$new" "$d/$f"
            else
            echo mv "$file" "$d/$new" # remove "echo" to actually rename things
            fi
            fi
            done


            You may change the regex by using new="${f//[\/:*?"<>|]/}" if you want to replace anything that Windows cannot handle.



            Save this script as rename.sh, make it executable with chmod +x rename.sh. Then, call it like rename.sh /some/path.



            Make sure to resolve any file name collisions (“Notice” announcements).



            If you're absolutely sure it does the right replacements, remove the echo from the script to actually rename things instead of just printing what it does.



            To be safe, I'd recommend testing this on a small subset of files first.





            Options explained



            To explain what goes on here:





            • -depth will ensure directories are recursed depth-first, so we can "roll up" everything from the end. Usually, find traverses differently (but not breadth-first).


            • -print0 ensures the find output is null-delimited, so we can read it with read -d '' into the file variable. Doing so helps us deal with all kinds of weird file names, including ones with spaces, and even newlines.

            • We'll get the directory of the file with dirname. Don't forget to always quote your variables properly, otherwise any path with spaces or globbing characters would break this script.

            • We'll get the actual filename (or directory name) with basename.

            • Then, we remove any invalid character from $f using Bash's string replacement capabilities. Invalid means anything that's not a lower- or uppercase letter, a digit, a slash (/), a dot (.), an underscore, or a minus-hyphen.

            • If $f is already clean (the cleaned name is identical to the current name), skip it.

            • If $new already exists in directory $d (e.g., you have files named resume and résumé in the same directory), issue a warning. You don't want to rename it, because, on some systems, mv foo foo causes a problem.  Otherwise,

            • We finally rename the original file (or directory) to its new name


            Since this will only act on the deepest hierarchy, renaming Motörhead/Encöding to Motorhead/Encoding is done in two steps:




            1. mv Motörhead/Encöding Motörhead/Encoding

            2. mv Motörhead Motorhead


            This ensures all replacements are done in the correct order.





            Example files and test run



            Let's assume some files in a base folder called test:



            test
            test/Motörhead
            test/Motörhead/anöther_file.mp3
            test/Motörhead/Encöding
            test/Randöm
            test/Täst
            test/Täst/Töst
            test/with space
            test/with-hyphen.txt
            test/work
            test/work/resume
            test/work/résumé
            test/work/schedule


            Here is the output from a run in debug mode (with the echo in front of the mv),
            i.e., the commands that would be called, and the collision warnings:



            mv test/Motörhead/anöther_file.mp3 test/Motörhead/another_file.mp3
            mv test/Motörhead/Encöding test/Motörhead/Encoding
            mv test/Motörhead test/Motorhead
            mv test/Randöm test/Random
            mv test/Täst/Töst test/Täst/Tost
            mv test/Täst test/Tast
            mv test/with space test/withspace
            Notice: "resume" and "résumé" both exist in test/work:
            -rw-r—r--  …  …  test/work/resume
            -rw-r—r--  …  …  test/work/résumé


            Notice the absence of messages for with-hyphen.txt, schedule, and test itself.






            share|improve this answer














            You're going to run in some problems if you want to rename files and directories at the same time. Renaming just a file is easy enough. But you want to make sure the directories are also renamed. You can't simply mv Motörhead/Encöding Motorhead/Encoding since Motorhead won't exist at the time of the call.



            So, we need a depth-first traversal of all files and folders, and then rename the current file or folder only. The following works with GNU find and Bash 4.2.42 on my OS X.



            #!/usr/bin/env bash
            find "$1" -depth -print0 | while IFS= read -r -d '' file; do
            d="$( dirname "$file" )"
            f="$( basename "$file" )"
            new="${f//[^a-zA-Z0-9/._-]/}"
            if [ "$f" != "$new" ] # if equal, name is already clean, so leave alone
            then
            if [ -e "$d/$new" ]
            then
            echo "Notice: "$new" and "$f" both exist in "$d":"
            ls -ld "$d/$new" "$d/$f"
            else
            echo mv "$file" "$d/$new" # remove "echo" to actually rename things
            fi
            fi
            done


            You may change the regex by using new="${f//[\/:*?"<>|]/}" if you want to replace anything that Windows cannot handle.



            Save this script as rename.sh, make it executable with chmod +x rename.sh. Then, call it like rename.sh /some/path.



            Make sure to resolve any file name collisions (“Notice” announcements).



            If you're absolutely sure it does the right replacements, remove the echo from the script to actually rename things instead of just printing what it does.



            To be safe, I'd recommend testing this on a small subset of files first.





            Options explained



            To explain what goes on here:





            • -depth will ensure directories are recursed depth-first, so we can "roll up" everything from the end. Usually, find traverses differently (but not breadth-first).


            • -print0 ensures the find output is null-delimited, so we can read it with read -d '' into the file variable. Doing so helps us deal with all kinds of weird file names, including ones with spaces, and even newlines.

            • We'll get the directory of the file with dirname. Don't forget to always quote your variables properly, otherwise any path with spaces or globbing characters would break this script.

            • We'll get the actual filename (or directory name) with basename.

            • Then, we remove any invalid character from $f using Bash's string replacement capabilities. Invalid means anything that's not a lower- or uppercase letter, a digit, a slash (/), a dot (.), an underscore, or a minus-hyphen.

            • If $f is already clean (the cleaned name is identical to the current name), skip it.

            • If $new already exists in directory $d (e.g., you have files named resume and résumé in the same directory), issue a warning. You don't want to rename it, because, on some systems, mv foo foo causes a problem.  Otherwise,

            • We finally rename the original file (or directory) to its new name


            Since this will only act on the deepest hierarchy, renaming Motörhead/Encöding to Motorhead/Encoding is done in two steps:




            1. mv Motörhead/Encöding Motörhead/Encoding

            2. mv Motörhead Motorhead


            This ensures all replacements are done in the correct order.





            Example files and test run



            Let's assume some files in a base folder called test:



            test
            test/Motörhead
            test/Motörhead/anöther_file.mp3
            test/Motörhead/Encöding
            test/Randöm
            test/Täst
            test/Täst/Töst
            test/with space
            test/with-hyphen.txt
            test/work
            test/work/resume
            test/work/résumé
            test/work/schedule


            Here is the output from a run in debug mode (with the echo in front of the mv),
            i.e., the commands that would be called, and the collision warnings:



            mv test/Motörhead/anöther_file.mp3 test/Motörhead/another_file.mp3
            mv test/Motörhead/Encöding test/Motörhead/Encoding
            mv test/Motörhead test/Motorhead
            mv test/Randöm test/Random
            mv test/Täst/Töst test/Täst/Tost
            mv test/Täst test/Tast
            mv test/with space test/withspace
            Notice: "resume" and "résumé" both exist in test/work:
            -rw-r—r--  …  …  test/work/resume
            -rw-r—r--  …  …  test/work/résumé


            Notice the absence of messages for with-hyphen.txt, schedule, and test itself.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Nov 19 at 9:18

























            answered Jan 18 '13 at 15:44









            slhck

            158k47436461




            158k47436461








            • 1




              You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
              – Scott
              Jan 18 '13 at 21:00










            • Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
              – slhck
              Jan 18 '13 at 21:12










            • I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
              – Scott
              Jan 19 '13 at 0:48










            • +1 for using the example with "Encöding" Too much fön!:-)
              – Marcel
              Mar 22 '14 at 21:25










            • After three years I still come back here. so usefull! :-)
              – Afri
              Apr 16 '16 at 12:08














            • 1




              You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
              – Scott
              Jan 18 '13 at 21:00










            • Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
              – slhck
              Jan 18 '13 at 21:12










            • I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
              – Scott
              Jan 19 '13 at 0:48










            • +1 for using the example with "Encöding" Too much fön!:-)
              – Marcel
              Mar 22 '14 at 21:25










            • After three years I still come back here. so usefull! :-)
              – Afri
              Apr 16 '16 at 12:08








            1




            1




            You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
            – Scott
            Jan 18 '13 at 21:00




            You might want to add logic to handle the case where the destination of the mv already exists, which can happen (1) if you have files that are already clean (resulting in mv foo foo), or (2) if you have files with the same name except for the special characters (e.g., mv Encöding Encoding, where you already have an Encoding file in addition to Encöding).
            – Scott
            Jan 18 '13 at 21:00












            Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
            – slhck
            Jan 18 '13 at 21:12




            Good idea, thanks. Any specific suggestions on what to do in that case? Granted – achieving this in a clean and sane manner is harder than it seems at first. If you have something, feel free to edit of course.
            – slhck
            Jan 18 '13 at 21:12












            I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
            – Scott
            Jan 19 '13 at 0:48




            I don’t believe it makes sense to think about handling the collisions automatically –– just identify them to the user and let him handle them. I’ve edited your answer, as you suggested.
            – Scott
            Jan 19 '13 at 0:48












            +1 for using the example with "Encöding" Too much fön!:-)
            – Marcel
            Mar 22 '14 at 21:25




            +1 for using the example with "Encöding" Too much fön!:-)
            – Marcel
            Mar 22 '14 at 21:25












            After three years I still come back here. so usefull! :-)
            – Afri
            Apr 16 '16 at 12:08




            After three years I still come back here. so usefull! :-)
            – Afri
            Apr 16 '16 at 12:08












            up vote
            14
            down vote













            I know that it's not exactly what you wanted, but if you know the original encoding, perhaps you can use convmv to change the encoding to UTF-8, which should fix most problems.



            This worked for me on a folder with some invalid-encoded Polish filenames:



            convmv -f cp1250 -t utf8 -r .


            Note that this command doesn't actually rename anything; add --notest option to really rename the files.






            share|improve this answer



















            • 1




              For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
              – user273265
              Nov 11 '13 at 20:14








            • 1




              By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
              – user273265
              Nov 11 '13 at 20:14

















            up vote
            14
            down vote













            I know that it's not exactly what you wanted, but if you know the original encoding, perhaps you can use convmv to change the encoding to UTF-8, which should fix most problems.



            This worked for me on a folder with some invalid-encoded Polish filenames:



            convmv -f cp1250 -t utf8 -r .


            Note that this command doesn't actually rename anything; add --notest option to really rename the files.






            share|improve this answer



















            • 1




              For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
              – user273265
              Nov 11 '13 at 20:14








            • 1




              By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
              – user273265
              Nov 11 '13 at 20:14















            up vote
            14
            down vote










            up vote
            14
            down vote









            I know that it's not exactly what you wanted, but if you know the original encoding, perhaps you can use convmv to change the encoding to UTF-8, which should fix most problems.



            This worked for me on a folder with some invalid-encoded Polish filenames:



            convmv -f cp1250 -t utf8 -r .


            Note that this command doesn't actually rename anything; add --notest option to really rename the files.






            share|improve this answer














            I know that it's not exactly what you wanted, but if you know the original encoding, perhaps you can use convmv to change the encoding to UTF-8, which should fix most problems.



            This worked for me on a folder with some invalid-encoded Polish filenames:



            convmv -f cp1250 -t utf8 -r .


            Note that this command doesn't actually rename anything; add --notest option to really rename the files.







            share|improve this answer














            share|improve this answer



            share|improve this answer








            edited Aug 30 '13 at 19:18

























            answered Aug 30 '13 at 19:00









            mik01aj

            6471814




            6471814








            • 1




              For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
              – user273265
              Nov 11 '13 at 20:14








            • 1




              By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
              – user273265
              Nov 11 '13 at 20:14
















            • 1




              For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
              – user273265
              Nov 11 '13 at 20:14








            • 1




              By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
              – user273265
              Nov 11 '13 at 20:14










            1




            1




            For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
            – user273265
            Nov 11 '13 at 20:14






            For those who have a static set (or don't have a diverse mix of charsets), the convmv option is amazingly simple and perfect. For OP, having a potential multitude of charsets, this would could be merged with the other answer, since convmv seems to know when it or when it doesn't encounter the correct format. By looping through the charsets, via convmv --list, one would get them properly encoded.
            – user273265
            Nov 11 '13 at 20:14






            1




            1




            By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
            – user273265
            Nov 11 '13 at 20:14






            By this I mean, if, as OP, runs a Debian server, one certainly would assume UTF8 these days, in which case, one can keep the original letters. I had the a folder of some nordic chars, and used: convmv -t utf8 --nfc -f iso-8859-1 --notest -r . – The --nfc was to conform to Linux ahead of OS X or so, simply typing convmv gives up the (useful) options.
            – user273265
            Nov 11 '13 at 20:14












            up vote
            0
            down vote













            I know, you asked about renaming.



            But you can dodge the problem quite easily using software like MusicBrainz Picard.



            It is capable of identifying music (audio fingerprinting), downloading all the necessary data (including cover images, where available) from the huge MusicBrainz database and moving the files around so that your collection can fit any pattern you like. I'm using it for years and it always worked perfectly with anything from Cyrilic to Arabic; and of course (at least for Latin-based scripts) it can also do the conversion to ASCII.



            With this approach it does not really matter how messy/badly named your collection really is, as long as the files are readable and complete.



            (Did I mention it's free? Both as in free speech and as in free beer? Both the software and the database..?)






            share|improve this answer

























              up vote
              0
              down vote













              I know, you asked about renaming.



              But you can dodge the problem quite easily using software like MusicBrainz Picard.



              It is capable of identifying music (audio fingerprinting), downloading all the necessary data (including cover images, where available) from the huge MusicBrainz database and moving the files around so that your collection can fit any pattern you like. I'm using it for years and it always worked perfectly with anything from Cyrilic to Arabic; and of course (at least for Latin-based scripts) it can also do the conversion to ASCII.



              With this approach it does not really matter how messy/badly named your collection really is, as long as the files are readable and complete.



              (Did I mention it's free? Both as in free speech and as in free beer? Both the software and the database..?)






              share|improve this answer























                up vote
                0
                down vote










                up vote
                0
                down vote









                I know, you asked about renaming.



                But you can dodge the problem quite easily using software like MusicBrainz Picard.



                It is capable of identifying music (audio fingerprinting), downloading all the necessary data (including cover images, where available) from the huge MusicBrainz database and moving the files around so that your collection can fit any pattern you like. I'm using it for years and it always worked perfectly with anything from Cyrilic to Arabic; and of course (at least for Latin-based scripts) it can also do the conversion to ASCII.



                With this approach it does not really matter how messy/badly named your collection really is, as long as the files are readable and complete.



                (Did I mention it's free? Both as in free speech and as in free beer? Both the software and the database..?)






                share|improve this answer












                I know, you asked about renaming.



                But you can dodge the problem quite easily using software like MusicBrainz Picard.



                It is capable of identifying music (audio fingerprinting), downloading all the necessary data (including cover images, where available) from the huge MusicBrainz database and moving the files around so that your collection can fit any pattern you like. I'm using it for years and it always worked perfectly with anything from Cyrilic to Arabic; and of course (at least for Latin-based scripts) it can also do the conversion to ASCII.



                With this approach it does not really matter how messy/badly named your collection really is, as long as the files are readable and complete.



                (Did I mention it's free? Both as in free speech and as in free beer? Both the software and the database..?)







                share|improve this answer












                share|improve this answer



                share|improve this answer










                answered Oct 16 '15 at 4:45









                Alois Mahdal

                1,37931333




                1,37931333






























                    draft saved

                    draft discarded




















































                    Thanks for contributing an answer to Super User!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.





                    Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


                    Please pay close attention to the following guidance:


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid



                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.


                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function () {
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f538161%2fhow-to-bulk-rename-files-with-invalid-encoding-or-bulk-replace-invalid-encoded-c%23new-answer', 'question_page');
                    }
                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown







                    Popular posts from this blog

                    Сан-Квентин

                    8-я гвардейская общевойсковая армия

                    Алькесар