How does this script ensure that only one instance of itself is running?












20














On 19 Aug 2013, Randal L. Schwartz posted this shell script, which was intended to ensure, on Linux, "that only one instance of [the] script is running, without race conditions or having to clean up lock files":



#!/bin/sh
# randal_l_schwartz_001.sh
(
if ! flock -n -x 0
then
echo "$$ cannot get flock"
exit 0
fi
echo "$$ start"
sleep 10 # for testing. put the real task here
echo "$$ end"
) < $0


It seems to work as advertised:



$ ./randal_l_schwartz_001.sh & ./randal_l_schwartz_001.sh
[1] 11863
11863 start
11864 cannot get flock
$ 11863 end

[1]+ Done ./randal_l_schwartz_001.sh
$


Here is what I do understand:




  • The script redirects (<) a copy of its own contents (i.e. from $0) to the STDIN (i.e. file descriptor 0) of a subshell.

  • Within the subshell, the script attempts to get a non-blocking, exclusive lock (flock -n -x) on file descriptor 0.


    • If that attempt fails, the subshell exits (and so does the main script, as there is nothing else for it to do).

    • If the attempt instead succeeds, the subshell runs the desired task.




Here are my questions:




  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file? (I tried redirecting from a different file and re-running as above, and the execution order changed: the non-backgrounded task gained the lock before the background one. So, maybe using the file's own contents avoids race conditions; but how?)

  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?

  • Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?










share|improve this question
























  • What was your exact test process when you tried your experiment to redirect from a different file?
    – Freiheit
    Jan 3 at 22:23
















20














On 19 Aug 2013, Randal L. Schwartz posted this shell script, which was intended to ensure, on Linux, "that only one instance of [the] script is running, without race conditions or having to clean up lock files":



#!/bin/sh
# randal_l_schwartz_001.sh
(
if ! flock -n -x 0
then
echo "$$ cannot get flock"
exit 0
fi
echo "$$ start"
sleep 10 # for testing. put the real task here
echo "$$ end"
) < $0


It seems to work as advertised:



$ ./randal_l_schwartz_001.sh & ./randal_l_schwartz_001.sh
[1] 11863
11863 start
11864 cannot get flock
$ 11863 end

[1]+ Done ./randal_l_schwartz_001.sh
$


Here is what I do understand:




  • The script redirects (<) a copy of its own contents (i.e. from $0) to the STDIN (i.e. file descriptor 0) of a subshell.

  • Within the subshell, the script attempts to get a non-blocking, exclusive lock (flock -n -x) on file descriptor 0.


    • If that attempt fails, the subshell exits (and so does the main script, as there is nothing else for it to do).

    • If the attempt instead succeeds, the subshell runs the desired task.




Here are my questions:




  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file? (I tried redirecting from a different file and re-running as above, and the execution order changed: the non-backgrounded task gained the lock before the background one. So, maybe using the file's own contents avoids race conditions; but how?)

  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?

  • Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?










share|improve this question
























  • What was your exact test process when you tried your experiment to redirect from a different file?
    – Freiheit
    Jan 3 at 22:23














20












20








20


6





On 19 Aug 2013, Randal L. Schwartz posted this shell script, which was intended to ensure, on Linux, "that only one instance of [the] script is running, without race conditions or having to clean up lock files":



#!/bin/sh
# randal_l_schwartz_001.sh
(
if ! flock -n -x 0
then
echo "$$ cannot get flock"
exit 0
fi
echo "$$ start"
sleep 10 # for testing. put the real task here
echo "$$ end"
) < $0


It seems to work as advertised:



$ ./randal_l_schwartz_001.sh & ./randal_l_schwartz_001.sh
[1] 11863
11863 start
11864 cannot get flock
$ 11863 end

[1]+ Done ./randal_l_schwartz_001.sh
$


Here is what I do understand:




  • The script redirects (<) a copy of its own contents (i.e. from $0) to the STDIN (i.e. file descriptor 0) of a subshell.

  • Within the subshell, the script attempts to get a non-blocking, exclusive lock (flock -n -x) on file descriptor 0.


    • If that attempt fails, the subshell exits (and so does the main script, as there is nothing else for it to do).

    • If the attempt instead succeeds, the subshell runs the desired task.




Here are my questions:




  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file? (I tried redirecting from a different file and re-running as above, and the execution order changed: the non-backgrounded task gained the lock before the background one. So, maybe using the file's own contents avoids race conditions; but how?)

  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?

  • Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?










share|improve this question















On 19 Aug 2013, Randal L. Schwartz posted this shell script, which was intended to ensure, on Linux, "that only one instance of [the] script is running, without race conditions or having to clean up lock files":



#!/bin/sh
# randal_l_schwartz_001.sh
(
if ! flock -n -x 0
then
echo "$$ cannot get flock"
exit 0
fi
echo "$$ start"
sleep 10 # for testing. put the real task here
echo "$$ end"
) < $0


It seems to work as advertised:



$ ./randal_l_schwartz_001.sh & ./randal_l_schwartz_001.sh
[1] 11863
11863 start
11864 cannot get flock
$ 11863 end

[1]+ Done ./randal_l_schwartz_001.sh
$


Here is what I do understand:




  • The script redirects (<) a copy of its own contents (i.e. from $0) to the STDIN (i.e. file descriptor 0) of a subshell.

  • Within the subshell, the script attempts to get a non-blocking, exclusive lock (flock -n -x) on file descriptor 0.


    • If that attempt fails, the subshell exits (and so does the main script, as there is nothing else for it to do).

    • If the attempt instead succeeds, the subshell runs the desired task.




Here are my questions:




  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file? (I tried redirecting from a different file and re-running as above, and the execution order changed: the non-backgrounded task gained the lock before the background one. So, maybe using the file's own contents avoids race conditions; but how?)

  • Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?

  • Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?







linux shell-script io-redirection subshell lock






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Jan 3 at 20:24

























asked Jan 3 at 20:10









sampablokuper

1,3081532




1,3081532












  • What was your exact test process when you tried your experiment to redirect from a different file?
    – Freiheit
    Jan 3 at 22:23


















  • What was your exact test process when you tried your experiment to redirect from a different file?
    – Freiheit
    Jan 3 at 22:23
















What was your exact test process when you tried your experiment to redirect from a different file?
– Freiheit
Jan 3 at 22:23




What was your exact test process when you tried your experiment to redirect from a different file?
– Freiheit
Jan 3 at 22:23










3 Answers
3






active

oldest

votes


















22















Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file?




You could use any file, as long as all copies of the script use the same one.
Using $0 just ties the lock to the script itself: If you copy the script and modify it for some other use, you don't need to come up with a new name for the lock file. This is convenient.



If the script is called through a symlink, the lock is on the actual file, and not the link.



(Of course, if some process runs the script and gives it a made up value as the zeroth argument instead of the actual path, then this breaks. But that's rarely done.)




(I tried using a different file and re-running as above, and the execution order changed)




Are you sure that was because of the file used, and not just random variation? As with a pipeline, there's really no way to be sure in what order the commands get to run in cmd1 & cmd. It's mostly up to the OS scheduler. I get random variation on my system.




Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?




It looks like that's so that the shell itself holds a copy of the file description holding the lock, instead of just the flock utility holding it. A lock made with flock(2) is released when the file descriptors having it are closed.



flock has two modes, either to take a lock based on a file name, and run an external command (in which case flock holds the required open file descriptor), or to take a file descriptor from the outside, so an outside process is responsible for holding it.



Note that the contents of the file are not relevant here, and there are no copies made. The redirection to the subshell doesn't copy any data around in itself, it just opens a handle to the file.




Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?




Yes, but the lock is on the file, not the file descriptor. Only one opened instance of the file can hold the lock at a time.





I think you should be able to do the same without the subshell, by using exec to open a handle to the lock file:



$ cat lock.sh
#!/bin/sh

exec 9< "$0"

if ! flock -n -x 9; then
echo "$$/$1 cannot get flock"
exit 0
fi

echo "$$/$1 got the lock"
sleep 2
echo "$$/$1 exit"

$ ./lock.sh bg & ./lock.sh fg ; wait; echo
[1] 11362
11363/fg got the lock
11362/bg cannot get flock
11363/fg exit
[1]+ Done ./lock.sh bg





share|improve this answer



















  • 1




    Using { } instead of ( ) would also work and avoid the subshell.
    – R..
    Jan 3 at 23:28










  • Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
    – David Z
    Jan 4 at 5:10










  • @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
    – ilkkachu
    2 days ago



















9














A file lock is attached to a file, through a file description. At a high level, the sequence of operations in one instance of the script is:




  1. Open the file to which the lock is attached (“the lock file”).

  2. Take a lock on the lock file.

  3. Do stuff.

  4. Close the lock file. This releases the lock that is attached to the file description created by opening a file.


Holding the lock prevents another copy of the same script for running because that's what locks do. As long as an exclusive lock on a file exists somewhere on the system, it's impossible to create a second instance of the same lock, even through a different file description.



Opening a file creates a file description. This is a kernel object that doesn't have much direct visibility in programming interfaces. You access a file description indirectly via file descriptors, but normally you think of it as accessing the file (reading or writing its content or metadata). A lock is one of the attributes that are a property to the file description rather than a file or to a descriptor.



At the beginning, when a file is opened, the file description has a single file descriptor, but more descriptors can be created either by creating another descriptor (the dup family of system calls) or by forking a subprocess (after which both the parent and the child have access to the same file description). A file descriptor can be closed explicitly or when the process that it's in dies. When the last file descriptor attached to a file is closed, the file description is closed.



Here's how the sequence of operations above affects the file description.




  1. The redirection <$0 opens the script file in the subshell, creating a file description. At this point there is a single file descriptor attached to the description: descriptor number 0 in the subshell.

  2. The subshell invokes flock and waits for it to exit. While flock is running, there are two descriptors attached to the description: number 0 in the subshell and number 0 in the flock process. When flock takes the lock, that sets a property of the file description. If another file description already has a lock on the file, flock cannot take the lock, since it's an exclusive lock.

  3. The subshell does stuff. Since it still has an open file descriptor on the description with the lock, that description keeps existing, and it keeps its lock since nobody ever removes the lock.

  4. The subshell dies at the closing parenthesis. This closes the last file descriptor on the file description that has the lock, so the lock disappears at this point.


The reason the script uses a redirection from $0 is that redirection is the only way to open a file in the shell, and keeping a redirection active is the only way to keep a file descriptor open. The subshell never reads from its standard input, it just needs to keep it open. In a language that gives direct access to open and close call, you could use



fd = open($0)
flock(fd, LOCK_EX)
do stuff
close(fd)


You can actually get the same sequence of operations in the shell if you do the redirection with the exec builtin.



exec <$0
flock -n -x 0
# do stuff
exec <&-


The script could use a different file descriptor if it wanted to keep accessing the original standard input.



exec 3<$0
flock -n -x 0
# do stuff
exec 3<&-


or with a subshell:



(
flock -n -x 3
# do stuff
) 3<$0


The lock doesn't have to be on the script file. It could be on any file that can be opened for reading (so it has to exist, it has to be a file type that can be read such as a regular file or a named pipe but not a directory, and the script process must have the permission to read it). The script file has the advantage that it's guaranteed to be present and readable (except in the edge case where it was deleted externally between the time the script was invoked and the time the script gets to the <$0 redirection).



As long as flock succeeds, and the script is on a filesystem where locks aren't buggy (some network filesystems such as NFS may be buggy), I don't see how using a different lock file could allow a race condition. I suspect a manipulation error on your part.






share|improve this answer























  • There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
    – Mark
    Jan 3 at 21:54






  • 4




    @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
    – Gilles
    Jan 3 at 21:57






  • 1




    Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
    – Sergiy Kolodyazhnyy
    Jan 4 at 1:33





















5














The file used for locking is unimportant, the script uses $0 because that's a file that is known to exist.



The order in which the locks are obtained will be more or less random, depending on how fast your machine is able to start the two tasks.



You may use any file descriptor, not necessarily 0. The lock is held on the file opened to the file descriptor, not the descriptor itself.



( flock -x 9 || exit 1
echo 'Locking for 5 secs'; sleep 5; echo 'Done' ) 9>/tmp/lock &





share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "106"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492324%2fhow-does-this-script-ensure-that-only-one-instance-of-itself-is-running%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    3 Answers
    3






    active

    oldest

    votes








    3 Answers
    3






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    22















    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file?




    You could use any file, as long as all copies of the script use the same one.
    Using $0 just ties the lock to the script itself: If you copy the script and modify it for some other use, you don't need to come up with a new name for the lock file. This is convenient.



    If the script is called through a symlink, the lock is on the actual file, and not the link.



    (Of course, if some process runs the script and gives it a made up value as the zeroth argument instead of the actual path, then this breaks. But that's rarely done.)




    (I tried using a different file and re-running as above, and the execution order changed)




    Are you sure that was because of the file used, and not just random variation? As with a pipeline, there's really no way to be sure in what order the commands get to run in cmd1 & cmd. It's mostly up to the OS scheduler. I get random variation on my system.




    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?




    It looks like that's so that the shell itself holds a copy of the file description holding the lock, instead of just the flock utility holding it. A lock made with flock(2) is released when the file descriptors having it are closed.



    flock has two modes, either to take a lock based on a file name, and run an external command (in which case flock holds the required open file descriptor), or to take a file descriptor from the outside, so an outside process is responsible for holding it.



    Note that the contents of the file are not relevant here, and there are no copies made. The redirection to the subshell doesn't copy any data around in itself, it just opens a handle to the file.




    Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?




    Yes, but the lock is on the file, not the file descriptor. Only one opened instance of the file can hold the lock at a time.





    I think you should be able to do the same without the subshell, by using exec to open a handle to the lock file:



    $ cat lock.sh
    #!/bin/sh

    exec 9< "$0"

    if ! flock -n -x 9; then
    echo "$$/$1 cannot get flock"
    exit 0
    fi

    echo "$$/$1 got the lock"
    sleep 2
    echo "$$/$1 exit"

    $ ./lock.sh bg & ./lock.sh fg ; wait; echo
    [1] 11362
    11363/fg got the lock
    11362/bg cannot get flock
    11363/fg exit
    [1]+ Done ./lock.sh bg





    share|improve this answer



















    • 1




      Using { } instead of ( ) would also work and avoid the subshell.
      – R..
      Jan 3 at 23:28










    • Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
      – David Z
      Jan 4 at 5:10










    • @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
      – ilkkachu
      2 days ago
















    22















    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file?




    You could use any file, as long as all copies of the script use the same one.
    Using $0 just ties the lock to the script itself: If you copy the script and modify it for some other use, you don't need to come up with a new name for the lock file. This is convenient.



    If the script is called through a symlink, the lock is on the actual file, and not the link.



    (Of course, if some process runs the script and gives it a made up value as the zeroth argument instead of the actual path, then this breaks. But that's rarely done.)




    (I tried using a different file and re-running as above, and the execution order changed)




    Are you sure that was because of the file used, and not just random variation? As with a pipeline, there's really no way to be sure in what order the commands get to run in cmd1 & cmd. It's mostly up to the OS scheduler. I get random variation on my system.




    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?




    It looks like that's so that the shell itself holds a copy of the file description holding the lock, instead of just the flock utility holding it. A lock made with flock(2) is released when the file descriptors having it are closed.



    flock has two modes, either to take a lock based on a file name, and run an external command (in which case flock holds the required open file descriptor), or to take a file descriptor from the outside, so an outside process is responsible for holding it.



    Note that the contents of the file are not relevant here, and there are no copies made. The redirection to the subshell doesn't copy any data around in itself, it just opens a handle to the file.




    Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?




    Yes, but the lock is on the file, not the file descriptor. Only one opened instance of the file can hold the lock at a time.





    I think you should be able to do the same without the subshell, by using exec to open a handle to the lock file:



    $ cat lock.sh
    #!/bin/sh

    exec 9< "$0"

    if ! flock -n -x 9; then
    echo "$$/$1 cannot get flock"
    exit 0
    fi

    echo "$$/$1 got the lock"
    sleep 2
    echo "$$/$1 exit"

    $ ./lock.sh bg & ./lock.sh fg ; wait; echo
    [1] 11362
    11363/fg got the lock
    11362/bg cannot get flock
    11363/fg exit
    [1]+ Done ./lock.sh bg





    share|improve this answer



















    • 1




      Using { } instead of ( ) would also work and avoid the subshell.
      – R..
      Jan 3 at 23:28










    • Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
      – David Z
      Jan 4 at 5:10










    • @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
      – ilkkachu
      2 days ago














    22












    22








    22







    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file?




    You could use any file, as long as all copies of the script use the same one.
    Using $0 just ties the lock to the script itself: If you copy the script and modify it for some other use, you don't need to come up with a new name for the lock file. This is convenient.



    If the script is called through a symlink, the lock is on the actual file, and not the link.



    (Of course, if some process runs the script and gives it a made up value as the zeroth argument instead of the actual path, then this breaks. But that's rarely done.)




    (I tried using a different file and re-running as above, and the execution order changed)




    Are you sure that was because of the file used, and not just random variation? As with a pipeline, there's really no way to be sure in what order the commands get to run in cmd1 & cmd. It's mostly up to the OS scheduler. I get random variation on my system.




    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?




    It looks like that's so that the shell itself holds a copy of the file description holding the lock, instead of just the flock utility holding it. A lock made with flock(2) is released when the file descriptors having it are closed.



    flock has two modes, either to take a lock based on a file name, and run an external command (in which case flock holds the required open file descriptor), or to take a file descriptor from the outside, so an outside process is responsible for holding it.



    Note that the contents of the file are not relevant here, and there are no copies made. The redirection to the subshell doesn't copy any data around in itself, it just opens a handle to the file.




    Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?




    Yes, but the lock is on the file, not the file descriptor. Only one opened instance of the file can hold the lock at a time.





    I think you should be able to do the same without the subshell, by using exec to open a handle to the lock file:



    $ cat lock.sh
    #!/bin/sh

    exec 9< "$0"

    if ! flock -n -x 9; then
    echo "$$/$1 cannot get flock"
    exit 0
    fi

    echo "$$/$1 got the lock"
    sleep 2
    echo "$$/$1 exit"

    $ ./lock.sh bg & ./lock.sh fg ; wait; echo
    [1] 11362
    11363/fg got the lock
    11362/bg cannot get flock
    11363/fg exit
    [1]+ Done ./lock.sh bg





    share|improve this answer















    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of its own contents rather than, say, the contents of some other file?




    You could use any file, as long as all copies of the script use the same one.
    Using $0 just ties the lock to the script itself: If you copy the script and modify it for some other use, you don't need to come up with a new name for the lock file. This is convenient.



    If the script is called through a symlink, the lock is on the actual file, and not the link.



    (Of course, if some process runs the script and gives it a made up value as the zeroth argument instead of the actual path, then this breaks. But that's rarely done.)




    (I tried using a different file and re-running as above, and the execution order changed)




    Are you sure that was because of the file used, and not just random variation? As with a pipeline, there's really no way to be sure in what order the commands get to run in cmd1 & cmd. It's mostly up to the OS scheduler. I get random variation on my system.




    Why does the script need to redirect, to a file descriptor inherited by the subshell, a copy of a file's contents, anyway?




    It looks like that's so that the shell itself holds a copy of the file description holding the lock, instead of just the flock utility holding it. A lock made with flock(2) is released when the file descriptors having it are closed.



    flock has two modes, either to take a lock based on a file name, and run an external command (in which case flock holds the required open file descriptor), or to take a file descriptor from the outside, so an outside process is responsible for holding it.



    Note that the contents of the file are not relevant here, and there are no copies made. The redirection to the subshell doesn't copy any data around in itself, it just opens a handle to the file.




    Why does holding an exclusive lock on file descriptor 0 in one shell prevent a copy of the same script, running in a different shell, from getting an exclusive lock on file descriptor 0? Don't shells have their own, separate copies of the standard file descriptors (0, 1, and 2, i.e. STDIN, STDOUT, and STDERR)?




    Yes, but the lock is on the file, not the file descriptor. Only one opened instance of the file can hold the lock at a time.





    I think you should be able to do the same without the subshell, by using exec to open a handle to the lock file:



    $ cat lock.sh
    #!/bin/sh

    exec 9< "$0"

    if ! flock -n -x 9; then
    echo "$$/$1 cannot get flock"
    exit 0
    fi

    echo "$$/$1 got the lock"
    sleep 2
    echo "$$/$1 exit"

    $ ./lock.sh bg & ./lock.sh fg ; wait; echo
    [1] 11362
    11363/fg got the lock
    11362/bg cannot get flock
    11363/fg exit
    [1]+ Done ./lock.sh bg






    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited Jan 3 at 21:15

























    answered Jan 3 at 20:46









    ilkkachu

    56.3k784156




    56.3k784156








    • 1




      Using { } instead of ( ) would also work and avoid the subshell.
      – R..
      Jan 3 at 23:28










    • Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
      – David Z
      Jan 4 at 5:10










    • @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
      – ilkkachu
      2 days ago














    • 1




      Using { } instead of ( ) would also work and avoid the subshell.
      – R..
      Jan 3 at 23:28










    • Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
      – David Z
      Jan 4 at 5:10










    • @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
      – ilkkachu
      2 days ago








    1




    1




    Using { } instead of ( ) would also work and avoid the subshell.
    – R..
    Jan 3 at 23:28




    Using { } instead of ( ) would also work and avoid the subshell.
    – R..
    Jan 3 at 23:28












    Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
    – David Z
    Jan 4 at 5:10




    Further down in the comments on the G+ post, someone there also suggested roughly the same method using exec.
    – David Z
    Jan 4 at 5:10












    @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
    – ilkkachu
    2 days ago




    @R.., oh, sure. But it's still ugly with the extra braces around the actual script.
    – ilkkachu
    2 days ago













    9














    A file lock is attached to a file, through a file description. At a high level, the sequence of operations in one instance of the script is:




    1. Open the file to which the lock is attached (“the lock file”).

    2. Take a lock on the lock file.

    3. Do stuff.

    4. Close the lock file. This releases the lock that is attached to the file description created by opening a file.


    Holding the lock prevents another copy of the same script for running because that's what locks do. As long as an exclusive lock on a file exists somewhere on the system, it's impossible to create a second instance of the same lock, even through a different file description.



    Opening a file creates a file description. This is a kernel object that doesn't have much direct visibility in programming interfaces. You access a file description indirectly via file descriptors, but normally you think of it as accessing the file (reading or writing its content or metadata). A lock is one of the attributes that are a property to the file description rather than a file or to a descriptor.



    At the beginning, when a file is opened, the file description has a single file descriptor, but more descriptors can be created either by creating another descriptor (the dup family of system calls) or by forking a subprocess (after which both the parent and the child have access to the same file description). A file descriptor can be closed explicitly or when the process that it's in dies. When the last file descriptor attached to a file is closed, the file description is closed.



    Here's how the sequence of operations above affects the file description.




    1. The redirection <$0 opens the script file in the subshell, creating a file description. At this point there is a single file descriptor attached to the description: descriptor number 0 in the subshell.

    2. The subshell invokes flock and waits for it to exit. While flock is running, there are two descriptors attached to the description: number 0 in the subshell and number 0 in the flock process. When flock takes the lock, that sets a property of the file description. If another file description already has a lock on the file, flock cannot take the lock, since it's an exclusive lock.

    3. The subshell does stuff. Since it still has an open file descriptor on the description with the lock, that description keeps existing, and it keeps its lock since nobody ever removes the lock.

    4. The subshell dies at the closing parenthesis. This closes the last file descriptor on the file description that has the lock, so the lock disappears at this point.


    The reason the script uses a redirection from $0 is that redirection is the only way to open a file in the shell, and keeping a redirection active is the only way to keep a file descriptor open. The subshell never reads from its standard input, it just needs to keep it open. In a language that gives direct access to open and close call, you could use



    fd = open($0)
    flock(fd, LOCK_EX)
    do stuff
    close(fd)


    You can actually get the same sequence of operations in the shell if you do the redirection with the exec builtin.



    exec <$0
    flock -n -x 0
    # do stuff
    exec <&-


    The script could use a different file descriptor if it wanted to keep accessing the original standard input.



    exec 3<$0
    flock -n -x 0
    # do stuff
    exec 3<&-


    or with a subshell:



    (
    flock -n -x 3
    # do stuff
    ) 3<$0


    The lock doesn't have to be on the script file. It could be on any file that can be opened for reading (so it has to exist, it has to be a file type that can be read such as a regular file or a named pipe but not a directory, and the script process must have the permission to read it). The script file has the advantage that it's guaranteed to be present and readable (except in the edge case where it was deleted externally between the time the script was invoked and the time the script gets to the <$0 redirection).



    As long as flock succeeds, and the script is on a filesystem where locks aren't buggy (some network filesystems such as NFS may be buggy), I don't see how using a different lock file could allow a race condition. I suspect a manipulation error on your part.






    share|improve this answer























    • There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
      – Mark
      Jan 3 at 21:54






    • 4




      @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
      – Gilles
      Jan 3 at 21:57






    • 1




      Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
      – Sergiy Kolodyazhnyy
      Jan 4 at 1:33


















    9














    A file lock is attached to a file, through a file description. At a high level, the sequence of operations in one instance of the script is:




    1. Open the file to which the lock is attached (“the lock file”).

    2. Take a lock on the lock file.

    3. Do stuff.

    4. Close the lock file. This releases the lock that is attached to the file description created by opening a file.


    Holding the lock prevents another copy of the same script for running because that's what locks do. As long as an exclusive lock on a file exists somewhere on the system, it's impossible to create a second instance of the same lock, even through a different file description.



    Opening a file creates a file description. This is a kernel object that doesn't have much direct visibility in programming interfaces. You access a file description indirectly via file descriptors, but normally you think of it as accessing the file (reading or writing its content or metadata). A lock is one of the attributes that are a property to the file description rather than a file or to a descriptor.



    At the beginning, when a file is opened, the file description has a single file descriptor, but more descriptors can be created either by creating another descriptor (the dup family of system calls) or by forking a subprocess (after which both the parent and the child have access to the same file description). A file descriptor can be closed explicitly or when the process that it's in dies. When the last file descriptor attached to a file is closed, the file description is closed.



    Here's how the sequence of operations above affects the file description.




    1. The redirection <$0 opens the script file in the subshell, creating a file description. At this point there is a single file descriptor attached to the description: descriptor number 0 in the subshell.

    2. The subshell invokes flock and waits for it to exit. While flock is running, there are two descriptors attached to the description: number 0 in the subshell and number 0 in the flock process. When flock takes the lock, that sets a property of the file description. If another file description already has a lock on the file, flock cannot take the lock, since it's an exclusive lock.

    3. The subshell does stuff. Since it still has an open file descriptor on the description with the lock, that description keeps existing, and it keeps its lock since nobody ever removes the lock.

    4. The subshell dies at the closing parenthesis. This closes the last file descriptor on the file description that has the lock, so the lock disappears at this point.


    The reason the script uses a redirection from $0 is that redirection is the only way to open a file in the shell, and keeping a redirection active is the only way to keep a file descriptor open. The subshell never reads from its standard input, it just needs to keep it open. In a language that gives direct access to open and close call, you could use



    fd = open($0)
    flock(fd, LOCK_EX)
    do stuff
    close(fd)


    You can actually get the same sequence of operations in the shell if you do the redirection with the exec builtin.



    exec <$0
    flock -n -x 0
    # do stuff
    exec <&-


    The script could use a different file descriptor if it wanted to keep accessing the original standard input.



    exec 3<$0
    flock -n -x 0
    # do stuff
    exec 3<&-


    or with a subshell:



    (
    flock -n -x 3
    # do stuff
    ) 3<$0


    The lock doesn't have to be on the script file. It could be on any file that can be opened for reading (so it has to exist, it has to be a file type that can be read such as a regular file or a named pipe but not a directory, and the script process must have the permission to read it). The script file has the advantage that it's guaranteed to be present and readable (except in the edge case where it was deleted externally between the time the script was invoked and the time the script gets to the <$0 redirection).



    As long as flock succeeds, and the script is on a filesystem where locks aren't buggy (some network filesystems such as NFS may be buggy), I don't see how using a different lock file could allow a race condition. I suspect a manipulation error on your part.






    share|improve this answer























    • There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
      – Mark
      Jan 3 at 21:54






    • 4




      @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
      – Gilles
      Jan 3 at 21:57






    • 1




      Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
      – Sergiy Kolodyazhnyy
      Jan 4 at 1:33
















    9












    9








    9






    A file lock is attached to a file, through a file description. At a high level, the sequence of operations in one instance of the script is:




    1. Open the file to which the lock is attached (“the lock file”).

    2. Take a lock on the lock file.

    3. Do stuff.

    4. Close the lock file. This releases the lock that is attached to the file description created by opening a file.


    Holding the lock prevents another copy of the same script for running because that's what locks do. As long as an exclusive lock on a file exists somewhere on the system, it's impossible to create a second instance of the same lock, even through a different file description.



    Opening a file creates a file description. This is a kernel object that doesn't have much direct visibility in programming interfaces. You access a file description indirectly via file descriptors, but normally you think of it as accessing the file (reading or writing its content or metadata). A lock is one of the attributes that are a property to the file description rather than a file or to a descriptor.



    At the beginning, when a file is opened, the file description has a single file descriptor, but more descriptors can be created either by creating another descriptor (the dup family of system calls) or by forking a subprocess (after which both the parent and the child have access to the same file description). A file descriptor can be closed explicitly or when the process that it's in dies. When the last file descriptor attached to a file is closed, the file description is closed.



    Here's how the sequence of operations above affects the file description.




    1. The redirection <$0 opens the script file in the subshell, creating a file description. At this point there is a single file descriptor attached to the description: descriptor number 0 in the subshell.

    2. The subshell invokes flock and waits for it to exit. While flock is running, there are two descriptors attached to the description: number 0 in the subshell and number 0 in the flock process. When flock takes the lock, that sets a property of the file description. If another file description already has a lock on the file, flock cannot take the lock, since it's an exclusive lock.

    3. The subshell does stuff. Since it still has an open file descriptor on the description with the lock, that description keeps existing, and it keeps its lock since nobody ever removes the lock.

    4. The subshell dies at the closing parenthesis. This closes the last file descriptor on the file description that has the lock, so the lock disappears at this point.


    The reason the script uses a redirection from $0 is that redirection is the only way to open a file in the shell, and keeping a redirection active is the only way to keep a file descriptor open. The subshell never reads from its standard input, it just needs to keep it open. In a language that gives direct access to open and close call, you could use



    fd = open($0)
    flock(fd, LOCK_EX)
    do stuff
    close(fd)


    You can actually get the same sequence of operations in the shell if you do the redirection with the exec builtin.



    exec <$0
    flock -n -x 0
    # do stuff
    exec <&-


    The script could use a different file descriptor if it wanted to keep accessing the original standard input.



    exec 3<$0
    flock -n -x 0
    # do stuff
    exec 3<&-


    or with a subshell:



    (
    flock -n -x 3
    # do stuff
    ) 3<$0


    The lock doesn't have to be on the script file. It could be on any file that can be opened for reading (so it has to exist, it has to be a file type that can be read such as a regular file or a named pipe but not a directory, and the script process must have the permission to read it). The script file has the advantage that it's guaranteed to be present and readable (except in the edge case where it was deleted externally between the time the script was invoked and the time the script gets to the <$0 redirection).



    As long as flock succeeds, and the script is on a filesystem where locks aren't buggy (some network filesystems such as NFS may be buggy), I don't see how using a different lock file could allow a race condition. I suspect a manipulation error on your part.






    share|improve this answer














    A file lock is attached to a file, through a file description. At a high level, the sequence of operations in one instance of the script is:




    1. Open the file to which the lock is attached (“the lock file”).

    2. Take a lock on the lock file.

    3. Do stuff.

    4. Close the lock file. This releases the lock that is attached to the file description created by opening a file.


    Holding the lock prevents another copy of the same script for running because that's what locks do. As long as an exclusive lock on a file exists somewhere on the system, it's impossible to create a second instance of the same lock, even through a different file description.



    Opening a file creates a file description. This is a kernel object that doesn't have much direct visibility in programming interfaces. You access a file description indirectly via file descriptors, but normally you think of it as accessing the file (reading or writing its content or metadata). A lock is one of the attributes that are a property to the file description rather than a file or to a descriptor.



    At the beginning, when a file is opened, the file description has a single file descriptor, but more descriptors can be created either by creating another descriptor (the dup family of system calls) or by forking a subprocess (after which both the parent and the child have access to the same file description). A file descriptor can be closed explicitly or when the process that it's in dies. When the last file descriptor attached to a file is closed, the file description is closed.



    Here's how the sequence of operations above affects the file description.




    1. The redirection <$0 opens the script file in the subshell, creating a file description. At this point there is a single file descriptor attached to the description: descriptor number 0 in the subshell.

    2. The subshell invokes flock and waits for it to exit. While flock is running, there are two descriptors attached to the description: number 0 in the subshell and number 0 in the flock process. When flock takes the lock, that sets a property of the file description. If another file description already has a lock on the file, flock cannot take the lock, since it's an exclusive lock.

    3. The subshell does stuff. Since it still has an open file descriptor on the description with the lock, that description keeps existing, and it keeps its lock since nobody ever removes the lock.

    4. The subshell dies at the closing parenthesis. This closes the last file descriptor on the file description that has the lock, so the lock disappears at this point.


    The reason the script uses a redirection from $0 is that redirection is the only way to open a file in the shell, and keeping a redirection active is the only way to keep a file descriptor open. The subshell never reads from its standard input, it just needs to keep it open. In a language that gives direct access to open and close call, you could use



    fd = open($0)
    flock(fd, LOCK_EX)
    do stuff
    close(fd)


    You can actually get the same sequence of operations in the shell if you do the redirection with the exec builtin.



    exec <$0
    flock -n -x 0
    # do stuff
    exec <&-


    The script could use a different file descriptor if it wanted to keep accessing the original standard input.



    exec 3<$0
    flock -n -x 0
    # do stuff
    exec 3<&-


    or with a subshell:



    (
    flock -n -x 3
    # do stuff
    ) 3<$0


    The lock doesn't have to be on the script file. It could be on any file that can be opened for reading (so it has to exist, it has to be a file type that can be read such as a regular file or a named pipe but not a directory, and the script process must have the permission to read it). The script file has the advantage that it's guaranteed to be present and readable (except in the edge case where it was deleted externally between the time the script was invoked and the time the script gets to the <$0 redirection).



    As long as flock succeeds, and the script is on a filesystem where locks aren't buggy (some network filesystems such as NFS may be buggy), I don't see how using a different lock file could allow a race condition. I suspect a manipulation error on your part.







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 days ago

























    answered Jan 3 at 21:05









    Gilles

    529k12810611587




    529k12810611587












    • There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
      – Mark
      Jan 3 at 21:54






    • 4




      @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
      – Gilles
      Jan 3 at 21:57






    • 1




      Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
      – Sergiy Kolodyazhnyy
      Jan 4 at 1:33




















    • There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
      – Mark
      Jan 3 at 21:54






    • 4




      @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
      – Gilles
      Jan 3 at 21:57






    • 1




      Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
      – Sergiy Kolodyazhnyy
      Jan 4 at 1:33


















    There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
    – Mark
    Jan 3 at 21:54




    There is a race condition: you can't control which instance of the script gets the lock. Fortunately, for almost all purposes, it doesn't matter.
    – Mark
    Jan 3 at 21:54




    4




    4




    @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
    – Gilles
    Jan 3 at 21:57




    @Mark There's a race to the lock, but it isn't a race condition. A race condition is when timing can allow something bad to happen, such as two processes being in the same critical section at the same time. Not knowing which process will enter the critical section is expected nondeterminism, it is not a race condition.
    – Gilles
    Jan 3 at 21:57




    1




    1




    Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
    – Sergiy Kolodyazhnyy
    Jan 4 at 1:33






    Just FYI, the link in " file description" points to Open Group specs index page rather than to specific description of the concept, which is what I think you intended to do. Or you could also link your older answer here as well unix.stackexchange.com/a/195164/85039
    – Sergiy Kolodyazhnyy
    Jan 4 at 1:33













    5














    The file used for locking is unimportant, the script uses $0 because that's a file that is known to exist.



    The order in which the locks are obtained will be more or less random, depending on how fast your machine is able to start the two tasks.



    You may use any file descriptor, not necessarily 0. The lock is held on the file opened to the file descriptor, not the descriptor itself.



    ( flock -x 9 || exit 1
    echo 'Locking for 5 secs'; sleep 5; echo 'Done' ) 9>/tmp/lock &





    share|improve this answer


























      5














      The file used for locking is unimportant, the script uses $0 because that's a file that is known to exist.



      The order in which the locks are obtained will be more or less random, depending on how fast your machine is able to start the two tasks.



      You may use any file descriptor, not necessarily 0. The lock is held on the file opened to the file descriptor, not the descriptor itself.



      ( flock -x 9 || exit 1
      echo 'Locking for 5 secs'; sleep 5; echo 'Done' ) 9>/tmp/lock &





      share|improve this answer
























        5












        5








        5






        The file used for locking is unimportant, the script uses $0 because that's a file that is known to exist.



        The order in which the locks are obtained will be more or less random, depending on how fast your machine is able to start the two tasks.



        You may use any file descriptor, not necessarily 0. The lock is held on the file opened to the file descriptor, not the descriptor itself.



        ( flock -x 9 || exit 1
        echo 'Locking for 5 secs'; sleep 5; echo 'Done' ) 9>/tmp/lock &





        share|improve this answer












        The file used for locking is unimportant, the script uses $0 because that's a file that is known to exist.



        The order in which the locks are obtained will be more or less random, depending on how fast your machine is able to start the two tasks.



        You may use any file descriptor, not necessarily 0. The lock is held on the file opened to the file descriptor, not the descriptor itself.



        ( flock -x 9 || exit 1
        echo 'Locking for 5 secs'; sleep 5; echo 'Done' ) 9>/tmp/lock &






        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Jan 3 at 20:49









        Kusalananda

        122k16230375




        122k16230375






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f492324%2fhow-does-this-script-ensure-that-only-one-instance-of-itself-is-running%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Сан-Квентин

            Алькесар

            Josef Freinademetz