Incremental backups with tar where current file has most recent and previous files only have different...











up vote
1
down vote

favorite












I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.










share|improve this question
























  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52















up vote
1
down vote

favorite












I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.










share|improve this question
























  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52













up vote
1
down vote

favorite









up vote
1
down vote

favorite











I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.










share|improve this question















I am somewhat familiar with how to use tar's --listed-incremental flag to take incremental backups. The end result is a backup-0 file that has the first full back-up and then backup-1, backup-2, ..., backup-x with the changes in order of the backups.



In the past I have used rsync and hard-links to make backups where backup-0 is current state and each backup-x folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).



I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.



So the idea is to have a growing list of files like so:





  • backup-0.tar.bz2 - this is the current backup and will be the biggest because it is a full backup


  • backup-1.tar.bz2 - this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2)


  • backup-2.tar.bz2 - this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2)


  • backup-3.tar.bz2 - ...


  • backup-4.tar.bz2 - ...


  • backup-5.tar.bz2 - ...


If that doesn't make sense hopefully this will.



First time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file2

  3. make backup-0.tar.bz2


At this point backup-0.tar.bz2 has /tmp/file1 and /tmp/file2.



Second time:




  1. $ touch /tmp/file3

  2. $ rm /tmp/file2

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1 and /tmp/file3


  • backup-1.tar.bz2 has /tmp/file2; it doesn't have file1 cause it didn't change so it's in backup-0.tar.bz2


Third time:




  1. $ touch /tmp/file1

  2. $ touch /tmp/file4

  3. ..do the magic


At this point:





  • backup-0.tar.bz2 has /tmp/file1, /tmp/file3, and /tmp/file4


  • backup-1.tar.bz2 has /tmp/file1 because it was changed


  • backup-2.tar.bz2 has /tmp/file2


Like so:



|       | first time | second time | third time              |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |


I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.




  1. first time = take backup-0

  2. second time


    1. rename backup-0 to backup-1

    2. take backup-0

    3. remove everything from backup-1 that matches backup-0



  3. third time


    1. rename backup-1 to backup-2

    2. rename backup-0 to backup-1

    3. take backup-0

    4. remove everything from backup-1 that matches backup-0



  4. fourth time


    1. rename backup-2 to backup-3

    2. rename backup-1 to backup-2

    3. rename backup-0 to backup-1

    4. take backup-0

    5. remove everything from backup-1 that matches backup-0




I feel like it's that last step (remove everything from backup-1 that matches backup-0) that is inefficient.



My question is, how can I do this? If I use tar's --listed-incremental it'll do the reverse of what I am trying.







linux backup tar






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Nov 17 at 6:38

























asked Nov 17 at 4:19









IMTheNachoMan

17712




17712












  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52


















  • How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
    – IMTheNachoMan
    Nov 17 at 5:52
















How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52




How to do this. If I use tar's --listed-incremental it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52










1 Answer
1






active

oldest

votes

















up vote
0
down vote














If I use tar's --listed-incremental it'll do the reverse of what I am trying.




It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

  2. Restore full backup (now backup-1) to a temporary directory.

  3. Create backup-0 from the current data with a new snapshot file.

  4. Remove backup-1 (previous full backup).

  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




-g, --listed-incremental=FILE

Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






share|improve this answer





















  • Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
    – IMTheNachoMan
    Nov 23 at 22:12











Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














 

draft saved


draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1376154%2fincremental-backups-with-tar-where-current-file-has-most-recent-and-previous-fil%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes








up vote
0
down vote














If I use tar's --listed-incremental it'll do the reverse of what I am trying.




It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

  2. Restore full backup (now backup-1) to a temporary directory.

  3. Create backup-0 from the current data with a new snapshot file.

  4. Remove backup-1 (previous full backup).

  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




-g, --listed-incremental=FILE

Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






share|improve this answer





















  • Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
    – IMTheNachoMan
    Nov 23 at 22:12















up vote
0
down vote














If I use tar's --listed-incremental it'll do the reverse of what I am trying.




It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

  2. Restore full backup (now backup-1) to a temporary directory.

  3. Create backup-0 from the current data with a new snapshot file.

  4. Remove backup-1 (previous full backup).

  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




-g, --listed-incremental=FILE

Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






share|improve this answer





















  • Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
    – IMTheNachoMan
    Nov 23 at 22:12













up vote
0
down vote










up vote
0
down vote










If I use tar's --listed-incremental it'll do the reverse of what I am trying.




It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

  2. Restore full backup (now backup-1) to a temporary directory.

  3. Create backup-0 from the current data with a new snapshot file.

  4. Remove backup-1 (previous full backup).

  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




-g, --listed-incremental=FILE

Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.






share|improve this answer













If I use tar's --listed-incremental it'll do the reverse of what I am trying.




It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:




  1. Rename backup-N to backup-(N+1) looping from Nmax down to 0.

  2. Restore full backup (now backup-1) to a temporary directory.

  3. Create backup-0 from the current data with a new snapshot file.

  4. Remove backup-1 (previous full backup).

  5. Treat the temporary directory as a "new" version. Create backup-1 as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).


You may wonder if this will keep the old (kept) backup-N files coherent with the new ones. A reasonable doubt, since the manual says:




-g, --listed-incremental=FILE

Handle new GNU-format incremental backups. FILE is the name of a snapshot file, where tar stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. If FILE does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level 0 dump). To create incremental archives of non-zero level N, create a copy of the snapshot file created during the level N-1, and use it as FILE.




So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N files every time you perform a full backup. But then:




When listing or extracting, the actual contents of FILE is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use /dev/null in its place.




This means if you extract backup-N files in increasing sequence to get a state from some time ago, any backup-M file (M>0) only expects a valid M-1 state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M file based on a full backup (as you will do, every backup-M will start as backup-1 where backup-0 is a full backup) or based on a chain of incremental backups (as the manual suggests).





I understand your point is to keep backup-0 as an up-to-date full backup and to be able to "go back in time" with backup-0, backup-1, backup-2, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1 and upload a full new backup-0 every time. If your data is huge then uploading a full backup every time will be a pain.



For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup few times:




rdiff-backup backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup. rdiff-backup also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also, rdiff-backup can operate in a bandwidth efficient manner over a pipe, like rsync.




Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.







share|improve this answer












share|improve this answer



share|improve this answer










answered Nov 17 at 7:26









Kamil Maciorowski

22.8k155072




22.8k155072












  • Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
    – IMTheNachoMan
    Nov 23 at 22:12


















  • Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
    – IMTheNachoMan
    Nov 23 at 22:12
















Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12




Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12


















 

draft saved


draft discarded



















































 


draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1376154%2fincremental-backups-with-tar-where-current-file-has-most-recent-and-previous-fil%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Сан-Квентин

Алькесар

Josef Freinademetz