Incremental backups with tar where current file has most recent and previous files only have different...
up vote
1
down vote
favorite
I am somewhat familiar with how to use tar
's --listed-incremental
flag to take incremental backups. The end result is a backup-0
file that has the first full back-up and then backup-1
, backup-2
, ..., backup-x
with the changes in order of the backups.
In the past I have used rsync
and hard-links to make backups where backup-0
is current state and each backup-x
folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).
I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.
So the idea is to have a growing list of files like so:
backup-0.tar.bz2
- this is the current backup and will be the biggest because it is a full backup
backup-1.tar.bz2
- this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2
)
backup-2.tar.bz2
- this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2
)
backup-3.tar.bz2
- ...
backup-4.tar.bz2
- ...
backup-5.tar.bz2
- ...
If that doesn't make sense hopefully this will.
First time:
$ touch /tmp/file1
$ touch /tmp/file2
- make
backup-0.tar.bz2
At this point backup-0.tar.bz2
has /tmp/file1
and /tmp/file2
.
Second time:
$ touch /tmp/file3
$ rm /tmp/file2
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
and/tmp/file3
backup-1.tar.bz2
has/tmp/file2
; it doesn't havefile1
cause it didn't change so it's inbackup-0.tar.bz2
Third time:
$ touch /tmp/file1
$ touch /tmp/file4
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
,/tmp/file3
, and/tmp/file4
backup-1.tar.bz2
has/tmp/file1
because it was changed
backup-2.tar.bz2
has/tmp/file2
Like so:
| | first time | second time | third time |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |
I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.
- first time = take
backup-0
- second time
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- third time
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- fourth time
- rename
backup-2
tobackup-3
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
I feel like it's that last step (remove everything from backup-1
that matches backup-0
) that is inefficient.
My question is, how can I do this? If I use tar
's --listed-incremental
it'll do the reverse of what I am trying.
linux backup tar
add a comment |
up vote
1
down vote
favorite
I am somewhat familiar with how to use tar
's --listed-incremental
flag to take incremental backups. The end result is a backup-0
file that has the first full back-up and then backup-1
, backup-2
, ..., backup-x
with the changes in order of the backups.
In the past I have used rsync
and hard-links to make backups where backup-0
is current state and each backup-x
folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).
I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.
So the idea is to have a growing list of files like so:
backup-0.tar.bz2
- this is the current backup and will be the biggest because it is a full backup
backup-1.tar.bz2
- this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2
)
backup-2.tar.bz2
- this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2
)
backup-3.tar.bz2
- ...
backup-4.tar.bz2
- ...
backup-5.tar.bz2
- ...
If that doesn't make sense hopefully this will.
First time:
$ touch /tmp/file1
$ touch /tmp/file2
- make
backup-0.tar.bz2
At this point backup-0.tar.bz2
has /tmp/file1
and /tmp/file2
.
Second time:
$ touch /tmp/file3
$ rm /tmp/file2
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
and/tmp/file3
backup-1.tar.bz2
has/tmp/file2
; it doesn't havefile1
cause it didn't change so it's inbackup-0.tar.bz2
Third time:
$ touch /tmp/file1
$ touch /tmp/file4
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
,/tmp/file3
, and/tmp/file4
backup-1.tar.bz2
has/tmp/file1
because it was changed
backup-2.tar.bz2
has/tmp/file2
Like so:
| | first time | second time | third time |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |
I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.
- first time = take
backup-0
- second time
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- third time
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- fourth time
- rename
backup-2
tobackup-3
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
I feel like it's that last step (remove everything from backup-1
that matches backup-0
) that is inefficient.
My question is, how can I do this? If I use tar
's --listed-incremental
it'll do the reverse of what I am trying.
linux backup tar
How to do this. If I usetar
's--listed-incremental
it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52
add a comment |
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I am somewhat familiar with how to use tar
's --listed-incremental
flag to take incremental backups. The end result is a backup-0
file that has the first full back-up and then backup-1
, backup-2
, ..., backup-x
with the changes in order of the backups.
In the past I have used rsync
and hard-links to make backups where backup-0
is current state and each backup-x
folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).
I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.
So the idea is to have a growing list of files like so:
backup-0.tar.bz2
- this is the current backup and will be the biggest because it is a full backup
backup-1.tar.bz2
- this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2
)
backup-2.tar.bz2
- this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2
)
backup-3.tar.bz2
- ...
backup-4.tar.bz2
- ...
backup-5.tar.bz2
- ...
If that doesn't make sense hopefully this will.
First time:
$ touch /tmp/file1
$ touch /tmp/file2
- make
backup-0.tar.bz2
At this point backup-0.tar.bz2
has /tmp/file1
and /tmp/file2
.
Second time:
$ touch /tmp/file3
$ rm /tmp/file2
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
and/tmp/file3
backup-1.tar.bz2
has/tmp/file2
; it doesn't havefile1
cause it didn't change so it's inbackup-0.tar.bz2
Third time:
$ touch /tmp/file1
$ touch /tmp/file4
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
,/tmp/file3
, and/tmp/file4
backup-1.tar.bz2
has/tmp/file1
because it was changed
backup-2.tar.bz2
has/tmp/file2
Like so:
| | first time | second time | third time |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |
I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.
- first time = take
backup-0
- second time
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- third time
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- fourth time
- rename
backup-2
tobackup-3
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
I feel like it's that last step (remove everything from backup-1
that matches backup-0
) that is inefficient.
My question is, how can I do this? If I use tar
's --listed-incremental
it'll do the reverse of what I am trying.
linux backup tar
I am somewhat familiar with how to use tar
's --listed-incremental
flag to take incremental backups. The end result is a backup-0
file that has the first full back-up and then backup-1
, backup-2
, ..., backup-x
with the changes in order of the backups.
In the past I have used rsync
and hard-links to make backups where backup-0
is current state and each backup-x
folder has the files that were specific to that backup. Basically what is outlined http://www.mikerubel.org/computers/rsync_snapshots/ and http://www.admin-magazine.com/Articles/Using-rsync-for-Backups/(offset).
I want mimic that functionality with tar. I cannot use hard-links because the tar files will ultimately be uploaded to a cloud provider that doesn't maintain/understand links and what not. I also want to tar the backups because I can also encrypt them before they are uploaded to the cloud.
So the idea is to have a growing list of files like so:
backup-0.tar.bz2
- this is the current backup and will be the biggest because it is a full backup
backup-1.tar.bz2
- this is yesterday's backup but it will only have the files that are different from what is in current (backup-0.tar.bz2
)
backup-2.tar.bz2
- this is the backup from two days ago but it will only have the files that are different from yesterday (backup-1.tar.bz2
)
backup-3.tar.bz2
- ...
backup-4.tar.bz2
- ...
backup-5.tar.bz2
- ...
If that doesn't make sense hopefully this will.
First time:
$ touch /tmp/file1
$ touch /tmp/file2
- make
backup-0.tar.bz2
At this point backup-0.tar.bz2
has /tmp/file1
and /tmp/file2
.
Second time:
$ touch /tmp/file3
$ rm /tmp/file2
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
and/tmp/file3
backup-1.tar.bz2
has/tmp/file2
; it doesn't havefile1
cause it didn't change so it's inbackup-0.tar.bz2
Third time:
$ touch /tmp/file1
$ touch /tmp/file4
- ..do the magic
At this point:
backup-0.tar.bz2
has/tmp/file1
,/tmp/file3
, and/tmp/file4
backup-1.tar.bz2
has/tmp/file1
because it was changed
backup-2.tar.bz2
has/tmp/file2
Like so:
| | first time | second time | third time |
|-------|------------|-------------|-------------------------|
| file1 | backup-0 | backup-0 | backup-0 and backup-1 |
| file2 | backup-0 | backup-1 | backup-2 |
| file3 | | backup-0 | backup-0 |
| file4 | | | backup-0 |
I figured this is one way to approach it but it seems horribly inefficient to me. Maybe there are features/flags I can use that would make this more efficient.
- first time = take
backup-0
- second time
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- third time
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
- fourth time
- rename
backup-2
tobackup-3
- rename
backup-1
tobackup-2
- rename
backup-0
tobackup-1
- take
backup-0
- remove everything from
backup-1
that matchesbackup-0
- rename
I feel like it's that last step (remove everything from backup-1
that matches backup-0
) that is inefficient.
My question is, how can I do this? If I use tar
's --listed-incremental
it'll do the reverse of what I am trying.
linux backup tar
linux backup tar
edited Nov 17 at 6:38
asked Nov 17 at 4:19
IMTheNachoMan
17712
17712
How to do this. If I usetar
's--listed-incremental
it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52
add a comment |
How to do this. If I usetar
's--listed-incremental
it'll do the reverse of what I am trying.
– IMTheNachoMan
Nov 17 at 5:52
How to do this. If I use
tar
's --listed-incremental
it'll do the reverse of what I am trying.– IMTheNachoMan
Nov 17 at 5:52
How to do this. If I use
tar
's --listed-incremental
it'll do the reverse of what I am trying.– IMTheNachoMan
Nov 17 at 5:52
add a comment |
1 Answer
1
active
oldest
votes
up vote
0
down vote
If I use
tar
's--listed-incremental
it'll do the reverse of what I am trying.
It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:
- Rename
backup-N
tobackup-(N+1)
looping from Nmax down to 0. - Restore full backup (now
backup-1
) to a temporary directory. - Create
backup-0
from the current data with a new snapshot file. - Remove
backup-1
(previous full backup). - Treat the temporary directory as a "new" version. Create
backup-1
as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).
You may wonder if this will keep the old (kept) backup-N
files coherent with the new ones. A reasonable doubt, since the manual says:
-g
,--listed-incremental=FILE
Handle new GNU-format incremental backups.FILE
is the name of a snapshot file, wheretar
stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. IfFILE
does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level0
dump). To create incremental archives of non-zero levelN
, create a copy of the snapshot file created during the levelN-1
, and use it asFILE
.
So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N
files every time you perform a full backup. But then:
When listing or extracting, the actual contents of
FILE
is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use/dev/null
in its place.
This means if you extract backup-N
files in increasing sequence to get a state from some time ago, any backup-M
file (M>0) only expects a valid M-1
state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M
file based on a full backup (as you will do, every backup-M
will start as backup-1
where backup-0
is a full backup) or based on a chain of incremental backups (as the manual suggests).
I understand your point is to keep backup-0
as an up-to-date full backup and to be able to "go back in time" with backup-0
, backup-1
, backup-2
, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1
and upload a full new backup-0
every time. If your data is huge then uploading a full backup every time will be a pain.
For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup
few times:
rdiff-backup
backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.rdiff-backup
also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also,rdiff-backup
can operate in a bandwidth efficient manner over a pipe, likersync
.
Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
add a comment |
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
If I use
tar
's--listed-incremental
it'll do the reverse of what I am trying.
It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:
- Rename
backup-N
tobackup-(N+1)
looping from Nmax down to 0. - Restore full backup (now
backup-1
) to a temporary directory. - Create
backup-0
from the current data with a new snapshot file. - Remove
backup-1
(previous full backup). - Treat the temporary directory as a "new" version. Create
backup-1
as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).
You may wonder if this will keep the old (kept) backup-N
files coherent with the new ones. A reasonable doubt, since the manual says:
-g
,--listed-incremental=FILE
Handle new GNU-format incremental backups.FILE
is the name of a snapshot file, wheretar
stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. IfFILE
does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level0
dump). To create incremental archives of non-zero levelN
, create a copy of the snapshot file created during the levelN-1
, and use it asFILE
.
So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N
files every time you perform a full backup. But then:
When listing or extracting, the actual contents of
FILE
is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use/dev/null
in its place.
This means if you extract backup-N
files in increasing sequence to get a state from some time ago, any backup-M
file (M>0) only expects a valid M-1
state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M
file based on a full backup (as you will do, every backup-M
will start as backup-1
where backup-0
is a full backup) or based on a chain of incremental backups (as the manual suggests).
I understand your point is to keep backup-0
as an up-to-date full backup and to be able to "go back in time" with backup-0
, backup-1
, backup-2
, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1
and upload a full new backup-0
every time. If your data is huge then uploading a full backup every time will be a pain.
For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup
few times:
rdiff-backup
backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.rdiff-backup
also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also,rdiff-backup
can operate in a bandwidth efficient manner over a pipe, likersync
.
Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
add a comment |
up vote
0
down vote
If I use
tar
's--listed-incremental
it'll do the reverse of what I am trying.
It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:
- Rename
backup-N
tobackup-(N+1)
looping from Nmax down to 0. - Restore full backup (now
backup-1
) to a temporary directory. - Create
backup-0
from the current data with a new snapshot file. - Remove
backup-1
(previous full backup). - Treat the temporary directory as a "new" version. Create
backup-1
as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).
You may wonder if this will keep the old (kept) backup-N
files coherent with the new ones. A reasonable doubt, since the manual says:
-g
,--listed-incremental=FILE
Handle new GNU-format incremental backups.FILE
is the name of a snapshot file, wheretar
stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. IfFILE
does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level0
dump). To create incremental archives of non-zero levelN
, create a copy of the snapshot file created during the levelN-1
, and use it asFILE
.
So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N
files every time you perform a full backup. But then:
When listing or extracting, the actual contents of
FILE
is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use/dev/null
in its place.
This means if you extract backup-N
files in increasing sequence to get a state from some time ago, any backup-M
file (M>0) only expects a valid M-1
state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M
file based on a full backup (as you will do, every backup-M
will start as backup-1
where backup-0
is a full backup) or based on a chain of incremental backups (as the manual suggests).
I understand your point is to keep backup-0
as an up-to-date full backup and to be able to "go back in time" with backup-0
, backup-1
, backup-2
, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1
and upload a full new backup-0
every time. If your data is huge then uploading a full backup every time will be a pain.
For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup
few times:
rdiff-backup
backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.rdiff-backup
also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also,rdiff-backup
can operate in a bandwidth efficient manner over a pipe, likersync
.
Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
add a comment |
up vote
0
down vote
up vote
0
down vote
If I use
tar
's--listed-incremental
it'll do the reverse of what I am trying.
It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:
- Rename
backup-N
tobackup-(N+1)
looping from Nmax down to 0. - Restore full backup (now
backup-1
) to a temporary directory. - Create
backup-0
from the current data with a new snapshot file. - Remove
backup-1
(previous full backup). - Treat the temporary directory as a "new" version. Create
backup-1
as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).
You may wonder if this will keep the old (kept) backup-N
files coherent with the new ones. A reasonable doubt, since the manual says:
-g
,--listed-incremental=FILE
Handle new GNU-format incremental backups.FILE
is the name of a snapshot file, wheretar
stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. IfFILE
does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level0
dump). To create incremental archives of non-zero levelN
, create a copy of the snapshot file created during the levelN-1
, and use it asFILE
.
So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N
files every time you perform a full backup. But then:
When listing or extracting, the actual contents of
FILE
is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use/dev/null
in its place.
This means if you extract backup-N
files in increasing sequence to get a state from some time ago, any backup-M
file (M>0) only expects a valid M-1
state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M
file based on a full backup (as you will do, every backup-M
will start as backup-1
where backup-0
is a full backup) or based on a chain of incremental backups (as the manual suggests).
I understand your point is to keep backup-0
as an up-to-date full backup and to be able to "go back in time" with backup-0
, backup-1
, backup-2
, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1
and upload a full new backup-0
every time. If your data is huge then uploading a full backup every time will be a pain.
For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup
few times:
rdiff-backup
backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.rdiff-backup
also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also,rdiff-backup
can operate in a bandwidth efficient manner over a pipe, likersync
.
Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.
If I use
tar
's--listed-incremental
it'll do the reverse of what I am trying.
It's good you realize this. I can see upsides and downsides of either direction (I won't discuss them here). Technically it's possible to reverse the process:
- Rename
backup-N
tobackup-(N+1)
looping from Nmax down to 0. - Restore full backup (now
backup-1
) to a temporary directory. - Create
backup-0
from the current data with a new snapshot file. - Remove
backup-1
(previous full backup). - Treat the temporary directory as a "new" version. Create
backup-1
as incremental backup, providing the snapshot file from the previous step. (Note you need to change your working directory from the one with current data to the temporary one, so relative paths stay the same).
You may wonder if this will keep the old (kept) backup-N
files coherent with the new ones. A reasonable doubt, since the manual says:
-g
,--listed-incremental=FILE
Handle new GNU-format incremental backups.FILE
is the name of a snapshot file, wheretar
stores additional information which is used to decide which files changed since the previous incremental dump and, consequently, must be dumped again. IfFILE
does not exist when creating an archive, it will be created and all files will be added to the resulting archive (the level0
dump). To create incremental archives of non-zero levelN
, create a copy of the snapshot file created during the levelN-1
, and use it asFILE
.
So it suggests the snapshot file should be updated all the way from the full backup, as if you would need to rebuild backup-N
files every time you perform a full backup. But then:
When listing or extracting, the actual contents of
FILE
is not inspected, it is needed only due to syntactical requirements. It is therefore common practice to use/dev/null
in its place.
This means if you extract backup-N
files in increasing sequence to get a state from some time ago, any backup-M
file (M>0) only expects a valid M-1
state to exist. It doesn't matter if this state is obtained from a full or incremental backup, the point is these states should be identical anyway. So it shouldn't matter if you created the backup-M
file based on a full backup (as you will do, every backup-M
will start as backup-1
where backup-0
is a full backup) or based on a chain of incremental backups (as the manual suggests).
I understand your point is to keep backup-0
as an up-to-date full backup and to be able to "go back in time" with backup-0
, backup-1
, backup-2
, … If you want to keep these files in a "dumb" cloud service, you'll need to carefully rename them according to the procedure, replace backup-1
and upload a full new backup-0
every time. If your data is huge then uploading a full backup every time will be a pain.
For this reason it's advisable to have a "smart" server that can build the current full backup every time you upload a "past-to-present" incremental backup. I have used rdiff-backup
few times:
rdiff-backup
backs up one directory to another, possibly over a network. The target directory ends up a copy of the source directory, but extra reverse diffs are stored in a special subdirectory of that target directory, so you can still recover files lost some time ago. The idea is to combine the best features of a mirror and an incremental backup.rdiff-backup
also preserves subdirectories, hard links, dev files, permissions, uid/gid ownership, modification times, extended attributes, acls, and resource forks. Also,rdiff-backup
can operate in a bandwidth efficient manner over a pipe, likersync
.
Please note the software hasn't been updated since 2009. I don't know if it's a good recommendation nowadays.
answered Nov 17 at 7:26
Kamil Maciorowski
22.8k155072
22.8k155072
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
add a comment |
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
Thanks. This could work but it would require a lot of space to do the full extract to the temp directory. I have an idea to do what I want and am working on a script. 1) dump inventory of files to backup including mod time and size 2) archive files, including inventory files then later 1) extract inventory file from archive 2) take new inventory file 3) compare two files 4) extract different files and put in new archive.
– IMTheNachoMan
Nov 23 at 22:12
add a comment |
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1376154%2fincremental-backups-with-tar-where-current-file-has-most-recent-and-previous-fil%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
How to do this. If I use
tar
's--listed-incremental
it'll do the reverse of what I am trying.– IMTheNachoMan
Nov 17 at 5:52