Or maybe you just want to tar up only directories and subdirectories, excluding files…

I’ve mentioned a few times lately that I’m working on my backup plan for GNU/Linux. I started by looking at great free software tools like Samba’s rsync and GNU Tar, and I don’t think I need to look much further than them. There is also GNU Cpio, which I haven’t really investigated yet.

I may have more to say later about my rsync and tar adventures, but for today here’s something I came up with to emulate a feature of a tool I had in Windows that I couldn’t find how to do with existing tools in GNU. The xcopy DOS command lets you recursively copy files modified after a certain date by using the /D:date option.

your questions:

Why would you want to do this? Well, one reason would be for offsite backups. I regularly store backup discs in a bank safe deposit box. I then occasionally want to copy changed files from the last dropoff, encrypt them, and send to my gmail account or carry around on a USB thumb drive. I’d like to have a lightweight way to do this that didn’t rely on a system of incremental backups.

Doesn’t tar take care of this with the --after-date option? Yes, but I haven’t been able to figure out how to get it to exclude directories that are empty of files after the date. The only mention of “prune” in nearly one megabyte of documentation is in reference to the fruit. (The documentation is thorough and good; I wouldn’t be surprised if what I want is possible and I just missed it.)

If I have a hierarchy with hundreds of folders and only a handful of files have changed, I don’t want to store all of the empty directories in my tar file. (Nor do I want to extract them later if it becomes necessary.) It’s not that big of a deal, but I became motivated by the challenge of figuring out a way to do this, and kept scratching this (very minor) itch.

Well, why don’t you just use [some easy/obvious method]? Now you tell me! I’d be very interested to learn about built-in ways to accomplish what I’m trying to do, whether it’s copying files to a target directory or directly in to an archive file. I might feel silly if it’s really obvious or something I should have found in the documentation or my searches, but I wouldn’t regret the time I’ve spent cobbling together these scripts. I learned a lot about the tools and about bash scripting.

It’s often best for me to struggle with things to get the lessons embedded in my pulpy grey matter. In previous entries I’ve railed against the free software migration and learning process because my time is in such short supply, but I’m getting past that. It will take as long as it takes.

And I’m enjoying the learning. I’ve always known the Unix command line is a powerful tool, and learning how to make it go is fun for me. You can find a lot of information about shell scripting online, of course, but I also recommend the O’Reilly book Learning the bash Shell. It’s a good introduction and reference.

Ok, let’s get to it. Read below the fold for information about the cpafter.sh script and to download it if you think it will make your life complete…

cpafter.sh administrative preliminary

First things first, keep in mind my site disclaimer and the notice at the top of the script files. Run these scripts at your own risk. They are pretty simple and you can use your own judgment about how safe they are to run. (I think they are as safe and as risky as the cp command when used with the “recursive” and “force” options.)

I considered releasing these in to the public domain, since there isn’t much to them, but since one of my motivations in starting this site was to promote free software, I decided to release them under the GNU General Public License as a way of proudly stating my belief in the GNU philosophy. It may seem like a lot of thought and extra lines (for the license notification) to put in to a couple of clumsy shell scripts, but it makes me feel good to counter all the anti-innovative (antovative?!) proprietary claims out there with the statement that here is something free for anybody to use and copy.

question

How are we going to recursively copy files that have been modified after a certain date?

answer

We’ll use the find, mkdir, and cp commands as our building blocks.

find -daystart -mtime n

Find lets you search for files last modified n*24 hours ago (or more recently). You can read the man page to see how the -mtime option works, but how about if we try some examples? I do much better with examples.

Let’s say it’s 3pm today and you have three files:

  1. file_today_2pm
  2. file_yesterday_4pm
  3. file_yesterday_2pm

If you run find . -type f -mtime 0, you’ll find files #1 and #2.

If you run find . -type f -mtime 1, you’ll find file #3.

This is because -mtime causes find to look at things in discrete 24-hour time periods, starting from right now. At 3pm, files #1 and #2 are in the first time period (counting backwards), and #3 is in the next.

If you wanted to see all files modified in the first two time periods (counting backwards!), you would use find . -type f -mtime -1, which would return files 1-3.

To make things cleaner, you might want to use time periods that start at midnight. You’d use the -daystart option for this, and:

find . -type f -daystart -mtime 0 would get you file #1.

find . -type f -daystart -mtime 1 would return files #2 and #3.

The cumulative “minus” modifier behaves differently than what I’d expect:

find . -type f -daystart -mtime -1 only gives you file #1.

find . -type f -daystart -mtime -2 returns all three files.

Once we find the files we care about, we’ll want to copy them. For this, we can call on the -exec option. I’ve used -exec for years with grep in HP-UX (their grep doesn’t have a recursive option), and now I had the chance to learn some more about it. -exec allows you to call another program to operate on each of the found files. Let’s look at the grep example first, since it’s simple:

find . -type f -exec grep some_string {} \;

This will take each file and tell grep to look for some_string in that file. {} is an argument for grep — it’s the “found” filename. The semi-colon terminates the grep command. (The man page for find says that both the ‘{}’ and ‘;’ may need to be escaped with a \backslash or quoted to protect them from expansion by the shell. In my experience in HP-UX and GNU/Linux, it’s just the semi-colon that needs the backslash.)

mkdir and cp

My HP-UX find-and-grep experience suggested the answer for my problem. I could use find -mtime to find my files and then call another command to copy them to the target hierarchy. Since cp (as far as I can tell) doesn’t allow you to force creation of parent directories if needed, I would need a second script to be called via -exec. It would simply make the target directory if necessary and then copy the file over.

logical operation

Let’s say we have:

~/test/
       file1.txt
       file2.txt
       apple/
             green.txt
             red.txt
       jupiter/
               moon/
                    io.txt
                    europa.txt
               red_spot/
                        storm.txt
~/test_target/

And we want to copy files modified in the past couple of days from “test” to “test_target”. Maybe that includes ~/test/apple/red.txt and ~/test/jupiter/moon/europa.txt. cpafter.sh will descend in to ~/test/ so that:

find . -type f -daystart -mtime -2 will give these results:

./apple/red.txt
./jupiter/moon/europa.txt

We’ll pass these via -exec to copy_it.sh along with our target_dir with the understanding that it will build those paths (if necessary) under ~/test_target/ and then copy the files.

let’s look at the scripts (finally)

download

cpafter.tar.gz (14 KB)
contains:

I added some command line option parsing and checks in an attempt to make the program more generally useful and robust, but haven’t spent a lot of time on this since I created it for my own itch and I don’t expect a large audience for this thing. In the spirit of the free and open source development process, I would be happy to entertain any improvements that are offered back to me.

cpafter.sh

Let’s take a look at selected parts of the script and its operation. Here’s the usage/help information for cpafter.sh:

Copies files modified on or after the given date from source dir
to target dir, creating any subdirectories as needed.

	usage: cpafter.sh [-vf] -a after_date_YYYYMMDD -s source_dir -t target_dir
		-v verbose
		-f force target dir creation or copying to non-empty target dir

To manage the parameters, I learned how to use the nice getopts built-in command. Unfortunately, getopts doesn’t handle long option names, but I can live with that for the moment. (There is also the older external command “getopt” that can be made to work with long option names.)

while getopts ":vfa:s:t:" opt
do
	case $opt in
		v  ) verbose=" -v " ;;
		f  ) force="true" ;;
		a  ) after_date=$OPTARG ;;
		s  ) from_dir=$OPTARG ;;
		t  ) to_dir=$OPTARG ;;
		\\? ) echo -e $usage
		     exit 1
	esac
done

My bash book goes in to these features, and I also found a lot of web pages explaining them. Try searching for [bash getopt getopts]. This is one of those things that slows you down: trying to learn the right way to do something. But I think it will be a good habit to get in to. Enough said on this since this post is supposed to be about copying files.

It seemed more natural and less ambiguous to pass in a date as an argument rather than the number of days, so the first thing we need to do is convert our YYYYMMDD formatted date in to a number for -mtime, so let’s skip by some error checking to the date stuff:

after_date_epochal=$(date -d $after_date +%s)

today=$(date +%Y%m%d)
today_epochal=$(date -d $today +%s)

date_dif=$(( (($after_date_epochal - $today_epochal) / 60 / 60 / 24) - 1))

$after_date_epochal is the number of seconds since the epoch (1 Jan 1970) for our “after date.” Then we get midnight of the current day as seconds also, and do some math to find the difference for our -mtime number, $date_dif. That extra “- 1” on the end makes up for the unexpected (to me) extra one that we had to subtract in our example above.

Now for some funny business with our directories:

orig_dir=$PWD

cd $to_dir
to_dir=$PWD
cd $orig_dir

cd $from_dir

I wanted to make certain assumptions in copy_it.sh about the target dir which wouldn’t work if a path with no slashes was used for the target directory, and the above seemed like an easy (if not elegant) way to do it. Then we descend in to the “from dir,” which works whether it is relative or absolute because we first return to the original directory that the script started in.

Now! Let’s run our big find command:

find . -type f -daystart -mtime $date_dif -exec copy_it.sh $verbose -s {} -t $to_dir \;
find . -type l -daystart -mtime $date_dif -exec copy_it.sh $verbose -s {} -t $to_dir \;

I run it twice to look for -type f (regular files) and -type l (symbolic links). I think those are the only things I’d want to use this for. I don’t know of a way to search for both at the same time, which seems like it would be prettier.

copy_it.sh

Here’s the usage/help information for copy_it.sh:

Copies a file to a dir, using the path information from the file to
build a path from the given dir root if necessary. Meant to be used
with cpafter.sh.

	usage: copy_it.sh [-v] -s source_file -t target_directory_root
		-v verbose (just lists the source file)

And here is the meat of the script:

#regex -- does string start with dot slash?
if [[ ! "$from_file" =~ ^\./ ]]; then
	from_file="./$from_file"	#in case only a filename was given
fi

#return $from_file up to (but not including) last slash
add_to_target=${from_file%/*}

if [[ ! -d "$to_dir/$add_to_target" ]]; then

	mkdir -p "$to_dir/$add_to_target"
fi
cp -pdf "$curdir/$from_file" "$to_dir/$from_file"

mkdir -p causes any necessary parent directories to be created. cp -pdf preserves permissions, etc., doesn’t follow symbolic links, and forces a copy if the destination file cannot be opened (removes it and tries again). (Thanks to Trip for pointing out that $to_dir/$add_to_target in the mkdir line needs to be quoted to handle spaces in filenames. I guess I originally tested with spaces in filenames but not dirs.)

cpafter.sh download

cpafter.tar.gz (14 KB)
contains:

run it

Using our earlier dir structure, let’s say we tried from our home dir:

cpafter.sh -v -a 20070414 -s test -t test_target

The verbose output would be:

copying from /home/username/test
        to /home/username/test_target

(mkdir) ./apple/
./apple/red.txt
(mkdir) ./jupiter/moon/
./jupiter/moon/europa.txt

And you would have:

~/test_target/apple/red.txt
~/test_target/jupiter/moon/europa.txt

holy cow, where did the time go?

Phew! It really takes a long time to write these things up. I often wonder if I should spend so much time but then it usually turns out someone finds the post useful so I feel like it was time well spent. Thank you for your patronage.

Update, 8 July 2007

I ran in to some trouble with differences in the behavior of the =~ regex matching operator between bash 3.2 (included with my Ubuntu 7.04/Feisty Fawn install) and bash 3.1 (included with Ubuntu 6.10/Edgy Eft and where I originally developed these scripts). I had seen warnings about this change in 3.2, and was able to make one thing compatible between the two versions, but couldn’t make parentheses capture a match in ${BASH_REMATCH[1]} in version 3.2. Also saw some other strange differences in behavior. In the end, I worked out a different way to do things using the pattern matching operator ${variable%pattern}, and have included that in the updated copy_it.sh script.

While I was at it, I made a change to use the more modern and easier to use $(cmd) syntax for running an external command rather than the backtick method: `cmd`.

And I updated the license to use GPLv3.

The old tar.gz file can be found at cpafter-v01.tar.gz.

Update, 12 August 2007

Trip helpfully pointed out in comments that $to_dir/$add_to_target should be quoted for the mkdir command in the copy_it.sh script so that spaces and other odd characters will be handled correctly. This has been fixed.

Also, Wolfgang Murth (alwag at gmx dot at) very kindly submitted a change for specifying dates differently, by number of days instead of passing in the date as YYYYMMDD. I really appreciate the time and effort, but I’m going to hold off on adding something like that for now. It makes the script more flexible and it would handle copying things older than N days also, but I can’t think of enough scenarios where I would use this feature and I’d prefer to keep things simple if possible. But you can see for yourself: cpafter_alternate_date_option.sh. Thanks, Wolfgang!

The old tar.gz file can be found at cpafter-v02.tar.gz.

Update, 27 September 2007

I created a Google Code Project for these scripts at http://code.google.com/p/bash-cpafter/. I’m not sure if I’ll do much with it. I mostly just wanted to test out the project hosting there. It looks pretty good, offering an SVN repository and issues tracking and other goodies. I like having stuff here, but I may not want to run and maintain these more sophisticated features myself.