Git Website Codebase Management

This article explains how I manage my website codebase using Linux and Git. It assumes you have working familiarity with both.

Background

Despite being a longstanding Unix/Linux user, for over a decade I used Dreamweaver running on WinXP to enter raw article text and html, clean up source code, and transfer files to and from the site.

When Microsoft sealed the fate of WinXP and I realized Adobe had moved to a subscription model with the singular aim to help their corporate officers buy more yachts I knew it was time to move development to Linux.

And here I am. Today I write my content in the Atom editor and manage it with git. I've used Git for software development projects since 2010 and in 2014 decided to use it for deployment as well.

Importing Codebase into Git

To begin tracking changes to my website codebase with Git I changed into my top level source directory and initialized Git, added all the files in the tree to the staging area, and then committed those changes.

$ cd dvatp
$ git init
$ git add .
$ git commit -m "Initial checkin"

Routine Development in a Local Repository

I routinely use Git to add and commit changes to the codebase with:

$ cd dvatp
$ git add .
$ git commit -m "Another amazing change"

...where dvatp is the name of the top-level directory that contains my website codebase.

This naturally provides several benefits: the ability to monitor changes in the codebase as I work via:

$ git status

...the ability to monitor my commit history via:

$ git log

...and since Git makes creating and working with branches so easy I have the ability to work on big projects in a development branch created for that purpose while I continue to submit regular (but non-critical) changes directly in the master branch so I can quickly publish them.

The "new feature" development process is as follows:

$ git checkout -b newfeature
# edit some files here
$ git add .
$ git commit
$ git checkout master
$ git merge newfeature

After I'm done with the development branches I generally delete them so I don't need to maintain them and continually merge in the changes made directly on the master branch.

$ git branch -d newfeature

Note that I use either -d or -D depending on how confident I am that I have successfully merged everything from the development branch into the master branch. If there's any question I'll typically use "-d" because that will produce a warning if there are unmerged changes.

Working with a Remote Repository

I originally created a remote repository simply to serve as a means to backup my code quickly and efficiently. And based on several years of experience I'd say that this works pretty well.

Creating a remote repository first involves creating a bare repository on the remote machine:

$ mkdir /home/doug/git/dvatp.git
$ cd /home/doug/git/dvatp.git
$ git init --bare

This is only done once per repository. Note that by convention I place all of my bare repositories in a subdirectory within my home directory called "git". I do no work in there. It's only purpose is to serve as a store of repositories.

Then on the local machine I change into the working directory of the repository, create a remote (effectively an easy-to-type alias for a specific repository on my remote host) and then push my code to it.

$ cd dvatp
$ git remote add lindvatp ssh://user@host.com:port/home/doug/git/dvatp.git
$ git push lindvatp master

The remote name "lindvatp" is my shorthand convention for describing both hostname (linode) and specific repository name (dvatp). The ":port" directive, incidentally, is only necessary if the remote host runs SSH on a non-standard port for security reasons.

Then with each change I make to the codebase I add one more command to push the changes to the remote repository. If I happen to be working in the master branch on the local repository this command looks like this:

$ git push lindvatp master

If I'm working in a development branch called "develop" the command would look like this:

$ git push lindvatp develop

If it's not obvious, the most important thing to do prior to a push is to verify that you're on the correct branch in the local repository or you may wind up doing something you don't intend. To make sure I don't do anything stupid I use a utility called git-aware-prompt that alters my bash prompt when working within a Git repository to display the current branch and a few other things. It looks like this in practice:

 doug@debian ~/dvatp/app/assets/ (master)*$

The name inside of the parentheses is the current branch (in this case "master" and the asterisk means uncommitted changes exist somewhere in that repository.

Automatic Deployment From Master Branch

To deploy the code I put a "post-receive" hook script in the remote repository's "hook" directory so that when I push code to the live server the remote repository commences an autonomous checkout of the master branch into a working directory that is referenced by the web server (and hence made live). This allows me to use Git not only as a VCS but as a deployment mechanism as well.

Initially the pre-receive hook script looked like this:

# this version causes a checkout on all pushes
#!/bin/bash
git --work-tree=/home/doug/www/dvatp \
    --git-dir=/home/doug/git/dvatp.git checkout -f

That worked, of course, but it was far from perfect. Like most experienced users of Git I typically do most of my iterative development in what are called feature branches. These branches, while temporary in nature, contain code I need to back up until the code is merged into a generic "development" branch or the master branch itself. I backup these branches by pushing them as needed to the remote repository.

The problem is that each push to the remote server, regardless of the branch name, causes execution of the post-receive hook script, which would then cause a checkout of the master branch. While I don't normally stage code in the master branch (i.e. if it's there, it's ready to go), the idea that routine development could potentially affect production code did not sit well with me so I looked around for a solution and found it:

#!/bin/bash
# only update live code with a push to master branch
while read oldrev newrev refname
do
    branch=$(git rev-parse --symbolic --abbrev-ref $refname)
    if [ "master" == "$branch" ]; then
        git --work-tree=/home/doug/www/dvatp \
            --git-dir=/home/doug/git/dvatp.git checkout -f
    fi
done

This version of the script evaluates whether the revision just arrived in is the master branch, and if so it performs the checkout to deploy the code. This prevents a checkout and hence any changes to the production code running on the live server unless I'm pushing changes to the master branch.

The post-receive script as shown is apparently not foolproof, however, as I have noticed one case where the codebase was not checked out following a push. This occurred shortly after (or in conjunction with) a push following a merge caused by forgetting to rebase (pull) from the remote repository before committing changes on my local repository. The workaround was simple -- I made a inconsequential change to a file, committed it, and then did another push. I'll come up with a solution for this eventually but for now I have an efficient way to track and deploy my code.

Wednesday, April 24, 2024