Purge A File (Or Directory) From A Git Repository

...and remove all traces from the entire history of the repo

[2 minute read]

Warning: the filter-branch command rewrites history! One side-effect of removing a file from a commit is that the commit’s hash is changed, along with the hash of all subsequent commits. If you run this command on a shared branch, make absolutely certain that you understand the consequences before proceeding.

Sounded easy enough. Move a git repo from Server A to Server B. How hard could that be? But then:

remote: warning: Large files detected.
remote: error: File giant_file is 123.00 MB; this exceeds GitHub's file size limit of 100 MB

Well, okay, no problem. giant_file was deleted long ago and only exists in history. I’ll just adiĆ³s that badboy forever. How hard could that be? A little googling, and I find quite a few posts that all describe various ways to do the accomplish the same end - I saw two different pages on github.com alone. (Wtf, right?) So finally I settled on this:

git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch path/to/giant_file' HEAD

It worked, but it was sloooooooooooooow. Maybe because my repository is huuuuuuuuuge, but it’s worth stating for the record that this process may not be as fast as, say, git commit -m "I did things". It took twenty minutes to remove the aforementioned badboy. But the command worked as expected and I learned a bit about how filter-branch works. So: yay! Okay, next up: git push. Yadda yadda yadda, and then:

remote: warning: Large files detected.
remote: error: File giant_file is 123.00 MB; this exceeds GitHub's file size limit of 100 MB

Not cool, man! NOT COOL AT ALL!

Well. Take another gander at that last argument in my filter-branch command. Yeahhhhh: HEAD. Can you see where I’m going with this? As the documentation tells us, that final argument is for the “rev list”:

All positive refs included by these options are rewritten. You may also specify options such as –all, but you must use – to separate them from the git filter-branch options.

All I did was remove giant_file from the branch associated with my current HEAD, in this case master. My white whale was still lurking about elsewhere in history. This time I ran:

git filter-branch --index-filter 'git rm -r --cached --ignore-unmatch path/to/giant_file' -- --all

And then, twenty-some minutes and four foozball games later: git push.

Success! giant_file is now in git heaven and my repo is thriving happily on Server B.

- THE END -