Normalizing line endings in git repositories

If you develop your project (happily) in a cave where all other tribe members use the same configuration, you may be oblivious to the raging line endings war that has cursed computing for decades. If it so happens that another tribe member arrives with another configuration, the line ending genes may mash up and create stronger files, if you consider mixed line endings genetically strong. Unfortunately, this causes severe insanity in the tribe, so you would like to set on a path to eradicate the genetic variation and establish supreme rule of the LF. Or the CRLF. I won't judge your preferences.

Two methods, two end results

Choosing either method depends on what you value the most: preserving what is pushed upstream, or preserving the commit history.

If you care about upstream the most, you can follow the suggestion by GitHub: its TL;DR version is to create a commit that fixes all the line endings. This approach is perfect for open-source projects, since you cannot change your project's history without mangling all forks. The downside is that each time you do a git blame from now on, it will point to the same commit that fixed the line endings. For your convenience, it is reposted here:

git rm --cached -r .
git reset --hard
git add .
git commit -m "Normalize line endings"

If you want to preserve your commit history and have more control over your repo, you can do this through git filter-branch, replacing the line endings in each offending commit. This is usually useful for private repos, and you would need to contact all project members to let them know. I urge you to have a repository backup (hosted or local) prior to executing the steps below, since they rewrite history.

<code># filter all branches and run the ~/fix-eol.sh shell script
git filter-branch --tree-filter '~/fix-eol.sh' -- --all
</code>




<code>#!/usr/bin/bash
# ~/fix-eol.sh
# convert all js,css,html, and txt files from DOS to UNIX line endings
find . -type f -regex ".*\.\(js\|css\|html\|txt\)" | xargs fromdos
</code>

Depending on the size of the repository, this command may take quite some time. The factors that determine the run time are:

the amount of commits - since each commit may contain files with bad line endings, each one is checked out and processed. The more commits, the slower the process
the amount of files that need to be converted - you can run the command over the complete repository, or only on special folders

To hasten the process considerably, you can use a in-memory file system (the filter-branch docs even recommend this).

<code># create a temporary folder on the /dev/shm memory fs
mkdir /dev/shm/repo-temp
# run filter-branch with an in-memory temporary directory
git filter-branch --tree-filter '~/fix-eol.sh' \
    -d /dev/shm/repo-temp -- --all</code>

That will process the repository significantly faster, depending on the size of the in-memory directory. The above factors will still be relevant - in my experience, filtering 30,000 commits and about 50 files takes about 1,5 hours on a 4Gb directory. Your results may vary.

Reflection

One nice thing about the second approach is that it can be run after the first, since the commit that normalizes the line endings will be obsolete after running the filter. You can normalize the line endings through a commit that preserves the upstream, and if the loss of history hinders your development, you can decide to filter history.