My Hugo blog has been hosted on GitHub for the past year, but with the upgrade of the Hugo version, a lot of invalid fragmented files were generated during multiple refactorings. Additionally, all the image files from my blog generated a large amount of junk files during the conversion to webp, causing the GitHub repository to grow excessively, nearing the 1GB warning line. Therefore, I decided to clean up the repository today.
Official GitHub Repository Size Limits
It is recommended to keep repositories small, ideally under 1 GB, and strongly recommended to keep them under 5 GB. Smaller repositories clone faster and are easier to use and maintain. If your repository excessively impacts our infrastructure, you may receive an email from GitHub Support asking you to take corrective action. We strive to be flexible, especially for large projects with many collaborators, and will work with you to find a solution whenever possible. You can effectively manage the size and overall health of your repository to prevent it from impacting our infrastructure.
Standard Slimming Methods
1. Clean Up Untracked Files
First, ensure that you have cleaned up all untracked files. You can use the following commands to view and clean up untracked files:
|
|
2. Remove Unnecessary Large Files
If you have previously committed some large files to the repository, these files may occupy a significant amount of storage space. You can use git filter-repo
to rewrite Git history and remove unnecessary files.
Install
git filter-repo
:1
pip install git-filter-repo # Requires Python to be installed first
Remove large files:
1
git filter-repo --path <file-path> --invert-paths
For example, to delete a file named
largefile.zip
:1
git filter-repo --path largefile.zip --invert-paths
3. Compress the Git Repository
Git provides the git gc
command to compress the repository and remove unnecessary objects.
|
|
4. Further Compress the Git Repository
Git provides git repack
to pack objects in the repository, which can further optimize storage and performance.
|
|
5. Clean Up the Remote Repository
If you have already deleted unnecessary files and compressed the local repository, you may also need to clean up the remote repository.
Force push the local repository to the remote repository:
1
git push --force
Clean up remote repository references:
1
git remote prune origin
Summary
By cleaning up untracked files, removing unnecessary large files, compressing the Git repository, and cleaning up the remote repository, you can reduce the size of the Git repository to some extent.
After using the above methods, I found that the repository size reduction was still not significant, only dropping from 935MB to 880MB, which was far from the expected goal. Therefore, I had to take more drastic measures.
Non-Standard Cleanup Methods
For personal blogs, commit history is not particularly useful since it mainly consists of text and image information, and there’s rarely a need to revisit old records. Therefore, it’s sufficient to just clear the remote repository.
Delete and Recreate the Repository
Delete and recreate the repository on GitHub, then bind and upload the new repository.
This method is suitable for repositories that are not connected to other services. If the repository is connected to services like Vercel, Cloudflare, or other third-party services for deployment, it’s better not to use this method, as re-deploying can be troublesome. Instead, consider the reset method below.
Reset the Original Repository
Back up the local repository, either using commands or by manually copying the necessary folders.
1
git clone /d/hugo/user /d/hugo/user # Your file path
Create a new folder somewhere, add a random file, and set it up as a Git repository.
1 2
cd /d/hugo/new # Your file path git init
Add the remote repository.
1
git remote add origin https://github.com/user/user.github.io #
Reset the repository.
1 2
git add -A git commit -m "Initial commit"
Force push to the original repository.
1
git push -f origin master
Rebind and Upload to the Original Repository
Find the previously backed-up folder and repeat steps 2-5 above.
Summary
In Hugo, the public
and resources
folders are the most likely to generate junk files. I used to manage these by directly deleting them when debugging with Hugo commands, but these folders contain thousands of files, which messed up the Git history.
This time, I chose to create a .gitignore
file in the blog’s root directory to exclude these two folders. The file content is as follows:
|
|
Tips
If the
resources
folder is not uploaded, it may result in longer deployment times on GitHub Actions or Vercel, as image conversion needs to be done on the server. However, if there are few images, the impact is minimal.