Featured image of post How to Slim Down the GitHub Repository for a Hugo Blog

How to Slim Down the GitHub Repository for a Hugo Blog

My Hugo blog has been hosted on GitHub for the past year, but with the upgrade of the Hugo version, a lot of invalid fragmented files were generated during multiple refactorings. Additionally, all the image files from my blog generated a large amount of junk files during the conversion to webp, causing the GitHub repository to grow excessively, nearing the 1GB warning line. Therefore, I decided to clean up the repository today.

Official GitHub Repository Size Limits

It is recommended to keep repositories small, ideally under 1 GB, and strongly recommended to keep them under 5 GB. Smaller repositories clone faster and are easier to use and maintain. If your repository excessively impacts our infrastructure, you may receive an email from GitHub Support asking you to take corrective action. We strive to be flexible, especially for large projects with many collaborators, and will work with you to find a solution whenever possible. You can effectively manage the size and overall health of your repository to prevent it from impacting our infrastructure.


Standard Slimming Methods

1. Clean Up Untracked Files

First, ensure that you have cleaned up all untracked files. You can use the following commands to view and clean up untracked files:

1
2
3
git clean -n  # View which files will be deleted
git clean -f  # Delete untracked files
git clean -fd # Delete untracked files and directories

2. Remove Unnecessary Large Files

If you have previously committed some large files to the repository, these files may occupy a significant amount of storage space. You can use git filter-repo to rewrite Git history and remove unnecessary files.

  1. Install git filter-repo:

    1
    
    pip install git-filter-repo # Requires Python to be installed first
    
  2. Remove large files:

    1
    
    git filter-repo --path <file-path> --invert-paths
    

    For example, to delete a file named largefile.zip:

    1
    
    git filter-repo --path largefile.zip --invert-paths
    

3. Compress the Git Repository

Git provides the git gc command to compress the repository and remove unnecessary objects.

1
git gc --prune=now --aggressive

4. Further Compress the Git Repository

Git provides git repack to pack objects in the repository, which can further optimize storage and performance.

1
git repack -a -d -f

5. Clean Up the Remote Repository

If you have already deleted unnecessary files and compressed the local repository, you may also need to clean up the remote repository.

  1. Force push the local repository to the remote repository:

    1
    
    git push --force
    
  2. Clean up remote repository references:

    1
    
    git remote prune origin
    

Summary

By cleaning up untracked files, removing unnecessary large files, compressing the Git repository, and cleaning up the remote repository, you can reduce the size of the Git repository to some extent.

After using the above methods, I found that the repository size reduction was still not significant, only dropping from 935MB to 880MB, which was far from the expected goal. Therefore, I had to take more drastic measures.


Non-Standard Cleanup Methods

For personal blogs, commit history is not particularly useful since it mainly consists of text and image information, and there’s rarely a need to revisit old records. Therefore, it’s sufficient to just clear the remote repository.

Delete and Recreate the Repository

Delete and recreate the repository on GitHub, then bind and upload the new repository.

This method is suitable for repositories that are not connected to other services. If the repository is connected to services like Vercel, Cloudflare, or other third-party services for deployment, it’s better not to use this method, as re-deploying can be troublesome. Instead, consider the reset method below.

Reset the Original Repository

  1. Back up the local repository, either using commands or by manually copying the necessary folders.

    1
    
     git clone /d/hugo/user /d/hugo/user # Your file path
    
  2. Create a new folder somewhere, add a random file, and set it up as a Git repository.

    1
    2
    
     cd /d/hugo/new # Your file path
     git init
    
  3. Add the remote repository.

    1
    
     git remote add origin https://github.com/user/user.github.io #
    
  4. Reset the repository.

    1
    2
    
     git add -A
     git commit -m "Initial commit"
    
  5. Force push to the original repository.

    1
    
     git push -f origin master
    

Back to a New Repository

Rebind and Upload to the Original Repository

Find the previously backed-up folder and repeat steps 2-5 above.

Summary

In Hugo, the public and resources folders are the most likely to generate junk files. I used to manage these by directly deleting them when debugging with Hugo commands, but these folders contain thousands of files, which messed up the Git history.

This time, I chose to create a .gitignore file in the blog’s root directory to exclude these two folders. The file content is as follows:

1
2
 public/
 resources/

Tips

If the resources folder is not uploaded, it may result in longer deployment times on GitHub Actions or Vercel, as image conversion needs to be done on the server. However, if there are few images, the impact is minimal.

All textual works on this website are protected by copyright, and the authors reserve all rights. The photos on this website, unless specifically stated, licensed under the CC BY-NC-ND 4.0 license.
Built with Hugo, Powered by Github.
Total Posts: 317, Total Words: 415716.
本站已加入BLOGS·CN