Some github projects have unreasonably large .git directories. Examples: conan/docs and Arduino.
conan/docs
seem to have solved this.
Arduino
The total size of the arduino repository including the .git directory is approximately 1.4Gbyte:
$ du -hs gitprj/Arduino
1.4G gitprj/Arduino
While the actually checked-out data size is only about 65M
$ du -hcs gitprj/Arduino/*
12K CONTRIBUTING.md
4.0K ISSUE_TEMPLATE.md
4.0K PULL_REQUEST_TEMPLATE.md
4.0K README.md
18M app
25M arduino-core
22M build
16K hardware
4.0K lib_sync
0B libraries
40K license.txt
65M total
The problem is that several times a wrong commit containing lots of accidental binaries was pushed, and correctly only by pushing another commit deleting the erronous files. Instead of rewriting history for once, and keeping the repository clean.
examples:
Accidental, and not cleaning up properly afterwards:
- 448222e4b6 adding about 192M of .class and object files
- 920212ee05 deleting about 192M of the same.
This does delete it from your checked out working directory, but not from the git repository.
Another bad habit of the ‘early’ days ( well, until 2014 ): keeping the tool binaries inside the repository.
- starting with 21fe7f0a83
- until finaly in 2013, in fabbe45c81 tools started to be removed from the main repo.
So now the arduino repository checks out at almost 95% of useless data.
conan
The conan docs Repository used to be really large. This was fixed somewhere early 2020.
The problem there was an accumulation of gh-pages commits. gh-pages
is an exception to the rule: don’t rewrite history.
When i checked in 2019, the total repository size was 1.7Gbyte, for only 9.5M of real data.