Packfiles

Alternative to loose objects.

Stores multiple objects per file under:

  • .git/objects/packs/patck-<SHA>.pack
  • .git/objects/packs/patck-<SHA>.idx

Unlike loose objects, packfiles can store diffs (deltas) between blob versions, which is specially important since one line changes on large blobs / trees are common.

Each packfile can be independently unpacked from other packfiles: it contains therefore all the deltas for each chain.

Optimizing packfiles is probably an NP complete problem. So Git uses some heuristics to do it: https://github.com/gitster/git/blob/master/Documentation/technical/pack-heuristics.txt

Normally packfiles only contain reachable objects.

The .idx file is just an index to speed up lookup: it can be generated at any time from a .pack file with index-pack.

TODO are packfiles also used to push?

Sources

man git-pack-objects

Packfile format

TODO

http://stefan.saasen.me/articles/git-clone-in-haskell-from-the-bottom-up/#pack_file_format

Delta format

This is the data that is stored in the delta entries of the packfile.

pack-objects

Low level pack creation.

Starting from the min-sane test repository, run:

printf '07cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
496d6428b9cf92981dc9495211e6e1120fb6f2ba
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391' \
| git pack-objects --stdout

This will output the generated .pack to stdout.

To also generate the .idx and save to a file, run:

git pack-objects a

This will generate the .idx .pack pair with names a-<SHA>.{idx,pack}.

You can confirm the files generated by this command are the same as git repack.

The SHA on the filenames is the SHA of TODO what? It is not the SHA of the content:

git hash-object a-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack

unpack-objects

Start from min-sane, run

git repack
git prune-packed
mv ./git/objects/pack/pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack .

Then:

git unpack-objects pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack

And tree outputs:

.git/objects
|-- 07
|   `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
|   `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
|   `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
`-- pack

The unpacking only happens for objects that are not already present in the repository.

unpack-file

Generate a file in the local directory with the contents of the given blob, and name of the form .merge_file_XXXXXX:

git unpack-file e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

Outputs the name of the file.

repack

prune-packed

Porcelain.

Pack all possible reachable objects or try to improve the packing efficiency.

Example: start from the min-sane test repository:

Then .git/objects looks like:

We have the three usual objects: commit, tree and blob:

.git/objects
|-- 07
|   `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
|   `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
|   `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
`-- pack

Now run:

git repack

Output:

Counting objects: 3, done.
Writing objects: 100% (3/3), done.
Total 3 (delta 0), reused 0 (delta 0)

Ha, this is what we see on git clone!

Now the objects look like:

.git/objects
|-- 07
|   `-- cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760
|-- 49
|   `-- 6d6428b9cf92981dc9495211e6e1120fb6f2ba
|-- e6
|   `-- 9de29bb2d1d6434b8b29ae775ad8c2e48c5391
|-- info
|   `-- packs
`-- pack
    |-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.idx
    `-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack

Notice how the loose objects were not removed, only packed.

To do that, we can use prune-packed:

git prune-packed

And now the tree looks like:

.git/objects
|-- info
|   `-- packs
`-- pack
    |-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.idx
    `-- pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack

since all objects had been packed.

git gc by default does both repack and prune-packed, so we could have used it instead.

count-objects

Porcelain.

Count unpacked objects and show their sizes.

Major application: decide how much a repack or gc might benefit you.

Sample output with -vH:

count: 6324
size: 25.58 MiB
in-pack: 108316
packs: 23
size-pack: 100.02 MiB
prune-packable: 518
garbage: 0
size-garbage: 0 bytes

TODO understand

verify-pack

Check that an idx / pack pair is not corrupted:

git verify-pack .git/objects/pack/pack-<SHA>.idx

Returns 0 if OK.

For lots of information on the pack in interactive usage, use -v.

Output for the min-sane test repository after git gc:

07cd7fe596afc90d9a2c9f7ae30b6b9e7a7b3760 commit 110 90 12
496d6428b9cf92981dc9495211e6e1120fb6f2ba tree   29 40 102
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 blob   0 9 142
non delta: 3 objects
pack-f847933433935e81b3fee26eaa6002fdf05ad6a5.pack: ok

The format is:

  • SHA
  • uncompressed payload size
  • compressed size. Can be larger for small files because of Zlib’s overhead.
  • offset into the packfile where the object is located

For a more complex repository, the output could look something like:

2431da676938450a4d72e260db3bf7b0f587bbc1 commit 223 155 12
deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree   136 136 1178
d982c7cb2c2a972ee391a85da481fc1f9127a01d tree   6 17 1314 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
3c4e9cd789d88d8d89c1073707c3585e41b0e614 tree   8 19 1331 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
b042a60ef7dff760008df33cee372b945b6e884e blob   22054 5799 1463
033b4468fa6b2a9547a70d88d1bbe8bf3f9ed0d5 blob   9 20 7262 1 b042a60ef7dff760008df33cee372b945b6e884e
1f7a7a472abf3dd9643fd615f6da379c4acb3e3a blob   10 19 7282
(many more lines like the above)
non delta: 15 objects
chain length = 1: 3 objects
chain length = 2: 1 object
.git/objects/pack/pack-978e03944f5c581011e6998cd0e9e30000905586.pack: ok

Note that some entries have 2 extra columns:

  • the depth of the object, i.e., how many deltas you have to resolve to get to it
  • the object to take the delta from

Those are deltified objects: their payload contains only a delta from another object.

On the first part, there are two kinds of line:

  • raw objects, of the form:

    deef2e1b793907545e50a2ea2ddb5ba6c58c4506 tree   136 136 1178
    

    TODO what are the three numbers at the end?

    • file size
    • TODO
    • TODO
  • delta versions of the form:

    d982c7cb2c2a972ee391a85da481fc1f9127a01d tree   6 17 1314 1 deef2e1b793907545e50a2ea2ddb5ba6c58c4506
    

    TODO what is the 1?

    The most recent versions of files are kept and deltas are done backwards

    Here, d982c is an older version of deef2. See how this is very small

    You can cat the version of the object as usual:

    git cat-file -p d982c7cb2c2a972ee391a85da481fc1f9127a01d
    

    but only a pack was stored.

pack-redundant

Start from the min-sane test repository packed.

TODO

show-index

Plumbing.

Get information about given index file. Subset of git verify-pack -v.

Example:

git show-index < .git/objects/pack/pack-<SHA>.idx

Sample output:

133 496d6428b9cf92981dc9495211e6e1120fb6f2ba (0f49d649)
12 860e5247c071721c8e286c73c3633509c77cf538 (198b73d3)
173 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 (6e760029)

TODO understand. 133, 12, … are probably offsets into the pack file, then the object SHA, then what?

index-pack

Plumbing.

Build idx file for a given .pack:

git index-pack .git/objects/pack/pack-<SHA>.pack

Generates the .idx on the same directory as the .pack.

Comments
comments powered by Disqus