HFS divides disk space into blocks of same size. 16 bit is used for identification of a block, i.e. there can be total of 65,536 such blocks and each block occupies even number of sectors. Apparently this system system leads to loss of large space on large volumes.
In its turn, HFS+ uses 32 bit to number blocks, allowing it to use the space more rationally.
To manage data placement on a disk, HFS+ stores on it service information – metadata. Among them, the most important for operation of file system and most interesting for us in our data recovery endeavour are the following:
- Volume header. Contains general information about volume. For example, data block size and information about location of other blocks of metadata on a disk.
- Allocation file. Bitmap in which status of each block on disk is marked. (1 — occupied, 0 — free.)
- Catalog file. It contains the majority of data about location of files and folders on a disk.
- Extents overflow file. Contains metadata that are not in catalog.
- Attributes file. Used to control access, etc.
- Journal file. Keeps data on transactions performed for this volume.
Catalog file, extents overflow file and attribute file are B-trees.
Volume header is always located on a fixed 1024 bytes from the volume start. It contains general information about volume, including location of other service structures of HFS+ on a disk. Thus, for example, journalInfoBlock — location of a journal, if this system has journal, allocationFile – block from which starts the map of location of files on a disk, catalogFile — location of catalog file.
Volume header HFS+
UInt32 fi leCount;
UInt32 fi nderInfo;
Record catalog file
UInt16 fl ags;
HFSCatalogNodeID fi leID;
FXInfo fi nderInfo;
UInt32 textEncoding;UInt32 reserved2;
Catalog file contains metadata of files and folders as separate records. Node ID (CNID) is a unique number of file system’s node. Most important information in record of catalog file is data about location a file that include eight records from the starting block and length in blocks of file fragment continuous portion (fork). If that is not enough other data about file fork are available in Extent overflow file.
Record of file’s fork
HFS+ journal is a continuous set of blocks of transactions that is never moved and its size remains unchanged. In other words, it represents a cyclic buffer of fixed size that contains HFS+ transaction records. One transaction can have one or several lists of blocks of operations. List consists of list header, followed by data.
Low-level access to file system and data recovery
There is much less information available in respect of data recovery from MacOS HFS and HFS+ than for other file systems, and it is much harder to recover data from these systems. Difficulties appear because HFS+ uses B-trees to store metadata about location of files. After a file was deleted, B-tree is immediately updated and information about location of deleted file is lost. With the release of Mac OS X 10.2 in August 2002 Apple improved its HFS+ by adding a journal that stores all changes to file system in transaction blocks.
Journaling may be allowed or prohibited by user during the work. Journaling in Mac OS X version 10.2 is prohibited by default. In Mac OS X 10.3 and later it is allowed by default. I.e. all modern machines with Mac OS X changes to file system are being journaled. However, the journal was added to HFS+ not for data recovery but to maintain the integrity of the file system in case of exceptional situations. Simple user action causes many changes in the file system.
Thus, when creating a file, for example, the following occurs:
- Node of new file is added to Catalog file.
- Volume’s bitmap is changed to correctly reflect information about occupied blocks.
- Also, records will be added to Extent overflow if the file is too fragmented.
- Attribute file is updated.
- Volume header is updated to note the fact of change in file system.
All these changes may lead to the damaged file system, because in the process of these actions power off or extraction of removable media can occur. Journaling helps solving this problem.
Transaction in journaled HFS+ includes the following steps:
1. Start transaction by copying all assumed changes in the journal file.
2. Record the journal from buffer to disk.
3. Record the fact of transaction in the journal header.
4. Carry out planned changes.
5. Mark transaction as completed.
When mounting HFS+ file system, the system checks the journal for incompleted transactions. If any, the file system will be corrected.
And finally, data recovery
How to use journal to recover damaged files? It is necessary to perform the following actions:
- Read the volume header
- Get access to Catalog File
- Find location of catalog file
- Read it into memory
- Find there records about deleted files
- If blocks that belong to deleted files have not yet been rewritten (check with Bitmap), read them and recover data.
It seems that all is clear here, but there is one problem – journal size is limited and periodically its contents overlap. Journal of boot volume of Mac-mini usually overlaps within 5-10 mins. Journal of boot volume of MacBook overlaps within 30 mins. If the volume has Time Machine, the journal overlaps every 20 seconds. Overall, rewriting is quite active. That is why before recovery it is better to mount volume only for reading:
mount -t hfs -r /dev/diskXXXX /Volumes/MyVolume
This way we will save the journal and prevent rewriting of blocks of deleted files that are marked in volume’s bitmap as free.
Lets see what can be done in console. Good understanding of HFS+ structure will help you to analyze the journal and volume map in console with a standard ‘dd’ utility that allows dumping disk sectors in file, although it is quite boring and tiresome.
Here is an example of reading one block according to the set address:
sudo dd if=/dev/disk1 of=./evidence bs=4096 skip=4355500 count=1
If there are any assumptions about contents of file and, all the more, if it is a text file, the following trick can be done:
sudo cat /dev/disk1 | strings -o | grep -i 'secret code' -C 5
If blocks of deleted files have not yet been overwritten, then this way will allow to recover the file in full.
Even if file blocks are overwritten, for example, using special utilities like shred, file data still can remain in virtual memory, i.e. in file swap. In Mac OS X swap file is stored in /var/vm
$ ls -al /var/vm
drwxr-xr-x 4 root wheel 136 Oct 14 10:50 .
drwxr-xr-x 24 root wheel 816 Oct 14 10:52 ..
drwx--x--x 18 root wheel 612 Oct 11 11:20 app_profi le
-rw------T 1 root wheel 67108864 Oct 14 10:50 swapfi le
And then, having analyzed the swap:
sudo strings -o /var/vm/swapfi le | grep 'secret code' -C 2
you can find file fragments that are still hanging in the buffer.
To access service structure of file system you can use files of raw devices in your code. As rule, EFI partition is located in /dev/rdisk0s1, and HFS+ partition in /dev/rdisk0s2. In addition, hfs/hfs_format.h already has descriptions of data structure of HFS+ that may be useful.
void dump(unsigned char * buf, size_t len)
for (size_t i = 0; i < len; ++i)
printf("%02X ", buf[i]);
int main(int argc, char *argv)
// Open device file
// devopen can also be used
int fd = open("/dev/rdisk0s2", O_RDONLY );
// Header description is in hfs_format.h
// Header is located at 1024
int rd = pread(fd, &volume_header, sizeof(volume_header), 0x400);
// Now we have all the necessary information
// about the volume
// Don’t foget to close the device
Software that interacts with raw-devices should be launched through sudo, because it can be done only by administrators and sudoers.
Beginning with Mac OS X Leopard the system includes Time Machine. This utility creates backups by writing all changes that take place in the system. Above-mentioned actions allow user to recover the entire system, several files or one separate file in the state it was at a certain moment in time.
A separate disk should be allocated for operation of Time Machine. Apple releases special device Apple Time Capsule, that is used as a network disk specially for backup copies of Time Machine. Time Machine can be used with any USB or eSata disks. Upon teh initial launch Time Machine creates folder on a specified backup disk that contains all data.
Thereafter Time Machine will copy only files that were changed. In other words, if Time Machine is used for disk, the recovery of lost files is not a problem.
Information rarely vanishes into thin air. Having good understanding of the operation of file system one can recover even what was thought irretrievably lost and find out many new things from the life of user.