Digging out Apple data: Mac OS X data recovery

Thursday, 21 May 2015 00:00

font size decrease font size increase font size

Digging out Apple data: Mac OS X data recovery

Rate this item

(0 votes)

It is hard to believe but so-called ‘not advanced users’ sometimes delete important data and then noisily regret it. And then programmers have to deal with their problems. That is why now we are going to find out how Mac OS X stores data on drives and how these data can be recovered.

Intro

Apple has developed its own file system for Mac – HFS (Hierarchical File System). Recently it was replaced with HFS+, that is used in Mac OS starting with version 8.1. The structure of a volume that uses HFS+ is close to HFS, though at the same time it contains some differences.

Main features of these two systems:

Length of file name:

• HFS:31
• HFS+: 255

Filename encoding

• HFS: Mac Roman
• HFS+: Unicode

Catalog node

• HFS: 512 bytes
• HFS+: 4 Kbytes

Maximum size of a file

• HFS: 2^31
• HFS+: 2^63

As you can see, HFS+ was created to more effectively use disk space for large size volumes and fragmented files.

HFS+ insides

HFS divides disk space into blocks of same size. 16 bit is used for identification of a block, i.e. there can be total of 65,536 such blocks and each block occupies even number of sectors. Apparently this system system leads to loss of large space on large volumes.

In its turn, HFS+ uses 32 bit to number blocks, allowing it to use the space more rationally.

To manage data placement on a disk, HFS+ stores on it service information – metadata. Among them, the most important for operation of file system and most interesting for us in our data recovery endeavour are the following:

Volume header. Contains general information about volume. For example, data block size and information about location of other blocks of metadata on a disk.
Allocation file. Bitmap in which status of each block on disk is marked. (1 — occupied, 0 — free.)
Catalog file. It contains the majority of data about location of files and folders on a disk.
Extents overflow file. Contains metadata that are not in catalog.
Attributes file. Used to control access, etc.
Journal file. Keeps data on transactions performed for this volume.

Catalog file, extents overflow file and attribute file are B-trees.

Volume header is always located on a fixed 1024 bytes from the volume start. It contains general information about volume, including location of other service structures of HFS+ on a disk. Thus, for example, journalInfoBlock — location of a journal, if this system has journal, allocationFile – block from which starts the map of location of files on a disk, catalogFile — location of catalog file.

Volume header HFS+

struct HFSPlusVolumeHeader

{
UInt16 signature;
UInt16 version;
UInt32 attributes;
UInt32 lastMountedVersion;
UInt32 journalInfoBlock;
UInt32 createDate;
UInt32 modifyDate;
UInt32 backupDate;
UInt32 checkedDate;
UInt32 fi leCount;
UInt32 folderCount;
UInt32 blockSize;
UInt32 totalBlocks;
UInt32 freeBlocks;
UInt32 nextAllocation;
UInt32 rsrcClumpSize;
UInt32 dataClumpSize;
HFSCatalogNodeID nextCatalogID;
UInt32 writeCount;
UInt64 encodingsBitmap;
UInt32 fi nderInfo[8];
HFSPlusForkData allocationFile;
HFSPlusForkData extentsFile;
HFSPlusForkData catalogFile;
HFSPlusForkData attributesFile;
HFSPlusForkData startupFile;
};
Record catalog file
struct HFSPlusCatalogFile
{
SInt16 recordType;
UInt16 fl ags;
UInt32 reserved1;
HFSCatalogNodeID fi leID;
UInt32 createDate;
UInt32 contentModDate;
UInt32 attributeModDate;
UInt32 accessDate;
UInt32 backupDate;
HFSPlusPermissions permissions;
FInfo userInfo;
FXInfo fi nderInfo;
UInt32 textEncoding;UInt32 reserved2;
HFSPlusForkData dataFork;
HFSPlusForkData resourceFork;
};
Catalog file contains metadata of files and folders as separate records. Node ID (CNID) is a unique number of file system’s node. Most important information in record of catalog file is data about location a file that include eight records from the starting block and length in blocks of file fragment continuous portion (fork). If that is not enough other data about file fork are available in Extent overflow file.

Record of file’s fork

struct HFSPlusForkData
{
UInt64 logicalSize;
UInt32 clumpSize;
UInt32 totalBlocks;
HFSPlusExtentRecord extents;
};
HFS+ journal is a continuous set of blocks of transactions that is never moved and its size remains unchanged. In other words, it represents a cyclic buffer of fixed size that contains HFS+ transaction records. One transaction can have one or several lists of blocks of operations. List consists of list header, followed by data.

Low-level access to file system and data recovery

There is much less information available in respect of data recovery from MacOS HFS and HFS+ than for other file systems, and it is much harder to recover data from these systems. Difficulties appear because HFS+ uses B-trees to store metadata about location of files. After a file was deleted, B-tree is immediately updated and information about location of deleted file is lost. With the release of Mac OS X 10.2 in August 2002 Apple improved its HFS+ by adding a journal that stores all changes to file system in transaction blocks.

Journaling may be allowed or prohibited by user during the work. Journaling in Mac OS X version 10.2 is prohibited by default. In Mac OS X 10.3 and later it is allowed by default. I.e. all modern machines with Mac OS X changes to file system are being journaled. However, the journal was added to HFS+ not for data recovery but to maintain the integrity of the file system in case of exceptional situations. Simple user action causes many changes in the file system.

Thus, when creating a file, for example, the following occurs:

Node of new file is added to Catalog file.
Volume’s bitmap is changed to correctly reflect information about occupied blocks.
Also, records will be added to Extent overflow if the file is too fragmented.
Attribute file is updated.
Volume header is updated to note the fact of change in file system.

All these changes may lead to the damaged file system, because in the process of these actions power off or extraction of removable media can occur. Journaling helps solving this problem.

Transaction in journaled HFS+ includes the following steps:

1. Start transaction by copying all assumed changes in the journal file.
2. Record the journal from buffer to disk.
3. Record the fact of transaction in the journal header.
4. Carry out planned changes.
5. Mark transaction as completed.

When mounting HFS+ file system, the system checks the journal for incompleted transactions. If any, the file system will be corrected.

And finally, data recovery

How to use journal to recover damaged files? It is necessary to perform the following actions:

Read the volume header
Get access to Catalog File
Find location of catalog file
Read it into memory
Find there records about deleted files
If blocks that belong to deleted files have not yet been rewritten (check with Bitmap), read them and recover data.

It seems that all is clear here, but there is one problem – journal size is limited and periodically its contents overlap. Journal of boot volume of Mac-mini usually overlaps within 5-10 mins. Journal of boot volume of MacBook overlaps within 30 mins. If the volume has Time Machine, the journal overlaps every 20 seconds. Overall, rewriting is quite active. That is why before recovery it is better to mount volume only for reading:

mkdir /Volumes/MyVolume

mount -t hfs -r /dev/diskXXXX /Volumes/MyVolume

This way we will save the journal and prevent rewriting of blocks of deleted files that are marked in volume’s bitmap as free.
Lets see what can be done in console. Good understanding of HFS+ structure will help you to analyze the journal and volume map in console with a standard ‘dd’ utility that allows dumping disk sectors in file, although it is quite boring and tiresome.
Here is an example of reading one block according to the set address:

sudo dd if=/dev/disk1 of=./evidence bs=4096 skip=4355500 count=1

If there are any assumptions about contents of file and, all the more, if it is a text file, the following trick can be done:

sudo cat /dev/disk1 | strings -o | grep -i 'secret code' -C 5

If blocks of deleted files have not yet been overwritten, then this way will allow to recover the file in full.

Even if file blocks are overwritten, for example, using special utilities like shred, file data still can remain in virtual memory, i.e. in file swap. In Mac OS X swap file is stored in /var/vm

$ ls -al /var/vm
total 131072
drwxr-xr-x 4 root wheel 136 Oct 14 10:50 .
drwxr-xr-x 24 root wheel 816 Oct 14 10:52 ..
drwx--x--x 18 root wheel 612 Oct 11 11:20 app_profi le
-rw------T 1 root wheel 67108864 Oct 14 10:50 swapfi le
And then, having analyzed the swap:
sudo strings -o /var/vm/swapfi le | grep 'secret code' -C 2
you can find file fragments that are still hanging in the buffer.

To access service structure of file system you can use files of raw devices in your code. As rule, EFI partition is located in /dev/rdisk0s1, and HFS+ partition in /dev/rdisk0s2. In addition, hfs/hfs_format.h already has descriptions of data structure of HFS+ that may be useful.

#import <hfs/hfs_format.h>
#import <util.h>
void dump(unsigned char * buf, size_t len)
{
for (size_t i = 0; i < len; ++i)
printf("%02X ", buf[i]);
}
int main(int argc, char *argv[])
{
// Open device file
// devopen can also be used
int fd = open("/dev/rdisk0s2", O_RDONLY );
// Header description is in hfs_format.h
HFSPlusVolumeHeader volume_header;
// Header is located at 1024
int rd = pread(fd, &volume_header, sizeof(volume_header), 0x400);
// Now we have all the necessary information
// about the volume
printf("%u\n", volume_header.blockSize);
dump((char*)&volume_header, sizeof(volume_header));
// Don’t foget to close the device
close(fd);
}
Software that interacts with raw-devices should be launched through sudo, because it can be done only by administrators and sudoers.

Time Machine

Beginning with Mac OS X Leopard the system includes Time Machine. This utility creates backups by writing all changes that take place in the system. Above-mentioned actions allow user to recover the entire system, several files or one separate file in the state it was at a certain moment in time.

A separate disk should be allocated for operation of Time Machine. Apple releases special device Apple Time Capsule, that is used as a network disk specially for backup copies of Time Machine. Time Machine can be used with any USB or eSata disks. Upon teh initial launch Time Machine creates folder on a specified backup disk that contains all data.

Thereafter Time Machine will copy only files that were changed. In other words, if Time Machine is used for disk, the recovery of lost files is not a problem.

Outro

Information rarely vanishes into thin air. Having good understanding of the operation of file system one can recover even what was thought irretrievably lost and find out many new things from the life of user.

Last modified on Thursday, 21 May 2015 13:39

Data Recovery Expert

Viktor S., Ph.D. (Electrical/Computer Engineering), was hired by DataRecoup, the international data recovery corporation, in 2012. Promoted to Engineering Senior Manager in 2010 and then to his current position, as C.I.O. of DataRecoup, in 2014. Responsible for the management of critical, high-priority RAID data recovery cases and the application of his expert, comprehensive knowledge in database data retrieval. He is also responsible for planning and implementing SEO/SEM and other internet-based marketing strategies. Currently, Viktor S., Ph.D., is focusing on the further development and expansion of DataRecoup’s major internet marketing campaign for their already successful proprietary software application “Data Recovery for Windows” (an application which he developed).