Print this page
Tuesday, 19 May 2015 00:00

Digging for Apple Data: Learning how to recover data in Mac OS X

Rate this item
(0 votes)

It's hard to believe, but the so-called "novice" users sometimes delete important data and then feel bad about it. Then code heads have to face the music, so now we'll see how Mac OS X stores data on the media, and how that data can be recovered.

Intro

Apple has developed its own file system called HFS (Hierarchical File System) for using in the Mac OS, . Currently, it was replaced by HFS +, which is used in Mac OS from version 8.1 on. The volume structure using HFS + is similar to HFS, but at the same time they slightly differ.

The main characteristics of the two systems:

The length of a file name:

  • HFS:31
  • HFS+: 255.

File name encoding:

  • HFS: Mac Roman
  • HFS+: Unicode

Catalog node

  • HFS: 512 Kb
  • HFS+: 4 Kb

Maximum file size

  • HFS: 2^31
  • HFS+: 2^63

As you can see, HFS + was created to make the best use of disk space on large volumes and fragmented files.

HFS+ from the inside

HFS divides the disk space into equal-sized blocks. The 16-bit fields are used to identify a block, therefore, there can be 65,536 of such blocks in total, and each block occupies an integral number of sectors. Obviously, such a system results in a loss of much space in large volumes.

In turn, HFS+ uses 32-bit values for block numbering, which allows using the space more efficiently.

To manage the data arrangement on a drive, HFS+ stores control footing on it, too, that is metadata. The following are the the most critical for the file system and the most interesting to us in the search for the missing data

  • Volume header. Contains general information about the volume. For example, the size of the data fork and information on the location of other metadata blocks on the disk.
  • Allocation file. Bitmap, where the status of each block on the disk is marked. (1 - in use, 0 - free.)
  • Catalog file. It stores most of the data on the arrangement of files and folders on the disk.
  • Extents overflow file. Contains additional metadata that did not fit in the catalog file.
  • Attributes file. Used for access check, etc.
  • Journal file. Stores information about transactions committed for this volume.
  • Catalog file, extents overflow file and attribute file are organized as B-trees.

The volume header is always stored at fixed offset 1024 bytes from the start of the volume. It contains information about the volume as a whole, including the location of other HFS+ key structures on the disk.. For example, journalInfoBlock - journal allocation, if the system has a journal, allocationFile - a block from which a map of file allocation on the disk starts, catalogFile - catalog file allocation.

HFS+ volume header

struct HFSPlusVolumeHeader
{
    UInt16 signature;
    UInt16 version;
    UInt32 attributes;
    UInt32 lastMountedVersion;
    UInt32 journalInfoBlock;
    UInt32 createDate;
    UInt32 modifyDate;
    UInt32 backupDate;
    UInt32 checkedDate;
    UInt32 fi leCount;
    UInt32 folderCount;
    UInt32 blockSize;
    UInt32 totalBlocks;
    UInt32 freeBlocks;
    UInt32 nextAllocation;
    UInt32 rsrcClumpSize;
    UInt32 dataClumpSize;
    HFSCatalogNodeID nextCatalogID;
    UInt32 writeCount;
    UInt64 encodingsBitmap;
    UInt32 fi nderInfo[8];
    HFSPlusForkData allocationFile;
    HFSPlusForkData extentsFile;
    HFSPlusForkData catalogFile;
    HFSPlusForkData attributesFile;
    HFSPlusForkData startupFile;
};

Catalog file record

struct HFSPlusCatalogFile
{
    SInt16 recordType;
    UInt16 fl ags;
    UInt32 reserved1;
    HFSCatalogNodeID fi leID;
    UInt32 createDate;
    UInt32 contentModDate;
    UInt32 attributeModDate;
    UInt32 accessDate;
    UInt32 backupDate;
    HFSPlusPermissions permissions;
    FInfo userInfo;
    FXInfo fi nderInfo;
    UInt32 textEncoding;UInt32 reserved2;
    HFSPlusForkData dataFork;
    HFSPlusForkData resourceFork;
};

Catalog file contains metadata about the files and folders as separate records. Node ID (CNID) is a unique number assigned to each node of the file system. The most important information in the catalog file record is the file allocation data, which includes eight extents from the startup block and the block length of continuous portion of the file fork. If that's not enough, the rest data on file forks is in the Extent overflow file.

File fork record

struct HFSPlusForkData
{
    UInt64 logicalSize;
    UInt32 clumpSize;
    UInt32 totalBlocks;
    HFSPlusExtentRecord extents;
};

The journal in HFS+ is a continuous set of transaction blocks that never moves and whose size does not change. In other words, it is a circular buffer of fixed size, which contains records of HFS+ transactions. A transaction may include a single or multiple block lists. The list consists of the block list header, followed by the data. Let's knock the boring meeting with the HFS+ contents off here, because there is nothing secret about it - it is an open format, and you can find a more detailed description on the Apple official website.

Low-level access to the file system and data recovery

There are much less manuals written on data recovery in Mac OS HFS and HFS+ written than in other systems, and it is more difficult.to perform such recovery Difficulties arise from the fact that HFS+ uses B-trees to store metadata about file allocation. Once a file is deleted, B-tree is immediately updated, and the data about the deleted file is lost. Having released Mac OS X 10.2 in August 2002, Apple has improved HFS+ by adding the journal that would keep all changes in the file system in transaction blocks.

Keeping a journal can be enabled or disabled by a user in the course of work. Keeping a journal In Mac OS X version 10.2 is disabled by default. In Version Mac OS X 10.3 and later, it is enabled by default. That is, changes in the file system are logged in the journal in all modern devices with Mac OS X. However, the journal has been added to HFS+ not to recover lost data, but to maintain the file system integrity in the event of exceptions. A simple action of a user generates a lot of changes to the file system.

So, when you create a file, for example, the followings occur:

  • A new file's node is added in the Catalog file.
  • The volume bitmap changes to display information about used blocks correctly.
  • Also, the records is added to the Extent overflow, if a file is fragmented heavily.
  • The Attributes file is updated.
  • The volume header is updated to enter an event of changes in the file system.

All these changes may lead to corruption of the file system, because power-off or removing of removable media may occur in the process of these actions. Keeping a journal just helps to solve this problem.

A transaction in the journaled HFS+ includes the following steps:

  1. Starting a transaction by copying all the proposed changes to the journal file.
  2. Recording the journal from buffer to the disk.
  3. Recording transaction event to the journal header.
  4. Conducting the planned changes.
  5. Marking the transaction as completed.

When mounting the HFS+ file system, the system checks the journal concerning uncompleted transactions. If there are any, the file system will be fixed.

Finally, we are recovering data!

How do we use the journal to recover damaged files? We must follow the following steps:

  1. Read the volume header
  2. Get access to the Catalog File
  3. Find the journal file
  4. Map it
  5. Find there records on deleted files
  6. If the blocks associated with deleted files are not overwritten (check by Bitmap), map them and recover data.

It seems to be clear, but there is one problem - the journal size is limited and its content is periodically overwritten. The journal of the Mac-mini boot volume is usually overwritten in 5-10 minutes. Journal of the MacBook boot volume is overwritten in 30 minutes. If there is Time Machine working on the volume, the journal is overwritten every 20 seconds. Generally speaking, the overwrite is quite operational. Therefore, we'd better set the volume as read-only before recovery:

mkdir /Volumes/MyVolume
mount -t hfs -r /dev/diskXXXX /Volumes/MyVolume

Doing so, we will preserve the journal and also prevent overwriting of deleted file blocks that are marked as free in the volume bitmap.

Many software products operate on the same principle as HFS+ journal does; you can easily find them on the Internet so there's no sense in writing your own one for this purpose.

Let's see what can be manually done in the console. Having a good understanding of the HFS+ structure, you can analyze the journal and the volume map in the console, using a standard "dd" utility, which allows dumping disk parts to the file, While this is certainly very tiring :).

Here is an example of reading a block at a specific address:

sudo dd if=/dev/disk1 of=./evidence bs=4096 skip=4355500 count=1

If you have any suggestions about the contents of the file and, even more so if it's a text file, you can perform this:

sudo cat /dev/disk1 | strings -o | grep -i 'secret code' -C 5

If the blocks of the deleted file have not been overwritten yet , then this method will completely recover the file.

Even if the file blocks have been overwritten, for example, using special tools similar to shred, file data may still remain in the virtual memory - that is, in the page file. The page file is found in / var / vm In Mac OS X.

$ ls -al /var/vm
total 131072
drwxr-xr-x 4  root wheel 136      Oct 14 10:50 .
drwxr-xr-x 24 root wheel 816      Oct 14 10:52 ..
drwx--x--x 18 root wheel 612      Oct 11 11:20 app_profi le
-rw------T 1  root wheel 67108864 Oct 14 10:50 swapfi le

Then, having analyzed the swap:

sudo strings -o /var/vm/swapfi le | grep 'secret code' -C 2

you can find the file fragments that are still contained in the buffer.

You can use raw-device files in your code to get access to overhead structures of the file system. As a rule, EFI partition is found in /dev/rdisk0s1 and HFS+ partition is in /dev/rdisk0s2. In addition, there are ready-to-use descriptions of HFS+ data structures in hfs/hfs_format.h, which you might find useful.

#import <hfs/hfs_format.h>
#import <util.h>
void dump(unsigned char * buf, size_t len)
{
    for (size_t i = 0; i < len; ++i)
        printf("%02X ", buf[i]);
}
int main(int argc, char *argv[])
{
    int fd = open("/dev/rdisk0s2", O_RDONLY );
    HFSPlusVolumeHeader volume_header;
    int rd = pread(fd, &volume_header, sizeof(volume_header), 0x400);
    printf("%u\n", volume_header.blockSize);
    dump((char*)&volume_header, sizeof(volume_header));
    close(fd);
}

You must run utilities, which interact with the raw-devices, using sudo, since only administrators and sudoers can do it..


Time Machine

From Mac OS X Leopard on, the system includes Time Machine. This utility backs up files, recording all changes in the file system. These actions allow the user to recover the whole system, some files or a single file in the form it was at a particular time. Time Machine requires a separate drive for work. Apple released a special device named Apple Time Capsule, which is used as a network drive specifically for Time Machine backup. Also, Time Machine can be used with any USB or eSata-drive. When first launched, Time Machine creates a folder containing all the data on the specified backup disk. Then Time Machine will copy only changed files. In general, if you use Time Machine for the disk, the recovery of lost data comes easy.

Outro

Information very rarely disappears without a trace. Knowing well how the file system operates, you can recover even what was considered irretrievably lost, and in fact, learn a great deal of interesting things in the user's private life.

Need to find more information about Apple Mac Data Recovery in Newark, NJ <- click here

Last modified on Tuesday, 19 May 2015 22:01
Data Recovery Expert

Viktor S., Ph.D. (Electrical/Computer Engineering), was hired by DataRecoup, the international data recovery corporation, in 2012. Promoted to Engineering Senior Manager in 2010 and then to his current position, as C.I.O. of DataRecoup, in 2014. Responsible for the management of critical, high-priority RAID data recovery cases and the application of his expert, comprehensive knowledge in database data retrieval. He is also responsible for planning and implementing SEO/SEM and other internet-based marketing strategies. Currently, Viktor S., Ph.D., is focusing on the further development and expansion of DataRecoup’s major internet marketing campaign for their already successful proprietary software application “Data Recovery for Windows” (an application which he developed).