- Detection. In detection mode the algorithm scans the disk in search of familiar to the program file signatures. For example, photos in popular JPEG format always have JFIF signature, according to which one can determine the fact of presence of file on disk.
- Identification. Detected signature is not all by far. Some signatures are so short that in the process of scanning different false detections may appear. Some signatures appear between different file types and some appear in the same file for several times. Additional checks are held to precisely determine the type of file found – for example, cross-checking of data taken from the file’s header and actually read out information.
- Analysis. To determine the precise size of file in bytes an inquiry and analysis of its header is being held. The result – precise number of bytes indicating the size of file.
- Determining the location of file on disk. Data received from file’s header helps determining its precise size and presence of file signature identifies the beginning of file. Based on the results the program detects sectors on the disk that supposedly are occupied by the file. It should be noted that these detections are based on some assumptions that are not always true.
In particular, it is assumed that the entire file is kept as a single continuous fragment that not always true due to disk fragmentation. In addition, some sectors may belong to other files – it is easy to verify this fact if the file system is present, but if file system is damaged or missing – you are only left with assumptions that all data belong to the file recovered at the moment.
Content analysis limitations
Unfortunately content analysis is not a panacea, it is rather a last hope tool. In case of serious damages caused to file system this is the only way of recovering at least part of files.
With content analysis one may recover not all data but only those that are in the data base of a particular program. For example, usually data recovery programs contain information about over 250 file formats, including most popular like DOC/DOCX, XLS/XLSX, JPEG/JPG, RAW and many other.
It should be noted that some file types cannot be recovered with content analysis in principal. In particular, encrypted files are specially created without repeating signatures. Many log files, binary formats, some data bases also do not have signatures that makes it impossible to detect them on the disk.
Another limitation is disk fragmentation. As it was shown above, content analysis can recovery only those files that are saved as a single continuous fragment. Fragmented file can be fully recovered only if a record about it in the file system is not damaged.