Recovery of RAID 6 with VMFS file system

A customer came to me with failed RAID array of 6th level assembled on four SATA disks. Only three of them were available. The situation is quite common: several months ago (!) one HDD in the array failed and it was sent for replacement under warranty. So for all remaining time RAID operated in emergency mode with three drives. Then another hard disk failed (see below) and everything fell. At first they went to a rather big company which advertises data recovery from RAID but terrified with offered price they brought the array to me.

The first rule concerning out-of-order RAID array is to create copies of every disk sector one by one to image file or to operating HDD of not lesser dimension. Some are lazy to do this that is why they get the situation when half-live drives in process to array research become completely dead.

As a lyrical digression I can give a piece of advice to my companions in arms. Before getting down to work you should learn all the information concerning the task from the customer. But anyway it is necessary to keep in mind that you should treat the stories of the customer as “supposedly” this is RAID 6 (in my case) and “presumably” it had 4 drives. The situations, when they say one thing but in reality there is quite another, are common. That is why you’d be all the better for taking this information into account but not to set all hopes upon it like upon the gospel truth. You should also follow the results of your diagnosis.  Moreover, the fact of storages (no matter of what type) being somewhere and its’ treatment by someone bring some additional factors. For example, you may have a great headache after actions of widely known in narrow circles Sergey Golovnyak (aka Sergol): disks may be substituted by different ones or record 00h may be made all over the user area and some other “joys” may wait for you.

Next step is to identify sequence of disks, blocks dimensions, rotation type and quantity of disks in the array; if one of the disks (or more in general case) is absent you should determine its position as well. There are some utilities for automatic analysis of all these items but they are useful only in the simplest cases like RAID 0. If we deal with RAID 6 or RAID 5 of nonstandard configuration (rotation shift and so on), you can rely on your own eyes and head, which is attached to eyes.

It is good, if file system NTFS was used  in virtual machines; it is possible then to find MFT records and find out according to them parameters of the RAID array. In this case we can pay no attention to probable fragmentation of virtual machines, as frequently about 1-2 thousand of uninterrupted partitions is enough for analysis. Worse, if we deal with file system like Ext3 or HFS. In such cases we have to look for understandable fragments of data like software or system logs, TXT, HTML files or similar ones.

It is necessary to check correctness of XOR test of all the disks in the array. Within the chosen sector we make it according to formula (0 sect 0 drive) XOR (0 sect 1 drive) = 0 sect 3 drive. Software exists which allows to automatize the process and provide the data of passed XOR in graphical form. In case of serious divergence in XOR correction, it is sensible to assume absence of more than one disk(s) in given array, incorrectness of RAID type chosen for analysis or unsuccessful rebuild of the array.

If you work under data recovery from RAID 6, you should bear in mind that except XOR blocks the array also has Reed-Solomon error correction, thanks to it array gains higher fault-tolerance. Detailed description of RAID types you can find in separate article.

It is rather easy to define one from another according to their appearance. Screenshots below present the differences in the structure graphically.

Normal MFT Record

Normal MFT Record

XOR Block or Parity Block

Xor block or Parity Block

Reed-Solomon Error Correction Block

Reed-Solomon Error Correction Block

Shown above screenshots open content of RAID 6 array three sectors located on the same shift. We visually determine parity blocks and error correction codes location; and then it is possible to identify location and rotation of data blocks by means of calculation or analysis. It should be taken into account that in the situation which is the basis for this article RAID 6 array has only thee disks of four, one disk is absent physically. Thus, it necessary to extrapolate missed RAID 6 disk blocks when determining the rotation.

Concerning determination of block dimension it is simple. It is necessary to trace visually the beginning of data array, for example parity blocks for large logs file if we see “garbage” instead of easy-to-read in HEX-editor content; we should move through sectors one by one till “garbage” block will not end and data block will begin. Quantity of sectors from beginning to the end is a dimension of the block. Remember, programs like R-Studio take block dimension in kilobytes while we found out number of sectors. In case RAID array assemblage takes place in similar software, it is necessary to adjust block dimension correspondingly.  

After identification of disks sequence it is possible to start array assemblage. There are quite many of programs for that. This is RAID Reconstructor, above mentioned R-Studio, UFS Explorer and so on.

We should take into account that in case of RAID 6 data recovery, as it was in this very situation, we may face so-called problem of irrelevant disk. What does it mean? My explanation: initially array took X quantity of drives (I had drives No. 0-1-2-3). Then it began working with X – 1, as a result disk No. 0 became not relevant. In case the array was operating for some time in such configuration and then all four drives instead of three ones will be assembled, the data you recover will be partly damaged. Previous data which had been written before one drive in the array failed would open without problems; but new files would contain blocks of foreign data not connected with necessary file.

It can be that RAID 6 will have two irrelevant drives in the array. If RAID has operated in configuration X for a year, half a year in configuration X – 1 and then in configuration X – 2 for a while, you are able to get valid data in case of correct missing storages determination.

After VMFS file system RAID 6 assemblage it is possible to open resulting image in Windows environment using free Java driver fvmfs.jar and connect to tree directory through web browser with help of WebDAV function. We can check virtual machines operability using VMWare Workstation, then load and through local network VMWare – PC we can copy necessary information on authorized storage connected to the system.