A hard disk and several RAID

Recommended for you: Get network issues from WhatsUp Gold. Not end users.

 

1 The basic working principle of hard disk

1.1 The hard component structure diagram


1.2 Explain the main parameter terms

Head: in the process of exchanging data between hard disk, the read operation is faster than the write operation, hard disk manufacturers to develop a read / write separate head.

Speed (Rotationl Speed): is the rotational speed of spindle drive motor, which is the largest number of the disk can be finished in one minute. Hard disk rotational speed is quicker, hard to find files faster and faster, the transmission speed is relatively hard drive is improved. Current common market hard disk speed generally 5400rpm,7200rpm,10000rpm,15000rpm. In theory, the speed is the sooner the better. Because of the high speed hard disk can shorten the average seek time and the actual writing of time. But the speed faster heat more, is not conducive to heat. Now the mainstream hard disk rotational speed is generally more than 7200rpm. As for the SCSI hard disk spindle speed is generally 7200-10000RPM, the maximum speed of SCSI hard disk rotational speed of up to 15000RPM.

A single platter capacity:   is one of the important parameters of the hard disk, a certain extent determines the grade level of the hard disk. The hard disk is composed of a plurality of memory disc assembly, the maximum amount of data and a single platter capacity is a storage disc can store.

Disc number:   disc is medium bearing data stored in hard disk, hard disk is composed of multiple disc stack together, mutually separated by washer. A hard disk number, thickness, heating is also more.

Random seek time (unit: ms):   the different rotating speed, the performance difference is directly reflected in the random read / write access time for this performance. Numerical random access performance of this parameter is as low as possible, a performance but also in daily hard disk application in speed the most direct experience.

The average seek time (Average seek time): refers to the disk moving the head to the designated track seeking the corresponding target data with time in the disk, it describes the hard disk data reading ability, in milliseconds. When a single disk capacity increases, the head seek action and the moving distance is reduced, so that the average seek time is reduced, speed up the hard disk speed.

Data cache: refers to the high-speed memory on the hard disk of the computer inside, like a buffer as some data temporarily stored for reading and re reading. Cache early hard disk is 512KB-2MB, the current mainstream SATA hard disk data cache for 32MB.

Road to Tao time (single track seek): refers to the head from one track to another track of time, in milliseconds(ms).

Full access time (max full seek): refers to the head start moving until finally find the needed data block all the time, in milliseconds(ms).

Continuous time to failure (MTBF): refers to the disk from the beginning of operation to the long time of failure. Usually the hard disk of MTBF of at least 30000 or 40000 hours.

1.3 The types and the advantages and disadvantages of hard disk

According to the hard development time sequence are:

1.3.1 IDE hard disk


IDE(Integrated Drive Electronics)Refers to the disk controller and integrated with a hard drive, is the transmission interface, the hard drive   another name called ATA (Advanced Technology Attachment), refers to the same thing, using the parallel multiplexing technology(PATA).

The general use of 16-bit data bus,   send 2 bytes per bus, is generally 100Mbytes/sec bandwidth, data bus must be locked in the 50MHz. The ordinary IDE hard disk rotational speed in 5400/7200RPM. The transmission rate to stop at about 133MB/s. Because the parallel technology constraints and gradually be eliminated.

1.3.2 SATA hard disk

SATA(Serial ATA)The mouth of the hard disk is also called serial hard disk. SATA is named for its serial data transmission mode. In the data transmission process, the data line and the signal line used independently, and the clock frequency transmission remain independent, therefore, compared with the previous PATA, transmission rate of SATA can reach 30 times the parallel.

Early SATA-1 can reach 150MB/s, the late SATA-2 standard can reach 300MB/s, and the SATA-3 protocol, the third generation of the standard transmission rate can reach 600M/s, speed in 7200RPM.

SATA hard disk support hot swappable, but the hard disk is damaged, and can't display bad disk concrete, lead to hot swap technology of little significance, in a single thread or a thread work, performance has been very good, but in the multi task or a large amount of data transmission, a sharp decline in performance, the reason is the relatively low mechanical chassis.

1.3.3 SCSI hard disk

SCSI English full name: Small Computer System Interface, is dedicated to a storage unit design of small computer system interface mode, the SCSI computer can send command to a SCSI device, disk drive arm positioning the head can move in the disk cache, and transfer data, the whole process is performed in the background. It can also send commands to operate at the same time, suitable for I/O applications and large load. The overall performance on the disk array is also significantly higher than the array based on ATA hard disk.

The mainstream SCSI hard drives are using Ultra 320 SCSI interface, can provide the interface to the transmission speed of 320MB/s, the average seek time is 4-5ms, CPU occupancy rate is low, the parallel processing ability, processing data can be transmitted asynchronously, ordinary SCSI hard disk rotational speed in 10000/15000RPM, but the price is expensive.

1.3.3 SAS hard disk

SAS(Serial Attached SCSI)A serial connection of SCSI, is a new generation of SCSI technology. And now the popular Serial ATA (SATA) drives, are using the serial techniques to obtain the higher transmission speeds, and by shortening the link line to improve the internal space. SAS  is a revolutionary development of SAS  SCSI technology; is a change to the development of SCSI Technology.

The transmission rate in support of 600MB/s, each SAS port provides 3Gb bandwidth, transmission capacity and not much difference between 4Gb fiber and FC hard disk is comparable to, enough, not only can be connected to the SCSI hard disk, but also compatible with SATA hard disk, the average seek time is 3-4ms, but the price is too expensive, compared with the capacity of Ultra 320 SCSI hard disk, SAS disk to be twice as expensive also, but the cost is very expensive, if the group RAID, also need to buy a SAS card.

1.3.4 FC hard disk

FC(Fibre Channel)People usually think it is system and system or between the system and subsystem interconnection architecture, it is to point to point (or exchange) configuration in the system by the cable connection. (hard disk is itself does not have the FC interface,   cabinet hard disk with FC interface,   by optical fiber and optical switches interconnected).

The transmission rate can reach 200MB/s-400MB/s, the average seek time is about 3MS, high performance transmission, good stability, but the price is extremely expensive, except for very high-end enterprise level application, basic does not have the market, the rise of SAS also brings great pressure to the FC.

1.3.5 SSD hard disk

Solid state drives (Solid State Disk or Solid State Drive), also known as electronic disk or solid state electronic disk, is composed of control unit and the solid state memory unit (DRAM or FLASH chip) consisting of the hard disk.

Excellent shock resistance, wide working temperature range chip (-40~85 ℃). The cost is very high.

SSD has two:

(1)Flash based solid state drives (IDE FLASH DISK, Serial ATA Flash Disk): the SSD can move, and data protection is not affected by power control, can be adapted to various environment, but the service life is not very high, suitable for individual users.

(2)Solid state disk based on DRAM: it follows the design of traditional hard disk, the file system may be most operating system for volume settings and management, and provides FC interface and PCI interface, application mode can be divided into SSD hard disk and SSD disk array two.

2 RAID

It introduces something about the hard things, Internet is out. In practical application, the performance of our procedure of the largest is the network and disk IO, because CPU is already fast, memory IO speed has reached very quickly point (there should be 5G per second), but our data are stored on disk, program the operation needs to read data, store data, because the performance of the disk is one of the most influential factors (do not say first network).

Defects in modern disk: I/O performance is poor, poor stability. Here we only discuss about the qualitative aspects of things, as far as performance, Ken willing to spend money, buy expensive. Related to the stability, if there is a breakdown or damage to a hard disk, the disk is not used again, if this is in the storage requirements for very high data locality, it is unthinkable. Because of this, the birth of a new technique of --RAID.

2.1 RAID concept

Redundant array of independent disks (RAID,   Redundant  Array of  Independent  Disks), formerly known as redundant array of inexpensive disks (RAID,   Redundant  Array of  Inexpensive  Disks) , referred to as the disk array. The basic idea is to a number of relatively inexpensive hard together, become a hard disk arrays, the performance even more than an expensive, large capacity hard disk. According to the choice of different versions, RAID has one or more of the following benefits than single disk: enhanced data integration, enhanced fault tolerance function, increase the throughput or capacity. In addition, the disk array for computer, looks like a single hard disk or logical storage unit. Divided intoRAID-0, RAID-1, RAID-1E, RAID-5, RAID-6, RAID-7, RAID-10, RAID-50, RAID-60.

Evaluation of a RAID form mainly depends on three indicators, respectively is: the speed of , disk usage, redundancy.

2.2 RAID0

The multiple disks were merged into one large disk, not redundant, parallel I/O, the fastest. If a disk (physical) damage, all data will be lost.


2.3 RAID1

More than two groups of N disk interactions mirror, can read very well in some multithreaded operating system, in addition to write speed slightly lower. Unless the primary disk and the mirror has the same data at the same time, damage, or as long as a disk can maintain normal operation, the highest reliability. Is the highest in RAID unit cost.


2.4 RAID2

The improved version of RAID 0, the Hamming code (Hamming Code) way to encode the data after the partition as independent bits, and the data are written to disk. Because the data with error correction code (ECC, Error Correction Code), so the capacity of whole data will be larger than the original data, data can be in the case of error error correction, to ensure that the output is correct. Data transfer rate is quite high. RAID2 at least three disk drive can operate. The need for multiple disk storage and recovery information, so RAID2 technology to implement more complex. It is rarely used in the business environment.


2.5 RAID3

Parallel transmission with parity check code. The Bit interleaving (data interleaved memory) technology, can only check wrong not correction. Mainly used for graphics (including animation) requirements such as high throughput occasions. Provide good rate on continuous transmission of large amounts of data, but for those who often need to perform a large number of write operations, the parity disk will become the bottleneck of write operations. Use the check disk alone to protect data although safety is not mirrored disk utilization high, but has been greatly improved. To implement the user must have more than three driver, write / read rate is very high. Because the parity bit is relatively small, so the computing time is relatively small.


2.6 RAID4

Independent disk structure with parity check code. Similar to RAID3, is carried out according to the data block access to the data, which is according to the disk, is a disk every time. In failure recovery, it is much greater than RAID3, the controller design difficulty is much larger, and the efficiency of data access is not very good. The host to access the RAID card should be measured in Block, read time, RAID3 will access all disks to get the data, RAID4 only requires one disk access. Considering the disk seek time is very long, when read in a large amount of data, RAID4 is easier to do with, thus the performance should be better. When writing, RAID3 can directly calculate the checksum value, then the data and checksum are written to disk, RAID4 you need to read the old data and old checksum value, use the old data, the old parity value, new data to calculate the checksum value, and then write the new data and new parity value.


2.7 RAID5

Independent distributed parity disk structure. Using Disk Striping (hard disk partition) technology, is a kind of storage performance, data security and storage cost of both the storage solutions. The data and parity information storage corresponding to each disk is composed of RAID5, and parity information and corresponding data are stored in a different disk. When a disk data RAID5 is damaged, can go to restore the damaged data using the remaining data and information of the corresponding parity.  RAID 5 can be interpreted as RAID 0 and RAID 1 compromise. Reading efficiency is very high, write efficiency, block type collective access efficiency. But for the parallel transmission solution is not good, and the controller design is difficult. Every write operation, will produce four actual read / write operations, including two times to read the old data and parity information, write the new data and parity information two. But when off after hours, running efficiency drops drastically.


2.8 RAID6

Independent disk structure of parity check codes with two kinds of distributed storage. Compared with RAID 5, RAID 6 increased by second independent parity information block. Use different algorithms to two independent parity system, data reliability is very high, even if the two disks and failure will not affect the use of data. Mainly for data cannot be wrong occasion. The controller design is very complex, the write speed is not good, used to calculate the parity check and verify the correctness of the data takes more time, causing unnecessary load. RAID 6 must have more than four disks to take effect. RAID 6 in the hardware disk array card functions, disk array level is the most common.


2.9 RAID10/01

RAID 1+0 is first mapped to partition data, then all the hard disk is divided into two groups, as RAID minimum combination of 0, then the two groups respectively as RAID 1 operation.

RAID 0+1 is contrary to RAID 1+0 procedures, is to partition the data mapped to the two group of hard disk. It will be all the hard disk is divided into two groups, a RAID minimum combination of 1, and two groups of hard disk respectively as RAID 0 operation.

The performance, RAID 0+1 RAID 1+0 is faster than the speed of reading and writing.

Reliability, while RAID 1+0 has a hard disk is damaged, the other three hard disk will continue to operate. RAID 0+1  as long as there is a disk is damaged, the same group RAID another hard disk 0 will also stop the operation, only two hard disk operation, low reliability.



2.10 RAID50

RAID 5 and RAID 0, to RAID 5, and RAID 0, which is the multi group RAID 5 Stripe access each other. Because RAID 50 is based on the RAID 5, and RAID 5 needs at least 3 hard disk, thus to multiple groups of RAID 5 RAID 50, at least 6 hard disk. In the RAID 50 minimum 6 hard disk configuration as an example, the first 6 sets of hard disk is divided into 2 groups, each group of 3 RAID 5, so we got two groups of RAID 5, and then two groups of RAID 5 RAID 0.

Any of a group of RAID 50 in the bottom or a plurality of groups of RAID 5 in 1 hard disk is damaged, can still maintain the operation, but if any of a group of RAID 5 in more than 2 hard disk is damaged, the entire group of RAID 50 will fail.

RAID 50 because of the multi group RAID consisted of 5 Stripe in the upper level, performance than pure RAID 5 high, and capacity utilization rate is the same as for RAID 5.


2.11 RAID60

RAID 6 and RAID 0 combination: first for the RAID 6, and RAID 0. In other words, is to more than two groups of RAID 6 for Stripe access. RAID 6 needs 4 sets of hard disk, so the RAID minimum requirement is 8 hard disk 60.

Because the bottom is RAID 6, so RAID 60 allows any set of RAID 6 damage up to 2 sets of hard disk, and the system can still maintain operation; but as long as the underlying any set of RAID 6 damage 3 hard disk, the whole group of RAID 60 will fail, of course, the odds is very low.

Compared with the RAID 6, RAID 60 layer through a combination of several groups of RAID 6 Stripe access, so high performance. But the use of threshold is high, and the low rate of capacity utilization is the larger problem.

2.12 Disk array comparative

RAIDClass

The number of disks must be

The minimum number of fault-tolerant disk

Available capacity

Performance

Safety

Objective

Application industry

0

≧2

0

n

Highest

A hard disk that is abnormal, all with the exception

The pursuit of maximum capacity, speed

3D industrial real-time rendering, video splicing cache usage

1

≧2

Half of the total number of

Half of the total capacity

Improve slightly

Highest

The pursuit of maximum security

Individual, enterprise backup

10

≧4

Half of the total number of

Half of the total capacity

High

The highest security

RAID 0/1 has the advantages of faster speed, theory

A large database, server

5

≧3

1

n-1

High

High

The pursuit of maximum, minimum budget

Individual, enterprise backup

6

≧4

2

n-2

5 slower than the RAID

Security is RAID 5

With RAID 5, but relatively safe

Individual, enterprise backup

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download

Posted by Sampson at November 12, 2013 - 10:22 PM