Thursday, August 12, 2010

Storage virtualization

Storage virtualization is a concept in IT System Administration, referring to the abstraction (separation) of logical storage from physical storage so that it may be accessed without regard to physical storage or heterogeneous structure. This separation allows the Systems Admin increased flexibility in how they manage storage for end users.

Storage virtualization uses address space mapping to achieve location independence by abstracting the physical location of the data. The virtualization system presents to the user a logical space for data storage and itself handles the process of mapping it to the actual physical location.

The virtualization software or device is responsible for maintaining a consistent view of all the mapping information for the virtualized storage. This mapping information is usually called meta-data and is stored as a mapping table. The virtualization software or device uses the meta-data to re-direct I/O requests.

Benefits
Non-disruptive data migration

One of the major benefits of abstracting the host or server from the actual storage is the ability to migrate data while maintaining concurrent I/O access.

The host only knows about the logical disk (vdisk) and so any changes to the meta-data mapping is transparent to the host. This means the actual data can be moved or replicated to another physical location without affecting the operation of any client. When the data has been copied or moved, the meta-data can simply be updated to point to the new location, therefore freeing up the physical storage at the old location.

The process of moving the physical location is known as data migration Most implementations allow for this to be done in a non-disruptive manner, that is concurrently while the host continues to perform I/O to the logical disk (vdisk).

Improved utilization
The physical storage resources are aggregated into storage pools, from which the logical storage is created. More storage systems, which may be heterogeneous in nature, can be added as and when needed, and the virtual storage space will scale up by the same amount. This process is fully transparent to the applications using the storage infrastructure
Utilization can be increased by virtue of the pooling and migration.

When all available storage capacity is pooled, system administrators no longer have to search for disks that have free space to allocate to a particular host or server. A new logical disk can be simply allocated from the available pool, or an existing disk can be expanded.

Pooling also means that all the available storage capacity can potentially be used. In a traditional environment, an entire disk would be mapped to a host. This may be larger than is required, thus wasting space. In a virtual environment, the logical disk (vdisk) is assigned the capacity required by the using host.

Storage can be assigned where it is needed at that point in time, reducing the need to guess how much a given host will need in the future. Using Thin Provisioning, the administrator can create a very large thin provisioned logical disk, thus the using system thinks it has a very large disk from day 1.

Fewer points of management

With storage virtualization, multiple independent storage devices, that may be scattered over a network, appear to be a single monolithic storage device, which can be managed centrally.

However, traditional storage controller management is still required. That is, the creation and maintenance of RAID arrays, including error and fault management.

Risks
Backing out a failed implementation

Once the abstraction layer is in place, only the virtualizer knows where the data actually resides on the physical medium. Backing out of a virtual storage environment therefore requires the reconstruction of the logical disks as contiguous disks that can be used in a traditional manner.

Most implementations will provide some form of back-out procedure and with the data migration services it is at least possible, but time consuming.

Interoperability and vendor support

Interoperability is a key enabler to any virtualization software or device. It applies to the actual physical storage controllers and the hosts, their operating systems, multi-pathing software and connectivity hardware.

Interoperability requirements differ based on the implementation chosen. For example virtualization implemented within a storage controller adds no extra overhead to host based interoperability, but will require additional support of other storage controllers if they are to be virtualized by the same software.

Switch based virtualization may not require specific host interoperability — if it uses packet cracking techniques to redirect the I/O.

Network based appliances have the highest level of interoperability requirements as they have to interoperate with all devices, storage and hosts.

Complexity

Complexity affects areas like :

  • Management of environment : Although a virtual storage infrastructure benefits from a single point of logical disk and replication service management, the physical storage must still be managed. Problem determination and fault isolation can also become complex, due to the abstraction layer.
  • Infrastructure design : Traditional design ethics may no longer apply, virtualization brings a whole range of new ideas and concepts to think about (as detailed here)
  • The software or device itself : Some implementations are more complex to design and code - network based, especially in-band (symmetric) designs in particular — these implementations actually handle the I/O requests and so latency becomes an issue.

Meta-data management

Information is one of the most valuable assets in today's business environments. Once virtualized, the meta-data are the glue in the middle. If the meta-data are lost, so is all the actual data as it would be virtually impossible to reconstruct the logical drives without the mapping information.

Any implementation must ensure its protection with appropriate levels of back-ups and replicas. It is important to be able to reconstruct the meta-data in the event of a catastrophic failure.

The meta-data management also has implications on performance. Any virtualization software or device must be able to keep all the copies of the meta-data atomic and quickly updateable. Some implementations restrict the ability to provide certain fast update functions, such as point-in-time copies and caching where super fast updates are required to ensure minimal latency to the actual I/O being performed.

Performance and scalability

In some implementations the performance of the physical storage can actually be improved, mainly due to caching. Caching however requires the visibility of the data contained within the I/O request and so is limited to in-band and symmetric virtualization software and devices. However these implementations also directly influence the latency of an I/O request (cache miss), due to the I/O having to flow through the software or device. Assuming the software or device is efficiently designed this impact should be minimal when compared with the latency associated with physical disk accesses.

Due to the nature of virtualization, the mapping of logical to physical requires some processing power and lookup tables. Therefore every implementation will add some small amount of latency.

In addition to response time concerns, throughput has to be considered. The bandwidth into and out of the meta-data lookup software directly impacts the available system bandwidth. In asymmetric implementations, where the meta-data lookup occurs before the information is read or written, bandwidth is less of a concern as the meta-data are a tiny fraction of the actual I/O size. In-band, symmetric flow through designs are directly limited by their processing power and connectivity bandwidths.

Most implementations provide some form of scale-out model, where the inclusion of additional software or device instances provides increased scalability and potentially increased bandwidth. The performance and scalability characteristics are directly influenced by the chosen implementation.

0 comments:

Post a Comment