design and implementation of scsi target emulator

6
TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0214 07/21 pp38-43 Volume 11, Number 1, February 2006 Design and Implementation of SCSI Target Emulator * PAN Jiaming ( ), SHU Jiwu ( ) ** , ZHANG Suqin ( ), ZHENG Weimin ( ) Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China Abstract: A SCSI target emulator is used in a storage area network (SAN) environment to simulate the behavior of a SCSI target for processing and responding to I/O requests issued by initiators. The SCSI target emulator works with general storage devices with multiple transport protocols. The target emulator utilizes a protocol conversion module that translates the SCSI protocols to a variety of storage devices and implements the multi-RAID-level configuration and storage visualization functions. Moreover, the target emulator implements RAM caching, multi-queuing, and request merging to effectively improve the I/O response speed of the general storage devices. The throughput and average response times of the target emulator for block sizes of 4 KB to 128 KB are 150% faster for reads and 67% faster for writes than the existing emulator. With a block size of 16 KB, the I/O latency of the target emulator is only about 20% that of the existing emulator. Key words: storage area network (SAN); iSCSI; SCSI target emulator Introduction With the explosion in the amount of information, the performance and scalability of storage systems have become critical issues along with the system cost-effectiveness. Storage area network (SAN) [1] is a new storage system model with excellent performance and scalability that has been widely used in recent years. FC-SAN is based on the fibre channel transport protocol, which allows SCSI commands and data transmits on fibre channels. IP-SAN is usually based on the iSCSI protocol [2,3] that describes a means of transporting the SCSI packets over TCP/IP, providing for an interoperable solution which takes advantage of the existing Internet infrastructure, Internet manage- ment facilities, and address distance limitations. The SCSI protocol [4] is used to transmit data between initiators and targets. A traditional SCSI target is a SCSI device such as a SCSI disk or tape driver, but traditional SCSI targets cannot be used in a SAN environment. Palekar et al. [5] proposed a SCSI target emulator for SAN environment. The target emulator receives SCSI commands and data sent by initiators from the storage network, then processes and responds to them. The target emulator then passes the SCSI commands and data to the SCSI subsystem. However, the disadvantage of Palekar’s emulator is that it can only make use of devices with a SCSI interface. The IntraStor system [6] is a three-tier storage system. The lowest system tier is the IP disk controllers (IDCs) with the disks attached. The middle tier is the storage controller and manager (SCM). The upper tier is the application host. The application host provides access to the SCM using the iSCSI protocol. The SCM gives access and manages the IDCs using a block oriented transport protocol named xBlock. The disadvantage of the IntraStor system is that multiple protocol Received: 2004-04-15; revised: 2005-01-04 Supported by the National High-Tech Research and Development (863) Program of China (No. 2001AA111110) To whom correspondence should be addressed. E-mail: shujw@tsinghua. edu.cn; Tel: 86-10-62795215

Upload: weimin

Post on 05-Jul-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Design and implementation of SCSI target emulator

TSINGHUA SCIENCE AND TECHNOLOGYISSN 1007-0214 07/21 pp38-43Volume 11, Number 1, February 2006

Design and Implementation of SCSI Target Emulator*

PAN Jiaming ( ), SHU Jiwu ( )**, ZHANG Suqin ( ), ZHENG Weimin ( )

Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China

Abstract: A SCSI target emulator is used in a storage area network (SAN) environment to simulate the

behavior of a SCSI target for processing and responding to I/O requests issued by initiators. The SCSI target

emulator works with general storage devices with multiple transport protocols. The target emulator utilizes a

protocol conversion module that translates the SCSI protocols to a variety of storage devices and

implements the multi-RAID-level configuration and storage visualization functions. Moreover, the target

emulator implements RAM caching, multi-queuing, and request merging to effectively improve the I/O

response speed of the general storage devices. The throughput and average response times of the target

emulator for block sizes of 4 KB to 128 KB are 150% faster for reads and 67% faster for writes than the

existing emulator. With a block size of 16 KB, the I/O latency of the target emulator is only about 20% that of

the existing emulator.

Key words: storage area network (SAN); iSCSI; SCSI target emulator

Introduction

With the explosion in the amount of information, the

performance and scalability of storage systems have

become critical issues along with the system

cost-effectiveness. Storage area network (SAN)[1]

is a

new storage system model with excellent performance

and scalability that has been widely used in recent

years. FC-SAN is based on the fibre channel transport

protocol, which allows SCSI commands and data

transmits on fibre channels. IP-SAN is usually based

on the iSCSI protocol[2,3]

that describes a means of

transporting the SCSI packets over TCP/IP, providing

for an interoperable solution which takes advantage of

the existing Internet infrastructure, Internet manage-

ment facilities, and address distance limitations.

The SCSI protocol[4]

is used to transmit data between

initiators and targets. A traditional SCSI target is a

SCSI device such as a SCSI disk or tape driver, but

traditional SCSI targets cannot be used in a SAN

environment.

Palekar et al. [5]

proposed a SCSI target emulator for

SAN environment. The target emulator receives SCSI

commands and data sent by initiators from the storage

network, then processes and responds to them. The

target emulator then passes the SCSI commands and

data to the SCSI subsystem. However, the

disadvantage of Palekar’s emulator is that it can only

make use of devices with a SCSI interface.

The IntraStor system[6]

is a three-tier storage system.

The lowest system tier is the IP disk controllers (IDCs)

with the disks attached. The middle tier is the storage

controller and manager (SCM). The upper tier is the

application host. The application host provides access

to the SCM using the iSCSI protocol. The SCM gives

access and manages the IDCs using a block oriented

transport protocol named xBlock. The disadvantage of

the IntraStor system is that multiple protocol

Received: 2004-04-15; revised: 2005-01-04

Supported by the National High-Tech Research and Development

(863) Program of China (No. 2001AA111110)

To whom correspondence should be addressed.

E-mail: shujw@tsinghua. edu.cn; Tel: 86-10-62795215

Page 2: Design and implementation of SCSI target emulator

PAN Jiaming ( ) et al Design and Implementation of SCSI Target Emulator 39

conversions may degrade the performance. Moreover,

the IDCs must implement an independent protocol

conversion for each storage device.

This paper introduces a target emulator with the goal

of building a scalable, flexible, manageable, and

cost-effective data storage system. The emulator was

designed for multiple transport protocols, especially

the iSCSI protocol, and with different storage devices.

The target emulator implements a protocol conversion

module which is able to translate SCSI commands

carried by different SCSI transport protocols to a

variety of storage devices. Many measures, such as

RAM caches, multiple queues, and request merging,

are used to optimize the reading and writing

performance.

The emulator improves on existing target emulators

in several areas. First, the conversion module is able to

convert the SCSI request to a variety of devices, so any

disks can be used in the target. Secondly, the emulator

is able to accept virtual disks as storage devices, such

as MD RAID volumes[7]

and LVM logical volumes[8]

.

Therefore, the target system storage based on this

emulator could be configured as a system with a

multi-RAID-level disk array and storage visualization.

Finally, RAM caching, multiple queuing, and request

merging are used to greatly improve the reading and

writing speeds. Therefore, the emulator performance is

greatly improved.

1 SAN System

The emulator is implemented on a in-house SAN

system TH-MSNS[9,10]

, in which the target simulator

module is a key component[11]

. The target simulator

module shown in Fig. 1 runs on the target side,

processing and responding to SCSI requests.

Fig. 1 SAN I/O path with STML

The SCSI target simulator module was a low-level

front-end protocol-specific target driver to transmit

SCSI commands and data. The driver hands off the

SCSI commands and data to the SCSI target mid-level

driver. The mid-level driver processes and executes the

SCSI commands and hands the responses back to the

low-level driver. The low-level driver is protocol

specific while the mid-level driver is accessible to

multiple low-level drivers implementing different

protocols.

2 SCSI Target Emulator Design

The target emulator design is based on the target

simulator module of TH-MSNS. The mid-level driver

is unchanged; therefore, the SCSI requests for SCSI

devices will be passed to the SCSI subsystem as before

with a routine added to forward SCSI requests for

general storage devices. The mid-level driver in the

target emulator is compatible with the various existing

low-level drivers implementing various transport

protocols.

The target emulator also is a SCSI protocol

conversion module to process the SCSI requests for

general storage devices. The conversion module is

responsible for receiving SCSI requests for general

storage devices from the mid-level driver and

translating these requests to the devices. The actual

reading from and writing to the storage devices are

performed by the I/O handler module. The accessing

speed to the general storage devices is enhanced by a

RAM cache driver named dCache in the I/O handler

module. In addition, measures such as multiple

queuing and request merging are used to optimize the

reading and writing for the general storage devices.

The target emulator architecture is shown in Fig. 2.

Fig. 2 SCSI target emulator architecture

Page 3: Design and implementation of SCSI target emulator

Tsinghua Science and Technology, February 2006, 11(1): 38-4340

The target emulator includes the following three key

modules:

1) SCSI target mid-level module This module

receives the SCSI commands and data from the

low-level drivers and returns the responses (data and/or

status) back to the driver. If the SCSI request target

is a SCSI device, the mid-level module passes the

request to the scsi_mod driver directly, which

processes the command and sends it onto the

corresponding SCSI target device. If the SCSI

command target is a general storage device, the

mid-level module forwards the request to the protocol

conversion module.

2) SCSI protocol conversion module This

module is the bridge between the mid-level module

and the general storage devices. It translates the

requests received from the mid-level module to a

variety of general storage devices and sets up block

requests for general storage devices. Finally, this

module passes the block requests to the I/O handler

module.

3) I/O handler module This module is

responsible for actually performing the I/O operations.

The module also manages the storage devices, such as

ATA disks and virtual disks. Each disk has an

independent request queue and an I/O operation thread,

which results in multiple queues. The benefit of

multiple queues is that the independent storage devices

can be read or written simultaneously, which improves

the concurrent access performance. In addition, the I/O

handler module will merge the requests in the request

queues if possible to reduce the storage disk access

frequency.

The dCache driver is a RAM cache driver

implemented in the I/O handler module which utilizes

a RAM buffer to cache data. The RAM buffer access

time is much faster than the magnetic disk access time.

Therefore, the dCache considerally reduces the I/O

response time which dramatically improves the

performance.

In addition, the I/O handler module provides a

configuration interface which interacts with

configuration tools so users can manage the storage

devices (e.g., register or un-register storage devices).

3 Implementation

The mid-level module in the target emulator is based

on the mid-level module of the TH-MSNS target

simulation module, so the implementation of the

mid-level module will not be described here. This

section only describes the implementation of the

protocol conversion module and the dCache driver, the

multiple queuing, and the request merging.

3.1 SCSI protocol conversion module

When the mid-level module receives a SCSI command

for one of the general storage devices from the

low-level driver, the mid-level module sets up a SCSI

request which is transmitted to the conversion module.

Each storage device has a unique device ID in the

target emulator. Each (target, logical unit number

(LUN)) pair is mapped to a device ID in the

conversion module. When the conversion module

receives a SCSI request, the module first maps the

(target, LUN) pair to the device ID. Then, the

conversion module evaluates the SCSI request. If the

request is a query-type request, it fills the SCSI request

with the desired device information and returns the

request for a read or write request, the conversion

module parses the LBA and the transfer length from

the SCSI command descriptor block[12,13]

. Then, the

conversion module combines the LBA, the transfer

length, and the data into a structure called a block

device request (BDR) which is sent to the I/O handler

module. When the I/O handler module successfully

completes reading or writing, the conversion module

fills the request and returns it to the mid-level module.

3.2 dCache driver

The dCache is a cache driver for general devices in the

I/O handler module. The RAM buffer used by dCache

is allocated and initialized when the target emulator is

set up. The buffer is organized as pages with page size

of 1 KB.

A table named pageTable in the dCache driver

describes the pages. Each entry in the pageTable

represents a page and includes the following major

fields.

Status: indicate the page status as free, dirty or valid,

meaning that the page is not in use, the page contains

data that has not been written to the disks, or the page

contains data that has been written to the disks. A free

page can be allocated freely, while a dirty page cannot

Page 4: Design and implementation of SCSI target emulator

PAN Jiaming ( ) et al Design and Implementation of SCSI Target Emulator 41

be replaced, but a valid page can be replaced. If there

are no free pages, the dCache will replace a valid page.

The replacement policy follows the LRU algorithm.

Counter: counter used by the LRU algorithm.

(DeviceID, LBA): corresponding device ID and

page LBA.

Address: page start address.

The pages in the dCache are addressed by the

(device ID, LBA) pair. A hashtable named LBATable

with the key (device ID, LBA) pair is used to speed up

the search with the result being the page corresponding

to the (device ID, LBA) pair.

All free pages are organized as a linked list called

free_list. In the same way, dirty pages are linked with

dirty_list and valid pages are linked with valid_list. All

continuous pages are linked together.

Upon receiving a write request, the dCache searches

the LBATable with the (device ID, LBA) pair. If an

entry is found, the corresponding page is overwritten

by the incoming data. Otherwise, the dCache gets a

page from the free_list or the valid_list and copies the

incoming data into the page. When the copy is

complete, dCache updates the relative data structures

and informs the protocol conversion module.

For a read request, the dCache searches the

LBATable. If the data is found in the RAM buffer, the

data is copied from the RAM buffer to the requesting

buffer. Otherwise, the dCache forwards the BDR to the

I/O handler module for reading and gets a page from

the free_list or the valid_list. As the data is written into

the requesting buffer, it is also written into the page.

The operation of moving data from a higher-level

storage device to a lower-level storage device is

defined as a destage operation. A destage thread was

implemented to perform the destage tasks. The destage

thread is initialized when the emulator sets up and

continues to watch at the dCache. The destage thread

remains asleep most of the time. The destage thread is

activated when the number of dirty pages exceeds the

threshold value or a storage device is not busy. When

the number of dirty pages exceeds the threshold value,

the destage thread migrates the data from the dirty

pages to the disks and marks the page state as valid.

When the destage thread detects no I/O operations to a

storage device, it searches the dirty pages for this

device and destages a page.

3.3 Multiple queuing and request merging

The concurrent access performance is improved by

giving each storage device its own I/O request queue

and I/O operation thread. The I/O request queue is a

FIFO. When the dCache needs to read or write disks,

the dCache forwards the BDR to the corresponding

queue according to its device ID. The I/O operation

thread sleeps until the I/O handler module detects that

the queue is not empty.

The I/O handler module uses request merging to

preprocess the requests. The I/O handler module

analyzes the data according to their LBAs and merges

requests without data correlation. If the LBAs of

several write requests are the same, the I/O handler

module discards the previous requests and only writes

the last request to the disks. If the LBAs of several

write requests in the queue are continuous, these

requests are merged into one request so that the data

will be written together to the disks.

4 Performance Analysis

The SCSI target emulator was implemented on the

TH-MSNS. The read/write throughput and the average

I/O response time of the emulator were then compared

with Palekar’s target emulator.

The testing used only an initiator and a target

connected by a gigabit ethernet. The initiator and target

configurations are described in Table 1.

Table 1 Initiator and target configurations

Initiator Target

CPU Intel(R) Xeon(TM) CPU

2.40 GHz 2

Intel(R) Xeon(TM)

CPU 2.40 GHz

Memory 1 GB 1 GB

OS Red Hat Linux 7.3 Red Hat Linux 7.3

kernel 2.4.18-3smp kernel 2.4.22-up

The SCSI target system first used a software

RAID-0 volume based on the SCSI target emulator.

The tests evaluated the throughput as well as the

average response time for block sizes of 4 KB, 16 KB,

256 KB, and 1024 KB. Then the SCSI target system

was set up with a hardware RAID-0 volume based on

Palekar’s target emulator with the throughput and the

average response time evaluated for the same block

sizes. Finally, the target emulator performance was

Page 5: Design and implementation of SCSI target emulator

Tsinghua Science and Technology, February 2006, 11(1): 38-4342

evaluated with a RAM-DISK. Both RAID-0 volumes

included seven SCSI disks. All reads and writes were

sequential.

The evaluation results are shown in Figs. 3 and 4.

The label “RAM” refers to the throughput for the

RAM disk with the target emulator, “Target” refers to

the throughput for the RAID-0 volume using the

emulator target, and “Palekar” refers to the throughput

for the RAID-0 volume using Palekar’s target

emulator.

Fig. 3 Target emulator read results

Figure 3 shows the throughput for sequential reads.

The throughput with the target emulator for the

software RAID-0 volume is about 50 MB/s, while the

throughput of Palekar’s target emulator for the

hardware RAID-0 volume is about 20 MB/s.

Figure 4 shows the throughput for sequential writes.

The throughput with the target emulator for the

software RAID-0 volume is about 50 MB/s, while the

throughput with Palekar’s target emulator for the

hardware RAID-0 volume is nearly 30 MB/s. The

results illustrate that at the same RAID level

configuration, the read performance is 150% better

while the write performance is about 67% better.

Fig. 4 Target emulator write results

The performance was also evaluated on a RAID-5

volume. The results showed that the throughput of

sequential reads/writes was up to 40 MB/s faster while

the total CPU utilization of the target emulator was less

than 50%.

Table 2 shows the average response times for a

block size of 16 KB.

Table 2 Average response time with a block size of 16 KB

(ms)

RAMPalekar-

RAID-0

Target-

RAID-0

Palekar-

RAID-5

Target-

RAID-5

Read 0.23 0.63 0.31 0.91 0.35

Write 0.22 0.55 0.29 0.93 0.37

The average response time indicates the average

time between initiation and completion of an I/O

operation. In the SAN environment, the response time

(Tr) is the sum of the transport latency (Tt) and the I/O

latency (TI/O) of the target:

Tr = Tt + TI/O.

The transport latencies in these evaluations are equal

for the same block size with the average response time

for the RAM disk representing the transport latency.

Thus, the I/O latencies can be calculated for all the

configurations. For read requests, the average I/O

latency of the present target emulator with the software

RAID-0 volume was 20% of the average read latency

of Palekar’s target emulator with the hardware RAID-0

volume. The average read latency for the RAID-5

volume was about 16% the Palekar’s result. The write

I/O latency with the present emulator target was 23%

of Palekar’s target emulator for the RAID-0 volume

and 22% for the RAID-5 volume. The results show

that the dCache, multiple queuing, and request merging

are able to dramatically improve the performance.

5 Conclusions

This paper describes a SCSI target emulator utilizing a

SCSI protocol conversion module to translate the SCSI

requests to a variety of storage devices. The emulator

uses a RAM cache, multiple queuing, and request

merging to accelerate the operations to the general

storage devices. Evaluation test results show that the

read/write throughputs for the RAID-0 volume are up

to about 50 MB/s and the read/write throughputs for

the RAID-5 volume are about 40 MB/s. The read

results are 150% faster than the Palekar’s emulator

while the write results are 67% faster.

References

[1] Phillips B. Have storage area networks come of age? IEEE

Computer, 1998, 31(7): 10-12.

[2] Satran J. iSCSI (Internet SCSI), draft-ietf-ips-iscsi-20.txt

Page 6: Design and implementation of SCSI target emulator

PAN Jiaming ( ) et al Design and Implementation of SCSI Target Emulator 43

(January, 2003), http://ietf.org/html.charters/ipscharter.html.

[3] Meth K Z, Satran J. Design of the iSCSI protocol. In:

Proceedings of the 20th

IEEE/11th

NASA Goddard

Conference on Mass Storage Systems and Technologies

(MSS’03), 2003.

[4] ANSI. SCSI architecture model-2 (SAM-2), Sep. 2002,

http://www.t10.org.

[5] Palekar A, Ganapathy N, Chadda A, Russell R. Design and

implementation of a Linux SCSI target for storage area

networks. In: Proceedings of the 5th Annual Linux

Showcase & Conference, 2001.

[6] Wang P, Gilligan E R, Green H, Raubitschek J. IP

SAN–From iSCSI to IP-addressable ethernet disks. In:

Proceedings of the 20th

IEEE/11th

NASA Goddard

Conference on Mass Storage Systems and Technologies

(MSS’03), 2003.

[7] Østergaard J. The Software-RAID HOWTO (2002),

http://www.tldp.org.

[8] Teigland D, Mauelshagen H. Volume managers in Linux.

In: Proceedings of the Freenix Track: 2001 USENIX

Annual, 2001.

[9] Shu Jiwu, Yao Jun, Fu Changdong, Zheng Weimin.

Highly efficient FC-SAN based on load stream. In: Zhou

Xingming, Jahnichen Stefan, Xu Ming, Cao Jiannong, eds.

The Fifth International Workshop on Advanced Parallel

Processing Technologies. Lecture Notes in Computer

Science 2834. Berlin: Springer-Verlag, 2003: 31-40.

[10] Shu Jiwu, Yao Jun. Technical report: Design and

implementation of TH-MSNS. Department of Computer

Science and Technology, Tsinghua University, China,

http://graduate.cs. tsinghua. edu.cn/~yaojun/, 2003. (in

Chinese).

[11] Li Bigang, Shu Jiwu, Zheng Weimin. A design and

implementation of target simulator module in SAN system.

Tsinghua Science and Technology, 2005, 10(1): 122-127.

[12] ANSI. SCSI-3 Block Commands (SBC), Nov. 1997,

http://www.t10.org.

[13] ANSI. SCSI Primary Commands-3(SPC-3), Sep. 2002,

http://www.t10.org.