design and implementation of scsi target emulator
TRANSCRIPT
TSINGHUA SCIENCE AND TECHNOLOGYISSN 1007-0214 07/21 pp38-43Volume 11, Number 1, February 2006
Design and Implementation of SCSI Target Emulator*
PAN Jiaming ( ), SHU Jiwu ( )**, ZHANG Suqin ( ), ZHENG Weimin ( )
Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China
Abstract: A SCSI target emulator is used in a storage area network (SAN) environment to simulate the
behavior of a SCSI target for processing and responding to I/O requests issued by initiators. The SCSI target
emulator works with general storage devices with multiple transport protocols. The target emulator utilizes a
protocol conversion module that translates the SCSI protocols to a variety of storage devices and
implements the multi-RAID-level configuration and storage visualization functions. Moreover, the target
emulator implements RAM caching, multi-queuing, and request merging to effectively improve the I/O
response speed of the general storage devices. The throughput and average response times of the target
emulator for block sizes of 4 KB to 128 KB are 150% faster for reads and 67% faster for writes than the
existing emulator. With a block size of 16 KB, the I/O latency of the target emulator is only about 20% that of
the existing emulator.
Key words: storage area network (SAN); iSCSI; SCSI target emulator
Introduction
With the explosion in the amount of information, the
performance and scalability of storage systems have
become critical issues along with the system
cost-effectiveness. Storage area network (SAN)[1]
is a
new storage system model with excellent performance
and scalability that has been widely used in recent
years. FC-SAN is based on the fibre channel transport
protocol, which allows SCSI commands and data
transmits on fibre channels. IP-SAN is usually based
on the iSCSI protocol[2,3]
that describes a means of
transporting the SCSI packets over TCP/IP, providing
for an interoperable solution which takes advantage of
the existing Internet infrastructure, Internet manage-
ment facilities, and address distance limitations.
The SCSI protocol[4]
is used to transmit data between
initiators and targets. A traditional SCSI target is a
SCSI device such as a SCSI disk or tape driver, but
traditional SCSI targets cannot be used in a SAN
environment.
Palekar et al. [5]
proposed a SCSI target emulator for
SAN environment. The target emulator receives SCSI
commands and data sent by initiators from the storage
network, then processes and responds to them. The
target emulator then passes the SCSI commands and
data to the SCSI subsystem. However, the
disadvantage of Palekar’s emulator is that it can only
make use of devices with a SCSI interface.
The IntraStor system[6]
is a three-tier storage system.
The lowest system tier is the IP disk controllers (IDCs)
with the disks attached. The middle tier is the storage
controller and manager (SCM). The upper tier is the
application host. The application host provides access
to the SCM using the iSCSI protocol. The SCM gives
access and manages the IDCs using a block oriented
transport protocol named xBlock. The disadvantage of
the IntraStor system is that multiple protocol
Received: 2004-04-15; revised: 2005-01-04
Supported by the National High-Tech Research and Development
(863) Program of China (No. 2001AA111110)
To whom correspondence should be addressed.
E-mail: shujw@tsinghua. edu.cn; Tel: 86-10-62795215
PAN Jiaming ( ) et al Design and Implementation of SCSI Target Emulator 39
conversions may degrade the performance. Moreover,
the IDCs must implement an independent protocol
conversion for each storage device.
This paper introduces a target emulator with the goal
of building a scalable, flexible, manageable, and
cost-effective data storage system. The emulator was
designed for multiple transport protocols, especially
the iSCSI protocol, and with different storage devices.
The target emulator implements a protocol conversion
module which is able to translate SCSI commands
carried by different SCSI transport protocols to a
variety of storage devices. Many measures, such as
RAM caches, multiple queues, and request merging,
are used to optimize the reading and writing
performance.
The emulator improves on existing target emulators
in several areas. First, the conversion module is able to
convert the SCSI request to a variety of devices, so any
disks can be used in the target. Secondly, the emulator
is able to accept virtual disks as storage devices, such
as MD RAID volumes[7]
and LVM logical volumes[8]
.
Therefore, the target system storage based on this
emulator could be configured as a system with a
multi-RAID-level disk array and storage visualization.
Finally, RAM caching, multiple queuing, and request
merging are used to greatly improve the reading and
writing speeds. Therefore, the emulator performance is
greatly improved.
1 SAN System
The emulator is implemented on a in-house SAN
system TH-MSNS[9,10]
, in which the target simulator
module is a key component[11]
. The target simulator
module shown in Fig. 1 runs on the target side,
processing and responding to SCSI requests.
Fig. 1 SAN I/O path with STML
The SCSI target simulator module was a low-level
front-end protocol-specific target driver to transmit
SCSI commands and data. The driver hands off the
SCSI commands and data to the SCSI target mid-level
driver. The mid-level driver processes and executes the
SCSI commands and hands the responses back to the
low-level driver. The low-level driver is protocol
specific while the mid-level driver is accessible to
multiple low-level drivers implementing different
protocols.
2 SCSI Target Emulator Design
The target emulator design is based on the target
simulator module of TH-MSNS. The mid-level driver
is unchanged; therefore, the SCSI requests for SCSI
devices will be passed to the SCSI subsystem as before
with a routine added to forward SCSI requests for
general storage devices. The mid-level driver in the
target emulator is compatible with the various existing
low-level drivers implementing various transport
protocols.
The target emulator also is a SCSI protocol
conversion module to process the SCSI requests for
general storage devices. The conversion module is
responsible for receiving SCSI requests for general
storage devices from the mid-level driver and
translating these requests to the devices. The actual
reading from and writing to the storage devices are
performed by the I/O handler module. The accessing
speed to the general storage devices is enhanced by a
RAM cache driver named dCache in the I/O handler
module. In addition, measures such as multiple
queuing and request merging are used to optimize the
reading and writing for the general storage devices.
The target emulator architecture is shown in Fig. 2.
Fig. 2 SCSI target emulator architecture
Tsinghua Science and Technology, February 2006, 11(1): 38-4340
The target emulator includes the following three key
modules:
1) SCSI target mid-level module This module
receives the SCSI commands and data from the
low-level drivers and returns the responses (data and/or
status) back to the driver. If the SCSI request target
is a SCSI device, the mid-level module passes the
request to the scsi_mod driver directly, which
processes the command and sends it onto the
corresponding SCSI target device. If the SCSI
command target is a general storage device, the
mid-level module forwards the request to the protocol
conversion module.
2) SCSI protocol conversion module This
module is the bridge between the mid-level module
and the general storage devices. It translates the
requests received from the mid-level module to a
variety of general storage devices and sets up block
requests for general storage devices. Finally, this
module passes the block requests to the I/O handler
module.
3) I/O handler module This module is
responsible for actually performing the I/O operations.
The module also manages the storage devices, such as
ATA disks and virtual disks. Each disk has an
independent request queue and an I/O operation thread,
which results in multiple queues. The benefit of
multiple queues is that the independent storage devices
can be read or written simultaneously, which improves
the concurrent access performance. In addition, the I/O
handler module will merge the requests in the request
queues if possible to reduce the storage disk access
frequency.
The dCache driver is a RAM cache driver
implemented in the I/O handler module which utilizes
a RAM buffer to cache data. The RAM buffer access
time is much faster than the magnetic disk access time.
Therefore, the dCache considerally reduces the I/O
response time which dramatically improves the
performance.
In addition, the I/O handler module provides a
configuration interface which interacts with
configuration tools so users can manage the storage
devices (e.g., register or un-register storage devices).
3 Implementation
The mid-level module in the target emulator is based
on the mid-level module of the TH-MSNS target
simulation module, so the implementation of the
mid-level module will not be described here. This
section only describes the implementation of the
protocol conversion module and the dCache driver, the
multiple queuing, and the request merging.
3.1 SCSI protocol conversion module
When the mid-level module receives a SCSI command
for one of the general storage devices from the
low-level driver, the mid-level module sets up a SCSI
request which is transmitted to the conversion module.
Each storage device has a unique device ID in the
target emulator. Each (target, logical unit number
(LUN)) pair is mapped to a device ID in the
conversion module. When the conversion module
receives a SCSI request, the module first maps the
(target, LUN) pair to the device ID. Then, the
conversion module evaluates the SCSI request. If the
request is a query-type request, it fills the SCSI request
with the desired device information and returns the
request for a read or write request, the conversion
module parses the LBA and the transfer length from
the SCSI command descriptor block[12,13]
. Then, the
conversion module combines the LBA, the transfer
length, and the data into a structure called a block
device request (BDR) which is sent to the I/O handler
module. When the I/O handler module successfully
completes reading or writing, the conversion module
fills the request and returns it to the mid-level module.
3.2 dCache driver
The dCache is a cache driver for general devices in the
I/O handler module. The RAM buffer used by dCache
is allocated and initialized when the target emulator is
set up. The buffer is organized as pages with page size
of 1 KB.
A table named pageTable in the dCache driver
describes the pages. Each entry in the pageTable
represents a page and includes the following major
fields.
Status: indicate the page status as free, dirty or valid,
meaning that the page is not in use, the page contains
data that has not been written to the disks, or the page
contains data that has been written to the disks. A free
page can be allocated freely, while a dirty page cannot
PAN Jiaming ( ) et al Design and Implementation of SCSI Target Emulator 41
be replaced, but a valid page can be replaced. If there
are no free pages, the dCache will replace a valid page.
The replacement policy follows the LRU algorithm.
Counter: counter used by the LRU algorithm.
(DeviceID, LBA): corresponding device ID and
page LBA.
Address: page start address.
The pages in the dCache are addressed by the
(device ID, LBA) pair. A hashtable named LBATable
with the key (device ID, LBA) pair is used to speed up
the search with the result being the page corresponding
to the (device ID, LBA) pair.
All free pages are organized as a linked list called
free_list. In the same way, dirty pages are linked with
dirty_list and valid pages are linked with valid_list. All
continuous pages are linked together.
Upon receiving a write request, the dCache searches
the LBATable with the (device ID, LBA) pair. If an
entry is found, the corresponding page is overwritten
by the incoming data. Otherwise, the dCache gets a
page from the free_list or the valid_list and copies the
incoming data into the page. When the copy is
complete, dCache updates the relative data structures
and informs the protocol conversion module.
For a read request, the dCache searches the
LBATable. If the data is found in the RAM buffer, the
data is copied from the RAM buffer to the requesting
buffer. Otherwise, the dCache forwards the BDR to the
I/O handler module for reading and gets a page from
the free_list or the valid_list. As the data is written into
the requesting buffer, it is also written into the page.
The operation of moving data from a higher-level
storage device to a lower-level storage device is
defined as a destage operation. A destage thread was
implemented to perform the destage tasks. The destage
thread is initialized when the emulator sets up and
continues to watch at the dCache. The destage thread
remains asleep most of the time. The destage thread is
activated when the number of dirty pages exceeds the
threshold value or a storage device is not busy. When
the number of dirty pages exceeds the threshold value,
the destage thread migrates the data from the dirty
pages to the disks and marks the page state as valid.
When the destage thread detects no I/O operations to a
storage device, it searches the dirty pages for this
device and destages a page.
3.3 Multiple queuing and request merging
The concurrent access performance is improved by
giving each storage device its own I/O request queue
and I/O operation thread. The I/O request queue is a
FIFO. When the dCache needs to read or write disks,
the dCache forwards the BDR to the corresponding
queue according to its device ID. The I/O operation
thread sleeps until the I/O handler module detects that
the queue is not empty.
The I/O handler module uses request merging to
preprocess the requests. The I/O handler module
analyzes the data according to their LBAs and merges
requests without data correlation. If the LBAs of
several write requests are the same, the I/O handler
module discards the previous requests and only writes
the last request to the disks. If the LBAs of several
write requests in the queue are continuous, these
requests are merged into one request so that the data
will be written together to the disks.
4 Performance Analysis
The SCSI target emulator was implemented on the
TH-MSNS. The read/write throughput and the average
I/O response time of the emulator were then compared
with Palekar’s target emulator.
The testing used only an initiator and a target
connected by a gigabit ethernet. The initiator and target
configurations are described in Table 1.
Table 1 Initiator and target configurations
Initiator Target
CPU Intel(R) Xeon(TM) CPU
2.40 GHz 2
Intel(R) Xeon(TM)
CPU 2.40 GHz
Memory 1 GB 1 GB
OS Red Hat Linux 7.3 Red Hat Linux 7.3
kernel 2.4.18-3smp kernel 2.4.22-up
The SCSI target system first used a software
RAID-0 volume based on the SCSI target emulator.
The tests evaluated the throughput as well as the
average response time for block sizes of 4 KB, 16 KB,
256 KB, and 1024 KB. Then the SCSI target system
was set up with a hardware RAID-0 volume based on
Palekar’s target emulator with the throughput and the
average response time evaluated for the same block
sizes. Finally, the target emulator performance was
Tsinghua Science and Technology, February 2006, 11(1): 38-4342
evaluated with a RAM-DISK. Both RAID-0 volumes
included seven SCSI disks. All reads and writes were
sequential.
The evaluation results are shown in Figs. 3 and 4.
The label “RAM” refers to the throughput for the
RAM disk with the target emulator, “Target” refers to
the throughput for the RAID-0 volume using the
emulator target, and “Palekar” refers to the throughput
for the RAID-0 volume using Palekar’s target
emulator.
Fig. 3 Target emulator read results
Figure 3 shows the throughput for sequential reads.
The throughput with the target emulator for the
software RAID-0 volume is about 50 MB/s, while the
throughput of Palekar’s target emulator for the
hardware RAID-0 volume is about 20 MB/s.
Figure 4 shows the throughput for sequential writes.
The throughput with the target emulator for the
software RAID-0 volume is about 50 MB/s, while the
throughput with Palekar’s target emulator for the
hardware RAID-0 volume is nearly 30 MB/s. The
results illustrate that at the same RAID level
configuration, the read performance is 150% better
while the write performance is about 67% better.
Fig. 4 Target emulator write results
The performance was also evaluated on a RAID-5
volume. The results showed that the throughput of
sequential reads/writes was up to 40 MB/s faster while
the total CPU utilization of the target emulator was less
than 50%.
Table 2 shows the average response times for a
block size of 16 KB.
Table 2 Average response time with a block size of 16 KB
(ms)
RAMPalekar-
RAID-0
Target-
RAID-0
Palekar-
RAID-5
Target-
RAID-5
Read 0.23 0.63 0.31 0.91 0.35
Write 0.22 0.55 0.29 0.93 0.37
The average response time indicates the average
time between initiation and completion of an I/O
operation. In the SAN environment, the response time
(Tr) is the sum of the transport latency (Tt) and the I/O
latency (TI/O) of the target:
Tr = Tt + TI/O.
The transport latencies in these evaluations are equal
for the same block size with the average response time
for the RAM disk representing the transport latency.
Thus, the I/O latencies can be calculated for all the
configurations. For read requests, the average I/O
latency of the present target emulator with the software
RAID-0 volume was 20% of the average read latency
of Palekar’s target emulator with the hardware RAID-0
volume. The average read latency for the RAID-5
volume was about 16% the Palekar’s result. The write
I/O latency with the present emulator target was 23%
of Palekar’s target emulator for the RAID-0 volume
and 22% for the RAID-5 volume. The results show
that the dCache, multiple queuing, and request merging
are able to dramatically improve the performance.
5 Conclusions
This paper describes a SCSI target emulator utilizing a
SCSI protocol conversion module to translate the SCSI
requests to a variety of storage devices. The emulator
uses a RAM cache, multiple queuing, and request
merging to accelerate the operations to the general
storage devices. Evaluation test results show that the
read/write throughputs for the RAID-0 volume are up
to about 50 MB/s and the read/write throughputs for
the RAID-5 volume are about 40 MB/s. The read
results are 150% faster than the Palekar’s emulator
while the write results are 67% faster.
References
[1] Phillips B. Have storage area networks come of age? IEEE
Computer, 1998, 31(7): 10-12.
[2] Satran J. iSCSI (Internet SCSI), draft-ietf-ips-iscsi-20.txt
PAN Jiaming ( ) et al Design and Implementation of SCSI Target Emulator 43
(January, 2003), http://ietf.org/html.charters/ipscharter.html.
[3] Meth K Z, Satran J. Design of the iSCSI protocol. In:
Proceedings of the 20th
IEEE/11th
NASA Goddard
Conference on Mass Storage Systems and Technologies
(MSS’03), 2003.
[4] ANSI. SCSI architecture model-2 (SAM-2), Sep. 2002,
http://www.t10.org.
[5] Palekar A, Ganapathy N, Chadda A, Russell R. Design and
implementation of a Linux SCSI target for storage area
networks. In: Proceedings of the 5th Annual Linux
Showcase & Conference, 2001.
[6] Wang P, Gilligan E R, Green H, Raubitschek J. IP
SAN–From iSCSI to IP-addressable ethernet disks. In:
Proceedings of the 20th
IEEE/11th
NASA Goddard
Conference on Mass Storage Systems and Technologies
(MSS’03), 2003.
[7] Østergaard J. The Software-RAID HOWTO (2002),
http://www.tldp.org.
[8] Teigland D, Mauelshagen H. Volume managers in Linux.
In: Proceedings of the Freenix Track: 2001 USENIX
Annual, 2001.
[9] Shu Jiwu, Yao Jun, Fu Changdong, Zheng Weimin.
Highly efficient FC-SAN based on load stream. In: Zhou
Xingming, Jahnichen Stefan, Xu Ming, Cao Jiannong, eds.
The Fifth International Workshop on Advanced Parallel
Processing Technologies. Lecture Notes in Computer
Science 2834. Berlin: Springer-Verlag, 2003: 31-40.
[10] Shu Jiwu, Yao Jun. Technical report: Design and
implementation of TH-MSNS. Department of Computer
Science and Technology, Tsinghua University, China,
http://graduate.cs. tsinghua. edu.cn/~yaojun/, 2003. (in
Chinese).
[11] Li Bigang, Shu Jiwu, Zheng Weimin. A design and
implementation of target simulator module in SAN system.
Tsinghua Science and Technology, 2005, 10(1): 122-127.
[12] ANSI. SCSI-3 Block Commands (SBC), Nov. 1997,
http://www.t10.org.
[13] ANSI. SCSI Primary Commands-3(SPC-3), Sep. 2002,
http://www.t10.org.