Download Seagate ST39173WC Specifications

Transcript
Sun StorEdge™ A1000 and
A3x00/A3500FC Best Practices
Guide
Sun Microsystems, Inc.
4150 Network Circle
Santa Clara, CA 95054 U.S.A.
650-960-1300
Part No. 806-6419-14
November 2002, Revision A
Send comments about this document to: [email protected]
Copyright 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved.
Sun Microsystems, Inc. has intellectual property rights relating to technology embodied in the product that is described in this document. In
particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at
http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries.
This document and the product to which it pertains are distributed under licenses restricting their use, copying, distribution, and
decompilation. No part of the product or of this document may be reproduced in any form by any means without prior written authorization of
Sun and its licensors, if any.
Third-party software, including font technology, is copyrighted and licensed from Sun suppliers.
Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in
the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd.
Sun, Sun Microsystems, the Sun logo, AnswerBook2, docs.sun.com, Sun StorEdge, Ultra, Ultra Enterprise, RSM, SunSolve, Sun Enterprise, and
Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and in other countries.
All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and in other
countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges
the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun
holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN
LOOK GUIs and otherwise comply with Sun’s written license agreements.
Use, duplication, or disclosure by the U.S. Government is subject to restrictions set forth in the Sun Microsystems, Inc. license agreements and as
provided in DFARS 227.7202-1(a) and 227.7202-3(a) (1995), DFARS 252.227-7013(c)(1)(ii) (Oct. 1998), FAR 12.212(a) (1995), FAR 52.227-19, or
FAR 52.227-14 (ALT III), as applicable.
DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES,
INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT,
ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID.
Copyright 2002 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés.
Sun Microsystems, Inc. a les droits de propriété intellectuels relatants à la technologie incorporée dans le produit qui est décrit dans ce
document. En particulier, et sans la limitation, ces droits de propriété intellectuels peuvent inclure un ou plus des brevets américains énumérés
à http://www.sun.com/patents et un ou les brevets plus supplémentaires ou les applications de brevet en attente dans les Etats-Unis et dans
les autres pays.
Ce produit ou document est protégé par un copyright et distribué avec des licences qui en restreignent l’utilisation, la copie, la distribution, et la
décompilation. Aucune partie de ce produit ou document ne peut être reproduite sous aucune forme, parquelque moyen que ce soit, sans
l’autorisation préalable et écrite de Sun et de ses bailleurs de licence, s’il y ena.
Le logiciel détenu par des tiers, et qui comprend la technologie relative aux polices de caractères, est protégé par un copyright et licencié par des
fournisseurs de Sun.
Des parties de ce produit pourront être dérivées des systèmes Berkeley BSD licenciés par l’Université de Californie. UNIX est une marque
déposée aux Etats-Unis et dans d’autres pays et licenciée exclusivement par X/Open Company, Ltd.
Sun, Sun Microsystems, le logo Sun, AnswerBook2, docs.sun.com, Sun StorEdge, Ultra, Ultra Enterprise, RSM, SunSolve, Sun Enterprise, et
Solaris sont des marques de fabrique ou des marques déposées de Sun Microsystems, Inc. aux Etats-Unis et dans d’autres pays.
Toutes les marques SPARC sont utilisées sous licence et sont des marques de fabrique ou des marques déposées de SPARC International, Inc.
aux Etats-Unis et dans d’autres pays. Les produits protant les marques SPARC sont basés sur une architecture développée par Sun
Microsystems, Inc.
L’interface d’utilisation graphique OPEN LOOK et Sun™ a été développée par Sun Microsystems, Inc. pour ses utilisateurs et licenciés. Sun
reconnaît les efforts de pionniers de Xerox pour la recherche et le développment du concept des interfaces d’utilisation visuelle ou graphique
pour l’industrie de l’informatique. Sun détient une license non exclusive do Xerox sur l’interface d’utilisation graphique Xerox, cette licence
couvrant également les licenciées de Sun qui mettent en place l’interface d ’utilisation graphique OPEN LOOK et qui en outre se conforment
aux licences écrites de Sun.
LA DOCUMENTATION EST FOURNIE "EN L’ÉTAT" ET TOUTES AUTRES CONDITIONS, DECLARATIONS ET GARANTIES EXPRESSES
OU TACITES SONT FORMELLEMENT EXCLUES, DANS LA MESURE AUTORISEE PAR LA LOI APPLICABLE, Y COMPRIS NOTAMMENT
TOUTE GARANTIE IMPLICITE RELATIVE A LA QUALITE MARCHANDE, A L’APTITUDE A UNE UTILISATION PARTICULIERE OU A
L’ABSENCE DE CONTREFAÇON.
Please
Recycle
Contents
Preface
1.
2.
xiii
Troubleshooting Overview
1–1
1.1
A3x00/A3500FC Commandments
1.2
Available Tools and Information
1–2
1–3
1.2.1
Documentation
1–3
1.2.2
Web Sites
1.2.3
Internal Directory
1.2.4
Obtaining the Latest Version of RAID Manager
1.2.5
RAID Manager 6.0, 6.1 and 6.22 Are not Supported
1.2.6
Serial Cable
1.2.7
RAID Manager 6.xx Architecture White Paper Available
1–4
1.3
Tips for Filing a Bug
1.4
FINs and FCOs
1–4
New Installation
1–5
1–5
1–6
1–6
1–6
Hardware Installation and Configuration
2.1
1–4
2–1
2–2
2.1.1
Battery Unit
2–2
2.1.2
Power Cables
2.1.3
Power Sequencer
2–2
2–3
Contents
iii
2–3
2.1.5
SCSI and Fiber-Optic Cables
2.1.6
SCSI ID, Loop ID, Controller, and Disk Tray Switch Settings
2.1.7
World Wide Name (WWN)
2–3
2–4
2–5
Adding or Moving Arrays to a Host With Existing A3x00 Arrays
2.3
Adding Disks or Disk Trays
2–6
2–6
2.3.1
Adding or Moving Disk Trays to Existing Arrays
2.3.2
Adding or Moving Disk Drives to Existing Arrays
2–7
2–7
2.4
Setting Up 2x7 and 3x15 Configurations, and Converting 1x5 to 2x7 or
3x15 2–8
2.5
Sun StorEdge A3500/A3500FC Lite
2.6
Cluster, Multi-Initiator, and SCSI Daisy Chaining Configurations
2.8
2–9
2.6.1
Cluster Information
2.6.2
Multi-Initiator Information
2.6.3
SCSI Daisy Chaining Information
Supported Configurations
2–10
2–10
2–11
2–11
2–11
2.7.1
Maximum Server Configurations
2–12
2.7.2
Onboard SOC+
2.7.3
Second Port on the SOC+ Card
2.7.4
Disk Drive Support Matrices
2.7.5
Independent Controller/Box Sharing
2.7.6
HBAs
2–12
2–13
2–13
2–13
2–13
SCSI to FC-AL Upgrade
2–14
RAID Manager Installation and Configuration
3.1
iv
Local/Remote Switch
2.2
2.7
3.
2.1.4
3–1
Installation and Configuration Tips, Tunable Parameters, and Settings
3.1.1
Software Installation
3–2
3.1.2
Software Configuration
3.1.3
RAID Module Configuration
3–3
3–3
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
3–2
3.2
4.
3.1.4
Tunable Parameters and Settings
3.1.5
Multi-Initiator/Clustering Environment
3.1.6
Maximum LUN Support
LUN Creation/RAID Level
3–3
3–5
3–5
3.2.1
General Information
3.2.2
LUN Numbers
3.2.3
The Use of RAID Levels
3.2.4
Cache Mirroring
3.2.5
Reconstruction Rate
3.2.6
Creation Process (Serial/Parallel) Time
3.2.7
DacStor Size (Upgrades)
3–8
3.3
LUN Deletion and Modification
3–9
3.4
Controller and Other Settings
3–5
3–6
3–6
3–6
3–7
3.4.1
NVSRAM Settings
3.4.2
Parity Check Settings
3–9
3–10
3.4.2.1
RAID Manager 6.1.1
3–10
3.4.2.2
RAID Manager 6.22x
3–10
3.4.2.3
Parity Repair
3.4.2.4
Multi-host Environment
4.2
4.3
Installation
3–8
3–9
3–11
System Software Installation and Configuration
4.1
3–4
3–11
4–1
4–2
4.1.1
New Installation
4.1.2
All Upgrades to RAID Manager 6.22 or 6.22.1
Solaris Kernel Driver
4–2
4–2
4.2.1
sd_max_throttle Settings
4.2.2
Generating Additional Debug Information
format and lad
4.3.1
4–2
4–3
4–3
4–4
Volume Labeling
4–5
Contents
v
4.4
4.5
Ghost LUNs and Ghost Devices
4–5
4.4.1
4–8
Removing Ghost Drives
Device Tree Rearranged
4.5.1
Dynamic Reconfiguration Related Problems
4.5.1.1
4–10
4–10
SNMP
4.7
Interaction With Other Volume Managers
4–11
VERITAS
4–12
4–12
4.7.1.1
VERITAS Enabling and Disabling DMP
4.7.1.2
HA Configuration Using VERITAS
4.7.1.3
Adding or Moving Arrays Under VERITAS
4.7.2
Solstice Disksuite (SDS)
4.7.3
Sun Cluster
4.7.4
High Availability (HA)
4.7.5
Quorum Device
Disk Drives
5.1.2
Disk Tray
4–13
4–14
5–1
Verifying FRU Functionality
5.1.1
4–13
4–13
Maintenance and Service
5.1
vi
Workaround
4.6
4.7.1
5.
4–9
5–2
5–3
5–4
5.1.2.1
RSM Tray
5.1.2.2
D1000 Tray
5–5
5–5
5.1.3
Power Sequencer
5–5
5.1.4
SCSI Cables
5.1.5
SCSI ID Jumper Settings
5.1.6
SCSI Termination Power Jumpers
5.1.7
LED Indicators
5.1.8
Backplane Assembly
5.1.9
D1000 FRUs
5–6
5–7
5–7
5–7
5–7
5–7
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
4–12
4–13
4–13
5.2
5.3
6.
5.1.10
Verifying the HBA
5.1.11
Verifying the Controller Boards and Paths to the
A3x00/A3500FC 5–8
5.1.12
Controller Board LEDs
5.1.13
Ethernet Port
FRU Replacement
5–8
5–10
5–10
5.2.1
HBA
5–10
5.2.2
Interconnect Cables
5.2.3
Power Cords
5.2.4
Power Sequencer
5.2.5
Hub
5.2.6
Controller Card Guidelines
5.2.7
Amount of Cache
5.2.8
Battery Unit
5.2.9
Cooling
5.2.10
Disk Drives
5.2.11
Disk Tray
5.2.12
Midplanes
5.2.13
Reset Configuration and sysWipe
5–11
5–11
5–11
5–12
5–12
5–13
5–13
5–14
5–14
5–14
5–15
Software and Firmware Guidelines
5–16
5–16
5.3.1
Firmware, Software, and Patch Information
5.3.2
RAID Manager 6 Upgrade
5.3.3
Firmware Upgrade
5–17
5–18
5–18
Troubleshooting Common Problems
6.1
5–9
6–1
Controller Held in Reset, Causes, and How to Recover
6.1.1
Reason Controllers Should be Failed
6.1.2
Failing a Controller in Dual/Active Mode
6.1.3
Replacing a Failed Controller
6–2
6–2
6–3
6–4
Contents
vii
6.1.4
6–5
6.2
LUNs Not Seen
6.3
Rebuilding a Missing LUN Without Reinitialization
6.4
6–6
6–7
6.3.1
Setting the VKI_EDIT_OPTIONS
6.3.2
Resetting the VKI_EDIT_OPTIONS
6.3.3
Deleting a LUN With the RAID Manager GUI
6.3.4
Recreating a LUN With the RAID Manager GUI
6.3.5
Disabling the Debug Options
Dynamic Reconfiguration
6–7
6–9
6–9
6–9
6–10
6–11
6.4.1
Prominent Bugs
6.4.2
Further Information
6–12
6–12
6.5
Controller Failover and LUN Balancing Takes Too Long
6.6
GUI Hang
6.7
Drive Spin Up Failure, Drive Related Problems
6.8
Phantom Controllers Under RAID Manager 6.22
6.9
Boot Delay (Why Booting Takes So Long)
6.10
Data Corruption and Known Problems
6.11
Disconcerting Error Messages
6.12
Troubleshooting Controller Failures
A. Reference
viii
Additional ASC/ASCQ Codes
6–12
6–13
6–13
6–14
6–15
6–16
6–17
6–17
A–1
A.1
Scripts and man Pages
A–2
A.2
Template for Gathering Debug Information for CPRE/PDE
A.3
RAID Manager Bootability Support for PCI/SBus Systems
A.4
A3500/A3500FC Electrical Specifications
A.5
Product Names
A–5
A–7
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
A–3
A–4
Figures
FIGURE 2-1
SCSI Bus Length Calculation 2–4
FIGURE 2-2
Fibre Channel Connection With Long Wave GBIC Support
2–4
Figures
ix
x
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Tables
1–2
TABLE 1-1
A3x00/A3500FC Commandments - Thou Shalt
TABLE 1-2
A3x00/A3500FC Commandments - Thou Shalt Not 1–2
TABLE 1-3
Web Sites
TABLE 1-4
Terminal Emulation Functionality
TABLE 1-5
FINs Affecting the Sun StorEdge A1000 and RSM 2000/A3x00/A3500FC Product Family
7
TABLE 1-6
FCOs Affecting the Sun StorEdge RSM 2000/A3x00/A3500FC Product Family 1–11
TABLE 2-1
Server Configuration and Maximum Controller Modules Supported 2–12
TABLE 5-1
Controller Module SCSI ID Settings
TABLE A-1
A1000 Bootability on PCI-Based Hosts A–4
TABLE A-2
A1000 Bootability on SBus-Based Hosts
TABLE A-3
A3x00 Bootability on PCI-Based Hosts A–5
TABLE A-4
A3x00 Bootability on SBus-Based Hosts A–5
TABLE A-5
Power Consumption Specifications A–6
TABLE A-6
Product Name Matrix
TABLE A-7
NVSRAM Product ID A–8
1–4
1–6
1–
5–7
A–4
A–7
Tables
xi
xii
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Preface
The Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide is intended for use
by experienced Sun™ engineering personnel (FE, SE, SSE, and CPRE) who have
received basic training on the Sun StorEdge™ A1000, A3x00/A3500FC. It is not
intended to replace the existing documentation set, but rather to serve as a single
point of reference that provides some answers to questions relating to common
installation and service tasks. Further, it serves as a roadmap to more detailed
information already provided in the current documentation set and on Sun web
sites.
Before You Read This Book
To fully use the information in this document, you must have thorough knowledge
of the topics discussed in all of the documents listed in “Related Documentation” on
page -xvi.
xiii
How This Book Is Organized
This manual is organized as follows:
Chapter 1 introduces some of the tools that are available to help troubleshoot the
Sun StorEdge A3x00/A3500FC disk array.
Chapter 2 provides some additional information, guidelines, and tips relating to the
installation and configuration of hardware.
Chapter 3 provides some additional information, guidelines, and tips relating to the
installation and configuration of RAID Manager.
Chapter 4 provides some additional information, guidelines, and tips relating to
installation and configuration of system software.
Chapter 5 provides maintenance and service information for verifying FRU
functionality, guidelines for replacing FRUs, and tips on upgrading to the latest
software and firmware levels.
Chapter 6 discusses some common problems encountered in the field and provides
additional information and tips for troubleshooting.
Appendix A contains the following reference information: a link to the RAID
Manager 6.22 README file, a supplementary listing of available man pages for
RAID Manager 6.22 commands, a template for gathering debug information, a
bootability support matrix, Sun StorEdge A3500/A3500FC electrical specifications,
and a product name matrix.
Using UNIX Commands
This document may not contain information on basic UNIX® commands and
procedures such as shutting down the system, booting the system, and configuring
devices.
See one or more of the following for this information:
xiv
■
Solaris Handbook for Sun Peripherals
■
AnswerBook2™ online documentation for the Solaris™ software environment
■
Other software documentation that you received with your system
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Typographic Conventions
Typeface or
Symbol
Meaning
Examples
AaBbCc123
The names of commands, files,
and directories; on-screen
computer output
Edit your .login file.
Use ls -a to list all files.
% You have mail.
AaBbCc123
What you type, when
contrasted with on-screen
computer output
% su
Password:
AaBbCc123
Book titles, new words or terms,
words to be emphasized
Read Chapter 6 in the User’s Guide.
These are called class options.
You must be superuser to do this.
Command-line variable; replace
with a real name or value
To delete a file, type rm filename.
Shell Prompts
Shell
Prompt
C shell
machine_name%
C shell superuser
machine_name#
Bourne shell and Korn shell
$
Bourne shell and Korn shell superuser
#
Preface
xv
Related Documentation
xvi
Application
Title
Part Number
Installation and Service
Sun StorEdge A3500/A3500FC Controller
Module Guide
805-4980
Installation and Service
Sun StorEdge A3500/A3500FC Hardware
Configuration Guide
805-4981
Installation
Sun StorEdge A3500/A3500FC Task Map
805-4982
Installation and Service
Sun StorEdge A3x00 Controller FRU
Replacement Guide
805-7854
Installation
Sun StorEdge A3500FC Controller
Upgrade Guide
806-0479
Installation and Service
Sun StorEdge Expansion Cabinet
Installation and Service Manual
805-3067
Installation
Sun StorEdge RAID Manager 6.1.1
Update 2 Release Notes
805-3656
Installation and Service
Sun StorEdge RAID Manager 6.1.1
Installation and Support Guide for Solaris
805-4058
Installation and Service
Sun StorEdge RAID Manager 6.1.1 User’s
Guide
805-4057
Installation and Service
Sun StorEdge RAID Manager 6.22
Installation and Support Guide for Solaris
805-7756
Installation
Sun StorEdge RAID Manager 6.22 Release
Notes
805-7758
Installation and Service
Sun StorEdge RAID Manager 6.22 User’s
Guide
806-0478
Release Notes
Sun StorEdge RAID Manager 6.22.1
Release Notes
805-7758
Installation
Sun StorEdge RAID Manager 6.22 and
6.22.1 Upgrade Guide
806-7792
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Accessing Sun Documentation
You can view, print, or purchase a broad selection of Sun documentation, including
localized versions, at:
http://www.sun.com/documentation
Sun Welcomes Your Comments
Sun is interested in improving its documentation and welcomes your comments and
suggestions. You can email your comments to Sun at:
[email protected]
Please include the part number (806-6419-13) of your document in the subject line of
your email.
Preface
xvii
xviii
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
CHAPTER
1
Troubleshooting Overview
This chapter introduces some of the tools that are available to help troubleshoot the
Sun StorEdge A3x00/A3500FC disk array, tips for filing a bug, and a listing of the
latest field information notices (FINs) and field change orders (FCOs).
This chapter contains the following topics:
■
Section 1.1, “A3x00/A3500FC Commandments” on page 1-2
■
Section 1.2, “Available Tools and Information” on page 1-3
■
Section 1.3, “Tips for Filing a Bug” on page 1-6
■
Section 1.4, “FINs and FCOs” on page 1-6
1-1
1.1
A3x00/A3500FC Commandments
Tables 1-1 and 1-2 contain PDE recommendations and tips that should be read and
followed prior to performing any installation or service tasks on the Sun StorEdge
A3x00/A3500FC disk array.
TABLE 1-1
Number
Commandment
1
Read the RAID Manager 6 Release Notes and Early Notifier 20029.
2
Only upgrade RAID Manager 6 software and firmware if and only if the controller
module, LUNs, and disk drives are all in an optimal state.
3
Never replace the D1000/ESM card while the D1000 is powered on. Refer to the
procedure in FIN I0670-1 for proper replacement.
4
Be trained before attempting to use the serial port access.
5
Follow the Sun StorEdge A3500/A3500FC Hardware Configuration Guide for
configuration changes between 1x5, 2x7, or 3x15.
6
Always keep the interface hardware, software, firmware, and patches current
(refer to Early Notifier 20029).
7
Always reset the battery date on both controllers with raidutil after battery
replacement.
8
Always upgrade the controller firmware after a RAID Manager 6 upgrade or
installation of RAID Manager 6 patches.
9
Always sync up the controller firmware after a controller board replacement.
10
Follow procedures when replacing disk drives: fail and then revive. See
Section 2.3.2, “Adding or Moving Disk Drives to Existing Arrays” on page 2-7.
TABLE 1-2
1-2
A3x00/A3500FC Commandments - Thou Shalt
A3x00/A3500FC Commandments - Thou Shalt Not
Number
Commandment
1
Do not mix RSM2000 (with RSM trays) and A3000 (with D1000 trays) controllers
in the same module.
2
Do not downgrade firmware unless the controller is at universal FRU level
2.5.6.32. See FIN I0553 for details.
3
Do not hot swap a controller that “owns” LUNs.
4
Do not move SIMMs from a failed controller to a new controller.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
TABLE 1-2
1.2
A3x00/A3500FC Commandments - Thou Shalt Not (Continued)
Number
Commandment
5
Do not perform boot -r while a controller is held in reset. See Section 6.1,
“Controller Held in Reset, Causes, and How to Recover” on page 6-2.
6
Do not enable 16/32 LUN support unless it is necessary (refer to FIN I0589).
7
Do not run A3x00s in a production environment without a LUN 0.
8
Do not move disk drives between hardware arrays (A1000, RSM2000, A3x00, and
A3500FC) or in the same array.
9
Do not enable DMP on VxVM pre-3.0.2 releases; RAID Manager 6.22x/DMP
compatibility issue has been resolved in VxVM 3.0.2. Refer to the VxVM
documentation for details.
10
Do not revive a disk drive if it has been failed by a controller.
Available Tools and Information
This section contains the following topics:
1.2.1
■
Section 1.2.1, “Documentation” on page 1-3
■
Section 1.2.2, “Web Sites” on page 1-4
■
Section 1.2.3, “Internal Directory” on page 1-4
■
Section 1.2.4, “Obtaining the Latest Version of RAID Manager” on page 1-4
■
Section 1.2.5, “RAID Manager 6.0, 6.1 and 6.22 Are not Supported” on page 1-5
■
Section 1.2.6, “Serial Cable” on page 1-5
■
Section 1.2.7, “RAID Manager 6.xx Architecture White Paper Available” on
page 1-6
Documentation
The current documentation set is listed in “Related Documentation” on page -xvi.
The documentation set is also available on the following web sites:
http://edist.central
http://infoserver.central/data/syshbk
Chapter 1
Troubleshooting Overview
1-3
1.2.2
Web Sites
The internal and external web sites listed in TABLE 1-3 provide quick access to a wide
variety of relevant information.
TABLE 1-3
Web Sites
Web Site Name
URL
Sonoma Engineering
http://webhome.sfbay/A3x00
Network Storage
http://webhome.sfbay/networkstorage
NSTE (QAST) Group
http://webhome.sfbay/qast
OneStop Sun Storage Products
http://onestop.Eng/storage
Enterprise Services Storage ACES http://trainme.east
Escalation Web Interface
http://sdn.sfbay/cgi-bin/access2
CPRE Group Europe
http://cte-www.uk
Disk Drive Models / FW
http://diskworks.ebay
Note – The Enterprise Services Storage ACES web page requires a login/password
for access to certain areas. Information is provided on the home page on how to
obtain a login account.
1.2.3
Internal Directory
The following internal directory contains the released versions of RAID Manager 6
software, firmware and NVSRAMs:
net/artemas.ebay/global/archiva/archive/StorEdge_Products
/sonoma
1.2.4
Obtaining the Latest Version of RAID Manager
RAID Manager 6.22x management software for the A1000, A3x00, and A3500FC disk
arrays is available for download via the Sun Download Center (formerly called the
Sun Software Shop). It is under the link Download at:
http://www.sun.com/storage/disk-drives/raid.html
1-4
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Note – Before you attempt to install the RAID Manager software, be sure to read the
Sun StorEdge RAID Manager 6.22 and 6.22.1 Upgrade Guide and Early Notifier 20029
for the latest installation and operation procedures.
1.2.5
RAID Manager 6.0, 6.1 and 6.22 Are not
Supported
RAID Manager 6.0 and 6.1 have been superseded by newer versions of RAID
Manager. RAID Manager 6.1.1 is only supported in cases involving data corruption
or loss. Upgrade to RAID Manager 6.22.1 as soon as possible.
1.2.6
Serial Cable
Sun field service personnel who have undergone the proper training on serial port
access can obtain a serial port cable along with the Debug Guide from the Area
Technical Service Manager. If you need assistance with serial port functions, contact
the local Storage ACES or CPRE. CPRE has access to LSI Logic’s 24 hour support
line.
The serial port provides access to useful commands used to determine controller,
drive, and LUN status. It was originally intended to be used by developers.
Caution – Only trained and experienced Sun personnel should access the serial
port. You should have a copy of the latest Debug Guide. There are certain commands
that can destroy customer data or configuration information. No warning messages
appear if a potentially damaging command has been executed.
The serial cable is an LSI Logic, Inc. proprietary DB-15 to DB-25 cable. The serial
cable comes with a DB-25 to DB-9 extension for PC interface. To connect to the serial
port on an UltraSPARC™ machine you will need a DB-25 gender adapter. The
gender adapter is not required for pre-UltraSPARC machines.
Note – Do not leave the serial port cable and Debug Guide with the customer. This
cable is for use by trained Sun personnel only.
Chapter 1
Troubleshooting Overview
1-5
If you use a PC to connect to the serial port of the disk array, you need terminal
emulation software. Also, you need to ensure that the Break functionality is
available. Although there are many different software applications that provide
terminal emulation, you will have the best results if you use the applications listed
in TABLE 1-4.
TABLE 1-4
1.2.7
Terminal Emulation Functionality
Operating Environment
Communication Software
Microsoft Windows 98/2000
Procomm Plus
Solaris x86/SPARC™
tip
Linux
dip
RAID Manager 6.xx Architecture White Paper
Available
The white paper titled StorEDGE A3000/A1000 Controller Architecture (The BlackBox
behind the RAID Module) is available on the Sonoma Engineering web site:
http://webhome.sfbay/A3x00/HW/A3000_controller_paper.fm.ps
1.3
Tips for Filing a Bug
Refer to Section A.2, “Template for Gathering Debug Information for CPRE/PDE”
on page A-3 for a template that should be used when filing a RAID
Manager/A3x00FC related bug.
1.4
FINs and FCOs
For access to the latest field information notices (FINs) and field change orders
(FCOs), refer to the following web site:
http://sdpsweb.ebay/FIN_FCO
1-6
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Tables 1-5 and 1-6 list the current FINs and FCOs affecting the Sun StorEdge RSM
2000/A3x00/A3500FC product family as of January 2001.
TABLE 1-5
FINs Affecting the Sun StorEdge A1000 and RSM 2000/A3x00/A3500FC Product Family
FIN
Number
Release
Date
I0310
Product
Description
07/09/97
SSA RSM 21x RSM Array
2000
Updated - Failure to follow documented installation
procedures to remove shipping brackets and reseat
drives may cause multiple disk errors.
I0312
08/31/00
RSM Array 2000 patches
Required set of patches for the Sun RAID Storage
manager Array 2000.
I0324
05/06/97
Sun VTS, RSM Array 2000
Problem with Sun VTS not probing LUNs located on an
RSM Array 2000 system.
I0325
06/09/97
RSM Array 2000, Power
sequencer
Power in the RSM Array 2000 will be on when the
keyswitch if off and one of the power sequencer’s circuit
breaker is off.
I0328
06/13/97
RSM Array 2000 disk
configuration
Some RSM 2000’s installed with 9.0 GB disk drives were
configured as if they only had 4.0 GB disk drives
attached.
I0332
06/20/97
SSA 21x, RSM Array 2000
INFORMATIONAL ONLY! Disk LEDs light or blink in
an inconsistent manner during power up on some disk
trays.
I0334
07/09/97
SSA RSM Model 219, RSM
Array 2000 Seagate
ST19171WC 9.0 GB disk
drive
“Cushioning damper” kit for Seagate ST19171WC 9.0 GB
disk drives installed in a SPARCstorage Array RAID
Storage Manager Model.
I0362
12/18/98
RSM Array 2000 RM 6.1
UPDATED FIN. Updated parity error recovery
procedure.
I0368
04/10/98
RSM Array 2000 controller
board
UPDATED FIN. Memory error detection on the RSM
Array 2000 controller logic board is ineffective.
I0440
10/29/98
RM Rev 6.1.1
RM 6.1.1 with Sun StorEdge A1000, A3000, or A3500 may
experience status check problems.
I0441
10/30/98
Sun StorEdge D1000 disk
tray
New disk firmware revision 7063 for ISP boot failures on
D1000’s with Seagate ST39173WC 9 Gbyte disks.
I0473
03/09/99
Solaris 7, RM6.1.1 UG
A3x00/A1000
Solaris 7 upgrade recommendations for Sun StorEdge
A3x00/A1000.
I0505
08/09/99
RM 6.1.1 RAID LUN
recovery
UPDATED FIN. A3x00/A1000 RAID 0 LUN recovery
requires stopping I/O to the LUN.
I0509
06/30/99
RM 6 and Solaris kernel
corruption
RAID Manager large rdriver.conf files need a patch to
prevent panic.
Chapter 1
Troubleshooting Overview
1-7
TABLE 1-5
FINs Affecting the Sun StorEdge A1000 and RSM 2000/A3x00/A3500FC Product Family
FIN
Number
Release
Date
I0511
Product
Description
07/28/99
VM DMP interferes with
A3x00 RDAC
Enabling DMP with RDAC on A3x00 and A1000 may
cause private regions to be lost.
I0520
04/24/01
Quorum device in Sun
Cluster
Servicing storage that contains a device that is used as a
quorum device in a Sun Cluster environment.
I0531
01/07/00
A3500FC parallel LUN
modifications
UPDATED FIN. Sun StorEdge A3500FC with
preconfigured LUNs may encounter errors if parallel
LUN modifications are made using Sbus host adapters.
I0536
09/09/00
E10000 with A3x00 DR
E10000 systems with an A3x00 attached may encounter
DR errors.
I0594
09/08/00
Sun StorEdge A3000 name
plate
A3000 name tag was applied on the product instead of
Sun StorEdge A3500 causes confusion to the customers.
I0547
01/21/00
UDWIS fcode
Intermittently, SCSI devices connected to a UDWIS card
may not be usable after a reboot.
I0551
02/08/00
Large Sun StorEdge
A1000, A3000, A3500
configurations
Boot process and controller on-line process may take
hours in systems with large Sun StorEdge A1000, A3000,
A3500, or configurations.
I0552
01/24/00
isp driver bug
SCSI devices (especially in a multi-hosted configuration)
may go off-line after isp errors.
I0553
02/04/00
A3x00 controller firmware
The firmware download procedure documented in
A3x00 Controller Replacement Guide may render the
controller unusable.
I0557
05/23/00
RM 6.1.1 firmware
Differences in LUN capacity after firmware upgrade on
Sun StorEdge A3000 or RSM 2000 hardware RAID
controllers.
I0566
06/12/00
Solaris 8 with any Sun
StorEdge A3x00/A1000
A patchID 108553-xx is required to run Solaris 8 with RM
6.22.
I0569
03/31/00
isp fcode 1.28 probe error
UDWIS Sbus host adapter with isp-fcode 1.28 can cause
“invalid command” during probe-scsi-all or boot -hv
from a Sun StorEdge D1000.
I0573
09/28/00
Storage Array A1000,
A3x00
Sun StorEdge A1000, A3000, A3500, or A3500FC requires
the existence of LUN 0 for proper operation.
I0579
06/07/00
UDWIS/Sbus SCSI
adapter
Systems with Ultra DWIS/Sbus host adapter slow to a
halt under heavy I/O loads and the console displays
“SCSI transport failed” error messages.
I0586
06/22/00
Sun StorEdge A3500FC
connectivity
The hardware and software requirement to use the
onboard SOC+ the I/O boards to connect the A3500FC.
1-8
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
TABLE 1-5
FINs Affecting the Sun StorEdge A1000 and RSM 2000/A3x00/A3500FC Product Family
FIN
Number
Release
Date
Product
Description
I0589
06/21/00
Any RM 6 version
glm.conf file must be modified to support more than 8
LUNs on any PCI-SCSI connected A1000 or A3x00 using
any version of RM 6.
I0590
07/20/00
Sun StorEdge A3500FC
upgrade kit
The SCSI NVSRAM will overwrite the FC controller
NVSRAM if following the documented A3500 SCSI to FC
upgrade procedure.
I0594
I0594
Sun StorEdge A3x00 and
A3500 arrays.
Some StorEdge A3500 products shipped with the
StorEdge A3000 name tag lead customers to believe the
wrong product was delivered.
I0612
09/08/00
Sun StorEdge Array
Configuration
RM6 may detect a non-existent 2MB dacstore on factory
formatted hard disk drives.
I0613
09/20/00
Sun StorEdge A1000 array
When replacing a failed controller board on the A1000, it
is important to first verify the version of RAID Manager
software, the controller firmware level and controller
memory size.
I0619
10/12/00
Sun StorEdge A1000 and
A3x00 Arrays
Proper procedures for booting from the Sun StorEdge
A1000 or A3x00 hardware RAID devices, including
known issues and problems.
I0634
11/30/00
Sun StorEdge A3x00 Array
controller failover
Sun StorEdge A3x00 Array controller failover may cause
temporary delay in processing of pending disk I/O.
I0637
01/10/01
Sun StorEdge A3500 Array
A3500 controller powering up prior to Dilbert trays due
to wrong connections of L5, R5,L6, and R6 power
sequencer cables.
I0643
01/10/01
Sun StorEdge A3x00 with
RAID Manager
Sun StorEdge A3x00 array configurations with RAID
Manager 6.1.x may be susceptible to controller
“deadlocks”.
I0648
02/16/01
Sun StorEdge A3x00 Array
Disks in Sun StorEdge A3x00 arrays might go offline,
making the devices unavailable.
I0653
05/08/01
RSM 2000 array shelves
RSM array shelves may collapse under weights greater
than 40 lbs. (18.2 kg), causing system damage or personal
injury.
I0670
06/15/01
Sun StorEdge D1000 ESM
board and Sun StorEdge
A3x00/A3500FC arrays
Procedure to replace the D1000 Environmental Service
Module (ESM) board in A3x00/A3500FC arrays.
I0684
06/21/01
RAID Manager 6.22
healthchk utility
RAID Manager 6.22 healthchk utility might not report
power supply or fan failures in Sun StorEdge A3x00
arrays, which might result in loss of availability.
Chapter 1
Troubleshooting Overview
1-9
TABLE 1-5
FINs Affecting the Sun StorEdge A1000 and RSM 2000/A3x00/A3500FC Product Family
FIN
Number
Release
Date
Product
Description
I0685
07/18/02
RAID Manager 6 Software
Certain precautions need to be observed when
upgrading the Solaris operating environment on systems
with RAID Manager 6 software installed.
I0688
06/27/01
RAID Manager 6.22 on
Solaris 2.5.1
NSTE (QAST) qualified RAID Manager 6.22 to run on
Solaris 2.5.1.
I0698
07/12/01
RAID Manager 6.22 patch
ID 108834-09 and
108553-09
Installing RAID Manager 6.22 patches 108834-09 and
108553-09 might generate false or misleading warning
messages “unresponsive drives” when running the
healthchk or drivutil commands and report excessive
false 9501 messages.
I0709
03/19/02
Sun StorEdge
A1000/A3x00/A3500FC
arrays
After replacing a controller FRU on A1000 /
A3x00/A3500FC with RM 6.22.1, the NVSRAM must be
reloaded.
I0724
10/10/01
Sun StorEdge
A1000/D1000 and
A3500/A3500FC arrays
18.2 Gbyte and 36 Gbyte IBM disk drives may be
susceptible to early life failures.
I0727
10/17/01
Sun StorEdge
A1000/A3x00 controllers
Recovering A1000 / A3x00 controller C numbers after a
device path changed due to reboot -r.
I0736
10/26/01
Sun StorEdge A3500FC
controller
Current replacement procedures for an A3500FC
controller in a clustered environment could result in the
controller going off line.
I0738
04/17/02
Sun StorEdge A3x00 array
with RAID Manager
Access to raw partitions on Sun StorEdge A1000, A3x00,
or A3500FC LUNs by non-root users, such as Oracle or
Sybase, is not allowed with some patch levels of RAID
Manager 6.22 and 6.22.1.
I0744
11/30/01
RAID Manager 6.22.1
RM 6.22.1 problems reported in BugId 4521759 have been
fixed with Patches 112126-01 for Solaris 8 and 112125-01
for other Solaris versions above Solaris 2.5.1.
I0782
03/01/02
Sun StorEdge
A1000/A3x00 arrays
Adding new disk drives to StorEdge A1000 or A3X00
could cause RAID Manager to become unavailable.
I0786
03/04/02
Sun StorEdge
A1000/A3x00 arrays
’hot_add’ of an A1000/A3x00/A3500FC array may need
to be followed by a reconfiguration reboot on Solaris 8.
I0809
04/10/02
RM 6.22.1
A Solaris 2.6 system with /usr on separate slice may fail
to boot after upgrade to RM 6.22.1.
I0825
05/10/02
RM 6.22.1 on
A1000/A3x00/A3500FC
Running automatic parityck on A1000/A3x00/A3500FC
arrays with RAID Manager 6.22.1 does not repair
problems.
1-10
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
TABLE 1-5
FINs Affecting the Sun StorEdge A1000 and RSM 2000/A3x00/A3500FC Product Family
FIN
Number
Release
Date
Product
Description
I0828
05/21/02
RM 6.22.1
LUNs may become inaccessible after upgrading from RM
6.22 to 6.22.1 or after adding unformatted disk drives to
RM 6.22.1.
I0845
06/25/02
RAID Manager 6
RAID Manager 6 may hang for 3-8 minutes when an IBM
drive is in the failed state in a Sun StorEdge
A1000/A3x00/A3500FC array.
I0860
08/02/02
RAID Manager 6 with
Solaris 9
RAID Manager 6.22.1 with A1000/A3x00/A3500FC
arrays needs a new patch revision for Solaris 9.
I0865
TBD
Sun StorEdge A3500FC
Arrays
Extraneous physical device paths may appear.
TABLE 1-6
FCOs Affecting the Sun StorEdge RSM 2000/A3x00/A3500FC Product Family
FCO
Number
Release
Date
A0120
Product
Description
02/27/98
RSM Array 2000 WD2S
Card
Power glitch may cause disks to go off-line on RSM
Array 2000 configured without an uninterruptable power
supply (UPS).
A0162
03/09/00
Power Supply
A1000/D1000
Sun StorEdge A1000 and D1000 power supplies fail with
amber LED.
A0163
03/31/00
UDWIS Sbus host adapter
A1000/D1000,
A3x00/A3500
Systems with large quantities of UDWIS/Sbus host
adapters installed may not come up after reboot due to
miscommunication between the SCSI host and the target.
A0164
05/12/00
FC100/P card,
A5x00/A3500
Systems with FC100/P cards (375-0040-xx) having Molex
optical transceivers, may cause increasing numbers of
loop errors, excessive LIPs, and SCSI parity errors as they
age.
A0165
06/19/00
RSM 2000 A3000 battery
backup unit
Any battery backup unit over two years old may not
have enough energy to hold data in the RSM 2000 or
A3000 controller’s write-back cache for 72 hours in the
event of a power outage.
Chapter 1
Troubleshooting Overview
1-11
1-12
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
CHAPTER
2
Hardware Installation and
Configuration
This chapter provides some additional information, guidelines, and tips relating to
the installation and configuration of hardware.
This chapter contains the following sections:
■
Section 2.1, “New Installation” on page 2-2
■
Section 2.2, “Adding or Moving Arrays to a Host With Existing A3x00 Arrays” on
page 2-6
■
Section 2.3, “Adding Disks or Disk Trays” on page 2-6
■
Section 2.4, “Setting Up 2x7 and 3x15 Configurations, and Converting 1x5 to 2x7
or 3x15” on page 2-8
■
Section 2.5, “Sun StorEdge A3500/A3500FC Lite” on page 2-9
■
Section 2.6, “Cluster, Multi-Initiator, and SCSI Daisy Chaining Configurations” on
page 2-10
■
Section 2.7, “Supported Configurations” on page 2-11
■
Section 2.8, “SCSI to FC-AL Upgrade” on page 2-14
2-1
2.1
New Installation
This section contains the following topics:
2.1.1
■
Section 2.1.1, “Battery Unit” on page 2-2
■
Section 2.1.2, “Power Cables” on page 2-2
■
Section 2.1.3, “Power Sequencer” on page 2-3
■
Section 2.1.4, “Local/Remote Switch” on page 2-3
■
Section 2.1.5, “SCSI and Fiber-Optic Cables” on page 2-3
■
Section 2.1.6, “SCSI ID, Loop ID, Controller, and Disk Tray Switch Settings” on
page 2-4
■
Section 2.1.7, “World Wide Name (WWN)” on page 2-5
Battery Unit
For further information regarding the battery unit, see Section 1.3.1 “Battery Unit” in
the Sun StorEdge A3500/A3500FC Controller Module Guide.
Verify that the battery’s date of manufacture is not older than six months. The
battery’s date of manufacture can be found on the Battery Support Information label
located on top of the battery. Write down the date of installation on the Battery
Support Information label. Add two years to the date of installation and write down
the replacement date on the Battery Support Information label.
The battery unit will undergo self diagnostics at power up which will take
approximately 15 minutes to complete. During this time you might see the following
SCSI message: ASC/ASCQ A0/00 which indicates that write cache cannot be
enabled. This message should go away within 15 minutes or so unless the battery
unit was completely drained. In this case it will require additional time for the
battery to completely recharge.
2.1.2
Power Cables
For further information and guidelines for connecting the AC power cables, refer to
Chapter 3 in the Sun StorEdge A3500/A3500FC Hardware Configuration Guide. It is
important that you follow these guidelines.
The controller must to be powered on at the same time or after the disk trays so that
the controller registers the disk drives. The power cables to the controller should
always be connected to one of the four outlets on the second sequenced group of the
2-2
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
expansion cabinet power sequencer. If the cabinet’s original factory configuration
has not been changed, then the cabinet should contain the correct power sequencer
connections.
Note – At the bottom of the expansion cabinet are two power sequencers. The front
power sequencer is hidden behind the front key switch panel. Remove the front key
switch panel to gain access to the power sequencer’s power cable.
2.1.3
Power Sequencer
For further information regarding power sequencer configuration, refer to Chapter 3
in the Sun StorEdge A3500/A3500FC Hardware Configuration Guide.
Caution – To prevent the possibility of data loss due to an inadvertent power
shutdown of an expansion cabinet, ensure that in a 3x15 configuration the power
sequencers are daisy chained front to front and back to back. The Local/Remote
switch should be set to Remote.
2.1.4
Local/Remote Switch
The Local/Remote switch on each power sequencer is factory set to Remote
(default). This allows power on/off control of each power sequencer through the
front key switch. If the Local/Remote switch is set to Local then the power on/off
control of each power sequencer is controlled by each power sequencer’s main
power circuit breaker switch.
2.1.5
SCSI and Fiber-Optic Cables
For further details regarding the SCSI and/or fiber-optic cables, refer to the Sun
StorEdge A3500/A3500FC Hardware Configuration Guide.
Each A3x00 is shipped from the factory with two 12-meter SCSI host cables. The
recommended maximum SCSI bus length is 25 meters. Exceeding this length can
cause SCSI or data errors. When calculating the maximum SCSI bus length in a
particular configuration, remember to include the internal SCSI bus of each device
which is 0.1 meters.
The example configuration in FIGURE 2-1 shows a daisy chained multi-initiator
configuration consisting of two hosts connected to two A3x00 modules.
Chapter 2
Hardware Installation and Configuration
2-3
The total SCSI bus length for this example is 24.4 M. It is calculated as follows: each
SCSI cable’s length (Cable no. 1 + Cable no. 2 + Cable no. 3) + the internal SCSI bus
length of each device (Host no. 1 + A3x00 no. 1 + A3x00 no. 2 + Host no. 2).
Host no. 1
(0.1 M)
Cable no. 1 (8.0 M)
A3x00 no. 1
(0.1 M)
Cable no. 2 (8.0 M)
A3x00 no. 2
(0.1 M)
FIGURE 2-1
Cable no. 3 (8.0 M)
Host no. 2
(0.1 M)
SCSI Bus Length Calculation
Each A3500FC is shipped from the factory with one pair of 15 meter fiber-optic
cables. The product will support a multi-mode fiber-optic connection up to 500
meters long using short wave GBICs only. The use of long wave GBICs is supported
but the connection must be through a switch to the host processor (FIGURE 2-2). Refer
to the switch documentation for the maximum cable length for the long wave GBIC
connection.
Short wave GBIC connection (500 M max)
A3500FC
Switch
Long wave GBIC connection
Host
Processor
FIGURE 2-2
2.1.6
Fibre Channel Connection With Long Wave GBIC Support
SCSI ID, Loop ID, Controller, and Disk Tray
Switch Settings
The factory default setting for controller A is SCSI ID 5 (T5) and for controller B is
SCSI ID 4 (T4). The SCSI cables connected within a SCSI daisy chain configuration
should be crossed to avoid SCSI ID conflicts (see Section 1.2.2 in the Sun StorEdge
A3500/A3500FC Hardware Configuration Guide).
2-4
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
If two disk trays have the same tray ID, the system reports a 98/01 ASC/ASCQ error
code during boot up time. In the 1x2 configuration since two drive channels from a
controller share one disk tray, it is unavoidable to have a tray ID conflict. During
boot up time, the 98/01 ASC/ASCQ error code is reported but has no impact on
system performance.
In a Fibre Channel dual controller connection through two hubs, each hub should be
connected to one “A” controller and one “B” controller to avoid SCSI ID conflicts
(see Section 2.2.4 in the Sun StorEdge A3500/A3500FC Hardware Configuration Guide).
Also, see Section 2.3 “Setting the Loop ID” in the Sun StorEdge A3500/A3500FC
Hardware Configuration Guide for procedures to set the loop ID for A3500FC
controllers. This section also contains a detailed SCSI ID/loop ID reference chart.
For better redundancy, do not connect both controllers of the same controller module
to the same I/O board.
It is recommended that you set both controllers to active/active mode and to balance
the LUNs across both controllers for better performance. The maximum number of
supported LUNs is the total number of LUNs between the two controllers. For
example, if the maximum supported number of LUNs is 16 then you can have 8
LUNs on controller A and 8 LUNs on controller B. If one controller becomes off-line
then the surviving controller owns all 16 LUNs.
2.1.7
World Wide Name (WWN)
Each FC-AL device requires a unique World Wide Name (WWN) that identifies the
device or controller on the loop. There is no hardware setting for the WWN. The
WWN is a 16-byte hexadecimal value that can be retrieved as the logical unit WWN
in the device identifier inquiry page (0x83) and is generated as follows:
■
The name (derived from the controller MAC address) of the storage array
controller creating the volume in the upper 8-bytes.
■
An auto increasing number and time stamp in the lower 8-bytes.
Note – When a controller is hot swapped, the upper 8-bytes of the WWN may not
match a controller in the storage array.
Chapter 2
Hardware Installation and Configuration
2-5
2.2
Adding or Moving Arrays to a Host With
Existing A3x00 Arrays
When moving a disk array, ensure that the array being moved has firmware levels
that match with the new host. See "Upgrading Controller Firmware" in the Sun
StorEdge RAID Manager 6.22 Release Notes. Since the firmware on the controller
cannot be downgraded, except in the case of a universal FRU, you should not move
an array to a host with a lower RAID Manager release. If the array being moved has
more than 8 LUNs, be sure the new host has greater than 8 LUN support turned on.
See “Maximum LUN Support in Solaris 2.6 and Solaris 7 Environments” in the Sun
StorEdge RAID Manager 6.22 Release Notes. Also, refer to “Adding or Moving Arrays
Under VERITAS” on page 4-13 for information pertaining to VERITAS volumes.
You can move an A1000 or an A3x00 array from one server to another and still be
able to access the data after the move.
You can perform most but not all of the RAID Manager 6 array management
commands after the move. In particular, you will not be able to execute commands
such as LUN creation or deletion. This can only be done on the host where the array
was originally connected. This is because there is host ownership information
stored in DacStore and on the LUNs that prevent these types of commands from
being executed. The workaround involves using either the storutil command to
update the host ownership information in the DacStore region or the RAID Manager
6 GUI Select Module -> Edit function to change the module name both of which
updates the DacStore with the new owning host information. See the man page
storutil (1M) to view the available command line options.
2.3
Adding Disks or Disk Trays
This section contains the following topics:
2-6
■
Section 2.3.1, “Adding or Moving Disk Trays to Existing Arrays” on page 2-7
■
Section 2.3.2, “Adding or Moving Disk Drives to Existing Arrays” on page 2-7
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
2.3.1
Adding or Moving Disk Trays to Existing Arrays
RAID Manager 6.22 has dynamic capacity expansion capability. If your RAID system
has not used all five drive side channels, you can add disk trays to it and expand the
capacity of the existing drive groups. The existing LUN capacity does not increase.
See “Configuring RAID Modules” in the Sun StorEdge RAID Manager 6.22 User’s
Guide. When expanding to a different configuration, ensure that it is a supported
configuration as documented in the Sun StorEdge A3500/A3500FC Hardware
Configuration Guide.
Note – Disk drives should not be interchanged between A1000, A3500, and
A3500FC. The controller may update its NVSRAM from the (inconsistent) DacStore
in the foreign disks causing an A3x00/A3500FC controller to start behaving like an
A1000 or RSM 2000 controller. If this should occur you will need to reload the
correct NVSRAM into each controller.
The A3000 back-end SCSI transfer rate is set to 20-MB/sec and the A3500 back-end
SCSI transfer rate is set to 40-MB/sec. Interchanging of these controllers can cause
data corruption problems.
You need to power cycle the controller module for the new disk tray to be
recognized in. Be sure the drives contained in the new disk tray are not larger in
capacity than the Global Hot Spare (GHS) drives. This is because the GHS will not
spare a drive that has a higher capacity than itself. To connect the power and SCSI
cables to the new disk tray, and to set the disk tray ID (and the option switch on
D1000 trays), refer to the Sun StorEdge A3500/A3500FC Hardware Configuration Guide.
2.3.2
Adding or Moving Disk Drives to Existing Arrays
You can add new drives to empty slots in your existing array. To add new disk
drives:
1. Insert a disk drive into the tray.
2. Allow time for the drive to spin up (approximately 30 seconds).
3. Run a health check to insure that the drive is not defective.
4. Fail the drive then revive the drive to update DacStore on the drive.
5. Repeat steps 1 through 4 until all drives have been added.
6. New drives are now ready for LUN creation.
Chapter 2
Hardware Installation and Configuration
2-7
Note – Refer to Escalation no. 525788, bug 4334761. Refer to FIN I0612 for further
information.
Caution – If the drives you are adding to the array were previously owned by
another controller module, either A1000 or A3x00/A3500FC, ensure that you preformat the disk drives to wipe clean the old DacStore information before inserting
them in an A3x00/A3500FC disk tray.
Caution – Do not randomly swap drives between drive slots or RAID systems. You
must use the Recovery Guru procedure to replace drives. Also see Section 5.2.10,
“Disk Drives” on page 5-14.
Caution – If you take out a disk drive, put a dummy drive in so proper cooling can
be maintained.
2.4
Setting Up 2x7 and 3x15 Configurations,
and Converting 1x5 to 2x7 or 3x15
To connect the power and SCSI cables to the new disk tray, and to set the disk tray
ID (and the option switch on D1000 trays), refer to the Sun StorEdge A3500/A3500FC
Hardware Configuration Guide.
Because of the re-cabling, the drive DacStore will be erroneous. Some drives might
fail or not appear. Ghost drives might appear. Do not attempt to fix these problems.
You first need to delete all previous LUN configurations. A Reset Configuration may
work, but the best thing to do is issue a sysWipe command then sysReboot on
each controller through the serial port.
Caution – Only trained and experienced Sun personnel should access the serial
port. You should have a copy of the latest Debug Guide. There are certain commands
that can destroy customer data or configuration information. No warning messages
appear if a potentially damaging command has been executed.
2-8
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Note – When you issue a sysWipe command you might see a message indicating
that sysWipe is being done in a background process. Wait for a message indicating
that sysWipe is complete before issuing a sysReboot command. Once the
configuration is reset and the previous DacStore is cleaned up, the drive status
should come up Optimal as long as the drive has no internal problem. sysWipe
should be run from each controller.
Note – An A3000 RAID system cannot be converted to an A3500 RAID system by
swapping disk trays because the controller NVSRAM is different. The A3x00
requires an entire controller module upgrade. Also, D1000 trays are not supported in
the A3000 56” cabinet. You will need to upgrade the cabinet.
2.5
Sun StorEdge A3500/A3500FC Lite
This package includes two rackmountable Sun StorEdge D1000s, a RAID module,
and parts necessary to connect to a host server.
Refer to the Sun StorEdge A3500/A3500FC Hardware Configuration Guide for cabling
instructions. Be sure the RAID controller is connected to power cords from
sequenced switch group 2.
Refer to the following web site for ordering information:
http://store.sun.com/docs/specials/bundlegroup.jhtml?catid=
48293
Note – Since two drive channels share a disk tray in this configuration, it is
recommended for a RAID 1 LUN configuration to mirror between two drive trays. If
you lose one D1000 disk tray, you will lose two drive channels.
The A3500/A3500FC Lite disk trays were previously available with 9-GB disk
drives. Starting September 2000 only 18-GB disk drives will be available for
purchase.
Chapter 2
Hardware Installation and Configuration
2-9
2.6
Cluster, Multi-Initiator, and SCSI Daisy
Chaining Configurations
This section contains the following topics:
2.6.1
■
Section 2.6.1, “Cluster Information” on page 2-10
■
Section 2.6.2, “Multi-Initiator Information” on page 2-11
■
Section 2.6.3, “SCSI Daisy Chaining Information” on page 2-11
Cluster Information
Refer to the Sun StorEdge A3500/A3500FC Hardware Configuration Guide for
instructions about cabling and setting up the SCSI ID and/or the FC loop ID.
The Sun Cluster home page is located on the following web site:
http://suncluster.eng/index.html
For detailed cluster configuration information, refer to the Sun Enterprise Cluster
System Hardware Site Preparation, Planning, and Installation Guide. You can find this
document on the following web site:
http://suncluster.eng.sun.com/engineering
For more details about controller replacement and restoring in a cluster
environment, refer to the Sun Cluster 3.0 Hardware Guide, “How to Replace a Failed
StorEdge A3500 Controller or Restore an Offline StorEdge A3500 Controller”. Sun
Cluster 3.0 documents are available on the following web site:
http://suncluster.eng/products/SC3.0
Specific Sun Cluster documents and relevant chapters within these documents are:
2-10
■
Sun Cluster 2.2—Sun Enterprise Cluster 3000/4000/5000/6000/10000 Hardware
Service Manual, 805-6512, chapter 3 “Hardware Troubleshooting and chapter 9
“Major Subassembly Replacement”.
■
Sun Cluster 3.0—Sun Cluster 3.0 Hardware Guide, 806-1420, chapter 7 “Installing,
Configuring, and Maintaining a Sun StorEdge Disk Array”.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
2.6.2
Multi-Initiator Information
Hubs are required for connecting A3500FCs in cluster/multi-initiator configurations.
A3x00/A3500FC is supported with Sun Cluster 2.2 in a two node cluster
configuration.
In a multi-initiator (aka multi-host) connection, both nodes need to be Sun SPARC
servers. Only a multi-initiator configuration that runs Sun Cluster is supported by
Sun. This applies to A3x00 and A3500FC.
Refer to the following web site for details regarding a cluster support matrix:
http://suncluster.eng/support-matrix
Also refer to the following cluster FAQ web site:
http://suncluster.eng/sales/faq/#storage
2.6.3
SCSI Daisy Chaining Information
For further information regarding daisy chaining SCSI disk arrays, refer to the Sun
StorEdge A3500/A3500FC Hardware Configuration Guide. Ensure that the SCSI cables
connected from controller A to controller B are crossed to avoid SCSI ID conflict.
The equivalent to SCSI daisy chaining within an FC-AL loop configuration is a hub
connection. For further information on setting up a hub connection and the
controller loop ID, refer to the Sun StorEdge A3500/A3500FC Hardware Configuration
Guide.
2.7
Supported Configurations
This section contains the following topics:
■
Section 2.7.1, “Maximum Server Configurations” on page 2-12
■
Section 2.7.2, “Onboard SOC+” on page 2-12
■
Section 2.7.3, “Second Port on the SOC+ Card” on page 2-13
■
Section 2.7.4, “Disk Drive Support Matrices” on page 2-13
■
Section 2.7.5, “Independent Controller/Box Sharing” on page 2-13
■
Section 2.7.6, “HBAs” on page 2-13
Chapter 2
Hardware Installation and Configuration
2-11
2.7.1
Maximum Server Configurations
Table 2-1 lists the maximum number of controller modules that are supported for a
given server configuration.
TABLE 2-1
Server Configuration and Maximum Controller Modules Supported
A3500FC
Maximum
Controller
Modules (Direct)
A3500FC
Maximum
Controller
Modules (Hub)
A3500 SCSI
Maximum
Controller
Modules (Single)
A3500 SCSI
Maximum
Controller
Modules (DaisyChained)
Sun Enterprise™
10000
21
34
21
21
Sun Enterprise
6000/6500
21
34
21
36
Sun Enterprise
5000/5500
10
34
10
20
Sun Enterprise
4000/4500
10
34
10
20
Sun Enterprise
3500
6
24
6
12
Sun Enterprise
3000
4
16
4
8
Sun Enterprise
450
3
12
3
4
Sun Enterprise
250
2
8
2
2
Sun Ultra 80
2
8
Not supported
Not supported
Sun Ultra 60
2
8
Not supported
Not supported
Server
2.7.2
Onboard SOC+
Onboard SOC+ is supported with A3500FC on host platforms: E3x00 through E6x00.
The following types of I/O boards have onboard SOC+ that are supported:
■
■
2-12
I/O board with SOC+:
■
X2611 (501-4266-06) I/O type 4, 83 MHz Gigaplane.
■
X2612 (501-4883-05) I/O type 4, 83/90/100 MHz Gigaplane.
I/O Graphic board with SOC+:
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
■
X2622 (501-4884-05) I/O type 5, 83/90/100 MHz Gigaplane.
Both SOC+ connections on one I/O board can be used to connect to an A3500FC
concurrently. For better redundancy, do not connect both controllers of the same
controller module to the same I/O board.
Minimum firmware requirement for supported I/O board is: 1.8.25.
Note – Refer to FIN I0586-1 for details.
2.7.3
Second Port on the SOC+ Card
Other Fibre Channel devices can be connected to the second port. Refer to FIN I05861 for details.
2.7.4
Disk Drive Support Matrices
The following URL sites provide support information for all hard disk drives that
are used in the Sun StorEdge™ RSM Array 2000 and Sun StorEdge A1000, A3x00
and A3500FC disk arrays.
■
2.7.5
http://webhome.sfbay/harddrivecafe/
Independent Controller/Box Sharing
Independent Controller is also known as Box Sharing. It enables the storage capacity
of an A3x00/A3500FC disk array to be shared by two independent hosts. See the
“Independent Controller Configuration” section in Chapter 2 in the Sun StorEdge
RAID Manager 6.22 User’s Guide for an overview of this function.
2.7.6
HBAs
Refer to the HBA support matrix in Early Notifier 20029 for a list of supported
HBAs.
Chapter 2
Hardware Installation and Configuration
2-13
2.8
SCSI to FC-AL Upgrade
Note – Refer to FIN I0590 and the latest version of the Sun StorEdge A3500/A3500FC
Controller Upgrade Guide for more information regarding this procedure. The latest
version of the Sun StorEdge A3500/A3500FC Controller Upgrade Guide at the time this
document was prepared: part no. 806-0479-11.
You need to load NVSRAM to the A3500FC controllers as documented in FIN I0590
and in the Sun StorEdge A3500/A3500FC Controller Upgrade Guide. Be sure to load the
correct NVSRAM patch:
■
A3500 array with D1000 trays - patch no. 109232-01
■
A3000 array with RSM trays - patch no. 109233-01
If you have RM 6.22.1, then use the NVSRAM in that release, which is later than the
NVSRAM in these two patches.
Follow the instructions provided in the Sun StorEdge A3500/A3500FC Controller
Module Guide to route fiber cables. The old SCSI controller front bezel does not give
adequate room for fiber-optic cable routing; discard it. Use the new A3500FC
controller face plate that comes with the A3500FC upgrade kit.
2-14
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
CHAPTER
3
RAID Manager Installation and
Configuration
This chapter provides some additional information, guidelines, and tips relating to
the installation and configuration of an array.
This chapter contains the following sections:
■
Section 3.1, “Installation and Configuration Tips, Tunable Parameters, and
Settings” on page 3-2
■
Section 3.2, “LUN Creation/RAID Level” on page 3-5
■
Section 3.3, “LUN Deletion and Modification” on page 3-9
■
Section 3.4, “Controller and Other Settings” on page 3-9
3-1
3.1
Installation and Configuration Tips,
Tunable Parameters, and Settings
This section contains the following topics:
3.1.1
■
Section 3.1.1, “Software Installation” on page 3-2
■
Section 3.1.2, “Software Configuration” on page 3-3
■
Section 3.1.3, “RAID Module Configuration” on page 3-3
■
Section 3.1.4, “Tunable Parameters and Settings” on page 3-3
■
Section 3.1.5, “Multi-Initiator/Clustering Environment” on page 3-4
■
Section 3.1.6, “Maximum LUN Support” on page 3-5
Software Installation
■
Upgrading from RAID Manager 6.0/RAID Manager 6.1 to RAID Manager 6.1.1
Update 2 or RAID Manager 6.22x.
■
Two Mbyte DacStore is not supported on RAID Manager 6.22x.
Issue: existing LUNs with a 2-MB DacStore running under RAID Manager
6.1.1/RAID Manager 6.22x see FIN I0557 for recommendations and procedures.
■
Upgrading from RAID Manager 6.1 to RAID Manager 6.1.1 Update 2.
See the “Sun StorEdge RAID Manager Upgrade Procedure" section in the Sun
StorEdge RAID Manager 6.1.1 Update 2 Release Notes.
■
Installing RAID Manager 6.1.1 Update 2.
Refer to the Sun StorEdge RAID Manager 6.1.1 Installation and Support Guide for
Solaris.
■
Installing RAID Manager 6.22x.
Refer to the Sun StorEdge RAID Manager 6.22 Installation and Support Guide for
Solaris and the Sun StorEdge RAID Manager 6.1.1 Update 2 Release Notes.
Also, refer to the Sun StorEdge RAID Manager 6.22 and 6.22.1 Upgrade Guide (part
number 806-7792) for additional information when upgrading to RAID Manager
6.22x.
■
3-2
Ensure that you also install the latest controller firmware revision after upgrading
the RAID Manager software.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
3.1.2
3.1.3
Software Configuration
■
RAID Manager 6.1.1—Refer to the Sun StorEdge RAID Manager 6.1.1 Installation
and Support Guide for details.
■
RAID Manager 6.22—Refer to the Sun StorEdge RAID Manager 6.22 Installation and
Support Guide for details.
■
If the default LUN 0 has to be resized (remove and recreate because the size is too
small), see FIN I0573 for procedure.
■
When upgrading to RAID Manager 6.22, you may see warning messages
indicating that there are bad disk drives. This is normal and is a new feature in
RAID Manager 6.22 called Predictive Failure Analysis (PFA). Refer to Chapter 6
“Recovery” in the Sun StorEdge RAID Manager 6.22 User’s Guide for details.
RAID Module Configuration
There are three typical RAID Module configurations (refer to Chapter 2 in the Sun
StorEdge RAID Manager 6.1.1 User’s Guide or Sun StorEdge RAID Manager 6.22 User’s
Guide for details).
3.1.4
■
Single-Host Configuration (with dual path access to the module).
■
Multi-Host Configuration (only support with Sun Cluster software).
■
Independent Controller Configuration (no redundant path protection). Since all
arrays connected to the host lose failover protection, Independent Controller
Configuration should not be mixed with Single-Host Configuration and MultiHost Configuration.
Tunable Parameters and Settings
■
For a detailed description on tunable parameters for RAID Manager 6, see the
man page rmparams (4). All RAID Manager GUI processes should be terminated
before changing /etc/osa/rmparams otherwise the GUI will core dump and
exit.
■
Rdac_NoReconfig—used to control LUN ownership between boot -r. Refer to
the README file of RAID Manager 6.1.1 patch number 106552-04 or later for
details. This variable is ignored by RAID Manager 6.22x.
■
On hosts using volume managers, tuning Rdac_RetryCount to provide for
quicker notification of failures is recommended since the volume manager will
also do retries. A value of 1 is suggested, just as it is for clusters.
■
System_MaxLunsPerController in rmparams; the higher the number, the longer it
takes to reboot or invoke the GUI environment.
Chapter 3
RAID Manager Installation and Configuration
3-3
3.1.5
Multi-Initiator/Clustering Environment
■
Sun Cluster is the only clustering/multi-initiator environment tested and verified
by Sun with the A3x00. A number of parameters should be modified to run an
A3x00 under the Sun Cluster 2.1/Sun Cluster 2.2 environment. See
Rdac_RetryCount, Rdac_NoAltOffline and Rdac_Fail_Flag in the RAID
Manager 6.1.1_u1 or 6.1.1_u2 patch number 106707-03 or later.
■
See the Sun Cluster documentation for specific Sun Cluster requirements.
For Sun Cluster 2.1, refer to the following web site:
http://suncluster.eng.sun.com/engineering
For Sun Cluster 2.2, refer to the following web site:
http://suncluster.eng.sun.com/engineering/SC2.2/fcs_docs/fcs_d
ocs.html
For Sun Cluster 3.x, refer to the following web site:
http://suncluster.eng.sun.com/products/SC3.0/
■
Running rm6 diagnostic commands in a multi-host configuration
When rm6 diagnostic commands (e.g. healthck, drivutil) are run, lock files
are used on the host to serialize execution and protect against these commands
being run simultaneously against an A1000/A3x00/A3500FC module.
This design means that running multiple rm6 diagnostic commands at the same
time, on the same host, is safe.
Running rm6 diagnostic commands from more than one host connected to a
shared A1000/A3x00/A3500FC at the same time, however, can cause controllers
to go offline, potentially affecting data availability.
This situation can occur when running explorers or third party application
packages (e.g. BMC Patrol) from multiple hosts with shared
A1000/A3x00/A3500FC storage at the same time.
Additionally, Sun Remote Services software runs rm6 utilities via the scripts
healthck.sh and drivutil.sh (included in the package SRSsymod).
In all of these cases, it’s important to modify procedures or configurations to
ensure rm6 diagnostic commands are never executed from more than one host
attached to a shared A1000/A3x00/A3500FC at the same time.
Running the rm6 GUI from more than one host attached to a shared
A1000/A3x00/A3500FC at the same time should also be avoided. Even when the
administrator has not actively executed diagnostic commands via the GUI, an idle
rm6 GUI may nevertheless be probing the A1000/A3x00/A3500FC. The customer
will still be exposed to the same problem of controllers going offline.
3-4
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
3.1.6
Maximum LUN Support
The default setting for maximum LUN support is 8. If more than 8 LUNs are
required on each A3x00, refer to "Maximum LUN Support..." in the RAID Manager 6
Release Notes for details. Also see FIN I0589 for PCI HBAs.
Note – When performing a RAID Manager upgrade, if extended LUN support is
enabled, ensure that you reenable it during the upgrade as described in the Upgrade
Guide.
Caution – Do not use the add16lun.sh script found on the RAID Manager 6.1.1
CD-ROM on a PCI machine. Details are available in FIN I0589.
3.2
LUN Creation/RAID Level
This section contains the following topics:
3.2.1
■
Section 3.2.1, “General Information” on page 3-5
■
Section 3.2.2, “LUN Numbers” on page 3-6
■
Section 3.2.3, “The Use of RAID Levels” on page 3-6
■
Section 3.2.4, “Cache Mirroring” on page 3-6
■
Section 3.2.5, “Reconstruction Rate” on page 3-7
■
Section 3.2.6, “Creation Process (Serial/Parallel) Time” on page 3-8
■
Section 3.2.7, “DacStor Size (Upgrades)” on page 3-8
General Information
■
RAID Manager 6.22x has an "immediate LUN availability" feature: a LUN
becomes Optimal in a few minutes, while the rest of the initialization is being
processed in the background. See Section 3.2.6, “Creation Process (Serial/Parallel)
Time” on page 3-8.
■
If LUN creation is scripted, see Section 3.2.6, “Creation Process (Serial/Parallel)
Time” on page 3-8.
■
For multi-initiator configurations, see Section 3.3, “LUN Deletion and
Modification” on page 3-9.
Chapter 3
RAID Manager Installation and Configuration
3-5
■
DacStore size is different between LUNs created under RAID Manager 6.0/RAID
Manager 6.1 vs. RAID Manager 6.1.1/RAID Manager 6.22x. See Section 3.2.7,
“DacStor Size (Upgrades)” on page 3-8.
■
LUN creation under RAID Manager 6.22x.
hot_add is a new command introduced in RAID Manager 6.22x and patch
106552-04 in RAID Manager 6.1.1_u1/2. It cleans up the Solaris device tree by
running devfsadm (Solaris 8 and later) or the following set of commands:
drvconfig, devlinks, and disks. hot_add was available in RAID Manager
6.1.1_u1/2 in /Tools/dr_hotadd.sh on the CD.
3.2.2
3.2.3
3.2.4
3-6
LUN Numbers
■
LUN numbers are allocated in sequence by RAID Manager 6 starting from zero.
■
The boot device has to be LUN 0.
■
You must always have a LUN 0. Refer to FIN I0573-2.
The Use of RAID Levels
■
See the "RAID Level" section in Chapter 2 in the Sun StorEdge RAID Manager 6.1.1
User’s Guide or Sun StorEdge RAID Manager 6.22 User’s Guide for various RAID
levels available on the A3x00 RAID controller. The configuration GUI also gives a
description of the RAID level selected during LUN creation.
■
RAID 1 and RAID 5 will take advantage of the hardware RAID controller to
improve performance and data availability.
■
RAID 0 LUN does not offer any redundancy. If a single drive fails, all data on the
LUN is lost (see the "RAID level" description in Chapter 2 in the Sun StorEdge
RAID Manager 6.1.1 User’s Guide or Sun StorEdge RAID Manager 6.22 User’s Guide.
Cache Mirroring
■
See the "Changing Cache Parameters" section under the "Maintenance and
Tuning" chapter in the Sun StorEdge RAID Manager 6.1.1 User’s Guide or Sun
StorEdge RAID Manager 6.22 User’s Guide.
■
Write cache is always turned off for a short duration ( ~ 15 min.) after a reboot or
reset of the A3x00 controller module until the battery is ready.
■
Write cache will be disabled when the battery strength drops below 80%.
■
Write cache will be off for all LUNs if the timer of the battery is greater then two
years. See Section 5.2.8, “Battery Unit” on page 5-13.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Caution – If write cache is turned on in dual/active mode you must also have it
mirrored. Failure to do so may result in data corruption if a controller fails.
3.2.5
Reconstruction Rate
■
Reconstruction rate depends on the "Reconstruction Rate setting" and the I/O
load on the module. Refer to Chapter 7 "Maintenance and Tuning" in the Sun
StorEdge RAID Manager User’s Guide.
■
Under optimal conditions (system idle with no other I/Os to the A3x00), it takes
about two minutes to reconstruct a 1-GB (4+1) RAID 5 LUN under RAID Manager
6.22. If the LUN is active with host I/O request, it could take up to 10 minutes to
reconstruct 1 GB of storage.
■
Reconstruction can start only if certain requirements have been met. If a controller
has failed, no reconstruction will start until the controller is Optimal.
■
Only one reconstruction can be in progress at a time for a given controller.
■
The reconstruction rate can vary by 50%, depending on the model of disk drive in
the trays.
■
LUNs can only be reconstructed by the controllers that own them.
■
The following sequence of events indicates how the LUN status changes:
1. Initially, in a (4+1) RAID 5 configuration, LUN X is Optimal.
2. RAID Manager 6 detects a bad drive in LUN X.
3. The status of LUN X then changes to degraded.
If there is a GHS, LUN X status changes to reconstructing; otherwise, LUN X
remains in degrade.
Note – The GHS must be of a size greater than or equal to the failed drive.
LUN X will remain in the reconstructing state until reconstruction with GHS has
been completed, even though the bad disk has been replaced.
After the reconstruction with GHS has been completed, LUN status switches to
Optimal. Copyback to the replaced drive starts when LUN status changes to
Optimal. New data will continue to be written to the GHS until the copyback
process is complete. Then the GHS returns to the GHS pool.
■
For recovery under RAID Manager 6.1.1, refer to Chapter 5 in the Sun StorEdge
RAID Manager 6.1.1 User’s Guide for details.
■
For recovery under RAID Manager 6.22, refer to Chapter 6 in the Sun StorEdge
RAID Manager 6.22 User’s Guide for details.
Chapter 3
RAID Manager Installation and Configuration
3-7
Note – Do not revive a drive if it is failed by the controller. Refer to the Sun StorEdge
RAID Manager User’s Guide.
3.2.6
3.2.7
Creation Process (Serial/Parallel) Time
■
LUNs can be created with either CLI or the GUI. Use the GUI to create LUNs for
better coordination between the various back-end utilities.
■
Be sure that an Optimal LUN 0 resides on one of the controllers and an Optimal
LUN exists on the other controller before you attempt parallel LUN creation. See
FIN I0573-02 or later for details on the importance of LUN 0. This condition is
very important in the case of LUN creation via script with CLI.
■
If you must create LUNs with CLI, allow a delay of 1 to 3 minutes between each
CLI (raidutil) process so the device path creation on the Solaris side has a
chance to complete before the next one starts.
■
Parallel LUN creation means that multiple LUNs are in the formatting state as
reported by the GUI.
■
The creation of multiple LUNs in the same drive group is processed in serial.
■
A limit of four LUNs can be created in parallel in each controller. Any more than
four is queued in the controller.
■
RAID Manager 6.1.1 takes 3 minutes to format 1-GB of storage under optimal
conditions. For example, a 10-GB RAID 5 (4+1) LUN takes 10-GB x 3 min/GB = 30
minutes. Refer to Chapter 3 in the Sun StorEdge RAID Manager 6.1.1 User’s Guide
for details and restrictions.
■
RAID Manager 6.22x takes 2 to 5 minutes for a LUN to become Optimal while the
actual format is taking place in the background. Refer to Chapter 4 in the Sun
StorEdge RAID Manager 6.22 User’s Guide for details and restrictions.
DacStor Size (Upgrades)
Consider the following when upgrading from RAID Manager 6.0/RAID Manager
6.1 to RAID Manager 6.1.1 and later.
DacStore size is different between LUNs created under RAID Manager 6.0/6.1 vs.
RAID Manager 6.1.1/6.22x. This is an issue with customers who are running RAID
Manager 6.0 or RAID Manager 6.1 and want to upgrade to RAID Manager 6.1.1 or
RAID Manager 6.22x. See FIN I0557-2 for details.
3-8
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
RAID Manager 6.22x does not support 2-MByte DacStore. It supports 40-MByte
DacStore.
3.3
3.4
LUN Deletion and Modification
■
See the "Guidelines for Creating or Deleting LUNs" section in the Sun StorEdge
RAID Manager 6.22 Release Notes for details on restrictions for LUN 0 removal.
Refer to FIN I0573-2 for more information regarding the serious consequences of
deleting LUN 0.
■
A number of new features are available to modify a LUN/drive group while the
LUN/drive group is in production. See the section "Modifying Drive Groups and
Logical Units" in Chapter 4 in the Sun StorEdge RAID Manager 6.22 User’s Guide
for details.
Controller and Other Settings
This section contains the following topics:
3.4.1
■
Section 3.4.1, “NVSRAM Settings” on page 3-9
■
Section 3.4.2, “Parity Check Settings” on page 3-10
NVSRAM Settings
See the "NVSRAM Settings" section in the appendix of the Sun StorEdge RAID
Manager 6.1.1 Installation Guide or the Sun StorEdge RAID Manager 6.22 Installation
Guide for more details.
■
■
The NVSRAM specifies the configuration of the controller. Some of the parameters
that the NVSRAM controls are:
■
Transfer rate between the controller and back-end SCSI enclosure.
■
Start of Day processing .
What version of NVSRAM to be used on a particular controller depends on the
disk trays and the RAID Manager 6 releases. Refer to the following file for further
details:
/net/artemas.ebay/global/archive/StorEdge_Products/
sonoma/nvsram/nvsram_versions
Chapter 3
RAID Manager Installation and Configuration
3-9
Caution – Modifying the NVSRAM settings via the nvutil (1M) command will
change the behavior of the controller. Use caution when executing this command.
3.4.2
Parity Check Settings
This section contains the following topics:
3.4.2.1
■
Section 3.4.2.1, “RAID Manager 6.1.1” on page 3-10
■
Section 3.4.2.2, “RAID Manager 6.22x” on page 3-10
■
Section 3.4.2.3, “Parity Repair” on page 3-11
■
Section 3.4.2.4, “Multi-host Environment” on page 3-11
RAID Manager 6.1.1
The RAID Manager 6.1.1 default setting is to run parityck (1M) once a day. This
setting can be modified via "Maintenance and Tuning." See the "Changing Automatic
Parity Check/Repair Settings" section in Chapter 6 in the Sun StorEdge RAID
Manager 6.1.1 User’s Guide.
If the I/O subsystem is very busy, it is likely that the parityck process will not
complete within 24 hours. In this case, a cron job can be created to run parityck
once a week or once every two weeks. The goal is to run parityck periodically, but
there should not be multiple parityck processes running on the same A3x00
module at any time. parityck takes up I/O resources in the controller.
Also refer to bug no. 4137421 which contains a script that can speed up the time
necessary to run parityck (checks LUNs in parallel instead of serially).
3.4.2.2
RAID Manager 6.22x
The RAID Manager 6.22x default setting for parityck (1M) is to run once a week.
A new option to parityck reports parity error without repair. See the man page
parityck (1M) for details.
If parityck (1M) finds a mis-matched data block and parity block, it reports the
mismatch (in /var/adm/messages and rmlog.log) and regenerates new parity
blocks.
You should run parityck with the “no repair” option, which is the default setting in
RAID Manager 6.22.1. See FIN I0825. This FIN describes how to override the default
from the GUI, which does not work as expected.
3-10
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
3.4.2.3
Parity Repair
With RAID 3 and RAID 5, data blocks are assumed to be good. Parity blocks are
regenerated by parityck (1M) with the proper options. See the man page
parityck (1M) for further details.
3.4.2.4
Multi-host Environment
■
Only 1 host in a cluster should be capable of running parityck.
■
Each host in a box sharing environment can run parityck.
Chapter 3
RAID Manager Installation and Configuration
3-11
3-12
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
CHAPTER
4
System Software Installation and
Configuration
This chapter provides some additional information, guidelines, and tips relating to
installation and configuration of system software.
This chapter contains the following sections:
■
Section 4.1, “Installation” on page 4-2
■
Section 4.2, “Solaris Kernel Driver” on page 4-2
■
Section 4.3, “format and lad” on page 4-4
■
Section 4.4, “Ghost LUNs and Ghost Devices” on page 4-5
■
Section 4.5, “Device Tree Rearranged” on page 4-9
■
Section 4.6, “SNMP” on page 4-11
■
Section 4.7, “Interaction With Other Volume Managers” on page 4-12
4-1
4.1
Installation
This section contains the following topics:
4.1.1
4.1.2
■
Section 4.1.1, “New Installation” on page 4-2
■
Section 4.1.2, “All Upgrades to RAID Manager 6.22 or 6.22.1” on page 4-2
New Installation
■
RAID Manager 6.1.1—Refer to Chapter 1 in the Sun StorEdge RAID Manager 6.1.1
Installation and Support Guide for Solaris.
■
RAID Manager 6.22—See the “About the Installation Procedure” section in
Chapter 1 in the Sun StorEdge RAID Manager 6.22 Installation and Support Guide for
Solaris.
All Upgrades to RAID Manager 6.22 or 6.22.1
Refer to the Sun StorEdge RAID Manager 6.22 and 6.22.1 Upgrade Guide (part number
806-7792) at:
http://acts.ebay.sun.com/storage/A3500/RM622
4.2
Solaris Kernel Driver
This section contains the following topics:
■
Section 4.2.1, “sd_max_throttle Settings” on page 4-3
■
Section 4.2.2, “Generating Additional Debug Information” on page 4-3
Refer to the patch matrix (OS, driver, RAID Manager 6) outlined in Early Notifier
20029 for details. Also refer to Patchpro:
http://patchpro.ebay/servlet/com.sun.patchpro.servlet.PatchProServlet
RAID Manager 6.1.1 only supports SCSI interconnects (sd via SBus/UDWIS/isp
and PCI/glm).
4-2
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
RAID Manager 6.22 supports both SCSI and Fibre Channel interconnect to the
A3x00/A3500FC controller module. RAID Manager 6.22 supports only SCSI when
you are using the Solaris 2.5.1 11/97 operating environment. See FIN I0688. The SCSI
driver stack is the same as RAID Manager 6.1.1. The driver stack for Fibre Channel
is:
4.2.1
4.2.2
■
SBus/soc+socal/sf/ssd
■
PCI/QLC2100/ifp/ssd
■
PCI/QLC220x/fcp/ssd
sd_max_throttle Settings
■
sd_max_throttle for the A3x00 is set by sd. Manually setting
sd_max_throttle in /etc/system is not necessary unless the current value is
too high. A good estimate is sd_max_throttle x no. of LUNs per module <=
180 for systems using an UDWIS/SBus host bus adapter (HBA). 180 is close to the
upper limit of command entries in the current generation of UDWIS/SBus (SCSI)
HBAs. sd_max_throttle has no effect on the current ssd.
■
Refer to FIN I0579-1 for further information.
Generating Additional Debug Information
After setting sd_error_level = 0 or ssd_error_level = 0, the following
error messages may appear on the console or in /var/adm/messages:
Failed CDB:0xbe 0x1e 0x12 0x1 0xc1 0x0 0x10 0x0 0x0 0x0 0x0 0x0
/pci@1f,4000/scsi@4,1/sd@5,0 (sd50):
Sense Data:0x70 0x0 0x5 0x0 0x0 0x0 0x0 0x28 0x0 0x0 0x0 0x0 0x24
0x0 0x
0 0x0 0x0 0x0 0xe 0x12
The messages are warnings that the target driver sd or ssd has tried a command
that didn’t work. Some versions, for example, try to read the wrong mode sense
page for power management data. You can safely ignore these messages.
Bug 4358075 describes error messages when the RAID Manager 6 software probes
all array devices mode page 2C and receives errors from those arrays that don’t
support page 2C. The bug 4358075 messages are expected and are not seen
unless sd_error_level = 0 or ssd_error_level = 0.
The following kernel variable could be set to capture more information regarding
why the controller failover occurred:
Chapter 4
System Software Installation and Configuration
4-3
sd_error_level = 2 or 0 (for A3x00 SCSI)
ssd_error_level = 2 or 0 (for A3500FC)
RdacDebug = 1 (for both SCSI and FC)
You can set the variables in two ways:
■
adb -kw
■
You can add the variables to the end of /etc/system, followed by a reboot. See
the man page system (4) for further details.
With the variables set, all failed command descriptor block (CDB) and retry
commands will appear on the console and in /var/adm/messages. Be sure enough
space is available on /var/adm, if file system size is limited.
4.3
format and lad
Keeping the lad/RM6 ctd assignment in sync with format is neither necessary nor
practical.
lad/RM6 reports the current device path to LUNs. The path changes when LUNs are
moved between controllers.
format reports the device path based on entries in/dev/[r]dsk. These entries are
created at the end of the LUN creation process. These are static, thus presenting a
fixed reference to the application no matter which controller owns the LUN.
If you want to keep the ctd# in sync, refer to the following white paper available on
the Sonoma Engineering web site for a RAID Manager 6.1 solution:
http://webhome.sfbay/A3x00/Sonoma/4084293
For a RAID Manager 6.22x solution for keeping the ctd# in sync, see the man page
rdac_address (4).
Note – format should report the LUNs as a pseudo device (for example
/pseudo/rdnexus...). If the path indicates physical devices (for example
/sbus@...), then this indicates that the LUNs were not built properly.
4-4
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
4.3.1
Volume Labeling
At the end of the LUN creation process, format (1M) is called to label the LUN with
a volume label. If the LUN creation process is interrupted or the LUN is created via
the serial port, a valid Solaris label may not exist on the LUN. In this case, just label
the LUN manually using the format (1M) command.
4.4
Ghost LUNs and Ghost Devices
The following sample procedure corrects a configuration with a LUN that has a
drive defined at location [15,15] (not valid). The drive is an Optimal drive/LUN, and
the device appears as an extra Global Hot Spare (GHS).
Caution – Only trained and experienced Sun personnel should access the serial
port. You should have a copy of the latest Debug Guide. There are certain commands
that can destroy customer data or configuration information. No warning messages
appear if a potentially damaging command has been executed.
Note – This serial port procedure should be performed from the controller that
owns the LUN or a reboot will be necessary. Perform a ghsList for information
about the hot spare. You must extract the dev pointer information for use in
subsequent steps. Make a note of the output from ghsList.
You must obtain the devnum of the ghost drive. Use vdShow <LUN#>, where
<LUN#> is the LUN that contains the extra disk. You will se one of the devnums
from this LUN in the devnums of the list of GHSs.
1. Perform a ghsList for information about the hot spare.
You must extract the dev pointer information for use in subsequent steps. Make
note of the output from ghsList.
-> ghsList
Information about two hot spares appears:
■
dev pointer=0x2b5348c is the address of the first Ghost Hot Spare, under
GHS 0.
Make a note of it.
Chapter 4
System Software Installation and Configuration
4-5
■
dev pointer=0x2b4c3ec is the address of the second GHS, under GHS1
GHS ENTRY 0
dev pointer=0x2b5348c (the address of the first Ghost Hot Spare)
devnum=2
state=2
status=0
flags 4000
GHS ENTRY 1
dev pointer=0x2b4c3ec
devnum 0002
state=2
status=0
flags 4000
value = 5 = 0x5
2. Remove the extra LUN that is part of the Global hot spare list.
Utilize shell commands on a laptop connected directly to the RS-232 port.
-> m 0x dev pointer address,4
The memory locations are displayed 4 bytes at a time.
3. Alter the third word byte two of the dev pointer information to tell the drive it is
not a hot spare.
-> m 0x2b5348c,4 (modifying phydev pointer in memory)
02b5348c: 02b5dd7c- (hit return to get to next word)
02b53490: 00000002- (hit return to get to next word)
02b53494: 00204000-0020000 (changing 3rd long word of phydev
structure to remove the 0x4000, but keeping 1st portion)
02b53498: 00000000-. (put decimal point "." to end writing)
value = 1 = 0x1
4. Write the information to disk (DacStore).
-> isp cfgWritePhydevDef,0x dev pointer address
value = 45155160 = 0x2b10358
4-6
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
5. Modify the global hot spare list in memory.
This entry should be the dev pointer address from ghsList. Zero this location out.
-> m &globalHotSpare
02f09104: 02b5348c-02b4c3ec
(Put address of second GHS here)
(packing stack after removal of invalid entry)
02f09108: 02b4c3ec-00000000
(Zero out this location)
02f0910c: 00000000- .
(end the command with
a "." [period] and a return)
value = 1 = 0x1
When the globalHotSpare list is modified, the entries should be packed, removing
the false GHS pointer. If there is a 0 in the middle of the list, it will cause everything
after the 0 to be forgotten.
6. Write the information to disk (DacStore).
Use isp cfgSaveGHSDrives to save the information.
-> isp cfgSaveGHSDrives (Save GHS drive stack to DacStore)
value = 45155160 = 0x2b10358
7. Verify that only one GHS is shown.
-> ghsList
GHS ENTRY 0
dev pointer=0x2b4c3ec
devnum 0002
state=2
status=0
flags 4000
value = 5 = 0x5
Chapter 4
System Software Installation and Configuration
4-7
4.4.1
Removing Ghost Drives
Use this procedure to remove ghost drives that are not Global hot spares. To remove
a phantom drive, perform the following steps through the controller shell.
Caution – Only trained and experienced Sun personnel should access the serial
port. You should have a copy of the latest Debug Guide. There are certain commands
that can destroy customer data or configuration information. No warning messages
appear if a potentially damaging command has been executed.
Note – This serial port procedure should be performed from the controller that
owns the LUN or a reboot may be necessary.
1. Enter the string:
-> cfgPhy ch,id
2. Write down the nextphy value.
3. Enter the string:
-> d &phyunits,6,4
A column of data is returned. Each column of data following the colon (:) equals
one channel.
4. Determine the channel that corresponds to the channel of the phantom drive.
The channels in the shell are 0-relative.
5. Enter the string:
-> d 0x(ADDRESS from step 4),0x30,4
6. Locate the first column that contains numbers after the colon (:).
4-8
■
If the nextphy value copied in step 2 is shown, proceed to step 7.
■
If the nextphy value copied in step 2 is not shown, then repeat steps 5 and 6 with
the value in the first column that contains numbers after the colon (:).
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
7. Enter the string:
-> m 0x(ADDRESS from step 4),4
8. After the dash (-), enter the nextphy value from step 2 and press enter.
9. Enter a period (.) and press enter.
10. Enter the string:
-> cfgPhy ch,id
Verify “number of phydevs = 0”.
11. Enter the string:
-> isp cfgSaveFailedDrives
4.5
Device Tree Rearranged
This section contains the following topics:
■
Section 4.5.1, “Dynamic Reconfiguration Related Problems” on page 4-10
■
Section 4.5.1.1, “Workaround” on page 4-10
Solaris device numbers for controllers, the c number in /dev/[r]dsk/cXtYdZs0,
are assigned by Solaris during reconfiguration reboots or device addition. It is not
always easy to predict how and when device numbers will be re-allocated after a
reconfiguration boot is done following configuration changes.
If this happens unexpectedly, mount points in /etc/vfstab and in Volume
Manager can be lost. The file /etc/path_to_inst is intended to keep device
numbers the same across normal reboots. Bug 4118532 describes numerous details,
but note the following:
■
Do not start any software installation of RAID Manager 6 or Solaris without first
making sure that the device names are stable. See FIN I0727.
A reconfiguration reboot is the best way to do this. Device numbering is likely to
change if hardware has been added, removed or disabled. Prior to Solaris 7, if one
did a reconfiguration boot with a failed controller, the C number was lost because
Chapter 4
System Software Installation and Configuration
4-9
the disks (1M) program would remove such failed controllers. In Solaris 7 and
later, disks won’t purge failed devices unless they are called as disks -C or
devfsadm -C (Solaris 8).
■
4.5.1
With the change to devfsadm in Solaris 7 and later, the file /dev/cfg also keeps
bus numbers and may need to be removed before a reconfiguration reboot in
order to clear up persistent misnumbering.
Dynamic Reconfiguration Related Problems
The first time you add an array, the /kernel/drv/rdriver.conf file serves much
the same purpose as the /kernel/drv/sd.conf file, except that the rdriver
takes the place of sd. The rdriver reads this file and determines the number of
LUNs that it will allow to be configured in the /devices/pseudo/rdnexus@?
device tree. This is the same device tree where links are created from /dev/[r]dsk.
The /kernel/drv/rdriver.conf file contains two types of entries.
■
Specific LUN definitions that match each configured LUN. They appear at the top
of the file.
■
The second type resemble those found in the /kernel/drv/sd.conf file and
occur at the bottom of the file. These entries show only LUNs 0-7 in the default
rdriver.conf file.
The problem, then, is that when we DR Attach the A3500’s, the specific or actual
LUN definitions are not placed into the /kernel/drv/rdriver.conf file until the
dr_hotadd.sh (RAID Manager 6.1.1) or hot_add (RAID Manager 6.22x) script is
run. This is too late: the rdriver has already been loaded during the last reboot,
and rdriver has already read the rdriver.conf file, which did not contain the
new, higher than LUN 8 A3500 entries. So the new, higher than 8 LUNs will not be
recognized without a reboot.
4.5.1.1
Workaround
Modify the entries at the bottom of the rdriver.conf file to allow targets 4 and 5
(or whichever targets your system uses) to accept more than 8 LUNs: 16 LUNs, for
example. You must reboot the system at least once to make this effective. However,
after the reboot, the rdriver is loaded and ready to accept LUNs greater than 8
dynamically. You can then attach one or more A3500s, and when the dr_hotadd.sh
(RAID Manager 6.1.1) or hot_add.sh (RAID Manager 6.22) script is run, the
rdriver is prepared to allow the creation of LUNs greater than 8, with the
appropriate device entries.
■
4-10
Once you determine the desired maximum number of LUNs you want, complete
the configuration by running the addXXlun.sh from the Tools directory of the
CD-ROM. An online version of the Tools directory is available from:
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
/net/artemas.ebay/global/archive/StorEdge_Products/
sonoma/rm_6.1.1_u2/FCS/Tools
or
/net/artemas.ebay/global/archive/StorEdge_Products/
sonoma/rm_6.22/Tools
Note – Adding support of more LUNs than you need extends the time required for
reboot and the response time of the RAID Manager 6 GUI because it has to scan all
the potential LUNs. See FIN I0551-1 or later.
■
Common problem—RAID Manager 6 is unable to communicate with the module
but lad shows more then 8 LUNs.
Solution—Re-run addXXlun.sh followed by boot -r or hot_add.
4.6
SNMP
See the "Setting Up SNMP Notification" section in Chapter 4 in the Sun StorEdge
RAID Manager 6.22 Installation and Support Guide for details.
Note – In order for SNMP to be properly configured, DNS must be enabled unless
you apply the workaround described in bug 4348634.
The SNMP trap data that is supported by the A3x00/A3500FC controllers is covered
in the MIB definition included with the RAID Manager host software.
See the following file:
/net/artemas.ebay/global/archive/StorEdge_Products/
sonoma/rm_6.22/rm6_22_FCS/Product/SUNWosau/reloc/lib/osa/
rm6traps.mib
Also refer to the RAID Manager software documentation online:
http://infoserver.central
Chapter 4
System Software Installation and Configuration
4-11
4.7
Interaction With Other Volume
Managers
This section contains the following topics:
4.7.1
■
Section 4.7.1, “VERITAS” on page 4-12
■
Section 4.7.2, “Solstice Disksuite (SDS)” on page 4-13
■
Section 4.7.3, “Sun Cluster” on page 4-13
■
Section 4.7.4, “High Availability (HA)” on page 4-13
■
Section 4.7.5, “Quorum Device” on page 4-14
VERITAS
This section contains the following topics:
■
Section 4.7.1.1, “VERITAS Enabling and Disabling DMP” on page 4-12
■
Section 4.7.1.2, “HA Configuration Using VERITAS” on page 4-13
■
Section 4.7.1.3, “Adding or Moving Arrays Under VERITAS” on page 4-13
The A3x00 has been tested and qualified with VERITAS Volume Manager (VxVM)
and SDS. Both VxVM and SDS are layered software running on top of rdriver.
The only A3x00 cluster environment that has been tested and qualified by Sun is Sun
Cluster 2.1 and Sun Cluster 2.2. Other cluster software (VERITAS Cluster Server and
FirstWatch, for example) has not been tested nor qualified by Sun.
4.7.1.1
VERITAS Enabling and Disabling DMP
■
The following document, available on the Sonoma Engineering web site, provides
instructions for installing, running, and administrating VxVM with A3x00:
http://webhome.sfbay/A3x00/Sonoma/VM_A3x00_A1000.pdf
■
4-12
VERITAS VxVM supports DMP (Dynamic Multipathing). This feature conflicts
with the dual path control of RAID Manager 6. VxVM release 3.0.2 addressed this
issue so that RAID Manager 6.22x can coexist with DMP on the same host. DMP
should be disabled under previous versions of VxVM. Refer to the Sun StorEdge
RAID Manager 6.22 Release Notes for instructions on disabling DMP.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
4.7.1.2
HA Configuration Using VERITAS
If you have a problem running an A3x00 under a third party cluster environment,
you can check with CPRE to see whether they have VIP arrangement with the third
party vendor to help you move forward.
Because Sun has no access to the source code of third party cluster software,
debugging is problematical. In such a case, a SCSI or Fibre Channel analyzer trace
between the host and the A3x00 module would help to isolate whether the issue is in
the upper software layer or in the A3x00 layer.
4.7.1.3
Adding or Moving Arrays Under VERITAS
Under VERITAS, volumes will show up in the Vxdisk list as online altused after being
moved to a new host. See SRDB no. 20907 for further details.
4.7.2
Solstice Disksuite (SDS)
The A3x00 is supported by Solstice Disksuite 4.1. See the following white paper
available on the Sonoma Engineering web site:
http://webhome.sfbay/A3x00/Sonoma/SDS.ps
4.7.3
Sun Cluster
Refer to Section 2.6.1, “Cluster Information” on page 2-10.
4.7.4
High Availability (HA)
High Availability and Parallel Database were two distinct products offered by
SunSoft and SMCC. The functionality of these two products is available in Sun
Cluster 2.1. See Section 1.4.1 "High Availability and Parallel Database
Configurations” in Chapter 1. The document is available on the following web site:
http://suncluster.eng/engineering/SC2.2/fcs_docs/fcs_docs.html
Chapter 4
System Software Installation and Configuration
4-13
4.7.5
Quorum Device
Quorum is a concept that is used in distributed systems, a cluster environment
particularly. The requirements and restrictions of a quorum device are specific to the
particular cluster environment. Refer to the following web sites for online
documentation:
http://suncluster.eng.sun.com/engineering/SC2.1
http://suncluster.eng/engineering/SC2.2/fcs_docs/fcs_docs.html
Using the Sun StorEdge A1000or A3x00/A3500FC array as a quorum device is not
supported. See FIN I0520-02.
4-14
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
CHAPTER
5
Maintenance and Service
This chapter provides maintenance and service information for verifying FRU
functionality, guidelines for replacing FRUs, and tips on upgrading to the latest
software and firmware levels.
This chapter contains the following sections:
■
Section 5.1, “Verifying FRU Functionality” on page 5-2
■
Section 5.2, “FRU Replacement” on page 5-10
■
Section 5.3, “Software and Firmware Guidelines” on page 5-16
5-1
5.1
Verifying FRU Functionality
This section contains the following topics:
■
Section 5.1.1, “Disk Drives” on page 5-3
■
Section 5.1.2, “Disk Tray” on page 5-4
■
Section 5.1.3, “Power Sequencer” on page 5-5
■
Section 5.1.4, “SCSI Cables” on page 5-6
■
Section 5.1.5, “SCSI ID Jumper Settings” on page 5-7
■
Section 5.1.6, “SCSI Termination Power Jumpers” on page 5-7
■
Section 5.1.7, “LED Indicators” on page 5-7
■
Section 5.1.8, “Backplane Assembly” on page 5-7
■
Section 5.1.9, “D1000 FRUs” on page 5-7
■
Section 5.1.10, “Verifying the HBA” on page 5-8
■
Section 5.1.11, “Verifying the Controller Boards and Paths to the A3x00/A3500FC”
on page 5-8
■
Section 5.1.12, “Controller Board LEDs” on page 5-9
■
Section 5.1.13, “Ethernet Port” on page 5-10
The troubleshooting and replacement procedures for the following controller
module FRUs are documented in detail in the Sun StorEdge A3500/A3500FC
Controller Module Guide:
■
Battery Canister F370-2434
■
Controller Fan F370-2433
■
DC Power and Battery Harnesses F565-1397
■
Power Supply F370-2436
■
Power Supply Fan F370-2432
■
Power Supply Housing F370-2869
■
Mounting Rail 370-3655
Note – If a fan failure message appears in the rmlog.log, replace the fan FRU that
was reported to have failed (controller or power supply fan) even if the fan appears
to be spinning. The fan circuitry has an RPM sensor which triggers the fan failure
message. The fan may continue to spin but in a degraded mode (at half speed).
5-2
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Note – Remember to reset the battery date on both controllers after a battery
replacement. Refer to Chapter 6 in the Sun StorEdge RAID Manager 6.22 User’s Guide
and read the section “Recovering from Battery Failures” for details on resetting the
battery date.
Note – The power supplies have a thermal protection shutdown feature. To recover
from a power supply shutdown, see Section 7.1 “Recovering From a Power Supply
Shutdown” in the Sun StorEdge A3500/A3500FC Controller Module Guide.
5.1.1
Disk Drives
Refer to Chapter 3 in the Sun StorEdge RAID Manager 6.22 User’s Guide for
procedures to verify the status of each disk drive.
When a disk drive fails, the disk drive amber LED should be on. See Section 4.3.7
“Disk Drive Problem” in the Sun StorEdge A3500/A3500FC Controller Module Guide
for further details.
Note – RAID Manager Recovery Guru will lead you through a disk drive
replacement step by step. Do not deviate from this procedure or you may end up
with ghost drives, ghost LUNs, or drive has not been detected problems. For the same
reason, do not swap disk drives while the controller module is powered off.
Make sure each disk drive has the supported firmware level. If there is a failure
affecting the entire disk tray, RAID Manager 6 Recovery Guru will report the drive
side channel failure. You need to first fix the drive side channel failure before any
drive can be reconstructed (see Section 5.2.10, “Disk Drives” on page 5-14).
Note – Do not swap drives between an A1000 and an A3x00 or you may end up
with DacStore and NVSRAM corruption.
Replacing a failed drive in a RAID 0 LUN requires special attention. Stop any
volume manager or upper level software from accessing the LUN to prevent
possible data corruption. As soon as the drive is replaced, the LUN is described as
“optimal.” Because the data was lost since it is a RAID 0 LUN, you must reformat
the LUN. See page 10 of the Sun StorEdge RAID Manager 6.22.1 Release Notes for more
information.
Never “revive” drives unless a LUN is optimal, as described in “Recovery Guru
Revive Option Is Removed” inthe Sun StorEdge RAID Manager 6.22.1 Release Notes.
Chapter 5
Maintenance and Service
5-3
5.1.2
Disk Tray
This section contains the following topics:
■
Section 5.1.2.1, “RSM Tray” on page 5-5
■
Section 5.1.2.2, “D1000 Tray” on page 5-5
Several conditions can cause a disk tray to become inaccessible: a loose or defective
SCSI cable, a loose or defective SCSI terminator, a defective SCSI chip on the
controller board, or a defective component in the disk tray.
The problem can be sometimes difficult to isolate. Check the rmlog.log and system
logs for an error sense code or a FRU code. Check each cable connector and look for
bent pins. Ensure that each cable is properly connected.
A disk tray failure can cause all of the drives to report a failed status. RAID Manager
6 Recovery Guru can be used to determine the drive side channel failure. Individual
drives are not recoverable until the drive side channel failure status has been
resolved.
To perform further troubleshooting, you will need to have spare components on
hand: a controller FRU, SCSI cables, SCSI terminators, and the disk tray interface
board.
After the hardware component has been replaced, run a health check to verify the
status of the drive channel, drive tray, and each disk drive. You might need to power
cycle the controller module (not the entire rack), even though RAID Manager 6
Recover Guru instructions indicate that you don’t have to power cycle the controller
module if the controller firmware level is 2.5.2 or higher.
You need to power cycle the controller module if any of the following conditions
occur:
■
The LUN reconstruction does not start.
Note – LUN reconstructions occur one at a time serially.
■
Health check continues to report drive side channel failure.
■
Any of the drives in the disk tray do not come out of the failed or unresponsive
state to reconstruct.
Power cycling of the controller module enables the controller to re-scan the disk
trays. If you continue to encounter error messages while attempting to bring a drive
back online (reconstruct), physically remove the drive, wait 30 seconds, and then
reinstall the drive. Then use RAID Manager 6 Recovery Guru to reconstruct the
drive. See FIN I0670 about replacing the ESM card.
5-4
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
5.1.2.1
RSM Tray
A common point of failure on the RSM tray is with the WD2S card, part number 3702196 (older version) and part number 370-3375 (newer version). This card is located
at the point where the SCSI cable attaches to the RSM disk tray. The function of this
card is to convert Wide Differential SCSI to Single Ended SCSI.
The other common point of failure on the RSM tray is with the SEN card, part
number 370-2195. The SEN card should have microcode rev 1.1. When you replace
the Wide Differential 2 SCSI card, change the SCSI ID range from 0-6 (default) to 814, which is needed to see the SCSI IDs properly in a Sun StorEdge A3x00 array.
Failure to do so results in a ghost drive [15,15].
5.1.2.2
D1000 Tray
The common point of failure on the D1000 tray is with the differential SCSI
controller board, part number 375-0008.
The disk tray midplane is also a single point of failure but only in rare cases. In very
rare cases a failed drive can tie up the entire SCSI bus causing the remaining drives
to be rendered inaccessible. This problem can be easily mistaken as a bad disk tray
and somewhat difficult to isolate. Compare each drive’s LED pattern. In some cases
a failed drive will indicate a different LED pattern from the remaining drives.
5.1.3
Power Sequencer
The power sequencer has two groups of four sequenced power outlets and one
unsequenced power outlet. They are designated sequenced group 1 and sequenced
group 2. There is a four second power on delay between sequenced group 1 and
sequenced group 2. The unsequenced outlet remains on as long as the sequencer
circuit breaker is turned on and the sequencer is connected to an AC power source.
When one half of the rack does not receive power, it is usually the result of power
sequencer failure.
The first thing to do is to verify that the power sequencer is plugged in to a 220 VAC
power source.
Note – At the bottom of the expansion cabinet are two power sequencers. The front
power sequencer is hidden behind the front key switch panel. Remove the front key
switch panel to gain access to the power sequencer’s power cable.
Chapter 5
Maintenance and Service
5-5
There is a Local/Remote switch located on the front panel of each power sequencer.
When the Local/Remote switch is set to Local the sequenced outputs are controlled
by a circuit breaker located on the front panel of each power sequencer. When the
Local/Remote switch is set to Remote, the sequenced outputs are controlled by the
key switch located at the bottom front of the Expansion rack. When the
Local/Remote switch is set to OFF, power is removed from the sequenced outputs.
The unsequenced output is not affected by the position of the Local/Remote switch.
As long as the AC power cord is connected and AC power is available, the
unsequenced output is available.
One possible failure is that only one of the sequenced groups turns on. In this case
you will see only one power supply functioning in some of the disk trays installed in
the Expansion rack. Refer to Chapter 3 in the Sun StorEdge A3500/A3500FC Hardware
Configuration Guide and verify that each disk tray is properly configured.
In a 3x15 configuration, the power sequencers in each rack need to be daisy chained
front to front and back to back (See Section 3.5.2 “Connections Between Power
Sequencers” in the Sun StorEdge A3500/A3500FC Hardware Configuration Guide). The
Local/Remote switch on each power sequencer should be set to Remote. The 2x7 rack
has a front key switch that controls both racks. The 1x8 rack does not have a front
key switch.
5.1.4
SCSI Cables
The most common problem involving SCSI cables is with bent pins on the
connectors. This usually occurs during a system installation. A typical indication of
a defective SCSI cable is an error message indicating SCSI parity errors and the SCSI
transfer rate has been reduced or has switched from wide to narrow SCSI. A SCSI
cable that has failed on the host side usually results in a data path failure indication.
If the problem is with a failed SCSI cable on the drive side, the result is usually in
the form of a drive side channel failure indication. Currently there are no procedures
for testing a SCSI cable for failure other than to replace it with a new or known good
cable. Also, ensure that the SCSI bus length is within the recommended maximum of
25 M (see Section 2.1.5, “SCSI and Fiber-Optic Cables” on page 2-3).
5-6
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
5.1.5
SCSI ID Jumper Settings
The controller module SCSI ID can be changed, if necessary, by the use of jumpers.
The SCSI ID jumper block is located at the rear of the SCSI controller module. See
Section 2.3 “Verifying Controller Module ID Settings” in the Sun StorEdge
A3500/A3500FC Hardware Configuration Guide for detailed instructions. The factory
default settings are shown in TABLE 5-1.
TABLE 5-1
5.1.6
Controller Module SCSI ID Settings
Controller
SCSI ID
A
5
B
4
SCSI Termination Power Jumpers
There are two SCSI termination power jumpers located at the rear of the SCSI
controller module. One is located below and to the left of the J11 connector (DIFF
SCSI ARRAY 1). The other is located below and to the right of the J4 connector (DIFF
SCSI HOST B). These jumpers enable the controller boards to supply SCSI
termination power to the host SCSI bus. Do not remove these jumpers.
5.1.7
LED Indicators
See Section 4.1 “Checking the Controller Module LEDs” in the Sun StorEdge
A3500/A3500FC Controller Module Guide for detailed information on checking the
controller module LEDs.
5.1.8
Backplane Assembly
Backplane failures are rare. There are no active devices on the backplane.
5.1.9
D1000 FRUs
Refer to the following web site for a listing of the D1000 FRUs:
http://infoserver.central/data/syshbk
Chapter 5
Maintenance and Service
5-7
5.1.10
5.1.11
Verifying the HBA
■
Refer to Early Notifier 20029 for the latest information regarding HBA support.
■
The UDWIS/SBus host bus adapter (HBA) should be at firmware level 1.28 or
higher (Refer to FCO A0163-1 and FIN I0547 for further details).
■
The older SOC+ card, part number 501-3060, is not supported with the A3500FC.
You need to check the card part number label located on the SBus connector to
determine the part number. Do not rely on the output of prtdiag, it may provide
the wrong part number.
■
The newer SOC+ card 501-5202 and 501-5266 are supported with A3500FC. Be
sure to have firmware level 1.13 (patch no. 109400-03 or higher) installed.
■
To use the onboard SOC+ you need to have firmware level 1.13 installed (patch
no. 103346-25 or higher). The two onboard SOC+ on one I/O board can be used at
the same time.
Verifying the Controller Boards and Paths to the
A3x00/A3500FC
See Section 4.1.2 “Checking the Controller LEDs” in the Sun StorEdge
A3500/A3500FC Controller Module Guide for information on interpreting the controller
status LED pattern.
A controller held in reset (offline) does not necessarily mean that the controller is
defective. Rather, it indicates that the I/O between the controller and the host has
been interrupted. There are several conditions that can cause this: a defective I/O
board, a defective HBA, a defective host SCSI cable or bent pins on the cable, a
defective SCSI terminator, a defective controller (either one), or the user may have
taken the controller offline manually.
RAID Manager 6 Recovery Guru provides step by step instructions to assist you in
troubleshooting a data path failure or an offline controller. If you have already
replaced the controller board and the replacement controller still cannot be brought
online, this is a good indication that the data path failure is with another defective
component.
The following two commands are very helpful in troubleshooting data path failures:
rdacutil -U and rdacutil -u.
The rdacutil -U command unfails the alternate controller without sending I/O
through the data path to check the controller. The controller goes through SOD self
diagnostics which is not an extensive diagnostics test. If the controller passes SOD
then it should come online.
5-8
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
The rdacutil -u command unfails the alternate controller then attempts to
communicate with the alternate controller through the I/O path. This is how the
RAID Manager 6 GUI unfails a controller.
Sometimes by issuing these two commands you can determine if the failure is
internal to the controller board or external. If the controller is not at fault, to perform
further troubleshooting, you will need to have spare parts readily available to isolate
the defective component by substituting components one at a time. A defective HBA
is one possible cause. Verify that the firmware used in the HBA (for example
firmware 1.28 or higher for an UDWIS/SBus HBA) is current. A defective host SCSI
cable or terminator is another possible cause for data path failure. Sometimes you
may see reducing sync transfer rate or disabled wide SCSI mode error messages. The
most common cause for these error messages are bent pins on a SCSI cable
connector. Also be aware that a SCSI terminator can be defective even if the green
LED on the SCSI terminator is on.
Note – Do not perform a boot -r while a controller is held in reset because it will
rearrange the device path and may cause the controller to not appear in RAID
Manager 6.
5.1.12
Controller Board LEDs
The controller module’s LEDs indicate the status of both the controller module and
its individual components. The green LEDs indicate normal operating status; amber
LEDs indicate a hardware fault. It is important that you check all the LEDs on the
front and back of the controller module when you turn on power. Besides fault
isolation, the LEDs can be used to determine if there is any I/O activity between the
host and the controller modules.
Here are a few things to consider when checking the LEDs for status:
■
If a Fast Write Cache operation or other I/O activity is in progress to the
controller module (or attached drive units), you may see several green LEDs
blinking including: the Fast Write Cache LED (on the front panel), controller FRU
status LEDs, or applicable drive activity LEDs.
■
The green heartbeat LED on the controller FRUs blinks continuously.
■
See Section 4.1.2 “Checking the Controller LEDs” in the Sun StorEdge
A3500/A3500FC Controller Module Guide for the LED pattern information.
■
An active controller will not have the same status LEDs lit as a passive controller.
■
If you just turned on power, the controller module’s green and amber LEDs will
turn on and off intermittently. Wait until the controller module finishes powering
up before you begin checking for faults.
Chapter 5
Maintenance and Service
5-9
5.1.13
Ethernet Port
The Ethernet port located on the back of the controller module is not supported.
5.2
FRU Replacement
This section contains the following topics:
5.2.1
■
Section 5.2.1, “HBA” on page 5-10
■
Section 5.2.2, “Interconnect Cables” on page 5-11
■
Section 5.2.3, “Power Cords” on page 5-11
■
Section 5.2.4, “Power Sequencer” on page 5-11
■
Section 5.2.5, “Hub” on page 5-12
■
Section 5.2.6, “Controller Card Guidelines” on page 5-12
■
Section 5.2.7, “Amount of Cache” on page 5-13
■
Section 5.2.8, “Battery Unit” on page 5-13
■
Section 5.2.9, “Cooling” on page 5-14
■
Section 5.2.10, “Disk Drives” on page 5-14
■
Section 5.2.11, “Disk Tray” on page 5-14
■
Section 5.2.12, “Midplanes” on page 5-15
■
Section 5.2.13, “Reset Configuration and sysWipe” on page 5-16
HBA
If the host server does not support hot swapping of the I/O boards, you will need to
shut down the host to replace an HBA. You should read the manual that comes with
the HBA and become familiar with the HBA installation procedures.
Note – There are several sections in the Sun Enterprise Cluster System Hardware
Service Manual that provide detailed procedures on how to disconnect a host and to
remove and replace an HBA or a RAID controller.
5-10
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
5.2.2
Interconnect Cables
■
Host SCSI cables
Stop all I/O activities to the corresponding data path before replacing a host SCSI
cable.
■
SCSI cables
See Section 6.1 in the Sun StorEdge A3500/A3500FC Controller Module Guide for
further details.
■
Fiber-optic cables
See Section 6.2 in the Sun StorEdge A3500/A3500FC Controller Module Guide for
further details.
■
Controller module terminators
Stop all I/O activities to the corresponding controller module before replacing a
controller module terminator.
■
Drive side SCSI cables
You should stop all I/O activities to the controller module and power down the
controller module before replacing the drive side SCSI cables. If the failure caused
a disk tray to lose communication to the controllers, you will need to power cycle
the controller module to re-establish communication with the disk tray.
5.2.3
Power Cords
■
Power cords from the power sequencer to the device trays
See Section 7.2 in the Sun StorEdge A3500/A3500FC Controller Module Guide for
further details. You will need to power down the entire rack to perform this
operation. You will need to remove the bottom disk tray to gain access to the
power sequencer outlets. Further, you will need to remove the rack side panel to
perform this service.
■
Power cords from the power sequencer to the AC source.
See Section 5.3 in the Sun StorEdge Expansion Cabinet Installation and Service
Manual for further details.
5.2.4
Power Sequencer
See Section 5.4 in the Sun StorEdge Expansion Cabinet Installation and Service Manual
for further details. You will need to power off the entire rack to replace a power
sequencer. Be sure each AC cords are plugged back to its original outlet position.
Chapter 5
Maintenance and Service
5-11
5.2.5
Hub
You need to stop all I/O activities to the hub before replacing it. Refer to the FC-100
Hub Installation and Service Manual for further details.
5.2.6
Controller Card Guidelines
■
With RAID Manager 6.22.1 or patches 109232 and 109233, there are new
NVSRAMs. With a Sun StorEdge A1000, download the NVSRAM after the
controller card is replaced. See FIN I0709.
Note – Because controller FRUs come with an NVSRAM, remember to always hotswap A3x00/A3500FC controller FRUs to preserve the NVSRAM that exists on the
failed controller. See FIN I0709.
■
To replace a failed controller board, see Section 6.3.1 in the Sun StorEdge
A3500/A3500FC Controller Module Guide and refer to the Sun StorEdge A3x00
Controller FRU Replacement Guide. Also refer to FIN I0553-1.
■
Refer to Section 2.6.1, “Cluster Information” on page 2-10 for guidelines when
replacing a controller within a cluster environment.
Note – SCSI controller FRUs are factory loaded with a universal firmware that can
be upgraded or downgraded. Make sure to follow the procedures documented in the
Sun StorEdge A3x00 Controller FRU Replacement Guide to download firmware to the
newly replaced controller to match with the supported RAID Manager on the host.
Firmware downgrades on controllers that are not universal FRUs can only be done
through the serial port.
Caution – Do not use RAID Manager GUI to downgrade the firmware. Use the
command line to downgrade the firmware.
Caution – The possibility of controller “deadlock” exists with certain A3x00 and
RAID Manager 6.1.x configurations. Refer to FIN I0643-01 prior to performing any
firmware upgrade.
For a universal FRU firmware downgrade, you need to load appware first then
bootware. For a firmware upgrade, you need to load bootware first then appware.
There are four different controller FRUs:
■
5-12
SCSI Controller Canister w/Memory (D1000), part number 540-3083
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
■
SCSI Controller Canister w/Memory (RSM), part number 540-3600
■
Fiber Controller Canister w/Memory (D1000), part number 540-4026
■
Fiber Controller Canister w/Memory (RSM), part number 540-4027
When returning a controller canister for repair, ensure that the memory SIMMs are
returned with the controller canister. If a SCSI controller canister being returned has
128-MB of cache memory, order two memory FRUs, part number F370-2439 in
addition to the replacement controller canister FRU.
Note – You might sometimes come across a controller with part number 375-008901. This controller has the TX CDC chip. It was shipped in the SCSI version of the
A3500 with D1000 disk trays from July 1999 through October 1999. This controller
will only work with the SCSI versions of the A3500 (with D1000 disk trays) systems
that have RAID Manager 6.1.1 Update 2 or higher. The replacement FRU for this
controller is part number 540-3083. The TX controller should not be returned to the
FRU supply inventory.
5.2.7
5.2.8
Amount of Cache
■
The SCSI controller FRU is configured at the factory (default) with 64 MBs of
cache memory per board.
■
The SCSI controller FRU cache memory can be upgraded in the field to 128 MBs.
■
The FC controller FRU is configured at the factory (default) with 128 MBs of cache
memory per board.
■
Write cache in the A3x00 is normally mirrored between two controllers thus
reducing the effective cache size available to a controller by half.
■
The amount of cache memory between two controllers has to match before the
Write Cache Mirroring parameters described in Chapter 7 in the Sun StorEdge RAID
Manager 6.22 User’s Guide can be changed.
Battery Unit
See Section 7.4 “Replacing the Battery Unit” in the Sun StorEdge A3500/A3500FC
Controller Module Guide for detailed instructions on replacing the battery unit.
■
Make sure you reset the battery age on both controllers.
■
Check the battery date code label to make sure the new battery FRU has not
exceeded it’s shelf life (12 months from date of Manufacture)
Chapter 5
Maintenance and Service
5-13
5.2.9
■
The battery has a service life of two years. After two years, it needs to be
replaced. A fresh battery will guarantee that the data saved in the controller’s
cache memory will be kept live for up to the design specification of 72 hours.
■
See the sections “To Replace Old Batteries” and “To Replace New Batteries” in the
Sun StorEdge RAID Manager 6.22.1 Release Notes for more information on replacing
old and new batteries.
Cooling
See Section 7.1 “Recovering From a Power Supply Thermal Shutdown” in the Sun
StorEdge A3500/A3500FC Controller Module Guide for further steps to take in the event
of a cooling related problem.
5.2.10
Disk Drives
Do not install disk drives from other arrays into an A3x00/A3500FC (for example,
A1000, A3000, etc). This can cause DacStore corruption in the array and will require
another download of the NVSRAM file to repair. For further information regarding
NVSRAM, refer to the nvsram_versions file in the following internal directory:
/net/artemas.ebay/global/archive/StorEdge_Products
/sonoma/nvsram/nvsram_versions
When you replace a disk drive, if it is not already in the failed state, make sure you
use RAID Manager 6 to fail it before removing the disk drive. Then replace the disk
drive and use RAID Manager 6 GUI to bring the new disk drive into the
configuration.
Swapping drives without first using RAID Manager 6 to fail it or removing and
replacing a drive when the controller is powered off can result in phantom disk
drives or phantom LUNs. Further, it will require serial port access to remove the
phantom drive. This applies to the Global Hot Spare (GHS) drive as well.
5.2.11
Disk Tray
Refer to Chapter 3 in the Sun StorEdge A1000/D1000 Installation, Operations and
Service Manual for detailed instructions on removing and replacing components.
Power to the disk tray needs to be turned off while replacing the following disk tray
components:
■
5-14
The D1000 interface board (see FIN I0670-1 for proper procedure)
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
■
In the RSM tray: the SEN card and the WD2S card
Install a jumper at location ID3 to change SCSI address range from 0-7 to 8-15.
■
The entire disk tray
RAID Manager 6 Recovery Guru reports drive side channel failure when the failure
affects the entire disk tray. After the hardware component has been replaced, run a
health check to verify the status of the drive channel, drive tray, and each disk drive.
You might need to power cycle the controller module (not the entire rack), even
though RAID Manager 6 Recover Guru instructions indicate that you don’t have to
power cycle the controller module if the controller firmware level is 2.5.2 or higher.
You need to power cycle the controller module if any of the following conditions
occur:
■
The LUN reconstruction does not start.
Note – LUN reconstructions occur one at a time serially per controller.
■
Health check continues to report drive side channel failure.
■
Any of the drives in the disk tray do not come out of the failed or unresponsive
state to reconstruct.
Power cycling of the controller module enables the controller to re-scan the disk
trays. If you continue to encounter error messages while attempting to bring a drive
back online (reconstruct), physically remove the drive, wait 30 seconds, and then
reinstall the drive. Then use RAID Manager 6 Recovery Guru to reconstruct the
drive.
Note – Remember to replace any dummy drives that were removed during disk
tray service. The dummy drives are important to maintain proper air flow and
cooling in the disk tray.
5.2.12
Midplanes
See Section 6.3.3 in the Sun StorEdge A3500/A3500FC Controller Module Guide to
remove and replace the controller card cage with the midplane.
Chapter 5
Maintenance and Service
5-15
5.2.13
Reset Configuration and sysWipe
Reset Configuration or sysWipe will delete all LUNs and bring the RAID system to
a default state: active controller A, passive controller B and one default 10-MB LUN
0.
Reset Configuration is a RAID Manager 6 procedure and sysWipe is a serial port
command. sysWipe wipes clean all prior DacStore data. You need to issue a
sysReboot after a sysWipe command is executed.
Caution – Only trained and experienced Sun personnel should access the serial
port. You should have a copy of the latest Debug Guide. There are certain commands
that can destroy customer data or configuration information. No warning messages
appear if a potentially damaging command has been executed.
Note – When you issue a sysWipe command you may see a message indicating
that sysWipe is being done in a background process. Wait for a follow-on message
indicating that sysWipe is completed before issuing a sysReboot command.
sysWipe should be run from each controller.
There are times when a known good controller A is held in reset by controller B so
you will not be able to access the shell tool to issue a sysWipe command for
controller A. In this case, perform the following:
1. Physically remove controller A and controller B.
2. Then insert controller A but leave controller B out temporarily.
3. Issue a sysWipe command followed by a sysReboot command for controller A.
4. Then physically remove controller A and insert controller B.
5. Issue a sysWipe command followed by a sysReboot command for controller B.
6. Then insert controller A back into the system and the system should be in a
default state.
5.3
Software and Firmware Guidelines
This section contains the following topics:
5-16
■
Section 5.3.1, “Firmware, Software, and Patch Information” on page 5-17
■
Section 5.3.2, “RAID Manager 6 Upgrade” on page 5-18
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
■
Section 5.3.3, “Firmware Upgrade” on page 5-18
Follow these general guidelines to simplify software and firmware upgrades.
■
Do not run an A3x00 with mixed firmware levels. FRU replacements for controller
boards come at firmware 02.05.06.32. This level can be upgraded (or downgraded)
to an appropriate level after installation. Always check the firmware level.
■
When upgrading from RAID Manager 6.0 to RAID Manager 6.1.1 (or later),
upgrading the firmware requires an intermediate step. The firmware must first be
upgraded to 02.04.04, and then from there may be upgraded to 02.05.02. This procedure
is clearly stated in the release notes.
■
When adding newer A3x00 hardware with older A3x00 hardware, all hardware
can be run with the latest software. It is best, however, to migrate older firmware
to match the newer firmware on the system.
■
When running RSM2000/A3000 units on same system with A1000 units, firmware
in RSM2000/A3000 units must be at 02.05.02.04 or newer to maintain
compatibility. A1000 units work with RAID Manager 6.1.1 and above.
■
The current Sun StorEdge RAID Manager 6.22.1 Release Notes, are available on the
following web site:
http://infoserver.central
■
Refer to the Sun StorEdge A1000 and A3x00 Installation Supplement on the following
web site:
http://docs.sun.com:80/ab2/coll.266.1/ISUPPA1000A3X00/@Ab2Pa
geView/85?
This document contains a workaround for a disk drive firmware download bug in
the A3000 only. This workaround is a script that stops SEN card polling in the
A3000 controller, downloads new firmware, and then restarts SEN card polling.
5.3.1
Firmware, Software, and Patch Information
For detailed information regarding firmware, software, and patch information, refer
to Early Notifier 20029 located on the following web site:
http://sunsolve.Ebay.Sun.COM/cgi/retrieve.pl?doc=
enotify%2F20029&zone_32=20029
Also refer to the PatchPro web site:
http://patchpro.ebay/servlet/com.sun.patchpro.servlet.PatchPro
Servlet
Chapter 5
Maintenance and Service
5-17
5.3.2
RAID Manager 6 Upgrade
You should use the Sun StorEdge RAID Manager 6.22 and 6.22.1 Upgrade Guide (part
number 806-7792) to perform any upgrade to RAID Manager 6.22 or 6.22.1.
Refer to the Sun StorEdge RAID Manager 6.22 Installation and Support Guide for Solaris
for further details.
RAID Manager 6.1.1 does not support Solaris 8. Solaris 8 support requires RAID
Manager 6.22 and patch no. 108553. For Solaris 9, only RM 6.22.1 is supported.
RAID Manager 6.0 and RAID Manager 6.1 has a 2-MB DacStore. RAID Manager
6.1.1 and higher has a 40-MB DacStore. Refer to FIN I0557.
5.3.3
Firmware Upgrade
Caution – The possibility of controller “deadlock” exists with certain A3x00 and
RAID Manager 6.1.x configurations. Refer to FIN I0643-01 prior to performing any
firmware upgrade.
The firmware upgrade steps are as follows (also refer to Appendix A in the Sun
StorEdge RAID Manager 6.22 Installation and Support Guide for Solaris).
RM 6.0
2.4.1d
RM 6.1
to
2.4.4.1
Universal
to
2.5.6.32
RM 6.22
to
3.1.2.35
In each of the steps, upgrade bootware first then Appdware.
Chapter 7 “Maintenance and Tuning” in the Sun StorEdge RAID Manager 6.22 User’s
Guide provides specific information for downloading firmware in either Online or
Offline mode.
Note – The only valid time to downgrade firmware is from a universal FRU with
firmware level 2.5.6.32. Any other situation requires that it be done through the
serial port. Refer to FIN I0553-1 for further details.
Caution – Only trained and experienced Sun personnel should access the serial
port. You should have a copy of the latest Debug Guide. There are certain commands
that can destroy customer data or configuration information. No warning messages
appear if a potentially damaging command has been executed.
5-18
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Note – If you are running Solaris 7 dated 11/99 and plan to upgrade the firmware
on an A3x00 controller, you need to ensure that patch no. 106541-10 (KJP 10) for
Solaris has been installed. Refer to Sun Early Notifier EN20029 and bug 4334814
for further details.
Chapter 5
Maintenance and Service
5-19
5-20
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
CHAPTER
6
Troubleshooting Common Problems
This chapter discusses some common problems encountered in the field and
provides additional information and tips for troubleshooting these problems.
This chapter contains the following sections:
■
Section 6.1, “Controller Held in Reset, Causes, and How to Recover” on page 6-2
■
Section 6.2, “LUNs Not Seen” on page 6-6
■
Section 6.3, “Rebuilding a Missing LUN Without Reinitialization” on page 6-7
■
Section 6.4, “Dynamic Reconfiguration” on page 6-11
■
Section 6.5, “Controller Failover and LUN Balancing Takes Too Long” on
page 6-12
■
Section 6.6, “GUI Hang” on page 6-13
■
Section 6.7, “Drive Spin Up Failure, Drive Related Problems” on page 6-13
■
Section 6.8, “Phantom Controllers Under RAID Manager 6.22” on page 6-14
■
Section 6.9, “Boot Delay (Why Booting Takes So Long)” on page 6-15
■
Section 6.10, “Data Corruption and Known Problems” on page 6-16
■
Section 6.11, “Disconcerting Error Messages” on page 6-17
■
Section 6.12, “Troubleshooting Controller Failures” on page 6-17
6-1
6.1
Controller Held in Reset, Causes, and
How to Recover
This section contains the following topics:
■
Section 6.1.1, “Reason Controllers Should be Failed” on page 6-2
■
Section 6.1.2, “Failing a Controller in Dual/Active Mode” on page 6-3
■
Section 6.1.3, “Replacing a Failed Controller” on page 6-4
The A3x00/A3500FC controllers do not detect controller failure and fail themselves.
The host system via the A3x00/A3500FC drivers or the user must make the decision
to fail a controller. Failing controllers is only possible in a system with redundant
controllers.
The redundant array controller architecture was developed on the premise that the
host system is best able to determine when a subsystem component has failed. A
controller is failed if it is held in a hardware reset state. A user should fail a
controller if there is cause for concern with regard to the controller’s hardware. If the
controller is failed (for example held in a hardware reset state) it will not be able to
access any data on the disk drives.
6.1.1
Reason Controllers Should be Failed
■
Unresponsive controller
An array controller may become unresponsive as a result of a controller host chip
failure, a loss of power to one of the controllers, or a controller hardware failure.
The controller should always be reset, and be given adequate time cycle to
through its reset logic before taking any further action.
An unresponsive controller’s typical symptoms include selection time-outs
and/or continuous command time-outs. The host should first attempt to revive
the controller from a possible hung state via a bus reset. If this fails, the host
should continue to access the configured LUN via the alternate controller, and fail
this controller.
■
Obtrusive Controller
An obtrusive array controller is one which interferes with the normal operation of
its alternate. This may be the result of a failing data path component of one of the
array controllers, an array controller drive side SCSI bus failure, or a failing disk
drive that has not been marked failed yet.
6-2
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
The symptoms of obtrusive array controllers may be the successful completion of
some commands, particularly non data access commands. Many data access
commands may fail on one or both controller in the subsystem. Other symptoms
include frequent command time-outs amidst many successful command
operations.
■
Failed Inter-controller Communication (ICON) Path
Redundant controllers rely on the ICON channel, which may be a dedicated
Application-specific Integrated Circuit (ASIC). The failed inter-controller
communication path condition occurs when the communication path between the
two array controllers has failed due to a communication line failure on one of the
controllers. The controllers have the communications capability to function in the
active/passive mode without the use of the ICON channel.
Diagnostics are run on the ICON channel at power up. Therefore, ICON channel
failures maybe detected at Start Of Day (SOD). If you are going through the serial
port and such a failure was detected, the controller’s Diagnostic Manager will
provide the option to the user to abort the power up sequence and replace the
controller.
The failure of the ICON channel may cause some of the following situations to
arise:
■
Switching from any mode of operation to dual/active mode is not allowed.
■
Logical unit ownership transfers are not allowed.
■
Changes in drive status for example from Optimal to Failed.
■
Changes in LUN status for example from Degraded to Optimal.
■
■
6.1.2
The addition of a logical unit to one controller will not be seen by the other
controller until the next reset or power up causes both controllers to read the
array configuration stored on the disk drives.
The deletion of a logical unit owned by one controller will not be seen by the
alternate controller until the next reset or power up.
Failing a Controller in Dual/Active Mode
If the host determines that a controller operating in dual/active mode has failed, it
may hold this controller in reset. To prevent loss of access to the LUNs owned by
this controller, the host must switch mode of operation to active/failed (passive)
mode.
The host should issue a mode select to the redundant controller page to the nonfailed controller. This requires setting the alternate RDAC mode parameters to failed
alternate controller (0x0C).
Chapter 6
Troubleshooting Common Problems
6-3
Upon receiving the mode select, the controller will:
■
Attempt to quiesce itself and its pair.
■
New commands to either of the controllers will terminate with a check condition
indicating that quiescence is in progress.
■
Write the new controller information to DacStore.
■
Hold the alternate controller in reset.
■
Reset the drive buses.
■
Reconfigure to become the active controller in active/passive mode.
■
Return status to the host for the mode select command.
The alternate controller is held in a hardware reset state, and is inaccessible to and
from the host.
6.1.3
Replacing a Failed Controller
Note – A controller that “owns” logical units should not be hot swapped. You
should either fail the controller prior to removal (preferable), or switch the controller
to active/passive mode.
Refer to FIN I0709 and Section 5.2.6, “Controller Card Guidelines” on page 5-12 for
more information on NVSRAM in the controllers.
A controller held in a hardware reset state for example failed, will have all of its
LEDs on. A passive controller in the active/passive mode flashed a pattern of 0xEE
or 0x6E/0xEE (module profile will indicate if it in passive mode).
When a failed controller is replaced, the new controller will not be automatically be
made operational. It will remain in the hardware reset state (failed) until the good
alternate controller is directed to release it from this state.
A failed controller is unfailed when it is released from the hardware reset state. This
is done to allow the host to perform diagnostics on a previously failed controller, or
to release a newly replaced controller from reset.
If you have a controller failure and you are using the Recovery Guru, it is very
important to follow every single step as displayed through the popup windows
during the recovery process. Failure to do so can cause further problems with your
RAID module. Refer to Section 2.6.1, “Cluster Information” on page 2-10 for
guidelines when replacing a controller within a cluster environment.
After you have brought your controller back online, do a module profile and make
sure that the failed controller is in active mode and not in passive mode. If it is in
passive mode, go through the maintenance application and bring that controller
6-4
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
back to active mode. Once you have done that, you might have to do some LUN
rebalancing between the active controllers now. Also check to make sure that the
firmware level matches what was on the active controller; you might have to do a
firmware upgrade on the replaced controller.
If you are going to use the rdacutil -u/U to unfail the controller, and use the
device argument form, you must use the controller that is in active mode and not the
failed controller device name. The -U option will not do any checks and will just try
to bring the controller back online through brute force.
It has been reported from the field that sometimes during a boot -r that one of the
controllers might be thrown into a reset (failed) mode. Because of this, if you are
adding a new A3x00 or moving them around, that /dev/dsk or /dev/rdsk will not
get built for the controller held in reset mode.
The simplest way to resolve this is to do the following:
■
Do a module profile on that A3x00 and get the cAtBdCsD for that good controller.
■
Execute rdacutil -u cAtBdCsD.
■
Once it is completed, do another module profile and make sure that the controller
that was held in reset is active. If it is in passive mode go through the
maintenance application and make it active.
■
Execute boot -r and the paths for that controller that was held in reset mode
will now be built.
■
Once the system is back up, execute a lad and a healthck -a.
Note – At this point, you might want to do some LUN balancing now that both
controllers are in active mode.
6.1.4
Additional ASC/ASCQ Codes
ASC/ASCQ 0x3f/02 is not documented in either the LSI Software Interface
Specifications on the engineering web page or in the online file
/etc/raid/raidcode.txt. These codes are reported by the backend disk drives in the
array. These ASC/ASCQ codes can appear on the controller console, as viewed
through the serial ports. The message might look like:
interrupt: NOTE: interrupt: 00000000 29 00 00 00 0000interrupt:
interrupt: WARN: interrupt: Sense data from device 00100001:
SKEY/ASC/ASCQ 06/3f/02 interrupt
The additional ASC/ASCQ codes are the following:
Chapter 6
Troubleshooting Common Problems
6-5
■
■
■
■
6.2
3f/00
3f/01
3f/02
3f/03
targe operating conditions have changed
microcode has been changed
changed operating definition
inquiry data has changed
LUNs Not Seen
There are many possible causes, but the usual scenario is after reconfiguration of the
system:
6-6
■
After upgrade of RAID Manager 6, see bug 4118532.
■
After upgrade of Solaris: usually sd.conf is lost causing LUNs above 8 are no
longer seen. This is described in the Sun StorEdge RAID Manager 6.22 Release
Notes.
■
With RAID Manager 6.22, upgrading to Solaris 8 requires patch no. 108553. If you
were running with patch no. 108334, it should be removed first.
■
Removing or adding HBA’s can cause some controllers not to be seen as described
in bug 4295322.
■
The creation of a LUN on the A3500FC in a multi-host environment requires a few
extra steps to ensure that the device is properly built on both hosts. When a LUN
is created with RAID Manager 6, the devices for RAID Manager and Solaris are
properly configured. However, RAID Manager 6 does not configure the devices
on the other host. You must configure the devices on the other host manually. See
bug 4336091 for further details.
■
Problems with adding greater than 8 LUN support can cause some or all the
LUNs not to be seen. FIN I0589-1 describes the proper way to increase the
number of LUNs, especially using the glm HBA (PCI SCSI). Note the
add16lun.sh script delivered on the RAID Manager 6.1.1u1 CD-ROM was
wrong and should not be used. See FIN I0589-1.
■
Having VERITAS Volume Manager 2.x installed and DMP enabled can cause loss
of access to LUNs when a failover occurs. This configuration is not supported
because its puts data at risk.
■
A related issue is loss of communication with the module rdac. rdac cuts off
communication when it can’t access a LUN when there is a good path in
/dev/osa for it. This can happen if LUN creation is terminated prematurely,
either by operator intervention (kill -9 or control-C), by system panic or
power failure. rdac_disks usually corrects this problem or a reconfiguration
reboot.
■
Adding 17 LUNs to a module connected by the FC-PCI HBA, which only
supports 16 LUNs will cause this loss of communication too, see bug 4304898.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
■
6.3
Adding drives from another A3x00 while the system is powered down can cause
loss of LUN configuration as described in bug 4133673.
Rebuilding a Missing LUN Without
Reinitialization
This section covers the following topics:
■
■
■
■
■
Section 6.3.1,
Section 6.3.2,
Section 6.3.3,
Section 6.3.4,
Section 6.3.5,
“Setting the VKI_EDIT_OPTIONS” on page 6-7
“Resetting the VKI_EDIT_OPTIONS” on page 6-9
“Deleting a LUN With the RAID Manager GUI” on page 6-9
“Recreating a LUN With the RAID Manager GUI” on page 6-9
“Disabling the Debug Options” on page 6-10
Rebuilding a missing LUN without reinitalizing it can be dangerous, because data
might get lost permanently. Use the following procedure only as a last resort after all
attempts of recovery have failed.
The process of recreating a missing LUN or deleting a LUN if needed requires you to
use the controller serial port and the RAID Manager GUI. You might need to recreate
a LUN when it is missing after a reboot of a controller.
Before you begin to recreate a LUN, make sure of the following:
■
■
A copy of the module profile prior to the disaster is available before attempting to
recreate a LUN.
Use the procedures for recreating a LUN if you know that the data is still intact.
Note – If you are recreating LUN 0 and if LUN 0 is greater than 10MB (for example
36GB RAID 5), and the module profile shows that it is only equal to 10MB, stop. Do
not proceed with this procedure. You will not be able to recover LUN 0.
6.3.1
Setting the VKI_EDIT_OPTIONS
If the system has dual controllers, set the VKI_EDIT_OPTIONS on both controllers
as follows.
1. At the RAID controller shell prompt, enter:
-> VKI_EDIT_ OPTIONS
Chapter 6
Troubleshooting Common Problems
6-7
2. To enter insert mode, type:
i
Press Return or Enter.
3. Type:
writeZerosFlag=1
Press Return or Enter twice.
4. To enable debug options, type:
+
Press Return or Enter.
5. To quit, type:
q
Press Return or Enter.
6. To commit changes, type:
y
Press Return or Enter.
7. From the shell prompt type:
-> writeZerosFlag=1
8. From the shell prompt type:
-> writeZerosFlag
6-8
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
If the flag was set properly the output should indicate: value = 1 and you can
proceed to Section 6.3.3, “Deleting a LUN With the RAID Manager GUI” on page 6-9
or Section 6.3.4, “Recreating a LUN With the RAID Manager GUI” on page 6-9.
However, if the output says anything like: “new value added to table,” something
was done incorrectly within the VKI_EDIT_OPTIONS. Do not proceed. Re-enter the
VKI_EDIT_OPTIONS and remove the statement previously entered on both
controllers.
6.3.2
Resetting the VKI_EDIT_OPTIONS
1. To clear the settings, type:
c
2. To confirm, type:
y
3. Return to Section 6.3.1, “Setting the VKI_EDIT_OPTIONS” on page 6-7.
6.3.3
Deleting a LUN With the RAID Manager GUI
Delete a LUN only if you determine it is necessary for your configuration.
1. Select the RAID module containing the LUN you want to delete.
2. Select the drive group containing the LUN you want to delete in the module
information area.
3. Highlight the LUN to delete and press the Delete key.
Respond to all prompts to delete the LUN.
6.3.4
Recreating a LUN With the RAID Manager GUI
Use the module profile information gathered prior to the loss of a LUN to recreate
the exact LUN parameters. Pay particular attention to drive order, segment size and
caching parameters.
1. Launch the RAID Manager GUI.
Chapter 6
Troubleshooting Common Problems
6-9
1. From the Configuration screen, select a module from RAID Module.
When the LUN was deleted, all the drives assigned to that LUN should have been
moved to the Unassigned drive area under module information.
2. Highlight the Unassigned drive icon.
Right click the Unassigned icon and select Create LUN...
3. Select the appropriate input for the RAID Level, Number of Drives, and Number
of LUNs options.
Create an exact replica from the module profile. As you select drives under Number
of Drives, select the drives in the exact order as before the loss of the LUN.
4. Click Options and select the segment size and the caching parameters
When you are finished, click OK. The “Create LUN” screen appears.
5. Click Create.
Creating the LUN takes a few minutes. The Configuration screen indicates the RAID
Manager is formatting the LUN and then indicates the LUN is optimal.
6. After the LUN becomes optimal, exit RAID Manager 6 and shut down the host.
7. Power cycle the controllers.
8. Bring up the Host.
The LUN should be in its original state.
9. Restart RAID Manager and verify the LUN is created.
6.3.5
Disabling the Debug Options
After you create the LUN, perform these steps from the serial port using the
VKI_EDIT_OPTIONS on both controllers to disable the Debug Options.
1. At the RAID controller shell prompt, enter:
-> VKI_EDIT_OPTIONS
2. To clear the settings, type:
c
Press Return or Enter.
6-10
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
3. To confirm, type:
y
4. To disable the options, type:
-
5. To confirm, type:
y
6. To quit, type:
q
7. To confirm, type:
y
8. When you are back at the prompt enter:
-> writeZerosFlag=0 sysReboot
or
-> sysReboot
6.4
Dynamic Reconfiguration
This section contains the following topics:
■
Section 6.4.1, “Prominent Bugs” on page 6-12
■
Section 6.4.2, “Further Information” on page 6-12
Chapter 6
Troubleshooting Common Problems
6-11
6.4.1
6.4.2
Prominent Bugs
■
Bug 4356814 - Dynamic reconfiguration fails with A3500FC, Leadville drivers,
Qlogic 2202 on an E10000 The resolution of this bug demonstrates that dynamic
reconfiguration works on an E10000 over a PCI bus using QLogic 2202.
■
Bug 4330698 - Unable to detach (dynamic reconfiguration) system board with
A3x00/A3500FC connect. Indicates a recent problem with dynamic
reconfiguration under Solaris 2.6, probably related to patch levels since dynamic
reconfiguration worked on Solaris 2.6 when RAID Manager 6.22 released
November 99. If the dynamic reconfiguration operation includes moving the nonpageable memory kernel cage additional steps must be taken.
Further Information
Refer to FIN I0536-2 for further information on dynamic reconfiguration.
Also refer to the following web sites for guides on dynamic reconfiguration:
http://sunsolve5.sun.com/sunsolve/Enterprise-dr
http://esp.west/home/projects/ssp3_5/pubs/ngdrpubs.html
For an A3000 disk array, see the "Special Handling of Sun StorEdge A3000" section
under Chapter 2 "DR Configuration Issues" in the Sun Enterprise 10000 Dynamic
Reconfiguration User Guide, 806-2249-10 for additional information.
6.5
Controller Failover and LUN Balancing
Takes Too Long
■
Controller failover can appear to take a long time as described in FIN I0634-1.
The latest isp driver is needed: 105600-15 for Solaris 2.6 only.
FIN I0551 describes editing sd.conf so that adding the controller back in doesn’t
take so long and to cut down the amount of time it takes to reboot (reducing
drvconfig time).
■
There is another issue with Fibre Channel cable failures.
Bug 4338906 - Rdac takes a long time to disable a controller with fiber problem.
This bug describes a problem found in cluster testing.
Bug 4344061 - RAID Manager 6 application hangs for power off RAID module,
reset loop corrects. This bug describes recovery applications hanging when power
is re-applied.
6-12
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Both of these bugs might be due to a known problem with Vixel 1000 hubs, Sun’s
only hub product as of October 2000. The hub doesn’t propagate the link failure
back to the host, so if the path is lost on the array side of the hub, there is no
notification sent to the host. Resetting the loop via software will rectify the
problem.
6.6
GUI Hang
Sometimes certain RAID Manager 6 applications such as Recovery or Maintenance
will stop responding, either showing an hour glass or appearing to be dead.
Arraymon polls the devices every few seconds as well as when each application is
started. An unresponsive device can cause this. Killing the application by PID and
restarting RAID Manager 6 is a reasonable workaround. For A3500FC connected
arrays, using luxadm or sansurfer to reset the FC loop is also effective.
Also be aware that certain processes such as LUN reconstruction can take as long as
five minutes to complete.
6.7
Drive Spin Up Failure, Drive Related
Problems
Sometimes drives in an A3x00 disk array will fail without any apparent reason when
the host is rebooted. Refer to bug 4253002.
This bug is fixed in the RAID Manager 6.1.1 Update 2 with patch no. 106513-5 or
RAID Manager 6.22x.
There is another problem with drives failing to spin up due to the Calico chip in
some Seagate drives. See patch no. 106817-3.
Chapter 6
Troubleshooting Common Problems
6-13
6.8
Phantom Controllers Under RAID
Manager 6.22
There have been issues regarding the installation and configuration of Solaris with
RAID Manager 6.22 and VERITAS Volume Manager. These issues regard instances of
"phantom controllers" or device nodes. This can cause problems for your installation.
To avoid these issues perform the installation of your system in the following order:
1. Solaris:
a. Install Solaris.
b. Install required patches.
2. RAID Manager 6.22:
a. Run pkgadd to install the RAID Manager 6.22 packages.
b. Edit the rmparams file and change the line
"System_MaxLunsPerController=8" to the number of LUNs needed.
c. Run /etc/raid/bin/genscsiconf.
d. Edit the rdac_address file to configure how you want your LUNs distributed
over the controllers as well as to define which path the system is allowed to
“see”. Refer to the man page rdac_address for further details.
e. Run init 6.
3. Configure LUNS:
a. Run /etc/raid/bin/rm6.
b. Upgrade firmware (if needed but perform offline).
c. Set controller active/active (if needed).
d. Create LUNs.
4. VERITAS Volume Manager (VxVM):
a. Run pkgadd to install the VxVM 3.0.2 packages.
b. Run vxinstall to create a rootdg.
c. Run /etc/raid/bin/rdac_disk.
d. Run init 6.
6-14
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
NOTES:
6.9
■
You can run the add16lun or add32lun script that comes with RAID Manager
6.22. It will do all the steps needed for 16 or 32 LUNs support (rdriver.conf
gets modified).
■
Another new command, rdac_disks, cleans up the device tree so there is no
confusion between VERITAS Volume Manager device tree and the /dev/osa
device tree. If this step is omitted you will likely find phantom controllers, and
that lad and format using different path’s etc.
■
If you are using any version prior to VxVM 3.0.2 you must disable DMP as
described in the Sun StorEdge RAID Manager 6.22 Release Notes or FIN I0511-2.
■
It is very popular out in the field for people to run the following commands:
drvconfig, devlinks, disks, or add_drv. This will result in multiple RDAC
links to a single device and the names will not agree with the RAID Manager path
names. This situation can be corrected by the execution of
/etc/raid/bin/rdac_disks. You can also look at the man pages for
rdac_disks for more information about this.
Boot Delay (Why Booting Takes So Long)
Several things can be done to reduce delays in booting, especially reconfiguration
reboots and drvconfig calls which are done by rdac_disks (1M) or add_disk
(1M). drvconfig (1M) is also called when controllers are brought back online.
If you using A3500FC disk arrays only, edit the rmparams file and remove the sd:
from the end of the variable Rdac_NativeScsiDrivers:sd:ssd:.
Also, clean up the sd.conf file so the SCSI device discovery process doesn’t have to
explicitly access each potential device. This is explained in FIN I0551.
Sometimes the number of device instances is so large that the drivers are spending
time looking for non-existent devices. One way to clean up these instances is to use
disks -C or under Solaris 8, devfsadm -C. This should not be done if the host is
connected to controllers that are failed over, temporarily removed, or if the LUNs are
not properly balanced because the extra device links will be removed.
Another approach that has been taken for a couple of escalations is to remove the
line rdnexus scsi from /etc/driver_classes and replace class=scsi by
parent=rdnexus in rdriver.conf as described in RFE no. 4374861. However,
this approach has not been thoroughly tested.
Under Solaris 9, a delay of five to ten minutes has been reported. See bug 4630273.
Chapter 6
Troubleshooting Common Problems
6-15
6.10
Data Corruption and Known Problems
■
Fujitsu 4/9-GB disk drive firmware 2848 has a bug and should be replaced using
patch no. 108873.
■
Turning power off a disk tray when using a RAID Manager version prior to 6.22.
Although this is not supported it can be done accidentally. See bug 4307641.
RAID Manager 6.22 with firmware 3.1.x has a fix for this problem.
■
RAID Manager 6.0 with firmware 2.4.x doesn’t properly handle internal memory
failures completely. It has been EOL’ed. Upgrade to at least RAID Manager 6.1.1
upgrade 2 immediately.
■
Removing a controller after a power failure, in which cached data blocks are held
on the controller.
■
A single point of failure is created if cache mirroring is disabled while write-cache
is enabled. Don’t use this combination. If the controller with a cached write block
fails, then the data is lost as described in “Cache Mirroring” on page 3-6.
■
If an entire A3x00 disk array is powered down and VERITAS File System (VxFS)
doesn’t automatically disable the file systems on the array, then those files need to
be manually disabled or data corruption will occur. See bug 4326273.
■
There are also problems in the isp driver that can cause data corruption so isp
patch levels should be maintained, see bug 4113677.
■
There is an E10000 software problem with caching; upgrade Solaris 2.6 to patch
no. KU-20.
■
There are potential data corruption issues in RAID Manager 6.0, 6.1, and 6.1.1
firmware when failover occurs. Make sure that patch no. 106513-4 at least is
installed. See bug 4293936.
Note – Patch no. 106513-4 is not compatible with RAID Manager 6.0/6.1. Upgrade
to at least RAID Manager 6.1.1Upgrade 2.
■
6-16
An error message with ASC/Q 0c/00 may appear to indicate data loss, but
doesn’t in reality. See bug 4124793.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
6.11
Disconcerting Error Messages
During ufsdump operations, the following errno 5 message may appear:
Apr 10 22:29:40 abc unix: WARNING: The Array driver is returning an
Errored I/O, with errno 5, on Module 1, Lun 1, sector 43261180
This message can be ignored if the error only occurs when ufsdump is running.
Otherwise the error needs to be further evaluated. See bug 4234852 and bug
4289725 and the related escalations. This message is not encountered when using
RAID Manager 6.22x. The root cause is ufsdump reading past the end of the
partition or reading 192 bytes which is not a proper "block read".
Make sure that the file system being dumped is not mounted and make sure it is
clean, run fsck.
The error message ASC/Q 0C/00 sense key 4,(6) indicates write failure when
its only a battery warning which may cause disabling caching or mirroring.
Sometimes after power up, it takes a while for the battery to recharge. See
raidcode.txt for the list of possibilities. Also see bug 4124793.
Another harmless message is:
unix: WARNING: kstat rcnt == 0 when exiting rung, please check
Upgrading to RM 6.22.1 will reduce the occurrence of this message. This message
can be safely ignored. See bug 4671354 for details.
6.12
Troubleshooting Controller Failures
Refer to the information listed under Technical Information at:
http://webhome.sfbay/A3x00
Chapter 6
Troubleshooting Common Problems
6-17
6-18
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
APPENDIX
A
Reference
This chapter contains the following topics:
■
Section A.1, “Scripts and man Pages” on page A-2
■
Section A.2, “Template for Gathering Debug Information for CPRE/PDE” on
page A-3
■
Section A.3, “RAID Manager Bootability Support for PCI/SBus Systems” on
page A-4
■
Section A.4, “A3500/A3500FC Electrical Specifications” on page A-5
■
Section A.5, “Product Names” on page A-7
A-1
A.1
Scripts and man Pages
A number of scripts are available in the Tools directory of the released CD. The
README file in the Tools directory has a description of these scripts.
A sample copy of the README file is available in the following directory:
/net/artemas.ebay/export/releases/sonoma/rm_6.22/rm6_22_FCS \
/Tools/README
The following man pages provide supplementary information for RAID Manager
6.22 array management and administration.
A-2
■
arraymon
■
drivutil
■
fwutil
■
genscsiconf
■
healthck
■
hot_add
■
lad
■
logutil
■
nvutil
■
parityck
■
perfutil
■
raidcode
■
raidutil
■
rdac_disks
■
rdacutil
■
rdaemon
■
rm6
■
rmscript
■
storutilsymconf
■
rdac.7
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
A.2
Template for Gathering Debug
Information for CPRE/PDE
The following template should be used when submitting information to engineering
regarding problems encountered in the field with the A3x00/A3500FC.
■
What is the current version of Solaris that is running on the host processor? Does
the problem also occur on previous versions of Solaris, for example, Solaris 2.5.1,
Solaris 7, Solaris 8, etc?
■
Record the output of the "Save Module Profile" from RAID Manager 6 GUI. This
output will be included but not limited to the following:
■
Controller bootware and firmware version, RAID Manager 6 version number
■
Product ID
■
Controller configuration - active/active, active/passive
■
LUN configuration - RAID level, number of pieces / piece assignment, size of
LUN, caching parameters, controller assignment
■
A copy of /var/adm/messages and the approximate time when the problem
was observed
■
A copy of /etc/raid/rmparams
■
System configuration needed to replicate problem. A copy of the explorer data or
the following:
■
A copy of the output from prtdiag
■
A copy of the output from prtconf -vp
■
A copy of /etc/system
■
A copy of /kernel/drv/sd.conf
■
A copy of /kernel/drv/rdriver.conf
■
A copy of /kernel/drv/rdnexus.conf
■
A copy of the output from showrev -p | sort + 1 - 2
■
A copy of the output from pkginfo -l SUNWosafw SUNWosar SUNWosau
■
Exact detailed steps and commands necessary to prepare the system for problem
replication
■
Exact detailed steps needed to replicate the problem
■
Description of the problem observed and approximate time necessary in order to
replicate the problem
■
Name, phone number, and email address of contact person who can answer LSI
Logic, Inc. questions about problem setup
Appendix A
Reference
A-3
■
The state of the components in the A3x00/A3500FC (for example are there any
failed controllers or drives, have any cables been disconnected, etc)
■
A copy of the output from RAID Manager 6 health check
Note – The engineer that is working on low level A3x00/A3500FC firmware may
not be very familiar with low level system administration commands, details of the
configuration, or how the system operates. The engineer will require detailed
information to determine what the problem is. In general, if the problem can be
narrowed down to an easily reproduced scenario or one that doesn’t take a long
time to replicate, the engineer will have a better chance of duplicating the problem
in the lab.
■
A.3
A copy of rmlog.log after it has been run through logutil
RAID Manager Bootability Support for
PCI/SBus Systems
The following tables provide the results of bootability testing of RAID Manager on
different versions of Solaris over PCI and SBus interfaces.
Note – The A3500FC on PCI and the A3500FC on SBus are not supported.
TABLE A-1
RAID Manager
Version
Solaris 2.6
Solaris 2.7
Solaris 2.8
(02/2000)
Solaris 2.8
(07/2001)
Solaris 2.9
6.1.1
Pass
Pass
Not Supported
Not Supported
Not Supported
6.22x
Fail
Fail
Fail
Fail
Not Supported
TABLE A-2
A-4
A1000 Bootability on PCI-Based Hosts
A1000 Bootability on SBus-Based Hosts
RAID Manager
Version
Solaris 2.6
Solaris 2.7
Solaris 2.8
(02/2000)
Solaris 2.8
(07/2001)
Solaris 2.9
6.1.1
Pass
Pass
Not Supported
Not Supported
Not Supported
6.22x
Pass
Pass
Pass
Fail
Not Supported
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
TABLE A-3
A3x00 Bootability on PCI-Based Hosts
RAID Manager
Version
Solaris 2.6
Solaris 2.7
Solaris 2.8
(02/2000)
Solaris 2.8
(07/2001)
Solaris 2.9
6.1.1
Pass
Pass
Not Supported
Not Supported
Not Supported
6.22x
Pass
Pass
Fail
Fail
Not Supported
TABLE A-4
A3x00 Bootability on SBus-Based Hosts
RAID Manager
Version
Solaris 2.6
Solaris 2.7
Solaris 2.8
(02/2000)
Solaris 2.8
(07/2001)
Solaris 2.9
6.1.1
Pass
Pass
Not Supported
Not Supported
Not Supported
6.22x
Pass
Pass
Pass
Fail
Not Supported
Note – A3500FC bootability is not supported.
A.4
A3500/A3500FC Electrical
Specifications
Refer to Appendix A.1 “Initial Cold Start Surge Current Specifications” in the Sun
StorEdge A3500/A3500FC Hardware Configuration Guide.Also refer to Appendix B.2
“Electrical Specifications” in the Sun StorEdge A3500/A3500FC Controller Module
Guide.
Also refer to Appendix B.2 “Electrical Specifications” in the Sun StorEdge
A3500/A3500FC Controller Module Guide.
Appendix A
Reference
A-5
The following table provides power consumption information for a given array
system configuration (minimum and maximum). The difference in power
consumption at 30˚C and 40˚C is due to the cooling fans spinning at a higher speed
at 40˚C.
TABLE A-5
Power Consumption Specifications
Power Consumption
at 30˚C
(BTU/Watts)
Power Consumption
at 40˚C
(BTU/Watts)
A3500 Lite using 9-GB or 18-GB disk
drives (minimum configuration)
1602/469
1803/528
A3500 Lite using 9-GB or 18-GB disk
drives (maximum configuration)
2847/834
3048/893
1x5 using 9-GB disk drives
(minimum configuration)
2131/624
2634/772
1x5 using 9-GB disk drives
(maximum configuration)
6021/1764
6524/1912
1x5 using 18-GB or 36-GB (1”) disk
drives (minimum configuration)
2206/646
2709/794
1x5 using 18-GB or 36-GB (1”) disk
drives (maximum configuration)
6472/1896
6975/2044
1x5 using 36-GB (1.6”) disk drives
(minimum configuration)
2476/725
2979/873
1x5 using 36-GB (1.6”) disk drives
(maximum configuration)
5845/1712
6348/1860
2x7 using 9-GB disk drives
(minimum configuration)
3889/1139
4593/1345
2x7 using 9-GB disk drives
(maximum configuration)
8868/2598
9572/2804
2x7 using 18-GB or 36-GB (1”) disk
drives (minimum configuration)
4039/1183
4744/1390
2x7 using 18-GB or 36-GB (1”) disk
drives (maximum configuration)
9500/2783
10204/2990
2x7 using 36-GB (1.6”) disk drives
(minimum configuration)
4579/1341
5283/1548
2x7 using 36-GB (1.6”) disk drives
(maximum configuration)
8621/2526
9324/2732
3x15 using 9-GB disk drives
(minimum configuration)
6393/1873
7902/2315
Configuration
A-6
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
TABLE A-5
Power Consumption Specifications (Continued)
Power Consumption
at 30˚C
(BTU/Watts)
Power Consumption
at 40˚C
(BTU/Watts)
3x15 using 9-GB disk drives
(maximum configuration)
18063/5293
19573/5735
3x15 using 18-GB or 36-GB (1”) disk
drives (minimum configuration)
6618/1939
8126/2381
3x15 using 18-GB or 36-GB (1”) disk
drives (maximum configuration)
19417/5689
20925/6131
3x15 using 36-GB (1.6”) disk drives
(minimum configuration)
7427/2176
8935/2618
3x15 using 36-GB (1.6”) disk drives
(maximum configuration)
17532/5137
19041/5579
Configuration
A.5
Product Names
The following table is a matrix that lists the rack product, controller, and software
name for a disk array product before and after certain dates.
TABLE A-6
Product Name Matrix
Sun
StorEdge
A1000
Array
Rack
Product
Name
Controller
Name Tag
SE A1000
Before 04/98
RSM Disk
Tray
After 04/98
RSM Disk
Tray
Before
11/99
D1000 Disk
Tray
After 11/99
D1000
Disk Tray
Released 11/99
Fibre Channel
(Dilbert)
Release 11/99
Fibre Channel
(Tabasco)
RSM 2000
SE A3000
SE A3500
SE A3500
SE A3500FC
SE A3500FC
RSM 2000
SE A3000
SE A3000
SE A3500
SE A3500FC
SE A3500FC
Abbreviations:
■
SE A1000 - Sun StorEdge A1000 array
■
RSM 2000 - Sun RSM 2000 array
■
SE A3000 - Sun StorEdge A3000 array
■
SE A3500 - Sun StorEdge A3500 array
■
SE A3500FC - Sun StorEdge A3500FC array
Appendix A
Reference
A-7
■
SE A3500FCd - Sun StorEdge A3500FC with D1000 disk trays
■
SE A3500FCr - Sun StorEdge A3500FC with RSM disk trays
Definitions:
■
Rack Product Name—The marketing name for the rack. This name appears on the
brochure or data sheet for the product.
■
Sun StorEdge Controller Name Tag—The name tag is located on the face plate of
the controller.
TABLE A-7
Product Names
Product ID Strings
Sun StorEdge A1000 array
StorEDGE A1000
Sun StorEdge RSM 2000
RSM Array 2000
Sun StorEdge A3000 array
StorEDGE A3000
Sun StorEdge A3500 array
StorEDGE A3000
Sun StorEdge A3500FC with RSM trays
StorEdgeA3500FCr
Sun StorEdge A3500FC with Dilbert trays
StorEdgeA3500FCd
■
A-8
NVSRAM Product ID
Product ID (NVSRAM)—The product ID is set in NVSRAM. It is visible via the
format command on LUN labels. It is also visible via RAID Manager 6 GUI. Select
the Module Profile, select Controller Information, and look in the Product ID
field.
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
Index
NUMERICS
98/01 ASC/ASCQ error code, 2–5
bootability support matrix, A–4
box sharing setup, 2–13
Break key, 1–6
bug filing hints, 1–6
A
A3x00/A3500FC Commandments, 1–2
accessing the serial port, 1–5
ACES web site, 1–4
add_disk command, 6–15
adding
arrays, 2–6
arrays to a host with existing arrays, 2–6
arrays under VERITAS, 4–13
disk drives, 2–7
disk drives to existing arrays, 2–7
disk trays, 2–7
disk trays to existing arrays, 2–7
arrays, adding or moving, 2–6
ASC/ASCQ A0/00 error code, 2–2
ASC/Q 0C/00 sense key 4,(6) error code, 6–
17
available documentation, 1–3
available tools and information, 1–3
B
backplane assembly, 5–7
battery support information label, 2–2
battery unit
checking, 2–2
replacement, 5–13
boot delay, 6–15
C
cables
fiber-optic, 2–3
power, 2–2
SCSI, 2–3
cache
amount of, 5–13
configuration, 5–13
mirroring, 3–6
checking
battery unit, 2–2
controller module LEDs, 5–7
cluster
configurations, 2–10
information, 2–10
command
add_disk, 6–15
dip, 1–6
format, 4–4
fsck, 6–17
hot_add, 3–6
lad, 4–4
parityck, 3–11
rdac_disks, 6–15
rdacutil -U, 5–8
rdacutil -u, 5–8
storutil, 2–6
Index-1
sysReboot, 2–9, 5–16
sysWipe, 2–9, 5–16
tip, 1–6
common problems, 6–1
configuration
cache, 5–13
hardware, 2–1
RAID Manager, 3–1
RAID module, 3–3
reset, 5–16
software, 3–3, 4–1
configurations
cluster, 2–10
multi-initiator, 2–10
SCSI, 2–10
connecting power cables for new installation, 2–2
controller
board LEDs, 5–9
card replacement, 5–12
failover taking too long, 6–12
held in reset, 6–2
settings, 3–9
switch settings, 2–4
controller and disk tray switch settings, 2–4
converting 1x5 to 2x7 or 3x15, 2–8
cooling related problems, 5–14
CPRE Group Europe web site, 1–4
creation of a LUN, 3–5
crossing SCSI cables, 2–4
D
dacstor size (upgrades), 3–8
data corruption
disconcerting error messages, 6–17
known problems, 6–16
Debug Guide, 1–5
debug information template, A–3
device
quorum, 4–14
tree rearranged, 4–9
devices
ghost, 4–5
dip command, 1–6
disconcerting error messages, 6–17
disk drive
adding disk drives to existing arrays, 2–7
dummy drive, 5–15
Index-2
functionality, 5–3
moving disk drives to existing arrays, 2–7
related problems, 6–13
replacement, 5–14
spin up failure, 6–13
support matrix, 2–13
disk tray
common point of failure, 5–4
functionality, 5–4
replacement, 5–14
switch settings, 2–4
DMP
enabling and disabling, 4–12
documentation
obtaining, 1–3
web site, 1–3
downloading
documentation, 1–3
RAID Manager, 1–4
driver
Solaris kernel, 4–2
dynamic multipathing (DMP), 4–12
dynamic reconfiguration
further information, 6–12
prominent bugs, 6–12
related problems and workaround, 4–10
E
electrical specifications, A–5
Enterprise Services Storage ACES web site, 1–4
error code
98/01 ASC/ASCQ, 2–5
ASC/ASCQ A0/00, 2–2
ASC/Q 0C/00 sense key 4,(6), 6–17
error messages, 6–17
Escalation web site, 1–4
ethernet port, 5–10
extended LUN support, 3–5
F
failed controller replacement, 6–4
failing a controller in dual-active mode, 6–3
fan failure message, 5–2
FCOs, 1–6
fiber-optic cables, 2–3
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
FINs, 1–6
firmware
guidelines, 5–16
information, 5–17
upgrade steps, 5–18
format command, 4–4
FRU replacement, 5–10
fsck command, 6–17
G
GBIC support, 2–4
generating debug information, 4–3
ghost
devices, 4–5
LUNs, 4–5
GUI hang, 6–13
guidelines for replacing the controller card, 5–12
L
labeling
volume, 4–5
lad command, 4–4
latest version of RAID Manager, 1–4
Linux, 1–6
local/remote switch, 2–3
long wave GBIC support, 2–4
loop ID, 2–4
LSI Logic web site, 1–4
LUN
balancing taking too long, 6–12
creation, 3–5
creation process time, 3–8
deletion and modification, 3–9
general information, 3–5
numbers, 3–6
LUNs
ghost, 4–5
maximum LUN support, 3–5
not seen, 6–6
H
HA, 4–13
HA configuration using VERITAS software, 4–13
hardware installation and configuration, 2–1
HBA replacement, 5–10
HBA support matrix, 2–13
high availability, 4–13
hints for filing a bug, 1–6
hot_add command, 3–6
how to recover from a controller held in reset, 6–2
hub replacement, 5–12
I
independent controller setup, 2–13
information
cluster, 2–10
multi-initiator, 2–11
SCSI daisy chaining, 2–11
installation
hardware, 2–1
new, 2–2
RAID Manager, 3–1
software, 3–2, 4–1
internal directory, 1–4
M
maintenance information, 5–1
man pages, A–2
maximum LUN support, 3–5
maximum server configurations, 2–12
Microsoft Windows 98/2000, 1–6
midplane replacement, 5–15
mirroring, cache, 3–6
moving
arrays to a host with existing arrays, 2–6
arrays under VERITAS, 4–13
disk drives, 2–7
disk drives to existing arrays, 2–7
disk trays, 2–7
disk trays to existing arrays, 2–7
multi-initiator information, 2–11
N
Network Storage web site, 1–4
new
hardware installation, 2–2
software installation, 4–2
NVSRAM settings, 3–9
Index-3
O
obtaining
Debug Guide, 1–5
RAID Manager, 1–4
serial cable, 1–5
obtrusive controller, 6–2
onboard SOC+, 2–12
OneStop Sun Storage Products web site, 1–4
P
parity check settings, 3–10
parityck command, 3–11
patch information, 5–17
PFA, 3–3
phantom
controllers under RAID Manager 6.22, 6–14
power
cables, 2–2
sequencer, 2–3
sequencer configuration for 3x15, 2–3
sequencer configuration for new installation, 2–
3
sequencer replacement, 5–11
power consumption specifications, A–5
predictive failure analysis, 3–3
product name matrix, A–7
6.11 or 6.22, 4–2
upgrade, 5–18
upgrading to 6.22, 1–5, 3–2
white paper, 1–6
rdac_disks command, 6–15
rdacutil -U command, 5–8
rdacutil -u command, 5–8
reason controllers should be failed, 6–2
reconstruction rate of RAIDs, 3–7
recovering from a power supply thermal
shutdown, 5–14
reference, A–1
replacing
battery unit, 5–13
controller card, 5–12
disk drives, 5–14
disk tray, 5–14
failed controller, 6–4
FRUs, 5–10
HBA, 5–10
hub, 5–12
interconnect cables, 5–11
midplanes, 5–15
power cords, 5–11
power sequencer, 5–11
reset configuration, 5–16
rmlog.log fan failure message, 5–2
S
Q
QAST web site, 1–4
quorum device, 4–14
R
RAID
module configuration, 3–3
phantom controllers, 6–14
reconstruction rate, 3–7
use of RAID levels, 3–6
RAID Manager
6.0/6.1 not supported, 1–5
bootability support, A–4
commands, A–2
installation and configuration, 3–1
issues when upgrading from RAID Manager 6 to
Index-4
scripts, A–2
scripts and man pages, A–2
SCSI
A3x00 SCSI Lite, 2–9
cables, 2–3
common point of failure with cables, 5–6
configurations, 2–10
crossing SCSI cables, 2–4
daisy chaining information, 2–11
disabled wide SCSI mode, 5–9
ID, 2–4
ID jumper settings, 5–7
reducing sync transfer rate, 5–9
SCSI bus length calculation, 2–4
SCSI bus maximum bus length, 2–3
SCSI ID conflict, 2–5
SCSI to FC-AL upgrade, 2–14
termination power jumpers, 5–7
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002
sd_max_throttle setting, 4–3
SDS, 4–13
second port on the SOC+ card, 2–13
sequencer
power, 2–3
serial cable, 1–5
serial port access, 1–5
service information, 5–1
setting up 2x7 and 3x15, 2–8
SNMP, 4–11
trap data, 4–11
SOC+, 2–12
software
configuration, 3–3, 4–1
guidelines, 5–16
information, 5–17
installation, 3–2, 4–1
new installation, 4–2
Solaris kernel driver, 4–2
Solaris x86, 1–6
Solstice Disksuite (SDS), 4–13
Sonoma Engineering web site, 1–4
specifications
electrical, A–5
power consumption, A–5
Storage ACES web site, 1–4
storutil command, 2–6
Sun
Cluster information, 2–10
Download Center, 1–4
Software Shop, 1–4
StorEdge A3500/A3500FC electrical
specifications, A–5
StorEdge A3x00/A3500FC Lite, 2–9
StorEdge D1000 tray common point of
failure, 5–5
StorEdge RSM tray common point of failure, 5–5
supported configurations, 2–11
switch local/remote, 2–3
sysReboot command, 2–9, 5–16
sysWipe command, 2–9, 5–16
T
template for filing a bug, 1–6
template for gathering debug information, A–3
thermal shutdown, recovering, 5–14
tip command, 1–6
tools and information, 1–3
troubleshooting overview, 1–1
tunable parameters and settings, 3–3
U
unresponsive controller, 6–2
upgrading
firmware, 5–18
RAID Manager 6 to 6.11 or 6.22, 4–2
SCSI to FC-AL, 2–14
to RAID Manager 6.22, 1–5, 3–2
use of RAID levels, 3–6
V
verifying
amount of cache, 5–13
backplane assembly, 5–7
controller boards, 5–8
D1000 FRUs, 5–7
D1000 tray functionality, 5–5
FRU functionality, 5–2
functionality of the disk drives, 5–3
functionality of the disk tray, 5–4
functionality of the power sequencer, 5–5
HBA, 5–8
paths to the A3x00/A3500FC, 5–8
RSM tray functionality, 5–5
VERITAS
adding or moving arrays, 4–13
enabling and disabling Volume Manager
DMP, 4–12
Volume Manager, 4–12
volume
labeling, 4–5
W
web sites, 1–4
dynamic reconfiguration information, 6–12
edist home page, 1–3
Enterprise Services FIN & FCO Program, 1–6
Enterprise Services Storage ACES, 1–4
Escalation Web Interface, 1–4
firmware information, 5–17
Index-5
LSI Logic, 1–4
Network Storage, 1–4
OneStop Sun Storage Products, 1–4
patch information, 5–17
PatchPro, 5–17
QAST Group, 1–4
RAID Manager 6.22 documentation, 5–17
RAID Manager 6.22 upgrades, 1–5, 3–2
RAID Manager software documentation, 4–11
software information, 5–17
Sonoma Engineering, 1–4
Sun Cluster 2.1 documentation, 4–14
Sun Cluster 2.2 documentation, 4–14
Sun Cluster 2.2 field Q&A, 2–11
Sun Cluster 3.0 documentation, 2–10
Sun Cluster documentation, 3–4
Sun Cluster download information, 3–4
Sun Cluster engineering technical docs &
download information, 2–10
Sun Cluster home page, 2–10
Sun Cluster support matrix, 2–11
Sun Download Center, 1–4
Sun Field Engineer’s Handbook, 5–7
Sun Shopware Shop, 1–4
Sun StorEdge A1000 A3x00 installation
supplement, 5–17
SunSolve Early Notifier home page, 5–17, 6–12
SunStorEdge A3500FC Lite solution, 2–9
why booting takes so long, 6–15
World Wide Name (WWN), 2–5
Index-6
Sun StorEdge A1000 and A3x00/A3500FC Best Practices Guide • November 2002