Download ZNYX bh5700 User`s guide

Transcript
HP bh5700 ATCA 14-Slot Blade Server
Ethernet Switch Blade
First Edition
Manufacturing Part Number: AD171-9603A
June 2006
Ethernet Switch Blade User's Guide
release 3.2.2j
page ii
Legal Notices
The information in this document is subject to change without notice.
Hewlett-Packard makes no warranty of any kind with regard to this manual, including, but
not limited to, the implied warranties of merchantability and fitness for a particular
purpose. Hewlett- Packard shall not be held liable for errors contained herein or direct,
indirect, special, incidental or consequential damages in connection with the furnishing,
performance, or use of this material.
Restricted Rights Legend. Use, duplication or disclosure by the U.S. Government is subject
to restrictions as set forth in subparagraph (c) (1) (ii) of the Rights in Technical Data and
Computer Software clause at DFARS 252.227-7013 for DOD agencies, and subparagraphs
(c) (1) and (c) (2) of the Commercial Computer Software Restricted Rights clause at FAR
52.227-19 for other agencies.
Information in this document is provided in connection with Intel® products. No
license, express or implied, by estoppel or otherwise, to any intellectual property rights
is granted by this document. Except as provided in Intel’s Terms and Conditions of
Sale for such products, Intel assumes no liability whatsoever, and Intel disclaims any
express or implied warranty, relating to sale and/or use of Intel products including
liability or warranties relating to fitness for a particular purpose, merchantability, or
infringement of any patent, copyright or other intellectual property right. Intel
products are not intended for use in medical, life saving, or life sustaining applications.
Intel may make changes to specifications and product descriptions at any time, without
notice. HEWLETT-PACKARD COMPANY 3000 Hanover Street Palo Alto, California 94304
U.S.A.
Copyright Notice. Copyright ©2003 Hewlett-Packard Development Company, L.P.
Reproduction, adaptation, or translation of this document without prior written
permission is prohibited, except as allowed under the copyright laws.
Additional Copyright Notices. AdvancedTCA® is a registered trademark of the PCI
Industrial Computer Manufacturers Group. Linux® is a registered trademark of Linus
Torvalds. ZNYX Networks, RAIN, RAINlink, OpenArchitect®, CarrierClass and HotSwap
are trademarks or registered trademarks of ZNYX Networks in the United States and/or
other countries. All other marks, trademarks or service marks are the property of their
respective owners.
Ethernet Switch Blade User's Guide
release 3.2.2j
page iii
About the Ethernet Switch Blade Manual
This manual includes everything you need to begin using the HP Ethernet Switch Blade with
OpenArchitect software, Release 3.2.2j.
Ethernet Switch Blade User's Guide
release 3.2.2j
page iv
Table of Contents
Chapter 1 Overview of the Ethernet Switch Blade ...........................................................17
High Performance Embedded Switching...................................................................... 17
Advanced TCA® Compliant.........................................................................................17
OpenArchitect Switch Management............................................................................. 18
Extensible Customization of Routing Policies..............................................................18
Powerful CarrierClass Features.....................................................................................18
Ethernet Port Layout..................................................................................................... 18
Ethernet Switch Blade Port Configuration..................................................................19
Base switch Quick Reference.............................................................................. 19
Fabric Switch Quick Reference........................................................................... 19
OpenArchitect Switch Environment............................................................................. 20
OpenArchitect Software Structure................................................................................ 20
Chapter 2 Port Cabling and LED Indicators...................................................................... 23
Connecting the Cables...................................................................................................23
Console Port Cabling................................................................................................23
Connecting to the Console Port................................................................................23
Out of Band Ports (OOB Ports).................................................................................... 24
LED Reference......................................................................................................... 24
Chapter 3 High Availability Networking...........................................................................27
Surviving Partner.......................................................................................................... 27
VRRP........................................................................................................................28
zlmd.......................................................................................................................... 28
Switch Replacement and Reconfiguration............................................................... 29
zspconfig...................................................................................................................29
Example HA Switch Configuration..........................................................................30
Modifying zsp.conf on the Base switch...............................................................31
Modifying zsp_vlan.conf on the Fabric Switch...................................................35
Configuring Surviving Partner......................................................................................42
Central Authority.......................................................................................................... 43
Chapter 4 Fabric Switch Configuration ............................................................................ 46
Two switches, two consoles..........................................................................................46
Connecting to the Fabric Switch Console.....................................................................46
OpenArchitect Configuration Procedure.......................................................................46
Changing the Shell Prompt........................................................................................... 47
Default Configuration Scripts...................................................................................47
Example Configuration Scripts................................................................................ 47
Overview of OpenArchitect VLAN Interfaces.........................................................48
Tagging and Untagging VLANs..........................................................................48
Switch Port Interfaces..........................................................................................49
Layer 2 Switch Configuration.......................................................................................49
Using the S50layer2 Script.................................................................................. 50
Ethernet Switch Blade User's Guide
release 3.2.2j
page v
Rapid Spanning Tree................................................................................................ 50
To Enable Rapid Spanning Tree:.........................................................................51
Port Path Cost...................................................................................................... 51
Layer 3 Switch Configuration............................................................................. 52
Using the S50layer3 Script.................................................................................. 52
Layer 3 Routing Protocols with GateD ........................................................................54
Using the S55gatedRip1 Script................................................................................ 54
To Modify the GateD Scripts: ................................................................................. 56
Class of Service (COS) ................................................................................................ 57
Egress Queues.......................................................................................................... 57
Ingress Classification................................................................................................57
Marking and Re-marking......................................................................................... 58
Scheduling................................................................................................................ 58
ztmd Explained.................................................................................................... 58
zfilterd Explained..................................................................................................... 58
Running zfilterd........................................................................................................58
Restrictions on Implementation................................................................................59
Conflict Resolution..............................................................................................59
iptables and filtering............................................................................................ 60
Introduction..........................................................................................................60
Packet Walk......................................................................................................... 61
Filter Rules Specifications...................................................................................62
Specifying Source and Destination IP Addresses.................................................... 62
Specifying Protocol............................................................................................. 62
Specifying an ICMP Message Type.................................................................... 62
Specifying TCP or UDP ports............................................................................. 63
Specifying TCP flags...........................................................................................63
Specifying an Interface........................................................................................ 63
Filter Rule Targets............................................................................................... 63
Supported Targets................................................................................................63
Classical Targets..................................................................................................63
ZNYX Targets..................................................................................................... 63
ZACTION Examples........................................................................................... 64
Extensions to the default matches........................................................................64
tc and zqosd ............................................................................................................. 65
FIFO Queues (pfifo and bfifo disciplines)...........................................................65
PRIO and WRR queues....................................................................................... 67
The U32 Filter.......................................................................................................... 69
Combining Queuing Disciplines.............................................................................. 69
Handle Semantics................................................................................................ 70
COPS: Common Open Policy Service..........................................................................70
Protocol Architecture................................................................................................71
OpenArchitect PEP...................................................................................................71
Using pepd................................................................................................................72
Ethernet Switch Blade User's Guide
release 3.2.2j
page vi
Chapter 5 Fabric Switch Administration........................................................................... 73
Setting the Root Password............................................................................................ 73
Adding Additional Users...............................................................................................73
Setting up a Default Route............................................................................................ 74
Name Service Resolution..............................................................................................74
DHCP Client Configuration..........................................................................................74
DHCP Server Configuration......................................................................................... 74
Network Time Protocol (NTP) Client Configuration................................................... 75
Network File System (NFS) Client Configuration........................................................75
NFS Server Configuration.............................................................................................76
Connecting to the Switch Using FTP............................................................................77
ftpd Server Configuration............................................................................................. 77
Connecting to the Switch Using TFTP......................................................................... 77
TFTPD Server Configuration........................................................................................77
SNMP Agent................................................................................................................. 78
Supported MIBS.......................................................................................................78
Supported Traps........................................................................................................79
SNMP and OpenArchitect Interface Definitions......................................................80
ifStackTable Entries.............................................................................................81
SNMP Configuration................................................................................................81
SNMP Applications..................................................................................................82
Port Mirroring............................................................................................................... 82
Link and LED Control.................................................................................................. 83
Link Event Monitoring..................................................................................................83
Chapter 6 Fabric Switch Maintenance...............................................................................84
Overview of the OpenArchitect switch boot process....................................................84
Saving Changes.............................................................................................................86
Modifying Files and Updating the Switch.................................................................... 86
Recovering from a System Failure................................................................................86
System Boots with a Console Cable............................................................................. 86
Booting with the –i option.............................................................................................87
System Hangs During Boot...........................................................................................88
Booting the Duplicate Flash Image...............................................................................88
Upgrading the OpenArchitect Image............................................................................ 88
Upgrading or Adding Files............................................................................................89
Excluding Saving Files to Flash............................................................................... 89
Upgrading the Switch Driver........................................................................................ 89
Using apt-get................................................................................................................. 90
Chapter 7 Base Switch Configuration................................................................................91
Two switches, two consoles..........................................................................................91
Connecting to the Base Switch Console....................................................................... 91
OpenArchitect Configuration Procedure..................................................................91
Changing the Shell Prompt.......................................................................................92
Default Configuration Scripts..............................................................................92
Ethernet Switch Blade User's Guide
release 3.2.2j
page vii
Example Configuration Scripts............................................................................92
Overview of OpenArchitect VLAN Interfaces....................................................93
Tagging and Untagging VLANs..........................................................................94
Switch Port Interfaces..........................................................................................94
Layer 2 Switch Configuration.................................................................................. 94
Using the S50layer2 Script.................................................................................. 96
Rapid Spanning Tree................................................................................................ 96
To Enable Rapid Spanning Tree:.........................................................................96
Port Path Cost...................................................................................................... 97
Layer 3 Switch Configuration.................................................................................. 97
Using the S50layer3 Script.................................................................................. 98
Layer 3 Switch Using Multiple VLANs............................................................100
Using the S50multivlan Script...........................................................................100
To Modify the Layer 3 Multivlan Script ......................................................... 102
Modify the example script you copied into the /etc/rcZ.d directory. Adjust and
assign the number of IP addresses as applicable. In the example below, the IP
address is changed for the interface in the ifconfig command line of the script.
........................................................................................................................... 102
Layer 3 Routing Protocols with GateD ................................................................. 102
Using the Provided S55gatedRip1 Script.......................................................... 102
To Modify the GateD Scripts: .......................................................................... 104
Class of Service (COS) ..........................................................................................105
Egress Queues....................................................................................................105
Ingress Classification.........................................................................................105
Marking and Re-marking...................................................................................106
Scheduling......................................................................................................... 106
zcos......................................................................................................................... 106
zfilterd.....................................................................................................................106
ztmd................................................................................................................... 106
Running zfilterd................................................................................................. 107
Restrictions on Implementation.........................................................................107
Conflict Resolution............................................................................................107
iptables and filtering............................................................................................... 108
Introduction........................................................................................................109
Packet Walk....................................................................................................... 110
Filter Rules Specifications.................................................................................110
Specifying Source and Destination IP Addresses..............................................110
Specifying Protocol........................................................................................... 110
Specifying an ICMP Message Type.................................................................. 110
Specifying TCP or UDP ports........................................................................... 111
Specifying TCP flags.........................................................................................111
Specifying an Interface...................................................................................... 111
Filter Rule Targets............................................................................................. 111
Supported Targets..............................................................................................111
Ethernet Switch Blade User's Guide
release 3.2.2j
page viii
Classical Targets................................................................................................111
ZNYX Targets................................................................................................... 112
ZACTION Examples......................................................................................... 112
Extensions to the default matches......................................................................113
tc: Traffic Control..................................................................................................113
Strict Priority Qdisc................................................................................................113
Weighted Round Robin Qdisc................................................................................114
FIFO Queues (pfifo and bfifo disciplines).........................................................114
Fifo Qdiscs..............................................................................................................115
Using Filters to Direct Packets to a COS Queue....................................................115
Protocol ip.............................................................................................................. 115
Protocol arp............................................................................................................ 116
Protocol all..............................................................................................................116
Matching Specific Ingress Ports.............................................................................116
Advanced Filtering – Policing................................................................................117
Examples............................................................................................................118
Policing Actions..................................................................................................... 118
u32 match selectors used in filters.........................................................................119
zqosd.......................................................................................................................120
PRIO and WRR queues..................................................................................... 121
The U32 Filter........................................................................................................ 123
Combining Queuing Disciplines............................................................................ 124
Handle Semantics.............................................................................................. 124
COPS: Common Open Policy Service................................................................... 124
Protocol Architecture.........................................................................................125
OpenArchitect PEP............................................................................................126
Using pepd......................................................................................................... 126
Chapter 8 Base Switch Administration............................................................................128
Setting the Root Password......................................................................................128
Adding Additional Users........................................................................................128
Setting up a Default Route..................................................................................... 129
Name Service Resolution....................................................................................... 129
DHCP Client Configuration................................................................................... 129
DHCP Server Configuration...................................................................................129
Network Time Protocol (NTP) Client Configuration.............................................130
Network File System (NFS) Client Configuration.................................................130
NFS Server Configuration......................................................................................131
Connecting to the Switch Using FTP..................................................................... 131
ftpd Server Configuration.......................................................................................132
Connecting to the Switch Using TFTP...................................................................132
TFTPD Server Configuration................................................................................. 132
SNMP Agent.......................................................................................................... 132
Supported MIBS................................................................................................ 132
Supported Traps.................................................................................................134
Ethernet Switch Blade User's Guide
release 3.2.2j
page ix
SNMP and OpenArchitect Interface Definitions............................................... 134
ifStackTable Entries...........................................................................................135
SNMP Configuration......................................................................................... 135
SNMP Applications........................................................................................... 136
Port Mirroring.........................................................................................................136
Link and LED Control............................................................................................137
Link Event Monitoring........................................................................................... 137
Chapter 9 Base Switch Maintenance............................................................................... 138
Overview of the OpenArchitect switch boot process............................................. 138
Saving Changes...................................................................................................... 140
Modifying Files and Updating the Switch..............................................................140
Recovering from a System Failure......................................................................... 140
System Boots with a Console Cable.......................................................................140
Booting with the –i option......................................................................................141
System Hangs During Boot.................................................................................... 142
Booting the Duplicate Flash Image........................................................................ 142
Upgrading the OpenArchitect Image......................................................................142
Upgrading or Adding Files.....................................................................................143
Excluding Saving Files to Flash............................................................................. 143
Upgrading the Switch Driver..................................................................................143
Using apt-get.......................................................................................................... 144
Chapter 10 Connecting to the Ethernet Switch Blade..................................................... 145
Base Interface Hub System:........................................................................................ 145
Ethernet Interfaces: ................................................................................................145
Management Interfaces: ........................................................................................ 145
Fabric Interface Hub System: .....................................................................................146
Ethernet Interfaces: ................................................................................................146
Management Interfaces: ........................................................................................ 146
Connecting to the Base Interface................................................................................ 146
Base Interface Serial Port Connection....................................................................146
Base Interface Out-of-Band Ethernet Connection .................................................147
Connecting to the Fabric Interface .............................................................................148
Fabric Interface Serial Port Connection ................................................................ 148
Fabric Interface Out of Band Ethernet Connection ...............................................149
Chapter 11 Diagnosing a Failed Ethernet Switch Blade Activation ..............................150
Accessing the ShMM.................................................................................................. 152
Verifying Communications Between the ShMM and Switch................................ 152
Critical Threshold Error Reported..................................................................... 152
Analyzing Mstate information for the switch............................................................. 153
Checking the ekey Status From the Shelf Manager.................................................... 153
Chapter 12 Troubleshooting a Failed OpenArchitect Load.............................................155
Recovering from a System Failure .............................................................................157
Booting Without the Overlay File...............................................................................158
Ethernet Switch Blade User's Guide
release 3.2.2j
page x
Booting the Duplicate Flash Image ............................................................................159
Chapter 13 Network Configuration Problems ............................................................... 160
Interface Overview......................................................................................................160
Physical Interfaces..................................................................................................160
Default Base Interface Configuration.....................................................................161
24 port, Layer 2 Switching, single VLAN.........................................................161
Default Fabric Interface Configuration.................................................................. 163
Editing the S50layer2 script can change the Ethernet Switch Blade Fabric Interface
default configuration. The S50Layer2 script and included example scripts
(/etc/rcZ.d/examples) can be used as templates to create custom scripts. The default
S50layer2 script configures the switch accordingly:..............................................163
Configuration Troubleshooting...................................................................................165
Determining ekey status for a specific slot................................................................. 165
Querying Base Interface ekey Status......................................................................167
Querying Fabric Interface ekey Status................................................................... 168
Network Connectivity Troubleshooting......................................................................170
No Connection........................................................................................................170
Diminished Network Throughput...........................................................................170
Connecting to Devices with Fixed Port Speeds ......................................................... 170
External Fault LED..................................................................................................... 170
Network Tests............................................................................................................. 171
Ping Test ................................................................................................................171
Traceroute Test.......................................................................................................172
Chapter 14 Isolating Hardware Failures.......................................................................... 173
Hardware Subsystem...................................................................................................176
Testing the FlashROMs...............................................................................................177
Testing the Switch Fabric............................................................................................178
Link Status for a single port................................................................................... 178
Link Status for a range of ports.............................................................................. 178
Testing the onboard RAM...........................................................................................179
Testing the Control Processor..................................................................................... 180
Hardware Fault....................................................................................................... 181
Software Error................................................................................................... 181
Chapter 15 High Availability Troubleshooting............................................................... 183
Spontaneous Failover Activity....................................................................................183
Unexpected Fail-back Activity...............................................................................183
Chapter 16 Switch Firmware Overview.......................................................................... 184
Checking the switch firmware version........................................................................184
3.1 Fabric Interface............................................................................................185
Updating the Switch Firmware................................................................................... 186
BootLoader Firmware Upgrade:.............................................................................186
OpenArchitect Firmware Upgrade:........................................................................ 186
IPMC Firmware Upgrade:......................................................................................187
Ethernet Switch Blade User's Guide
release 3.2.2j
page xi
Chapter 17 Restoring the Factory Default Configuration................................................188
Chapter 18 Before Calling Support..................................................................................189
Appendix A Fabric Switch Command Man Pages........................................................ 191
vrrpconfig ...................................................................................................................192
vrrpd ........................................................................................................................... 194
zbootcfg ......................................................................................................................197
zconfig ........................................................................................................................199
zcos .............................................................................................................................207
zdog ............................................................................................................................211
zfilterd ........................................................................................................................ 213
zflash........................................................................................................................... 214
zl2, zl2mc, zl3host, zl3net, zvlan................................................................................ 216
zgvrpd .........................................................................................................................219
zl2d .............................................................................................................................221
zl3d .............................................................................................................................223
zlc ............................................................................................................................... 225
zlmd ............................................................................................................................228
zlogrotate ....................................................................................................................230
zmirror ........................................................................................................................231
zmnt.............................................................................................................................233
zpeer ........................................................................................................................... 235
zqosd .......................................................................................................................... 238
zrc ...............................................................................................................................240
zreg..............................................................................................................................241
zrld ............................................................................................................................. 243
zsnoopd ...................................................................................................................... 244
zspconfig .................................................................................................................... 246
zstack ..........................................................................................................................253
ztats............................................................................................................................. 258
zsync............................................................................................................................259
ztmd ............................................................................................................................261
brctl(8) ........................................................................................................................263
Appendix B Base Switch Command Man Pages...........................................................266
vrrpconfig ...................................................................................................................267
vrrpd ........................................................................................................................... 269
zbootcfg ......................................................................................................................272
zconfig ........................................................................................................................274
zcos .............................................................................................................................282
zdog ............................................................................................................................286
zffpcounter ................................................................................................................. 288
zfilterd......................................................................................................................... 292
zflash........................................................................................................................... 293
zgmrpd........................................................................................................................ 295
Ethernet Switch Blade User's Guide
release 3.2.2j
page xii
zgr................................................................................................................................297
zgvrpd..........................................................................................................................300
zl2d..............................................................................................................................302
zl3d..............................................................................................................................304
zlc ............................................................................................................................... 306
zlmd ............................................................................................................................308
zlogrotate ....................................................................................................................310
zmirror ........................................................................................................................311
zmnt.............................................................................................................................314
zpeer ........................................................................................................................... 316
zqosd........................................................................................................................... 319
zrc ...............................................................................................................................321
zreg..............................................................................................................................322
zrld ............................................................................................................................. 324
zsnoopd ...................................................................................................................... 325
zspconfig .................................................................................................................... 328
zstack ..........................................................................................................................336
ztats............................................................................................................................. 340
zsync............................................................................................................................341
ztmd.............................................................................................................................343
brctl(8).........................................................................................................................345
Appendix C Intelligent Platform Management Interface ..............................................348
ISwitch-ShMC Interaction.......................................................................................... 348
Peripheral Management Controller Functional Support............................................. 349
Sensor Reading Example........................................................................................350
Structure of Standard IPMI Commands: From BMC to PMC....................................352
Structure of Standard IPMI Responses: From PMC to BMC..................................... 352
Event Generator ........................................................................................................ 353
IPMB Event message format............................................................................. 353
IPMI Event Message Definitions.......................................................................353
Field Replaceable Unit Inventory Device.............................................................. 353
IPMB Override/Local Status - Event Data 3 for the IPMB link........................354
Table of Figures
Figure 1.1: Fabric Switch Elements...................................................................................20
Figure 1.2: OpenArchitect Software Structure.................................................................. 22
Figure 2.1: LED Reference................................................................................................ 25
Figure 3.1: Host HA Architecture......................................................................................27
Figure 4.1: Fabric VLANs................................................................................................. 48
Figure 4.2: Firewall Flow ................................................................................................ 61
Figure 4.3: COPS Network Architecture........................................................................... 70
Figure 6.1: ROM Devices in Open Architect.................................................................... 84
Figure 6.2: Boot Flow Chart.............................................................................................. 85
Ethernet Switch Blade User's Guide
release 3.2.2j
page xiii
Figure 6.3: Init Script Flow................................................................................................86
Figure 7.1: Multiple VLANs..............................................................................................94
Figure 7.2: Layer 2 Switch ................................................................................................95
Figure 7.3: Layer 3 Switch ................................................................................................99
Figure 7.4: Multiple VLAN Configuration......................................................................101
Figure 7.5: Firewall Flow ............................................................................................... 109
Figure 7.6: COPS Network Architecture........................................................................ 125
Figure 9.1: ROM Devices in OpenArchitect................................................................... 138
Figure 9.2: Booting up Process Flow..............................................................................139
Figure 9.3: Init Script Flow..............................................................................................140
Figure 10.1: Fabric and Base .......................................................................................... 145
Figure 10.2: Base Interface Serial Port............................................................................ 147
Figure 10.3: Fabric Interface Serial Ports........................................................................ 148
Figure 11.1: Ethernet Switch Blade Activation States.....................................................150
Figure 12.1: OpenArchitect Boot Process....................................................................... 156
Figure 12.2: ROM Devices in OpenArchitect................................................................. 157
Figure 18.1: ROM Devices in OpenArchitect................................................................. 190
Index of Tables
Table 5.1: Supported MIBs................................................................................................79
Table 5.2: Supported Traps................................................................................................80
Table 5.3: Link and SNMP Status..................................................................................... 81
Table 7.1: Port Path Cost................................................................................................... 97
Table 7.2: Policing Actions..............................................................................................119
Table 7.3: U Match Selectors...........................................................................................120
Table 8.1: Supported MIBs..............................................................................................134
Table 8.2: Supported Traps..............................................................................................134
Table 8.3: Physical Link Status on Base Switch..............................................................135
Table 11.1: Troubleshooting States................................................................................. 152
Table 13.1: Ethernet Switch Blade Backplane Interfaces (zre Ports)..............................160
Table 13.2: Additional Interfaces.................................................................................... 161
Table C.1.: IPMI M States............................................................................................... 349
Table C.2: PMC Controller Support................................................................................ 349
Table C.3: GetSensorReading..........................................................................................350
Table C.4: GetSensorResonse..........................................................................................351
Table C.5: Standard IPMI Commands.............................................................................352
Table C.6: Standard IPMI Responses.............................................................................. 352
Table C.7: Event Message Format...................................................................................353
Table C.8: SEEPROM Space...........................................................................................354
Table C.9.: IPMB Override Status Data.......................................................................... 355
Ethernet Switch Blade User's Guide
release 3.2.2j
page xiv
Ethernet Switch Blade User's Guide
release 3.2.2j
page 15
Ethernet Switch Blade User's Guide
release 3.2.2j
page 16
Chapter 1 Overview of the Ethernet Switch Blade
The Ethernet Switch Blade is a 72-port AdvancedTCA® Hub and providing Gigabit Ethernet.
Up to 14 ATCA node boards may be addressed via the PICMG 3.0 Base Interface and via the
ATCA PICMG 3.1 fabric . The Base and Fabric switching domains are kept totally separate, both
on the physical layer and the software layer. The Ethernet Switch Blade provides a tightly
integrated modular switching platform that enables high-density solutions.
The Ethernet Switch Blade is actually two separate switches, one for the Base ports and one for
the fabric ports. There are two OpenArchitect® operating system images, one for each switch,
allowing the maximum in separation between the control signaling and the data. The modular
design provides great flexibility and control.
Ethernet Switch Blades can support a 10 Gigabit Ethernet Inter-Switch Link (ISL) for the Fabric
Interfaces, and a Gigabit Ethernet ISL for the Base Interface switches. Depending on the version
of OpenArchitect used, the ISL for the Fabric Interface switches may be operated at 10 Gigabits
per second and provide stacking features.
Linux-based OpenArchitect 3 runs on the embedded processors, providing a comprehensive
package for the management of Layer 2 and Layer 3 packet switching. VLAN management and
Layer 2-7 packet classification are also included with a user-friendly interface. OpenArchitect can
be used with a variety of IP routing protocols.
As part of Advanced TCA, the switch incorporates the PICMG 3.0 Intelligent Platform
Management Interface (IPMI) standard for Field Replaceable Unit FRU) management by the
Shelf Manager.
High Performance Embedded Switching
The Ethernet Switch Blade with OpenArchitect combines the performance of silicon-based
switching fabric with flexibility of software-managed routing policies. It provides Base fabric
PICMC 3.0 (1 Gigabit Ethernet ) links to each of the payload slots, plus two to four PICMC 3.1
in-band GigE ports to each node card, and GigE links to management ports and the second
switch. The Ethernet Switch Blade maintains the forwarding table on silicon, providing the
capability to switch and route at full line rate performance on every port.
Advanced TCA® Compliant
The Advanced TCA® standard developed by the PCI Industrial Computer Manufacturer Group
defines an embedded Ethernet environment for high availability chassis. This environment
includes two switch fabric slots that create a dual star Ethernet network to the 14 Base node slots.
Placing the Ethernet Switch Blade in a hub slot provides embedded Ethernet services to each
node card of the chassis. A standard HA configuration is one Ethernet Switch Blade placed in
each of the two hub slots in a chassis for creation of a redundant, high availability system.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 17
OpenArchitect Switch Management
The OpenArchitect software component – open source Linux, IP protocol stack, control
applications and the OA Engine – runs on two embedded PowerPC microprocessors.
OpenArchitect provides extensive managed IP routing protocols and other open standards for
switch management. Examples include network services; Virtual Redundant Router Protocol;
Routing Information Protocol; Open Shortest Path First; Border Gateway Protocol; Quality of
Service and Class of Service; access control lists; Simple Network Management Protocol MIBs,
Common Open Policy Services and web.
Extensible Customization of Routing Policies
The OpenArchitect software environment enables rapid porting of other UNIX/Linux-based
protocols, including open source software conforming to RFCs and other standards. It also
enables the development of application-specific protocol configuration scripts.
Powerful CarrierClass Features
The Ethernet Switch Blade has High Availability hardware features for advanced
telecommunication applications. The switch implements the PICMG 3.0 Full Hotswap support.
This feature provides field replaceable capabilities so a switch can fail and be replaced without
impacting the operational performance of a chassis.
The PICMG 3.0 Intelligent Platform Management Interface (IPMI) standard is also supported.
IPMI uses message-based interfaces that monitor the physical health characteristics of the
Ethernet Switch Blade. The switch provides operational status information to an IPMI
management application. End customers benefit with advanced notice of potential problems.
The Ethernet Switch Blade also implements the Media Dependent Interface called Auto MDI-X.
Auto MDI-X allows connections to any device, switches, hubs, or systems using a regular
straight-through or crossover Cat 5 cable. The RJ-45 port will auto detect and switch MDI/MDIX modes. This IEEE standard makes cabling – especially between switches – faster and less error
prone.
E-Keying is supported by the Ethernet Switch Blade.
Ethernet Port Layout
The Ethernet Switch Blade has a total of 72 switched Gigabit Ethernet ports. The base fabric is
connected via 24 Gigabit Ethernet ports and the data fabric is connected via 48 Gigabit Ethernet
ports. The Ethernet Switch Blade is actually composed of two separate switches, one for Base
port activity and another for fabric port activity. The Base ports ( control and signaling) are
switched on the Base switch, and the fabric ports ( data ) are switched on the fabric switch, which
provides total separation between system management or control packets, and customer data
packets.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 18
Ethernet Switch Blade Port Configuration
Base switch Quick Reference
ShelfManager1
zre22
ShelfManager2
zre13
ISL channel ( Base node2 )
zre23
Base nodes 3-14
zre0-11
Base nodes 15,16
zre 20-21
Front panel
zre12, zre14, zre15
Fabric Switch Quick Reference
slot
zre numbers
3
zre0-3
4
zre4-7
5
zre8-11
6
zre12-15
7
zre16-19
8
zre24-27
9
zre28-29
10
zre30-31
11
zre32-33
12
zre34-35
13
zre36-37
14
zre38, zre39
15
zre40-41
16
zre42-43
Inter-switch Link (ISL)
zre51
Front panel
zre20-23
Ethernet Switch Blade User's Guide
release 3.2.2j
page 19
You will find the Ethernet Switch Blade has a straightforward installation and configuration.
UNIX or Linux system management skills and some understanding of network protocols will be
required. Configure the Ethernet Switch Blades to your networking application before you
begin using the OpenArchitect switch.
OpenArchitect Switch Environment
The key elements of the OpenArchitect environment include two embedded Linux operating
systems, OpenArchitect-specific applications and libraries, plus, an innovative switch hardware
design.
OpenArchitect hardware is in many ways similar to typical switch architectures. The primary
difference in OpenArchitect is that the PCI bus that interfaces with the embedded processor and
the switch fabric is at a higher performance level than a typical switch (see Figure 1.1: Fabric
Switch Elements). The use of PCI creates a pipe of significant bandwidth between the processor
and the switch fabric.
The embedded processors, running Linux and the OpenArchitect processes, control the flow of all
traffic by maintaining the switch forwarding tables. These tables define the flow of the switch
traffic. Because they are on the switching chips, packets proceed at line rate.
OpenArchitect Software Structure
Figure 1.1: Fabric Switch Elements
OpenArchitect is based on an embedded Linux operating system and includes a number of ZNYX
Networks-supplied modules. The key element is the Linux routing table, which is crucial in a
Ethernet Switch Blade User's Guide
release 3.2.2j
page 20
network-enabled Linux implementation.
The purpose of the routing table is to tell the packet forwarding software where to forward the
data packets. In Linux, the packet-forwarding algorithm is operated in software. Normally, the
routing tables are maintained by operator configuration and the various routing protocols that run
in the application environment of Linux.
OpenArchitect uses an innovative new approach for forwarding packets. It provides embedded
software daemons that replicate ( shadow) the Linux routing tables in the silicon-based
forwarding tables (see Figure 1.1: Fabric Switch Elements). In the OpenArchitect switching
environment, the switching chips do the real-time work in switching network packets. The switch
fabric consults its own forwarding tables for each incoming packet; and either filters or forwards
the packet to any egress port, the embedded CPU, or to any combination. The Linux routing
tables, running in software, are used to update the silicon-based tables. This provides both the
flexibility and control of the Linux software environment and the speed of dedicated switching
silicon.
The OpenArchitect environment includes additional features. For example, installing the
OpenArchitect switch gives you immediate implementation of Linux routing protocols. Also, you
have complete support of routing table updates and a standardized method for configuration.
Finally, you can quickly integrate bug fixes, protocol enhancements and additional protocol
implementations from the Linux community. You can also integrate OpenArchitect into other
Linux applications including VPN software, voice over IP protocols, Quality of Service, and
HTML configuration.
RAIN Management API (RMAPI) is a generic interface for passing control data. The
OpenArchitect libraries are implemented completely above RMAPI. The libraries provide a frontend to RMAPI to simplify application writing. Currently one library is implemented, a general
library called zlxlib. As the OpenArchitect application requirements grow, the existing library
will be expanded and additional libraries will be created.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 21
Linux
Application
Environment
OpenArchitect Application Level Software
(i.e., zconfig, zl3d, zl2d, zsync)
OpenArchitect Libraries
zlxlib and ztlib
Linux Application
Level Software
(routed, gated)
Linux
Kernel
Linux
Protocol
Stack
ZNYX RAIN Mgt API RMAPI
Linux
Routing
Tables
Open Architect Driver
PCI Bus
Switch
Fabric
Figure 1.2: OpenArchitect Software Structure
OpenArchitect applications are used to program and configure the Ethernet Switch Blade. These
applications are implemented above the libraries and RMAPI.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 22
Chapter 2 Port Cabling and LED Indicators
The PICMG 3.1 standard defines an embedded Ethernet environment for Telco chassis. This
environment includes two switch fabric slots that create a dual star Ethernet network to the
fourteen node slots. Placing the Ethernet Switch Blade in a hub slot provides embedded Ethernet
services to each node card across the Packet Switching Backplane of the chassis. A standard
configuration is to place a Ethernet Switch Blade in each hub slot creating a redundant, high
availability system. This chapter provides information on the Ethernet Switch Blade port
connectors and LED indicators.
Connecting the Cables
Your switch setup may require some or all of the following types of cables: 10/100/1000 Port
Cabling
Category 5 cabling is required for all external ports. Be sure that your cable length is within the
minimum and maximum length restrictions for the Ethernet, otherwise you could experience
signal or data loss. All copper GigE ports on the Ethernet Switch Blade are auto-MDI sensing
and will automatically determine whether or not an MDI (straight-through) or MDI-X (crossover)
cable is attached.
Console Port Cabling
The switch console can be accessed via one RJ-45 10/100 service port located on the front panel
of the Ethernet Switch Blade.
NOTE: There are two switch portions that make up a Ethernet Switch Blade unit. Each
switch portion, Base and fabric, has its own console ports, and requires its own console
cable or OOB Ethernet cable.
The RS-232 configured RJ-45 connector console port on the front panel can be used to recover
from a system failure. It is used for maintenance only, and is generally not connected. Use a HP
console cable (P/N A6900-63006) provided with the HP bh5700 ATCA 14-Slot Blade Server, in
combination with a Modem Eliminator cable, to access the switch software through the console
port. Refer to the HP bh5700 ATCA 14-Slot Blade Server Installation Guide for additional
information.
Connecting to the Console Port
To attach the console cable to the OpenArchitect Base or fabric switch:
1. Plug the RJ-45 end of the console cable into the RJ-45 Console Port on the front.
2. Connect the Modem Eliminator cable to the DB-9 connector on the console cable.
3. Connect the other end of the Modem Eliminator cable to a standard COM port (9600, n,
8, 1).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 23
4. Reinsert the switch into the shelf chassis and power up.
Use a terminal emulation program to access the switch console.
Out of Band Ports (OOB Ports)
Each switch, fabric and Base, in a Ethernet Switch Blade unit has out-of-band (OOB) Ethernet
ports on the front panel. This is an alternative maintenance port supplying Ethernet connectivity
instead of serial connectivity and is connected only when performing switch maintenance
activities. Use ifconfig to bring up and configure the OOB ports. The OOB ports are 100 full
duplex, not auto-sensing. The front OOB port is eth0, and the rear (not implemented with this
release) is eth1.
LED Reference
See Figure 2.1 for a schematic view of the front of a typical Ethernet Switch Blade board. Note
that there are out-of-band ports, RS232 ports, a USB port, and 10 Gig egress ports (not
implemented in this release). In-band ports from the Base and fabric switches have LED status
lights controlled from the LED Mode button. Press the button successively to display the Base
switch ports, fabric switch ports 0-23, and finally the fabric switch ports 24-47. There are
separate LEDs for the out-of-band ports, and the ATCA status functions.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 24
Figure 2.1: LED Reference
Ethernet Switch Blade User's Guide
release 3.2.2j
page 25
Ethernet Switch Blade User's Guide
release 3.2.2j
page 26
Chapter 3 High Availability Networking
High availability networking is achieved by eliminating any single point of failure through
redundant connectivity: Redundant cables, switches and network interfaces for hardware,
combined with HA software solutions on both the hosts and switches to control the HA hardware
and maintain connectivity. An HA solution called Surviving Partner is provided on the switch.
For host-side HA, the most common solution is to use the Linux bonding driver. HA solutions
like the Linux bonding driver present a single, virtual interface to the protocol stack while
managing multiple physical links. Figure 3.1: Host HA Architecture shows the relation of the
protocol stack, a bonding driver and physical ports.
Figure 3.1: Host HA
Architecture
A failover between physical links can be made very quickly without requiring change to the IP or
MAC address of the virtual interface, effectively transparent from the applications point of view.
With redundant links from a switch (or switches) to the host, one link is maintained as the
ACTIVE link and the other as STANDBY. If the ACTIVE link were to go down, the STANDBY
becomes the new ACTIVE, while presenting the same virtual interface to the host.
NOTE: It is important that the bonding solution provide an active-backup mode. For the
Linux bonding driver set “mode == 1” see the http://sourceforge.net/projects/bonding/
documentation for more information. Use the recommendations for Linux kernel 2.4x not
2.6x.
Redundant connections provide an ACTIVE and STANDBY link to a switch, or provide
redundant links between more than one switch. In the case of more than one switch, a complete
HA solution requires a switch-based HA solution.
Surviving Partner
Surviving Partner is a switch-based HA solution. Surviving Partner runs on the switches to
provide transition of Layer 2 and Layer 3 switching functionality between two or more switches.
Surviving Partner is comprised of many interactive protocols and processes including VRRP,
zlmd, zlc, and others.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 27
VRRP
Since most end nodes use default router addresses, the change of the default router address during
a switch failover would require the end nodes to reconfigure. Layer 3 switches that failover must
maintain the default router address to maintain the end node's IP transparent failover. The Virtual
Router Redundancy Protocol (VRRP, RFC 2338) running in the Surviving Partner switches
provides transparent movement of the default router address. VRRP maintains the notion of a
Master switch and one or more Backup switches. This group of switches presents a virtual router
IP address that can be used by hosts on that net as their default route.
If a Backup switch determines the Master switch is no longer available, one of the Backup
switches will assume the role as Master. Physically, each switch maintains a link to the local
network. Only the Master switch answers to the default gateway, and the hosts on that net have
no need to relearn the router address.
In an HA configuration, the goal is to avoid any single point of failure. VRRP provides a good
mechanism to provide a static route for a local network, but a true HA configuration must also
provide redundant connections for the host. Providing a virtual router for the local network is not
enough. Take the simple case of two hosts on the local network with a connection to the virtual
router. Each host needs a connection to each physical switch participating in VRRP. In the
simplest configuration, each host would have one connection to the network. An HA solution
would include redundant connections from each host to each switch in the virtual router.
Combining the features of Surviving Partner on the switches and HA bonding drivers on the hosts
allows implementation of this true HA configuration.
zlmd
In addition to complete switch failover, single link failure must be properly handled. The Link
Monitor Daemon zlmd, monitors the link status of each port. If a link goes down, zlmd
communicates with the VRRP daemon (vrrpd) to change its priority. Changing the VRRP
priority results in movement of switching functionality. By combining zlmd with the zlc
application, links connected to hosts that have not failed can be deterministically moved to the
new master switch if desired. Supported modes include:
•
switch - The switch with the greatest number of UP links becomes the Master for all
VLANs under HA management.
•
Vlan - The switch with the greatest number of UP links in that particular VLAN becomes
the Master for that particular VLAN. If the switch has additional VLANs, they each
change independently.
•
Port - The Master will remain the Master for that particular VLAN until all ports in that
VLAN are down. The Backup then becomes the new Master for that VLAN. Failed links
move their connectivity through the Backup Switch and the switch interconnect to reach
the Master Switch. This option alleviates the need to move all nodes to a new switch just
because a single link goes down.
NOTE: All modes require inclusion of the interconnect in the VLAN. The ISL
connection between the two Base switches is port 23 for the Ethernet Switch Blade. The
ISL connection between the two fabric slots in port 51.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 28
Switch Replacement and Reconfiguration
When a switch fails, it must be replaced. The replacement switch will likely require proper
configuration. For transparent switch replacement, the newly replaced switch must learn its
configuration from its Surviving Partner.
In a simple failover scenario, Host A and Host B are configured with failover between two host
ports, one port connected to Switch A and the other connected to Switch B. Assume Switch A
provides connectivity between Host A and Host B. If Switch A fails, the active link on each host
moves over to the port connected to Switch B. Surviving Partner software on Switch B
recognizes that Switch A has failed, and assumes the role of switching traffic between Host A and
Host B. When the failed Switch A is replaced with a new Switch A', Switch A' will learn its
network configuration from the surviving partner Switch B. Switch A' is now ready as a backup
to Switch B in case of failure of Switch B.
This is achieved through the use of DHCP. When a switch becomes a VRRP Master, a DHCP
server is started with a pointer to a configuration file that contains configuration information for
its partners. The replacement switch comes up running DHCP client to retrieve its configuration.
Proper configuration of Surviving Partner requires coordinated configuration of many different
processes, including vrrpd, zlmd, zlc, and dhcpd. The daemon processes run scripts to
perform their actions. Because these scripts are complex and inter-dependent, a configuration
application called zspconfig is used to build them.
The basic steps to configuring Surviving Partner are:
1. Determine your desired configuration.
2. Modify the configuration file (/etc/rcZ.d/surviving_partner/zsp_DC.conf is
the default) to use as input to the configuration utility (zspconfig).
3. Configure startup scripts or other scripts such as gated routing scripts and vrrp
configuration scripts.
4. Run zspconfig on the Master system.
5. Run zspconfig –u on the Backup/Sibling system(s).
zspconfig
zspconfig performs the job of building the scripts based on a provided input file locally, or
from a remote machine. A text-based configuration file provides input to zspconfig. Example
configuration files are included on the switch in /etc/rcZ.d/surviving_partner. The
result of zspconfig is to create several configuration files and runtime shell scripts, and
optionally start the Surviving Partner processes. Scripts are generated for configuring VLANs,
starting the network, and starting the vrrpd and zlmd daemons.
zspconfig can also used by sibling backup switches to retrieve configuration from the
Surviving Partner and start the vrrpd and zlmd daemons. zspconfig is generally only run
once to configure Surviving Partner.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 29
The configuration and runtime scripts created are as follows:
•
S70Surviving_partner Switch initialization script that is run at boot time. This
script will restart the switch with the original configuration given to zspconfig.
Optionally, zspconfig will run this script from the initial invocation.
•
zsp.conf.<n> - zspconfig configuration file that contains the configuration of
the sibling backup switches. The <n> is used to distinguish potentially more than one
backup switch. This configuration file is placed in /tftpboot, and is retrieved via
DHCP during configuration of the backup switch by zspconfig with the “-u” option
or, by a replacement switch on boot up.
•
vrrpd.conf - Configuration script for the VRRP daemon. This configuration is
used when the S70Surviving_partner script launches vrrpd. There is a line in this file for
each virtual router address vrrpd will manage.
•
dhcpd.conf - Configuration script used by dhcpd when the switch becomes
master. dhcpd is also used to give replacement switches their configuration scripts.
Namely a zsp_.conf<n> file that can be input to zspconfig with the -u flag.
•
dhclient.conf - If zspconfig is executed with the -u flag, a dhclient.conf file
is created, and then dhclient is used to retrieve a zspconfig configuration file from the
/tftpboot area of the Master switch.
•
vrrpd.script - Runtime script that executes each time the vrrpd changes state.
This script starts and stops dhcpd, and toggles down RAINlink ports to force the
RAINlink nodes to a new Master switch.
•
zlmd.script - Runtime script executed by zlmd when a link goes up or down. This
script modifies the priority of the vrrpd that in turn may cause the VRRP Master to move
from one sibling switch to another.
After the scripts are created, zspconfig may run the S70Surviving_partner script to start
the Surviving Partner tasks. The tasks started are vrrpd, zlmd, and dhcpd.
The vrrpd and zlmd daemons run scripts to perform their actions. When vrrpd changes state
between Master and Backup, it runs a script that starts and stops dhcpd. When zlmd sees a link
go up or down, it runs a script that communicates with vrrpd via vrrpconfig.
Example HA Switch Configuration
The following walks through a basic Surviving Partner configuration typical for an HA setup.
Assume an HA chassis with multiple hosts, such as single-board CPUs, and two switches
configured for Surviving Partner. Each of the hosts has two Base Ethernet ports providing a link
to each of the Base switches and up to four fabric Ethernet ports providing links to each of the
Fabric switches.
Each host runs Linux bonding drivers (or ZNYX OA Node software with embedded RAINlink)
with the ports configured for failover. An interlink provides communication between the Base
switches. Another interlink provides communication between the Fabric switches.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 30
When using a Linux Bonding driver on the node card, the bonding driver should be configured
for Mode 1 (active/standby). See the Linux Bonding documentation at
http://sourceforge.net/projects/bonding/ for complete information.
The two Base switches will be configured as Surviving Partners, using VRRP to form a single
virtual interface to the hosts, as will the two Fabric switches. The ports can be configured many
different ways, with blocks of ports configured as vans. The configuration is set up in the zsp
configuration file, zsp.conf.
NOTE: The actual name on the system may change slightly from zsp.conf, depending on
current release requirements.
Modifying zsp.conf on the Base switch
An example file for setting up zspconfig on an Ethernet Switch Blade is
/etc/rcZ.d/surviving_partner/zsp.conf. The following will document the default settings.
NOTE: It is unlikely that any installation will use this default script in production. You will
have to modify it to suit your network design.
On Switch A (Master), make a backup copy of zsp.conf, and edit zsp.conf:
cd /etc/rcZ.d/surviving_partner/
cp zsp.conf zsp.conf.save
vi zsp.conf
The first section uses zconfig to create the VLANs. Many of these choices are determined by
the physical configuration of the switch and ATCA backplane. For instance, the Base switch
interconnect will always be port 23, and the shelf managers will be ports 22 and 13.
zconfig zhp0: vlan100 = zre23;
zconfig zhp1: vlan1 = zre0..11, zre20..21, zre23;
zconfig zre0..11, zre20..21 = untag1;
zconfig zre23 = untag100;
The next section sets up the physical IP addresses to use for the Master and the Backup switch.
The Master provides the addresses to the Backup on a first come, first serve basis. Note that the
physical IP address should be different from the virtual IP address that spans the pair of switches.
Once configured, the pair appears as one connection point to other hosts on the VLAN. You need
to supply an IP address for each interface on each switch. The first IP address on each line is the
Master and the second is the Backup.
sibling_addresses: zhp0 = 100.0.0.30, 100.0.0.31 netmask
255.0.0.0;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 31
sibling_addresses: zhp1 = 10.0.0.30, 10.0.0.31 netmask 255.0.0.0;
Now configure the virtual address for each sibling group. We are going to create a virtual
interface across one VLAN, but not for the interconnect. This provides a single point to
connect/route to the VLANs.
vrrp_virtual_address: zhp1 = 10.0.0.42 netmask 255.0.0.0;
Next come port definitions, as defined on the zspconfig man page. Since our hosts are
connected using the Linux bonding driver (or RAINlink), we will want to choose RAINlink on
each of the ports in VLANs on each switch, and interconnect for the interconnect port on each
switch,
The port definitions are:
interconnect - Ports connected between groups of Surviving Partner switches. VRRP
heartbeat messages are sent on the interconnect ports.
Crossconnect - Crossconnect ports are ports that are connected to other Surviving Partner
switches, that are not part of this Surviving Partner group. Crossconnect ports behave
differently then bonding driver/RAINlink ports. The links are not brought down
temporarily, and VRRP runs with the native MAC addresses to avoid MAC address
duplication with the other VRRP group.
RAINlink - Ports connected to bonding driver/RAINlink enabled nodes. These ports contain
virtual addresses managed by VRRP. And during a failover event, the links are toggled
down to force failover to the Master switch.
Route - Ports connected to upstream routers. VRRP does not manage virtual IP addresses for
these links. Routing protocols must be used to instruct up stream routers of a different path
to get to the VRRP managed networks.
monitor_only - Ports that are monitored but do not have a virtual address managed on them.
They will not have their links brought down temporarily during a failover scenario. These
ports are only monitored. If a problem occurs on this type of link it will cause a failover
scenario.
configure_only - Ports are configured as per the zconfig commands, but do not participate
in the high availability network. Problems on these links will not cause a switch failover.
interconnect: zhp0;
RAINlink: zre0..11, zre20..21;
Next come special modes for VRRP for use when more than one pair of Ethernet Switch Blades
are connected to another pair of Ethernet Switch Blades in a redundant configuration. The intent
of these modes is to provided Spanning Tree like capabilities eliminating network looks between
pairs of Surviving Partner configurations, as well as expedite address learning between the two
pairs of switches:
vrrp_mode: RAINlink_xmit_on_failover;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 32
#vrrp_mode: block_crossconnect;
The next sections determines the failover mode between the Surviving Partner switches. There
are three modes:
•
switch - Failover by switch. Failover from Master switch to Backup on any port
failure. The switch with the most links becomes the new Master. One port failure
will cause the switch to failover.
•
vlan - Failover by VLAN. The switch with the most up links in the VLAN becomes
the Master of that VLAN. When VLANs failover and all VLAN masters are not
located on a common switch, the interconnect link is used to carry data traffic, and
could become saturated. The use of the interconnect for data traffic in a failover
situation depends on the VLAN design, from one extreme where one VLAN could
contain all ports to one port per VLAN at the other extreme.
•
port - Failover by port. The Master switch will remain the Master until all ports in the
VLAN are down. The Backup then becomes the new Master for that VLAN.
Similar to VLAN failover, the interconnect link will carry data traffic in this mode,
when ports failover.
failover_mode: switch;
Next, you can set VRRP_msg_rate and default priority. VRRP_msg_rate is the time in
milliseconds between vrrp message transmissions over the interconnect link. The
vrrp_def_priority is the default priority for both switches. The value is set to 254 and
should not require change.
vrrp_msg_rate: 100;
# In milliseconds
vrrp_def_priority: 254;
The following optional entries provide a mechanism to propagate files and/or startup scripts to
sibling switches. An example might be startup scripts or scripts to configure gated. Example
scripts are included to start gated with RIP1, RIP2, or OSPF setup. You must use absolute path
names.
# start_script:
are moved
Allows the user to add files and scripts that
# to the slave switches when they do a zspconfig -u.
example might
# be the gated configuration script S55...
names are
# required.
more than
An
Absolute path
Multiple start_script commands can be used to move
# one file.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 33
#start_script:/etc/rcZ.d/SxxScript;
#start_script:/etc/rcZ.d/SyyScript;
# vrrpd_script:
during
Allows the user to add scripts to be executed
# vrrpd state transitions.
of the
These scripts are run from the end
# /etc/rcZ.d/surviving_partner/vrrpd.script file.
provided
# script must be well behaved.
delays it will
If it crashes, or hangs or
# effect the SurvivingPartner performance.
run in
# backround.
itself.
The user
The script is not
If this is needed, have your script background
#vrrpd_script: /etc/rcZ.d/surviving_partner/my_vrrpd_script;
#vrrpd_script: /etc/rcZ.d/surviving_partner/my_vrrpd_script2;
# gated_template:
the
Allows the user to provide a template for
# gated.conf file to be used by the sibling group.
#gated_template: /etc/rcZ.d/surviving_partner/gated.template
These entries are optional:
If you use the special failover modes vlan or port ( see above for details), you can also specify an
individual address to be the default master, that is, that a port or VLAN should run on a specific
switch when the vrrp priorities are equal between switches.
NOTE: VLAN or port mastering is not appropriate for switch mode and should not be
attempted.
When addresses designated 'master' failover, they will return to their Master switch, whenever the
link is repaired. If they are not designated 'master', they will remain at the backup switch after
repairs.
If both switches are equal in priority for a VLAN, then the switch with theIPaddress designated
'master' will become Master for that VLAN.
Add the keyword “(master)” after one of the sibling_addresses. The local address comes first.
sibling_addresses: zhp1=10.0.0.30(master), 10.0.0.31
netmask 255.0.0.0;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 34
Once the configuration files are complete, run the zspconfig utility on the Master to configure
all the scripts:
NOTE: This command can take 60 seconds or more with no screen output.
zspconfig –f zsp.conf
You will see output similar to this:
zspconfig -f zsp.conf
….
Would you like to install the Surviving Partner startup
script[y,n,?] y
Would you like to start the Surviving Partner daemons without
rebooting [y,n,?] y
Once configuration is complete, insure there are no superfluous S-type startup scripts in
/etc/rcZ.d, and zsync your switch to save your configuration.
Now go to the backup switch and run zspconfig –u to get the appropriate configuration
information from the Master,
zspconfig –u zhp0
Modifying zsp_vlan.conf on the Fabric Switch
An example file for setting up zspconfig on a Ethernet Switch Blade Fabric board is
/etc/rcZ.d/surviving_partner/zsp_vlan.conf. Reference the descriptions in the
previous section for descriptions of each configuration section.
# Sample configuration is based on the idea that there are
separate VLANs
# for the multiple connections to a slot.
#
#
zhp0: Interconnect VLAN
#
zhp1..4: Data interface VLANs, configured such that
Option 2
#
have
#
slots have 2 VLANs connected to them and Option 3 slots
4 VLANs connected to them.
#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 35
# This script will likely need modification for your particular
# network setup.
#
# In this example the Egress ports, zre20..23 and zre48..50 are
# not managed by HA since how, or if, these ports are managed
by HA is
# dependent on the external devices they are connected to.
Non-HA
# egress ports can be brought up through conventional means by
adding
# an S-script to /etc/rcZ.d.
HA, they
If the ports are to be managed by
# can be added to an existing VLAN(zhp) or a new VLAN(zhp) can
be created
# If a new VLAN(zhp) is to be managed by HA, add a zconfig,
sibling_address,
# and vrrp_virtual_address configuration line and define the
port type as
# appropriate.
#
# The interconnect port is needed in the VLANs connected to the
# RAINlink ports.
#
zconfig zhp0: vlan100 = zre51;
zconfig zhp1: vlan1 = zre0, zre4, zre8, zre12, zre16, zre24,
zre28, zre30, zre32, zre34, zre36, zre38, zre40, zre42, zre51;
zconfig zhp2: vlan2 = zre1, zre5, zre9, zre13, zre17, zre25,
zre29, zre31, zre33, zre35, zre37, zre39, zre41, zre43, zre51;
zconfig zhp3: vlan3 = zre2, zre6, zre10, zre14, zre18, zre26,
zre51;
zconfig zhp4: vlan4 = zre3, zre7, zre11, zre15, zre19, zre27,
zre51;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 36
zconfig zre0, zre4, zre8, zre12, zre16, zre24, zre28, zre30,
zre32, zre34, zre36, zre38, zre40, zre42 = untag1;
zconfig zre1, zre5, zre9, zre13, zre17, zre25, zre29, zre31,
zre33, zre35, zre37, zre39, zre41, zre43 = untag2;
zconfig zre2, zre6, zre10, zre14, zre18, zre26 = untag3;
zconfig zre3, zre7, zre11, zre15, zre19, zre27 = untag4;
zconfig zre51 = untag100;
# Recommend using vrrp_mode RAINlink_xmit_on_failover.
zl3d zhp1 zhp2 zhp3 zhp4;
# First address is our address.
The remaining addresses
# are handed out to the siblings on a first come first serve
# basis in the order specified.
Each zconfig'ured interface
# should have sibling addresses specified.
sibling_addresses: zhp0 = 100.0.0.30, 100.0.0.31 netmask
255.0.0.0;
sibling_addresses: zhp1 = 10.0.0.30, 10.0.0.31 netmask 255.0.0.0;
sibling_addresses: zhp2 = 11.0.0.30, 11.0.0.31 netmask 255.0.0.0;
sibling_addresses: zhp3 = 12.0.0.30, 12.0.0.31 netmask 255.0.0.0;
sibling_addresses: zhp4 = 13.0.0.30, 13.0.0.31 netmask 255.0.0.0;
# The virtual address spans the sibling group, giving hosts and
routers
# a single point to connect to or a single point to use as a
router. A
# virtual address should not be specified for the interconnect
interface.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 37
vrrp_virtual_address: zhp1 = 10.0.0.42 netmask 255.0.0.0;
vrrp_virtual_address: zhp2 = 11.0.0.42 netmask 255.0.0.0;
vrrp_virtual_address: zhp3 = 12.0.0.42 netmask 255.0.0.0;
vrrp_virtual_address: zhp4 = 13.0.0.42 netmask 255.0.0.0;
# Port definitions
# Define to what the ports are connected.
be
# by zhp or zre name.
the
Specifications can
The zhp name is a shortcut to specify
# entire port group associated with that interface.
In the end
# these definitions are on a port by port basis. Note: zhp and
# zre names cannot be mixed on the same line.
#
# Shelf manager ports should be defined as monitor_only.
monitor_only ports
# are used in failover calculations, but the failover mechanism
is left
# to the software running on the shelf managers cards.
of the
The use
# term "crossconnect" in these HA scripts is not the same as
the use
# in ATCA shelf managers.
interconnect: zhp0;
RAINlink: zre0..19, zre24..43;
#############################
#############################
Special Modes
# VRRP modes
# The block_crossconnect mode causes the equivalent of STP
blocking on the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 38
# crossconnect ports of the VRRP Backup.
block_crossconnect mode is
The
# meant as a replacement for STP, however, the switches
connected to the
# crossconnect ports must be Ethernet Switch
Surviving Partner.
switches running
#
# The RAINlink_xmit_on_failover mode requires that the OpenNode
blades
# connected to RAINlink ports transmit a packet when failing
over, so that
# The Layer 2 tables will learn the new port/MACaddress
relationship. An
# example is the SNAP_BCAST_MODE in RAINlink or a gratuitous
ARP.
vrrp_mode: RAINlink_xmit_on_failover;
#vrrp_mode: block_crossconnect;
# failover modes
# switch-failover, VLAN-failover or port-failover are mutually
exlusive. They
# describe what occurs if a port fails.
if any port
For switch-failover,
# fails, all functionality of the current switch is moved to
the backup
# For vlan-failover, if a port fails in the vlan then all the
ports that
# are a member of that VLAN are failed over.
failover, each port
# can failover independently.
For port-
For vlan and port failover the
# interconnect will need to be used to maintain connectivity,
requiring
# all VLANs to include the interconnect ports.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 39
failover_mode: port;
# VRRP_msg_rate is the time in milliseconds between
transmissions
# VRRP messages on the interconnect.
requires the
The VRRP protocol
# absence of 3 VRRP messages before concluding that the remote
switch
# has failed.
siblings.
The msg_rate must match the msg_rate of all
# Anything other than multiples of seconds is non-conformant
# with the VRRP specification and will only run with ZNYX
supplied
# vrrpd.
vrrp_msg_rate: 100;
# In milliseconds
vrrp_def_priority: 254;
# start_script:
are moved
Allows the user to add files and scripts that
# to the slave switches when they do a zspconfig -u.
example might
# be the gated configuration script S55...
names are
# required.
more than
An
Absolute path
Multiple start_script commands can be used to move
# one file.
#start_script:/etc/rcZ.d/SxxScript;
#start_script:/etc/rcZ.d/SyyScript;
# board_synchronization_mode:
the Base and
Ethernet Switch Blade User's Guide
Coordinate the HA events between
release 3.2.2j
page 40
# Fabric portions of the 7100 switch.
is dependent on the
The actual coordination
# setting of the board_synchronization_mode and the
failover_mode. In
# switch failover_mode the number of up links in both switch
planes is
# considered.
all
In vlan and port failover mode they are not.
In
# failover_modes, if the data plane or fabric plane switch
reboots or
# power cycles, the HA partner will take mastership for all
VLANs in
# both planes.
"basic" is the
board_synchronization is off by default.
# only supported mode at this time.
in both
The same mode must be set
# the base and fabric switches.
#board_synchronization_mode: basic;
# vrrpd_script:
during
Allows the user to add scripts to be executed
# vrrpd state transitions.
of the
These scripts are run from the end
# /etc/rcZ.d/surviving_partner/vrrpd.script file.
provided
# script must be well behaved.
delays it will
If it crashes, or hangs or
# effect the SurvivingPartner performance.
run in
# backround.
itself.
The user
The script is not
If this is needed, have your script background
#vrrpd_script: /etc/rcZ.d/surviving_partner/my_vrrpd_script;
#vrrpd_script: /etc/rcZ.d/surviving_partner/my_vrrpd_script2;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 41
# gated_template:
the
Allows the user to provide a template for
# gated.conf file to be used by the sibling group.
#gated_template: /etc/rcZ.d/surviving_partner/gated.template
Once the configuration files are complete, run the zspconfig utility on the Master to configure
all the scripts:
NOTE: This command can take 60 seconds or more with no screen output.
zspconfig –f zsp.conf
You will see output similar to this:
zspconfig -f zsp.conf
….
Would you like to install the Surviving Partner startup
script[y,n,?] y
Would you like to start the Surviving Partner daemons without
rebooting [y,n,?] y
Once configuration is complete, insure there are no superfluous S-type startup scripts in
/etc/rcZ.d, and zsync your switch to save your configuration.
Now go to the backup switch and run zspconfig –u to get the appropriate configuration
information from the Master,
zspconfig –u zhp0
Configuring Surviving Partner
The S60SP_startup script is useful in setting up proper switch replacement. By factory
installing the S60SP_startup script in replacement switches, the replacement switches will boot
looking for a Master switch configuration. The S60SP_startup works as follows:
It first looks for a local file /etc/rcZ.d/surviving_partner/zsp.primary.conf. If this
file exists, it is used to configure the switch. Only the originally configured switch or Central
Authority should contain this file. See Central Authority later in this Chapter for more
information.
Next it uses zspconfig –u to attempt to contact a running Master switch to retrieve the proper
configuration. This is the normal case for a replacement switch.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 42
Finally, it lets the currently saved S70Surviving_Partner script execute. This case would be
the case of a power up of an already configured backup switch when the other HA switch is
unavailable. This case could occur after losing power to the entire chassis.
Central Authority
Modifications can be made to the S60SP_startup script to use a third machine running DHCP
that is not part of the Surviving Partner pair. The third machine is referred to as the Central
Authority.
Setup requires a DHCP daemon configuration file on the Central Authority and a dhclient
configuration file for each of the two Surviving Partner switches in the pair. The format of the
DHCP daemon configuration file is dependent on the machine and operating system being used.
An example can be obtained from the Surviving Partner primary switch in the location
/etc/rcZ.d/surviving_partner/dhcpd.conf.
This configuration will contain configuration for only one of the two Surviving Partner switches.
It must be edited. For example:
subnet 100.0.0.0 netmask 255.0.0.0 {
option broadcast-address 100.255.255.255;
host ZNYX1 {
fixed-address 100.0.0.31;
option dhcp-client-identifier "ZNYX";
option vendor-encapsulated-options
"zsp_conf.1";
}
}
A second host entry must be added with unique information.
subnet 100.0.0.0 netmask 255.0.0.0 {
option broadcast-address 100.255.255.255;
host PRIMARY {
fixed-address 100.0.0.30;
option dhcp-client-identifier "PRIMARY";
option vendor-encapsulated-options
Ethernet Switch Blade User's Guide
release 3.2.2j
page 43
"zsp.primary.conf";
}
host SECONDARY {
fixed-address 100.0.0.31;
"SECONDARY";
option dhcp-client-identifier
option vendor-encapsulated-options
"zsp.secondary.conf";
}
}
The zsp.primary.conf and zsp.secondary.conf files must be placed in the tftp
location on the machine, often /tftpboot. The zsp.primary.conf and
zsp.secondary.conf files can be retrieved from the Surviving Partner switches. This is the
configuration that will be given to the switches. It is recommended that the zsp.conf be taken
from the primary as follows:
The zsp.conf file created by hand on the primary is moved to /tftpboot/zsp.primary.conf on
the Central Authority.
Move /tftpboot/zsp_DC.conf.1 file on the primary created by zspconfig to
/tftpboot/zsp.secondary.conf on the Central Authority.
Create dhclient.conf files on the Surviving Partner switches. Examples can be found in
/etc/rcZ.d/surviving_partner/dhclient.conf. As an example:
send dhcp-client-identifier "ZNYX";
request vendor-encapsulated-options;
require vendor-encapsulated-options;
Modify the dhclient.conf file on the primary switch as follows:
send dhcp-client-identifier "PRIMARY";
request vendor-encapsulated-options;
require vendor-encapsulated-options;
Modify the dhclient.conf file on the secondary switch as follows:
send dhcp-client-identifier "SECONDARY";
Ethernet Switch Blade User's Guide
release 3.2.2j
page 44
request vendor-encapsulated-options;
require vendor-encapsulated-options;
The last step is to modify the startup scripts that run zspconfig to use the -c option. The -c
option allows you to provide a dhclient.conf script rather then having zspconfig create a
default. For example, the S60SP_startup script line that reads:
echo
y n | zspconfig -t 10 -su zhp0 > /dev/null 2>&1
Can be modified to
echo y n | zspconfig -c
/etc/rcZ.d/surviving_partner/dhclient.new.conf -t 10 -su zhp0
> /dev/null 2>&1
If you use S60SP_startup, the /etc/rcZ.d/surviving_partner/zsp.primary.conf file
should not exist. This way the S60SP_startup script will first look at the Central Authority. If
the Central Authority is down, then it will use its current configuration.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 45
Chapter 4 Fabric Switch Configuration
Two switches, two consoles
There are two separate switch portions in the Ethernet Switch Blade units, the base switch and the
fabric switch. The fabric switch handles the data traffic for the ATCA rack over ports 0-47. It
runs the Ethernet Switch Blade software. Two or four GigE connections are provided to node
cards using the ATCA backplane.
Connecting to the Fabric Switch Console
You can connect to the fabric switch console using a telnet connection or with a console
cable. Use the procedure below for a telnet connection. See Connecting to the Console Port,
for instructions.
Connect an Ethernet cable to the host and the switch. The OOB port is not active in the default
configuration. You can connect to the fabric OOB port on the front panel.
Work from a host on the 10.0.0.0 network.
The OpenArchitect switch is pre-configured with address 10.0.0.43. Telnet to 10.0.0.43.
telnet 10.0.0.43
After you are connected, enter the login name root. No password is required.
OpenArchitect login: root
ZX7100-OA<release no.>#
OpenArchitect Configuration Procedure
Layer 2 and Layer 3 switch configurations can be accomplished with a few simple commands.
Once you have configured your switch, the commands should be placed into a start up
configuration script. Like most Linux systems, the OpenArchitect switch boot process runs
initialization commands and scripts in /etc/init.d/. In particular, OpenArchitect runs
/etc/init.d/rcS which in turn executes all scripts located in /etc/rcZ.d starting with an
uppercase “S” in alphabetical order. Any configuration scripts you create should be named in the
standard Linux/Unix manner, starting with an uppercase “S” and numbered in the sequence you
would like them executed. The final step once the switch has been properly configured is to use
the zsync command to save all files into flash for reloading.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 46
Changing the Shell Prompt
You may use standard bash shell procedures to change the prompts on your base switches. Many
sites choose a system that distinguishes among the individual switches at their location. The same
rules apply for saving your choice (zsync) as for all other configuration changes.
Default Configuration Scripts
As shipped the following scripts are run from /etc/rcZ.d as the switch boots up:
NOTE: These default scripts will change in later releases. Use them as examples.
S20stack - Script that calls zstack to combine the two BCM56504 24-port switch fabric chips
into a single 48 port virtual switch. zstack must be run before any other switch configuration.
S50layer2 - Script that sets up a basic Layer 2 switch. All 48 ports are set up on one VLAN. This
configuration script is appropriate for a Ethernet Switch Blade. It may need to be modified for
other models.
Example Configuration Scripts
Example scripts are provided that can be used as templates. Use one of the scripts located in the
switch /etc/rcZ.d/examples directory to help you configure the switch. The default
configuration for the switch is located in the script file /etc/rcZ.d/S50layer2.
The following scripts are included (each is examined in more detail later in the appropriate
section describing common Layer 2 and Layer 3 configurations):
•
S50layer2 - Script which sets up a basic Layer 2 switch. All 48 ports are set up on one
VLAN. This is a copy of the script in /etc/rcZ.d that is loaded in the default
configuration.
•
S50layer3 - Script which sets up a basic Layer 3 switch. All 48 ports are set up on
individual IP networks (VLANs). Layer 3 switching is enabled.
•
S50multivlan - Script which sets up multiple untagged VLANs. (See Using the S50layer3
Script) Layer 3 switching is enabled.
•
S55gatedRip1 - Script which is used with a Layer 3 switch and calls the GateD daemon to
enable RIP 1 routing protocol.
•
S55gatedRip2 - Script which is used with a Layer 3 switch and calls the GateD daemon to
enable RIP 2 routing protocol.
•
S55gatedOspf - Script which is used with a Layer 3 switch and calls the GateD daemon to
enable OSPF routing protocol.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 47
Overview of OpenArchitect VLAN Interfaces
A zhp device is associated with one VLAN. zhp may have one or more physical ports and their
associated zre devices. A VLAN from the viewpoint of the switch is a logical mapping of ports
based on intended use. The primary purpose of a VLAN is to isolate traffic and enable
communication to flow more efficiently within groups of mutual interest. The switch is used to
bridge from one VLAN to another. Figure 4.1: Fabric VLANs is an example of a custom layer2
VLAN network structure in a fabric switch.
Figure 4.1: Fabric VLANs
In the Figure 4.1, four VLANs for each fabric switch are used to organize traffic. This is just one
example of how a layer 2 switch could be configured with the fabric switch.
Tagging and Untagging VLANs
The OpenArchitect switch is capable of switching VLAN tagged and untagged data packets.
VLAN tagged packets conform to the 802.1q specification and the packet header contains an
additional four bytes of VLAN tag information. A given port can be specified to accept VLAN
tagged or untagged traffic. Internally, all traffic for a particular VLAN is treated as tagged traffic.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 48
Switch Port Interfaces
For each switch port, OpenArchitect creates a separate interface with its own MAC address called
a ZNYX raw Ethernet (zre). After the initial power up, 48 zre interfaces are created, one for
each in band port. You cannot directly access or modify the zre interfaces.
During the initial power up of the switch, the default configuration creates a Layer 2 switch. The
Layer 2 configuration places the zre interfaces in one zhp interface. See Figure 4.1: Fabric
VLANs The number after zre represents the corresponding switch port number (that is, zre1
represents port 1 on the switch).
Layer 2 Switch Configuration
The steps to build a Layer 2 switch involve creating groups of switch ports in VLANs (Layer 2
switching domains) and bringing the interfaces up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group.
A startup script called /etc/rcZ.d/S50layer2 is executed at boot time creating one untagged
VLAN (zhp0) for all ports. The ISL is assigned its own VLAN. The interface to the host is then
assigned the IP address of 10.0.0.43 to allow access to the switch. The VLAN is assigned an IP
address. The S50layer2 script does the following:
## Create a single untagged vlan (i.e. interface), consisting
# of the 48 Gigabit Ethernet ports Layer 2 forwarding enabled
# Put the ISL in its own vlan to avoid loops
#
/usr/sbin/zconfig zhp0: vlan1=zre0..50
/usr/sbin/zconfig zre0..50=untag1
/usr/sbin/zconfig zhp1: vlan2=zre51
/usr/sbin/zconfig zre51=untag2
sleep 1
#
# Assign the ZNYX default IP address 10.0.0.43 to the
# zhp0 interface and start it
#
ifconfig zhp0 10.0.0.43 netmask 255.255.255.0 broadcast
10.0.0.255 up
Ethernet Switch Blade User's Guide
release 3.2.2j
page 49
ifconfig zhp1 0.0.0.0
#
# At this point the system will act as a Layer 2 switch
# across all ports. Also, the system will accept telnet()
# connections on 10.0.0.43 on any port.
Script(s) may then
# be run to reinitialize the system and modify its
# configuration.
Using the S50layer2 Script
The S50layer2 script can be used as an example, and edited to customize your Layer2 setup.
The default script may not match your physical port configuration. In that case you will have to
alter the script to suit your circumstances. For example, to reconfigure the IP address on your
Layer 2 switch,
Open the S50layer2 file in the Linux vi editor.
Change the IP address value listed under the Linux ifconfig(1M) command line.
Save your changes by running OpenArchitect zsync.
zsync
Reboot the switch.
Rapid Spanning Tree
The Rapid Spanning Tree Protocol (RSTP) configures a simply connected active topology from
the arbitrarily connected components of a Bridged Local Area Network. RSTP participants use a
simple dialog carried in packets called Bridge Protocol Data Units (BPDUs) for finding the
shortest path between two networks and for eliminating loops from the topology. If nodes
attached to ports fail or are added or deleted, the topology dynamically changes to accommodate
the new configuration. If your network topology is such that there is no real redundancy or
chance for loops, you do not need to turn on Spanning Tree.
zl2d is a shell script used to create Linux bridges consisting of the name of the previously
created zhp device or devices preceded with a "b" (for example, if you are creating a Bridge
device from zhp0, the resulting device would be bzhp0). zl2d then starts a background task
that monitors the port information of the Linux bridge at a specified interval and updates the
Spanning Tree state fields in the hardware when necessary.
brctl(8) is called by zl2d for configuring certain RSTP parameters. For an explanation of
these parameters, see the IEEE 802.1d specification, or reference the brctl(8) man page in
Appendix A. The following demonstrates a simple example of setting up a Layer 2 switch and
starting RSTP.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 50
To Enable Rapid Spanning Tree:
Create a VLAN containing the ports that will be a part of the Linux bridge running Rapid
Spanning Tree. This example will use ports 0-3 (untagged):
zconfig zhp0: vlan1=zre0..3
zconfig zre0..3=untag1
Create a bridge device from the zhp device,
zl2d start zhp0
A Bridge device named bzhp0 should now exist consisting of ports zre0 through zre3 with
Spanning Tree enabled. To view the bridge device, use the brctl command,
brctl show
brctl showbr bzhp0
Port Path Cost
Each port has an associated cost that contributes to the total cost of the path to the Root Bridge
when the port is the root port. The smaller the cost, the better the path. The Ethernet Switch Blade
uses the following IEEE 802.1D recommendations based on the connection speed of your port:
Port Path Cost
Link Speed
Recommended Value
Recommended Range
10 Mb/s
100
50-600
100 Mb/s
19
10-60
1 Gb/s
4
3-10
10 Gb/s
TBD
TBD
To change the port path, use the brctl setpathcost option. For example, to set the port
priority to a value consistent with a gigabit interface,
brctl setpathcost bzhp0 zre1 4
Ethernet Switch Blade User's Guide
release 3.2.2j
page 51
Layer 3 Switch Configuration
The previous section outlines the Layer 2 switch configuration that is automatically configured
when you initially bring up the OpenArchitect switch. In order to communicate between Layer2
interfaces, you must properly setup routing.
The steps to build a Layer 2 switch involve creating a group of switch ports in a VLAN (or Layer
2 switching domain) and bringing that interface up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group with Layer 2 switching. Layer 3 routing information is then used to route
between the Layer 2 network devices.
Take a simple example of two VLANs configured on the switch, each with four ports. First
teardown any existing configuration,
zconfig –t
Use zconfig to create two new VLANs, each with four ports, and untag them,
zconfig zhp0: vlan1=zre1..4
zconfig zre1..4=untag1
zconfig zhp1: vlan2=zre5..8
zconfig zre5..8=untag2
Now, use ifconfig to assign each zhp interface an IP address,
ifconfig zhp0 10.0.0.1
ifconfig zhp1 11.0.0.1
At this point, the Linux host has enough information to route between the networks of the directly
attached interfaces, 10.0.0.0 via zhp0, and 11.0.0.0 via zhp1.
The next step is to enable the zl3d daemon to move that routing information from the host to the
Ethernet Switch Blade switching tables in silicon. Once enabled, zl3d will monitor the Linux
routing tables for changes in configuration and update the switch silicon tables. Start zl3d to
update the switch tables:
zl3d zhp0 zhp1
The Ethernet Switch Blade switch is now configured as a Layer3 switch that can route between
two Layer2 devices in silicon.
Using the S50layer3 Script
To modify the configuration to a Layer 3 switch, remove the S50layer2 file from the
/etc/rcZ.d directory, and replace it with the example script file, S50layer3.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 52
In the S50layer3 script separate VLANs are set up for each port. The VLANs, are labeled as
zhp0..zhpn. Each VLAN is associated with an individual zre interface. There is always a
one to one connection between VLANs and zhp interfaces. Remember, zre and zhp interfaces
can begin with a zero value but a VLAN cannot (that is, zhp0 has zre0 on vlan1, zhp1 has
zre1 on vlan2). Each zhp interface is assigned a separate IP address in the example script.
The S50layer3 script executes the following commands:
•
Runs zconfig command to create 48 untagged VLANs (one for each switch port).
/usr/sbin/zconfig zhp0..47: vlan1..48=zre0+
/usr/sbin/zconfig zre0..47=untag1+
NOTE : Double periods (..) after vlan1 and untag1 are used to indicate a range of values.
The plus (+) sign after zre1 is a wildcard character that means auto-incremented and
causes each zhp interface to hold only one zre (that is, zhp0 has zre1 on vlan1,
zhp1 has zre1 on vlan2).
Runs the Linux ifconfig(1M) command for each interface to assign default IP addresses
(10.0.0.43-10.0.47.43), sets the netmask and brings up the interfaces.
ifconfig zhp0 10.0.00.42 netmask 255.255.255.0 up
ifconfig zhp1 10.0.01.42 netmask 255.255.255.0 up
ifconfig zhp2 10.0.02.42 netmask 255.255.255.0 up
.
.
.
ifconfig zhp21 10.0.45.42 netmask 255.255.255.0 up
ifconfig zhp22 10.0.46.42 netmask 255.255.255.0 up
ifconfig zhp23 10.0.47.42 netmask 255.255.255.0 up
•
Runs the OpenArchitect zl3d. The zl3d application monitors the Linux routing tables
and updates the switch routing tables for each interface configured above.
/usr/sbin/zl3d zhp0..47
zl3d initially creates and adds each zhp interface (VLAN) to the switch routing tables. The
zhp0..zhp47 is shorthand for the list of interfaces (zhp0, zhp1, …, zhp47) to monitor
with zl3d.
To Modify the Layer 3 Script
•
Modify the example script you copied into the /etc/rcZ.d directory. Adjust and assign
Ethernet Switch Blade User's Guide
release 3.2.2j
page 53
the number of IP addresses as applicable. In the example below, the IP address is changed
for the interface in the ifconfig command line of the script.
From:
ifconfig zhp0 10.0.0.43 netmask 255.255.255.0 broadcast
10.0.0.255 up
To:
ifconfig zhp0 193.08.1.1 netmask 255.255.255.0 broadcast
193.08.1.255 up
•
•
•
•
Adjust the number of zhp interfaces, that are added to the routing tables, depending on the
number of VLANs you are adding for your network. Include any other details, as
applicable.
Run the OpenArchitect zsync command to save your changes.
zsync
Reboot the switch.
After rebooting, your switch works from your customized Layer 3 configuration.
Layer 3 Routing Protocols with GateD
An advanced networking configuration may require using the GateD software platform for
deployment of Routing Information Protocols (RIP 1 or RIP 2) and Open Shortest Path First
(OSPF) protocols. Once you’ve configured your Layer2 and Layer3 devices, start gated.
Using the S55gatedRip1 Script
To use GateD protocol with the switch, you need to copy two files into the same directory as your
Layer 3 configuration file. From the /etc/rcZ.d/examples folder, copy the example script file
and its corresponding GateD configuration file (for example, S55gatedRip1 and
gated.conf.rip1).
The example startup script executes the following commands (S55gatedRip1 is used as an
example):
•
Starts GateD with Rip1 using gated.conf.rip1 as the configuration file:
/usr/sbin/gated –f /etc/rcZ.d/gated.conf.rip1
The GateD conf file specifies the following configuration commands:
•
Implements the passive function so GateD is prevented from rerouting information to a
different interface if insufficient information is received.
interface 10.0.0.43 passive
Ethernet Switch Blade User's Guide
release 3.2.2j
page 54
interface 10.0.1.42 passive
interface 10.0.2.42 passive
.
.
.
interface 10.0.13.42 passive
interface 10.0.14.42 passive
interface 10.0.15.42 passive
•
Defines the netmask used in the interface.
define 10.0.0.43 netmask 255.255.255.0;
define 10.0.1.42 netmask 255.255.255.0;
define 10.0.2.42 netmask 255.255.255.0;
.
.
.
define 10.0.13.42 netmask 255.255.255.0;
define 10.0.14.42 netmask 255.255.255.0;
define 10.0.15.42 netmask 255.255.255.0;
•
Sets the RIP1 protocol to open.
};
rip1 yes{
•
Shuts off sending and receiving packets from all interfaces.
interface all noripin noripout
•
Opens sending and receiving packets for selected interfaces.
interface 10.0.0.43 ripin ripout version 1;
interface 10.0.1.43 ripin ripout version 1;
interface 10.0.2.43 ripin ripout version 1;
.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 55
.
.
interface 10.0.13.43 ripin ripout version 1;
interface 10.0.14.43 ripin ripout version 1;
interface 10.0.15.43 ripin ripout version 1;
•
Imports routes learned through the RIP protocol.
import proto rip {
all;
};
•
Exports all directly connected routes and routes learned from the RIP protocol.
export proto rip {
proto direct }
all;
};
proto rip {
all;
};
To Modify the GateD Scripts:
Copy two GateD files, the OpenArchitect "S" file and its corresponding conf file, into the rcZ.d
directory (that is, S55gatedRip1 and gated.conf.rip1). Notice the files are placed in the
same directory as the Layer 3 configuration file.
For RIP1:
cp /etc/rcZ.d/examples/S55gatedRip1 /etc/rcZ.d
cp /etc/rcZ.d/examples/gated.conf.rip1 /etc/rcZ.d
Or for RIP2:
cp /etc/rcZ.d/examples/S55gatedRip2 /etc/rcZ.d
cp /etc/rcZ.d/examples/gated.conf.rip2 /etc/rcZ.d
Ethernet Switch Blade User's Guide
release 3.2.2j
page 56
Or for OSPF:
cp /etc/rcZ.d/examples/S55gatedOspf /etc/rcZ.d
cp /etc/rcZ.d/examples/gated.conf.ospf /etc/rcZ.d
Open and make configuration changes to the listed conf file to coincide with the current
Layer 3 configuration (that is, adjust IP addresses and number of interfaces available). See
GateD documentation if you have questions regarding the conf file.
• Run the OpenArchitect zsync command to save your changes. Be sure your changes are
correct:
Zsync
•
•
Reboot the switch. After rebooting, your switch operates as a Layer 3 switch with GateD
routing.
Class of Service (COS)
This following section provides information on using the OpenArchitect switch to provide Class
of Service (COS) support. The switching fabric architecture defines the scope of the COS
parameters. Some apply to an individual port, and others apply to the whole switch. It is
important for the user to understand the scope of the parameters to ensure that the expected
behavior occurs.
Egress Queues
The Ethernet Switch Blade fabric switch provides 1 to 8 COS queues per egress port, and for
packets destined to the CPU from the switching fabric. By default, a freshly booted
OpenArchitect switch has a single queue per egress port (and the CPU).
Ingress Classification
Incoming packets are mapped to queues based on their priority tags. The built-in behavior of the
Ethernet Switch Blade uses the 802.1p tag within a packet as the queue selector. There is one
COS to queue selector map per port.
By using the Linux iptables utility and zfilterd with ztmd, the queue selection can be
based on any information in the first 64 bytes of the IP packet header. The default OpenArchitect
switch behavior has all COS values mapping to a single queue on each of the egress ports.
A default priority for an untagged packet can be assigned for each port. By default, these
incoming priority values are all mapped to COS queue 0. To change the default priority for
untagged packets, or to define the mapping from priority values to COS queues, use the zcos
command (refer to Appendix A).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 57
Marking and Re-marking
The OpenArchitect switch can mark or remark packets using the TOS field or 802.1p tag. This is
also controlled through the Linux iptables utility.
Scheduling
The servicing of configured queues by the switching fabric is referred to as scheduling. The
OpenArchitect switch has three built-in scheduling algorithms. The type of scheduling algorithm
used is implied, rather than being explicitly specified, based on the number of queues and which
options are configured. The following scheduling algorithms are provided:
First In First Out (FIFO) – When only one queue is configured per port, packets are serviced in
the order in which they arrive. This is the default for the OpenArchitect switch.
Strict Priority – This algorithm is used when more than one queue is provisioned on the port. The
highest priority queue, which is also the highest numbered one, is always serviced first (Example:
If four queues are configured, queue three is of higher priority than queue zero). As long as there
are packets in the highest priority queue, the lower priority queues are not serviced. The danger is
that higher priority traffic could block lower priority traffic.
Weighted Round Robin (WRR) – This algorithm is similar to Strict Priority scheduling, but it
provides fairness with quanta for each queue. Each queue is assigned a number of packets, known
as weight, that it is allowed to transmit before it yields to a lower priority queue. Note that with
WRR, the priorities of the queues are dependent on the weights allocated. A higher priority queue
with a smaller weight will get less wire-time than a lower priority queue configured with a larger
weight. The relative weights used for priority queues on a port can be set using the zcos
command (this is a switch-wide parameter).
ztmd Explained
ztmd is a traffic management daemon which accepts messages from traffic filtering and quality
of service applications and sets up the hardware.
zfilterd Explained
zfilterd is a daemon that intercepts filtering rules entered by the user via iptables, checks
them for validity and then passes them on to ztmd for entry in the switch.
Running zfilterd
Before starting zfilterd, ztmd must be running. Your can start both from within a script, or
directly from the command line. For example,
ztmd
zfilterd
iptables rules can be entered at any time. If your iptables filtering rules set is extensive,
Ethernet Switch Blade User's Guide
release 3.2.2j
page 58
you may want to move your set of iptables commands to a start up script to run upon
initialization. This could be accomplished by creating a standalone "S" script and placing that
script into / etc/rcZ.d.
Restrictions on Implementation
Several restrictions exist on the rules that can be implemented on the FFP hardware. These
include:
Actions
DROP the packet. ACCEPT the packet.
Output Port
Should be specified if the action is ACCEPT, if no output port is specified, an IRULE table entry
is generated for every port.
Field values
If specified as ranges, they must be on power of two boundaries.
Negation
Can only be used for icmp, tcp, or udp fields.
Fields supported are: Source IP address, destination IP address, IP protocol, TCP or UDP source
port or destination port, ICMP type, and TCP flags bits (such as SYN).
The input port and output port may also be specified as either zre<n>, where <n> is one of the
48 physical ports, or as zhp<n>, where the zhp interface used must be previously defined using
zconfig.
A restriction on the fields supported is the size of the IMASK table. There are only 16 entries per
port available, which means only 16 combinations of fields can be used at any time.
Conflict Resolution
There are differences from the expected behavior of implementing iptables in a host:
Although the rules are taken from the FORWARD and INPUT chains, they are applied to all
packets, including those destined for the local CPU. The order of application of the rules is not
necessarily the order in which they appear in the chains. If a rule uses a mask that is less
restrictive than another rule, it will be applied first. The last rule that is matched determines the
action that will take place. For example, the rules:
iptables -a FORWARD -i zhp3 -j DROP
smtp
iptables -a FORWARD -i zhp3 -o zhp1 -p tcp --dport
-j ACCEPT
result in SMTP packets received on any port in zhp3 to be sent for any port in zhp1; all other
packets from zhp3 would be dropped. The order of the two rules in the FORWARD chain does
not matter.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 59
On the other hand, in the following sequence of rules, the position of the rule that drops SYN
packets is important. Since the set of fields it examines is not a subset of the fields examined by
the ACCEPT rules, and visa versa, the ordering rule given above does not apply. In this case, the
order it is applied will be the same as its position in the FORWARD chain, and all packets which
are TCP SYN packets from zhp5 for zhp3 will be DROPPED, even if they also match one of
the ACCEPT rules.
iptables -a FORWARD -i zhp5 -o zhp3 -j DROP
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport smtp -j
ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p udp --sport domain -j
ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport domain -j
ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport www -j
ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport 23 -j
ACCEPT # rsync
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --syn -j DROP
iptables and filtering
iptables is a firewall management user-space utility used in conjunction with the Linux 2.4
kernels, and takes advantage of the netfilter 2.4 kernel code. iptables is extended with a few
more targets to support the hardware filtering functionality used in the chips on the Ethernet
Switch Blade (fabric board). Generally, all of the iptables functionality is usable with a few
minor extensions.
A more detailed source on iptables can be found at:
http://www.netfilter.org/
Almost all the contents described here are derived from there.
There are also many tutorials and iptables manipulation tools, both graphical and command
line. This is expressive of the Open Architect concept. A good place to start is:
http://freshmeat.net/search/?q=iptables
Introduction
Firewall rules are stored in tables. These tables are sometimes also known as firewall chains or
just chains. Tables normally store rules for what are known as hooks, which can be looked as
packet-path junctions. There are five defined hooks: PRE-ROUTE, POST-ROUTE, INPUT,
OUTPUT and FORWARDING. The example below illustrates the default chains on boot up.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 60
By default, INPUT, FORWARD and OUTPUT chains are installed on boot up. Additional rules
can be installed for the other chains. Additionally, one can write software extensions to add more
chains. Figure 4.2 provides an illustration of the Firewall Flow.
In
c
o
m
in
g
P
re
ro
u
te
In
p
u
t
R
o
u
tin
g
D
e
c
is
io
n
F
o
rw
a
rd
L
o
c
a
l
P
ro
c
e
s
s
P
o
s
t
R
o
u
te
O
u
tg
o
in
g
O
u
tp
u
t
Figure 4.2: Firewall Flow
When a packet reaches a circle in the diagram, that chain is examined to decide the fate of the
packet. Two basic fates of a packet are defined as DROP and ACCEPT. If the chain says to
DROP the packet, it is killed there; however, if the chain says to ACCEPT the packet, it
continues traversing the diagram, ultimately terminating at an application or getting forwarded
out of the box. There are additional actions which may be applied to packets. These are
described in the "Supported Targets" section.
A chain is a checklist of rules. Each rule is checked against the packet header and if a rule
matches, action is taken. If the rule doesn't match the packet, then the next rule in the chain is
consulted. Finally, if there are no more rules to consult, then the kernel looks at the chain default
policy to decide what to do. In a security-conscious system, this policy usually tells the kernel to
DROP the packet.
In the Ethernet Switch Blade product, both the FORWARD chain hook, and the INPUT chain
hook (packets destined for the CPU) are implemented in hardware. The rest of the hooks are in
software in the Linux kernel. An extension of the FORWARD hook also resides in software. It is
important to note that this is in sync with routing being implemented in hardware with software
assist for exception handling. Under general circumstances, when routing happens in hardware,
only the FORWARD chain is traversed. Under exceptional handling of an incoming packet, one
can force the full software traversal. As a router you do not really care about the other hooks
except in the situation where you have some special handling, in which case a policy would force
the packet to be sent to the CPU for further processing.
NOTE: This is also how one would extend the OA packet munging capabilities (for
example, introduce NAT).
Packet Walk
When a packet comes in via one of the interface ports, the Ethernet Switch Blade makes a routing
decision. If the packet was destined for the Ethernet Switch Blade fabric switch itself or if the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 61
send to CPU action is specified, it is sent to the INPUT chain for further processing. If there is no
valid way to forward the packet, it is dropped. If the switch is configured to forward the packet, it
is sent to the FORWARD chain.
Next the hardware FORWARD chain is walked. If there is a rule inserted that matches the packet
headers, then it is looked up next. The inserted policy will decide the packets fate.
In essence, a filter rule will be used to scan the packet data for certain characteristics. Upon a
match a selected 'target' is executed. The target decides what should happen to the packet.
Filter Rules Specifications
A rule could be added (-a) to a chain, deleted (-D) from a chain, replaced (-R) from a chain or
inserted (-I) in a specific position in a chain. Each rule specifies a set of conditions the packet
must meet, and what to do if it meets them ('what to do' is referred to as a `target').
Here's an example filter rule:
iptables -a FORWARD -p UDP -s 0/0 -d 10.0.0.1/32 --source-port
53 -j DROP
This adds to the FORWARD chain the rule: "If you see UDP packets (-p UDP) from anywhere
(-s 0/0) going to host 10.0.0.1 (-d 10.0.0.1/32) with a source port number 53 (--source-port 53)
then the target is to DROP (-j DROP). More details on rule specifications follow.
Specifying Source and Destination IP Addresses
Source ( -s, --source or --src) and destination (-d, --destination or --dst) IP addresses can be
specified in four ways. The most common way is to use the full name, such as localhost or
www.linuxhq.com. The second way is to specify the IP address such as 127.0.0.1.
Netmasks can be applied to IP addresses to specify ranges, like199.95.207.0/24 or
199.95.207.0/255.255.255.0 Both specify any IP address from 199.95.207.0 to 199.95.207.255
inclusive. To specify an all-inclusive IP address /0 can be used, like: -s or -d 0/0. The example
rule we use above applies this trick. Note however that the effect above is the same as not
specifying the -s option at all.
Specifying Protocol
The protocol can be specified with the -p (or --protocol) flag. Protocol can be a number (if you
know the numeric protocol values for IP) or a name for the special cases of TCP, UDP or ICMP.
Case does not matter, so tcp works as well as TCP.
Specifying an ICMP Message Type
If the protocol is ICMP, the --icmp-type option can be used to match a specific message type,
for example, --icmp-type ping
Ethernet Switch Blade User's Guide
release 3.2.2j
page 62
The type can be preceded by ! to match any message except the type listed, for example, -icmp-type ! 1
Specifying TCP or UDP ports
If the protocol is TCP or UDP, the -s ( or --sport) and -d (or --dport) options specify the
TCP or UDP ports to match.
A range of ports can be specified by giving the first and last ports separated by a :, as in -dport 0:1023. It is also possible to precede the port specification with a ! to match all ports
which are not included in the range, for example, --sport ! 0:1023. However, the range of
ports must be a power of two, starting with a port number which is a multiple of the range.
Specifying TCP flags
If the protocol is TCP, a match on particular TCP flags is specified by listing the flag names; for
example, -p tcp --syn.
Specifying an Interface
The -i (or --in-interface) and -o (or --out-interface) options specify the name of an
interface to match. An interface is the physical device the packet came in on (-i) or is going out
on (-o). You can use the ifconfig command to list the `up' interfaces (for example, working
at the moment).
As a special case, an interface name ending with a + will match all interfaces, whether they
currently exist or not, which begin with that string. For example, to specify a rule which matches
all zhp interfaces, the -i zhp+ option would be used.
Filter Rule Targets
As mentioned above the -j construct within a rule specifies which target is to be used in filter rule
to define a target.
Supported Targets
The following are the supported targets. The switch has many additional targets that are software
based (example Network Address Translation or generic connection tracking).
Classical Targets
DROP This drops the packet.
ACCEPT
Accepts the packet
ZNYX Targets
ZACTION
This is the ZNYX Action target.
Parameters for ZACTION:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 63
--drop
--accept
Drops the packet
Accepts the packet
--set-prio <val>Set the 802.1p priority to <val>
--use-prio <val>Use queue priority <val>
--copy-cpu Send the packet to the CPU.
installed chains traversal in software
--set-eport <val>
This will force the full
Redirect the packet to port <val>
--set-mport <val> Mirror the packet to port
<val>
--set-tos <val> Set the IP-Precedence bits in the TOS field of
the IP header to <val>
--set-dscp <val>Set the 6-bit DSCP
header to <val>.
in the TOS field of the IP
Options with any of these ZACTION parameters:
--counter <val> Increment classifier hit counter <val>
--arp
Not an action,
match only ARP packets.
-i option can be used to specify ingress port or VLAN,
-d specifies target IP address,
-p specifies arp operation as request (1) or response (2).
For arp response, the -o field can be used to specify the egress port.
ZACTION Examples
Send all tcp packets arriving on zhp5 out port 2:
iptables -a FORWARD -i zhp5 -p tcp -j ZACTION --set-eport 2
Send all tcp packets arriving on zhp5 to the CPU (software).
iptables -a FORWARD -i zhp5 -p tcp -j ZACTION --copy-cpu
Set the 802.1p priority to 3 on all tcp packets arriving on zhp5.
iptables -a FORWARD -i zhp5 -p tcp -j ZACTION --set-prio 3
Extensions to the default matches
These are described in the Linux packet filtering HOWTO at:
http://netfilter.org/documentation/index.html#documentation-howto
Ethernet Switch Blade User's Guide
release 3.2.2j
page 64
FORWARDING Chain supports all of them.
tc and zqosd
tc, which stands for Traffic Control, is a mechanism for enabling Quality of Service on Linux.
tc uses three functional objects: queuing disciplines, which comprise queuing and scheduling
algorithms such as FIFO queues, priority queues, RED queues, and token buckets; classes, which
are leafs in queuing discipline hierarchies; and filters, such as u32 filters and route filters. In
addition to these three building blocks, tc also includes policers and meters, which may be
associated with filters.
The functional elements of tc may be combined to produce complex QoS rules. For example, a
packet may be matched to a filter, metered, policed as in-profile or out-of-profile, remarked,
mapped to a FIFO queue, and transmitted by a priority scheduler. tc is very flexible in the data
paths that it allows.
The utility zqosd is a daemon that monitors Linux QoS policy and shadows the policy rules into
a hardware configuration. When zqosd is running, tc rules are translated into hardware rules.
NOTE: This document does not detail all of the capabilities of the tc command, rather it
explicitly mentions only features that are supported by OpenArchitect-based switches.
The examples that follow assume that the switch is running the standard Layer 2 start-up script,
/etc/rcZ.d/examples/S50layer2, with all ports placed in a single VLAN, zhp0. Note that
this assumption is implied only by the fact that changes to zhp0 are shown to configure all ports.
Neither tc nor zqosd is limited by the interface setup. Each utility works on either VLANs
(zhp) or ports (zre).
FIFO Queues (pfifo and bfifo disciplines)
The simplest configuration for tc involves no classes or filters, and only a single FIFO queue.
With tc, queue sizes may be specified in bytes or packets. The first example defines a packetlimited FIFO. This example begins with only tc and then illustrates tc in conjunction with
zqosd.
As a first step, confirm that no tc configuration is active on the switch, by listing any queue
disciplines:
tc qdisc ls
The command should return nothing. Now, add a single packet-limited FIFO queue to zhp0 and
confirm that it has been installed to software:
tc qdisc add dev zhp0 handle 100:0 root pfifo limit 32
tc qdisc ls
The output should display the following,
Ethernet Switch Blade User's Guide
release 3.2.2j
page 65
qdisc pfifo 100: dev zhp0 limit 32p
The tc command is applied to a device, so dev zhp0 must be specified. Note that a VLAN,
such as zhp0, and a port, such as zre0, are each treated as devices. Breakdown of the options:
handle 100:0
Defines the handle for the queuing discipline. This handle may be used to reference the pfifo
queue. Note that the handle is included with the output of the qdisc ls command.
(100:0 and 100: are equivalent in tc.) The choice of handle is significant for zqosd.
root
Tells tc that this is the base queuing discipline for the device, not a child of another queuing
discipline.
pfifo limit 32
Specifies a packet-limited FIFO queue with an upper bound of 32 packets.
Now, delete the queuing discipline from zhp0 and confirm that it has been removed:
tc qdisc del dev zhp0 root
tc qdisc ls
Thus far, tc has been used without zqosd. It is not sufficient to install software rules on the
OpenArchitect switch though, because the normal case is for packets to be switched in hardware.
For that reason, zqosd must be used to shadow tc configuration into hardware. Like
zfilterd, zqosd works with ztmd, which provides the actual hardware interaction.
If ztmd is not already running, start it:, then initiate the zqosd daemon with no parameters:
ztmd
zqosd
Now, repeat the same tc command as before, to install a packet-limited FIFO queue:
tc qdisc add dev zhp0 handle 100:0 root pfifo limit 32
When this command is processed, zqosd detects the state change and generates output.
For each port belonging to zhp0, the queue size has changed to 32 packets. Under the default
switch configuration, all ports other than the CPU port belong to zhp0; so all queues other than
the CPU queue are affected.
As before, remove the tc configuration with the command:
tc qdisc del dev zhp0 root
Note that zqosd detects this state change. In fact, examining the CoS configuration on the
switch reveals that the queue sizes have reverted to their default values.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 66
The byte-limited FIFO queue case differs only slightly from the packet-limited FIFO case. The
syntax is almost identical. In hardware the limit is based on 128-byte cells. The specified byte
limit is divided by 128 to determine the cell limit. Always specify a byte limit of at least 128
bytes to avoid setting the queue length to zero.
For example, to set the byte limit for zhp0 to 4096,
tc qdisc add dev zhp0 handle 100:0 root bfifo limit 4096
Tear down any installed rules before proceeding with the next example:
tc qdisc del dev zhp0 root
PRIO and WRR queues
The FIFO examples used a single queue for each interface. In fact, the Ethernet Switch Blade
fabric switch is capable of attaching 1 to 8 queues to each port, with either priority or weighted
round robin (WRR) scheduling, and classification based on a priority map.
In tc, the prio queuing discipline establishes multiple queues and specifies their associated
priority map. Although WRR support is not part of the standard tc distribution, it has been
added to the prio discipline.
The final example in this document illustrates WRR. A strict priority scheduler is a simpler case
that can be constructed easily from this example.
Examine the existing CoS settings on the switch, noting the number of queues per port, queue
sizes, scheduling parameters, and priority map. Each of these values changes with this test.
The full set of commands to install four queues, a priority map, and weights is as follows:
tc qdisc add dev zhp0 handle 100:0 root prio bands 4 priomap 1
2 2 2 3 3 3 3 1 1 1 1 1 1 1 1 wrr 1 2 4 6
tc qdisc add dev zhp0 parent 100:1 pfifo limit 120
tc qdisc add dev zhp0 parent 100:2 pfifo limit 100
tc qdisc add dev zhp0 parent 100:3 pfifo limit
80
tc qdisc add dev zhp0 parent 100:4 pfifo limit
60
The first command attaches a queuing discipline as the root discipline for zhp0, with a handle of
“100:0,” as in the FIFO cases. The “prio” option identifies the type of queuing discipline.
Priority scheduling implies multiple queues and the “bands 4” parameters specify that there are
four queues.
The priority map may be read from left to right as Priority n maps to Queue q, where n is the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 67
index of the list element (numbering from 0) and q is the value specified by that element. So, this
example would read:
Priority 0 maps to Queue 1
Priority 1 maps to Queue 2
Priority 2 maps to Queue 2
Priority 3 maps to Queue 2
Priority 4 maps to Queue 3
Note that the tc priority map applies to a 4-bit field. With the Ethernet Switch Blade, the
priority map refers to the 802.1p tag, which is a 3-bit field. When translating this tc rule to
hardware, only Priorities 0 through 7 are significant; the other eight priorities are ignored.
The parameters wrr 1 2 4 6 specify that WRR scheduling is being used and assigns a relative
weight to each queue. The weights are treated as numbers of packets to be sent from each queue.
In this example, if the queues have sufficient packets, queue 1 will have twice as many packets
sent as queue 0, queue 2 will have four times as many, and queue 3 will have six times as many.
wrr parameters are scaled such that the maximum value is no more than 15. values which would
be 0 are set to 1:
Queue 0 has a weight of 1000 bytes
Queue 1 has a weight of 2000 bytes
Queue 2 has a weight of 4000 bytes
Queue 3 has a weight of 6000 bytes
The remaining commands each define a packet-limited FIFO queue. As with all previous tc
examples, these queues are created on device zhp0. However, unlike all previous examples,
they are not created as root disciplines for the device. Instead, the “parent” option identifies them
as child queues of the prio discipline.
For example, “parent 100:1” identifies that queue as the first child of the prio discipline (Queue
0), because the prio discipline’s handle is 100:0.
After running each of those commands, again examine the CoS parameters. As with the simple
FIFO example, queue sizes change to 32 packets. In addition, though, the number of queues
changes to 4 for each port in zhp0. Furthermore, the weights have changed for each queue, as
have the queue mappings.
To test the strict priority case, simply remove the wrr 1 2 4 6 options from the first tc
command. Note that all queue disciplines in this test may be cleared by deleting the root
discipline, as before:
tc qdisc del dev zhp0 root
Ethernet Switch Blade User's Guide
release 3.2.2j
page 68
The U32 Filter
The U32 filter provides the capability to match on fields in the L2, L3 or L4 header of a packet.
Each match rule gives the location of the field to be tested, which is always a 32 bit word, a mask
selecting the bits to be tested, and a value which is to be matched by the packet field. Many
matches can be specified in one tc filter command. Only if all matches succeed does the filter
match. In that case, the flowid field identifies the classid of the class this packet belongs in.
The following tc commands put all icmp packets in class 100:10, packets from IP address
1.2.3.4 in class 100:20. Packets for IP address 1.2.3.4 in class 100:20, and arp reply packets in
class 100:30. The last filter illustrates using an offset from the beginning of the protocol header,
along with a mask, to locate the field to be matched
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
protocol 1 0xff flowid 100:10
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
src 1.2.3.4/32 flowid 100:20
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
dst 1.2.3.4/32 flowid 100:20
tc filter add dev zhp0 protocol arp parent 100:0 u32 match u32
2 0xffff at +4 flowid 100:30
Combining Queuing Disciplines
Any of the queue length limiting disciplines can be used with the bandwidth management queue
disciplines, by defining them with the handle of one of the classes as their parent. For the htb
queueing discipline, each class has an explicit handle specified when it is defined. For the prio
queueing discipline, including wrr, each band is a class; their handles are formed from the
handle of the prio qdisc by appending a minor number of 1 to n for the n bands. For
example, the following commands define two strict priority queues for port zre5, with the lower
priority queue limited to 32 kb and the higher priority queue limited to 32 kb:
tc qdisc add dev zre5 root handle 100:0 prio bands 2 priomap 0
0 0 0 1 1 1 1
tc qdisc add dev zre5 parent 100:1 handle 110:0 bfifo limit
32kb
tc qdisc add dev zre5 parent 100:2 handle 120:0 bfifo limit
32kb
These translation rules handle conversions of individual rules from tc entries into hardware
entries. They do not explain the results of creating rules that are individually supported; but
which do not make sense in conjunction.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 69
Although the translation rules handle some inconsistency between software and hardware, a user
must define a combination of rules that is reasonable in hardware, to ensure predictable results.
Handle Semantics
All examples have illustrated zqosd copying tc rules into hardware. In fact, the zqosd utility
also enables the user to add tc rules that remain only in software. This selection is based on
handles. zqosd processes all supported queue disciplines and filters with handles between 100:0
and 200:FFFF.
COPS: Common Open Policy Service
The Common Open Policy Service (COPS) is a protocol for distributing networking policy to
devices such as switches and routers. COPS allows a single Policy Decision Point (PDP) to
distribute policy to multiple Policy Enforcement Points (PEPs). A PDP acts as a server for PEP
clients. Figure 4.3 Provides an illustration of the COPS Network Architecture.
PDP
PEP
PEP
PEP
Figure 4.3: COPS Network
Architecture
A PDP contains all of the policy rulers for its associated PEPs. A PDP typically stores rules in a
data and is a dedicated server, not a forwarding device.
A PEP is any network device that has to enforce policy decisions. For example, a switch that
restricts network access or prioritizes traffic fits the definition of a Policy Enforcement Point. A
PEP makes no policy decision. It simply applies policy that receives from its PDP.
COPS uses a connection-based query and response mechanism. The following scenario illustrates
PEP-PDP communication:
•
A PEP comes online and opens a connection to its PDP.
After a connection has been established, the PEP transmits state information to the PDP.
•
The PDP uses that state information to determine what policy is applicable for the PEP.
•
Ethernet Switch Blade User's Guide
release 3.2.2j
page 70
•
The PDP sends that policy to the PEP.
•
The PEP installs the policy and applies it to future traffic.
As long as COPS is running, a connection between the PEP and PDP should stay open. A PEP
could query a PDP at any time asking for a policy decision. Alternatively, an administrator could
modify the policy on a PDP, which would then push any policy changes to its PEPs.
Protocol Architecture
The COPS protocol is broken into several components. The base layer is the COPS protocol
itself, which defines the messaging format. This protocol defines how communication is handled
without specifying the details of the message data.
The base COPS protocol is then used by different client types. These client types apply the COPS
messaging scheme to particular types of data. The currently standardized client types deal with
the RSVP model (COPS-RSVP) and provisioning model (COPS-PR).
The COPS-RSVP scheme is designed around the requirement that a PEP will have to query a
PDP in response to events. An RSVP PEP is constantly listening for resource reservation requests
and relaying those requests to its PDP.
By contrast, the provisioning model is based on longer lasting policy. The expectation is that
policy should be administratively defined at the PDP and pushed to the PEPs as needed.
OpenArchitect is a COPS-PR client.
The most common use of COPS-PR is for distributing Differentiated Services (Diffserv) policy.
Diffserv is concerned with such Quality of Service elements as queues and schedulers.
OpenArchitect PEP
The OpenArchitect PEP implementation is known as pepd. The pepd utility is based on:
RFC 2478: Common Open Policy Service (COPS)
RFC 3084: COPS Usage for Policy Provisioning
RFC 3159: Structure of Policy Provisioning Information
RFC 3289: Management Information Base (MIB) for the Differentiated Services Architecture
Internet Draft: Differentiated Services Quality of Service Policy Information Base (latest version
draft-ietf-diffserv-pib-09)
Internet Draft: Framework Policy Information Base (latest version draft-ietf-rap-frameworkpib09)
A Policy Information Base (PIB) defines the representation of a particular data set. For example,
the Diffserv PIB specifies the structures used to represent all Diffserv elements. PIBs are
functionally equivalent to Management Information Bases (MIBs) such as those used by SNMP.
The OA PEP has implemented those portions of the Diffserv and Framework PIBs that are
supported by the underlying switch architecture.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 71
The pepd utility requires a PDP that has implemented the above RFCs and drafts. Until all draft
standards are approved, the certain COPS-PR data types will not be assigned OIDs. pepd uses
non-standard OIDs for the unassigned values.
Using pepd
The pepd utility works by connection to a PDP, informing the PDP of its roles, and installing
any rules that the PDP has for those roles. Configuration information should be specified in a
configuration file, specified on the command line with the –f option.
pepd –f <full_path_and_filename>
A sample configuration file is listed below:
PDP address: 10.0.0.11
PDP port: 3288
PEPID: some-id
Role-If: a zre1,zre2,zre3,zre4
where,
PDP address: The IP address of the PDP. Default is loopback (127.0.0.1)
PDP port:
The destination port on which to open a COPS connection. Default is 3288.
PEPID:
The PEP Identifier
Role-If:
A mapping of roles to interfaces. The name of the role is followed by a commadelineated list of interfaces. Multiple role-interface mappings are defined through multiple RoleIf declarations.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 72
Chapter 5 Fabric Switch Administration
One of the main benefits of the OpenArchitect switch is that it runs Linux, so much of the switch
administration is already familiar to most network or system administrators. It is a good idea to
complement these instructions with a standard Linux reference guide, such as Linux Network
Administrator’s Guide available from O’Reilly. Below are brief descriptions of some of the more
routine administrative task pertinent to the switch.
Setting the Root Password
The switch is shipped with a default user root and no password. To set the root password, use the
password command:
ZX7100-OA<release no.># passwd
Changing password for root
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and
numbers.
Enter new password:
Re-enter new password:
Password changed.
ZX7100-OA<release no.>#
NOTE: Even when just changing the password, you need to save the file system overlay
with the zsync command, or you will lose your changes upon reboot.
Adding Additional Users
Additional users can be added with the adduser command. Additional users are desirable for
connecting to the switch via ftpd and other daemons that require a login other than root and a
password. To create a user named guest, run adduser
ZX7100-OA<release no.># adduser guest
Changing password for guest
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and
numbers.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 73
Enter new password:
Re-enter new password:
Password changed.
ZX7100-OA<release no.># zsync
ZX7100-OA<release no.>#
Setting up a Default Route
If you wish to access the switch from some place other than a directly attached network, you may
want to setup a default route. Use the route command to set a default gateway.
route add default gw 10.0.0.254
Put the entry into the /etc/init.d/rcS startup script to automatically set a default route upon
reboot.
Name Service Resolution
Name service lookups will be done locally using /etc/hosts. You can also tell the switch
which name server to use by including an entry in /etc/resolv.conf.
DHCP Client Configuration
A utility is included to dynamically determine the IP address of the OpenArchitect switch
interfaces. To set the the IP address dynamically, execute the command,
dhclient zhp0
The default device name, zhp0, works with the default configuration of the OpenArchitect
switch and will attempt to obtain an IP address from the local DHCP server. To use DHCP to
set your IP addresses automatically on boot up, uncomment the the following line in
/etc/init.d/rcS by removing the # sign
/usr/sbin/dhclient zhp0
DHCP Server Configuration
The OpenArchitect switch includes a DHCP server. To start the DHCP server, configure
/etc/dhcpd.conf for your network, and run
Ethernet Switch Blade User's Guide
release 3.2.2j
page 74
dhcpd
Consult Linux Network administration manuals for more information on DHCP and
configuration options.
To use DHCP to set your IP addresses automatically on boot up, uncomment the the following
line in /etc/init.d/rcS by removing the # sign
dhcpd
Network Time Protocol (NTP) Client Configuration
NTP is a protocol for setting the real time clock on a system. There are numerous primary and
secondary servers available on the network. For more NTP information, and a list of available
NTP servers, see the following URL:
http://www.ntp.org/
You will need to have your network settings properly configured to reach an available NTP
server on your local network or the internet. To set the time and date, execute ntpdate with the
server of your choice. For example,
ntpdate –u ntp.ucsd.edu
The –u is required if the OpenArchitect switch is operating behind some types of firewalls.
If you wish for ntpdate to set your date and time automatically each time you boot,
uncomment the example ntpdate command line in /etc/init.d/rcS by removing the # sign.
ntpdate returns the Universal Time (UTC, formerly Greenwich Mean Time, or GMT). To
display the localtime, set the TZ variable to the appropriate name and the number of hours offset
from UTC. For instance,
export TZ=PST8
for Pacific Standard Time offset from UTC by 8 hours. To set an environment variable, add the
entry to /etc/profile. Remember to zsync to make your changes permanent.
Network File System (NFS) Client Configuration
The OpenArchitect switch includes an NFS client for mounting remote file systems. You will
need to start NFS server processes in order to use NFS. You will need to start the following
servers:
/sbin/portmap
Ethernet Switch Blade User's Guide
release 3.2.2j
page 75
/sbin/rpc.statd
/usr/sbin/rpc.mountd -r
Once the above servers are started, you can mount a remote NFS file system.
mount rhost:nfs_file_system local_mount_point
If the remote NFS file system you’re mounting is on an OA switch, you should mount with
caching disabled.
mount rhost:nfs_file_system –o noac local_mount_point
All the necessary servers are included in /etc/init.d/rcS but are commented out by default.
To automatically start all NFS client services each time you boot, uncomment the NFS Client
servers. Go to the /etc/init.d/rcS file. Uncomment the following command lines by
removing the # sign.
/sbin/portmap
/sbin/rpc.statd
/usr/sbin/rpc.mountd -r
You can also include commands to mount remote NFS file systems at boot time. There is an
example line included at the appropriate location in /etc/init.d/rcS. Uncomment and alter
the mount command included for your particular configuration.
NOTE: A “sleep” of 5 seconds is included to allow time for the links to come up prior to
attempting the mount.
sleep 5
mount 10.0.0.1:/nfs –t nfs –o noac /mnt
NFS Server Configuration
The switch also contains an NFS server so that you can mount the switch file system from other
systems. To enable the NFS server, first follow the steps to enable the NFS client. Then, edit
/etc/exports to include the file systems you wish to export. Consult a standard Linux Network
Administrator’s Guide (or man pages) regarding options for exported file systems. Generally, an
entry in /etc/exports looks like the following:
/nfs
*.localdomain.com(ro)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 76
Now start nfsd to export the mount points and begin answering requests from remote clients.
/sbin/rpc.nfsd –r
To export file systems automatically on boot, edit /etc/init.d/rcS, uncomment the
/sbin/rpc.nfsd command line by removing the #.
/sbin/rpc.nfsd -r
Connecting to the Switch Using FTP
Use ftp to transfer files to or from the switch. See the Linux Reference Guide for details of the
ftp command. In general, you can use ftp to connect to any system running an ftp server,
including other OpenArchitect switches, to either get (transfer files from the remote host to the
switch) or put (transfer files from the switch to the remote host) files.
ftp <remote_host>
ftpd Server Configuration
The switch itself can also be configured to run an FTP server (ftpd). See the Linux Reference
Guide for details of the ftpd command. You will need to add a user to the switch in order to
connect via ftp from a remote host, since root is not allowed ftp access. See the earlier section
in this chapter regarding how to add a user. The ftp daemon is started by default. If you wish to
shutdown the ftp daemon, comment out the betaftpd line in /etc/init.d/rcS.
Connecting to the Switch Using TFTP
Trivial File Transfer Protocol or tftp, is a very simple protocol used to transfer files. It is
designed to be small and easy to implement. Therefore, it lacks most of the features of a regular
FTP, like user authentication. You can use ftpd to connect to any system running a tftp server
(tftpd) including other OpenArchitect switches.
tftp <remote_host>
TFTPD Server Configuration
The tftp server is started by inetd(8) using the configuration set up in /etc/inetd.conf.
The use of tftp(1) does not require an account or password on the remote system. Due to the
lack of authentication information, tftpd will allow only publicly readable files to be accessed.
The default location of these files is /tftpboot.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 77
SNMP Agent
Simple Network Management Protocol (SNMP) is the defacto standard for network management.
An SNMP agent maintains a structure of data for a network device in a virtual information
database, called a Management Information Base (MIB). A network management station is
capable of accessing the MIB of the network device to monitor and configure the network device.
The OpenArchitect switch utilizes the NET-SNMP (formerly UCD-SNMP) agent core.
Additional information on the agent can be found at: http://www.net-snmp.com. The
OpenArchitect switch agent will respond to SNMPv1, SNMPv2, and SNMPv3 requests.
Protocols supported on the OpenArchitect switch by gated, such as RIP and OSPF communicate
with SNMP agent via the SNMP Multiplexing (SMUX) protocol.
Supported MIBS
OpenArchitect includes MIB support as documented by each of the RFCs listed. The MIBs
themselves are located on the switch in the /usr/share/snmp/mibs directory.
Supported MIBs
RFC 1155:
Structure and Identification of Management Information for TCP/IP-based
Internets
RFC 1227:
SNMP MUX Protocol and MIB
RFC 1493:
Definitions of Managed Objects for Bridges (obsoletes RFC 1286)
RFC 1657:
Definitions of Managed Objects for the Fourth Version of the Border
Gateway Protocol (BGP-4) using SMI-V2
RFC 1724:
RIP Version 2 MIB Extension (obsoletes RFC 1389)
RFC 1850:
OSPF Version 2 Management Information Base (obsoletes RFC 1253,
which obsoletes RFC 1252, which obsoletes RFC 1248)
RFC 2011:
SNMPv2 Management Information Base for the Internet Protocol Using
SMIv2
RFC 2012:
SNMPv2 Management Information Base for the Transmission Control
Protocol Using SMIv2
RFC 2012:
SNMPv2 Management Information Base for the User Datagram Protocol
Using SMIv2
RFC 2013:
Management Information Base for Network Management of TCP/IPbased internets: MIB-II (obsoletes RFC 1213, which obsoletes RFC
1158)
RFC 2021:
Remote Network Monitoring Management Information Base Version 2
RFC 2096:
IP Forwarding Table MIB
RFC 2571:
An Architecture for Describing SNMP Management Frameworks
RFC 2572:
Message Processing and Dispatching for the Simple Network
Management Protocol (SNMP)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 78
Supported MIBs
RFC 2573:
SNMP Applications
RFC 2574:
User-based Security Model (USM) for version 3 of the Simple Network
Management Protocol (SNMPv3)
RFC 2575:
View-based Security Model (VACM) for version 3 of the Simple Network
Management Protocol (SNMP)
RFC 2576:
Coexistence between Version 1, Version 2 and Version 3 of the Internetstandard Network Management Framework
RFC 2665:
Definitions of Managed Objects for Ethernet-like Interfaces
RFC 2674:
Definitions of Managed Objects for Bridges with Traffic Classes,
Multicast Filtering and Virtual LAN Extensions
RFC 2742:
Definitions of Managed Objects for Extensible SNMP Agents
RFC 2787:
Definitions of Managed Objects for the Virtual Router Redundancy
Protocol
RFC 2819:
Remote Network Monitoring Management Information Base
RFC 2863:
The Interfaces Group MIB (obsoletes RFC 2233, which obsoletes RFC
1573, which obsoletes RFC1229)
RFC 2932:
IPv4 Multicast Routing MIB
RFC 3165:
Definitions of Managed Objects for the Delegation of Management
Scripts
RFC 3231:
Definitions of Managed Objects for Scheduling Management Operations
ZNYX Networks Private
MIB
Custom ZNYX MIB to support software and hardware features not
covered by standard MIBs. The Private MIBs are ZX7100BASE.MIB AND
ZX7100FABRIC.MIB, pointed to by ZNYX-H.MIB.
UCD-SNMP Enterprise
MIB
UCD-SNMP MIB related to management and monitoring of the LINUX
host
Table 5.1: Supported MIBs
Supported Traps
Upon certain events, the OpenArchitect switch can be configured to send notification of the
event, called an SNMP Trap out to a defined recipient/manager or managers. Traps are not issued
in real time. OpenArchitect will send SNMP traps for the following conditions:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 79
Supported Traps
SNMPv2-MIB:
coldStart
SNMPv2-MIB:
authenticationFailure
IF-MIB:
linkUp
IF-MIB:
linkDown
UCD-SNMP-MIB:
ucdShutdown
RMON-MIB:
risingAlarm
RMON-MIB:
fallingAlarm
VRRP:
vrrpTrapNewMaster
VRRP:
vrrpTrapAuthFailure
EGP (rfc1213):
egpNeighborLoss
BGP4-MIB:
bgpEstablished
BGP4-MIB:
bgpBackwardTransition
Table 5.2: Supported Traps
SNMP and OpenArchitect Interface Definitions
OpenArchitect, defines three types of devices:
zre
physical port
zrl
trunk of ports
zhp
interface consisting of ports (zres) and trunks of ports
(zrls)
A zrl (trunk device) is treated as an aggregate of its constituent zres (ports). A zhp is an
aggregate of its immediately contributing sub-interfaces (zre's and zrl's). The ports that make
up a trunk do not contribute to the zhp.
The administrative status of a zre and zhp are independent of each other. If the administrative
status is down, then the operational status will be down independent of the underlying link state.
You must ifconfig up the zres to see the operational link status for a zre. When the
administrative status is up, the operational status is dependent on the underlying physical state.
For example, if zhp0 contains zre1 and zre2 the following would be true for the operational
status given the administrative status is up on zre1, zre2, and zhp0:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 80
Link and SNMP Status
Physical Link Status
SNMP Operational Status
zre1
zre2
zre1
zre2
zhp0
down
down
down
down
down
down
up
down
up
up
up
down
up
down
up
up
up
up
up
up
Table 5.3: Link and SNMP Status
The administrative status is directly controlled by ifconfig up/down. The administrative status
of the zhps and zres do not affect each other.
ifStackTable Entries
In the actual ifStackTable as shown in the MIB walk the following two OIDs (which denote
ifIndexes) show the relationships.
ifMIB.ifMIBObjects.ifStackTable.ifStackEntry.ifStackStatus.0.1 =
active(1)
ifMIB.ifMIBObjects.ifStackTable.ifStackEntry.ifStackStatus.0.2 =
active(1)
If they are X.Y then
if X = 0 there is nothing above this interface
if Y = 0 there is nothing below this interface
otherwise interface X has interface Y as a logical constituent.
•
•
SNMP Configuration
The SNMP agent is called snmpd and is started by default from the Linux boot up script
/etc/rcZ.d/S75snmpd. If you do not wish to start snmpd, remove etc/rcZ.d/S75snmpd.
Configuration of the OpenArchitect switch SNMP agent is the same as configuration of any
standard Linux host that uses the NET-SNMP agent. Configuration information for persistent
data and security information is kept in snmpd.conf under the default SNMP configuration
location, which for the OpenArchitect switch is /usr/share/snmp. snmpd.conf is the
location to change sys information such as the syslocation and syscontact, as well as
permissions such as the rocommunity or rwcommunity.
NOTE: For NET-SNMP agents, these objects (sysLocation.0, sysContact.0 and
sysName.0) ordinarily are read-write. However, specifying the value for one of these
objects by giving the appropriate token in snmpd.conf makes the corresponding object
read-only, and attempts to set the value of the object will result in a notWritable error
Ethernet Switch Blade User's Guide
release 3.2.2j
page 81
response.
The processing for link up and link down traps is now user configurable. As the default, traps
conform to RFC2863, meaning the trap contents will include:
ifIndex, ifAdminStatus and ifOperstatus
You can alter this behavior by specifying:
cisco_link_traps on
If cisco_link_traps are turned on as described then link up and link down traps will have a
cisco-like format and the trap contents will include:
ifDescr and ifType
Examine and edit /usr/share/snmp/snmpd.conf appropriately for your configuration.
Information in /usr/share/snmp/snmpd.conf is only read at startup - or when the daemon is
forced to reread its configuration. See the standard Linux man page for snmpd.conf for more
details.
SNMP Applications
The OpenArchitect switch includes the snmpget, snmpwalk, and snmpset applications you
can use these standard Linux utilities to test your SNMP agent. For example,
snmpwalk localhost –c public
walks the entire MIB of the localhost (that is, OpenArchitect switch) starting at the top of the
MIB. See the Linux Reference Man Pages for the usage of the SNMP utilities.
MIB values are decoded from their numerical representations into readable text by parsing MIBs
located in the /usr/share/snmp/mibs/ directory. If you need to add a MIB, add it to that
directory and zsync to save across reboots.
Port Mirroring
zmirror sets packet mirroring from a given set of ports to a given port. Turning on packet
mirroring causes a copy of the packet to be sent to the mirror_to port. There is only one
mirror_to port, and no limitation on mirror_from ports. Use the zmirror command in
the following way,
zmirror mirror_from mirror_to
After executing the following three commands, packets received on ports 0, 1 and 2 would be
Ethernet Switch Blade User's Guide
release 3.2.2j
page 82
mirrored (copied and transmitted) to port 12. This mirroring would be in addition to any Layer 3
or Layer 2 switching.
zmirror zre0 zre12
zmirror zre1 zre12
zmirror zre2 zre12
To clear the current mirroring use the -t option. The -e option can be used to indicate that
packets being sent on a given port should be copied to the mirror_to port. For example if the
-e option is used as follows, the packets transmitted, as opposed to received, on ports 0, 1 or 2
would be mirrored to port 12.
zmirror -e zre0 zre12
zmirror -e zre1 zre12
zmirror -e zre2 zre12
Link and LED Control
The zlc application sets the link speed and state of individual ports of the switch, or display their
current state. It can also set or clear the extract led or the internal fault led, or to set a port down
or up. To force the link on port 0 down,
zlc zre1 down
To check the status of a link,
zlc zre1 query
To check the status of all links,
zlc zre0..51 query
Link Event Monitoring
The zlmd application is intended to run as a daemon, waiting for a configured event to occur and
then running the program configured for that event. The events monitored are changes in the link
status at any of the in-band ports of the switch, the start of removal of the switch from the ATCA
backplane, or the cancellation of the removal before it actually takes place. The program can be a
shell script that initiates appropriate actions to respond to the event.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 83
Chapter 6 Fabric Switch Maintenance
This chapter includes basic information about the OpenArchitect switch environment including
an overview of the file system structure, modifying and updating switch files, upgrading the
switch driver and kernel, and implementing a system recovery.
Overview of the OpenArchitect switch boot process
The OpenArchitect switch is equipped with a Random Access Memory (RAM) disk and three
Read-Only Memory (ROM) devices, including, a boot ROM and two application flash devices.
Offset 0
Offset 0
zmon
Free space
Application
Flash 2 on
Device 2
Application
Flash 1 on
Device 1
Boot ROM
on Device 0
initrd
(exact copy
as in
Application
Flash 1)
initrd
Linux and
its file
system
Linux and its
file system
Offset 7f000
dev
bootstring
Free space
Free space
overlay
file system
Figure 6.1: ROM Devices in Open Architect
The boot ROM is located on device 0 and contains the OpenArchitect zmon application that
operates as a boot loader and includes a device bootstring. Device 1 contains the application
flash 1 image of the Linux operating system and the OpenArchitect overlay file system.
Application flash 1 is the primary working image for the switch. Device 2 contains the
application flash 2 that is an exact copy of application flash 1. You would only boot from this
device if application flash 1 is corrupted and you need to restore the switch to the factory-shipped
configuration.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 84
Bootloader examines the
bootstring in the boot
ROM
Determines
if the boot string
is dev1
Yes
Loads image
from Flash 1
to RAM
Yes
Loads image
from Flash 2
to RAM
No
Determines
if the boot String
is dev2
No
Begins
execution of
RAM image
Boot into
zmon bootloader
Figure 6.2: Boot Flow Chart
Under normal circumstances, the booting up process follows the process outlined in Figure 6.2.
During boot up, the zmon bootloader reads the device bootstring to locate and validate the
correct application image to load. The bootstring command is in the following format:
boot : X | [<options>]
X represents the device value 0, 1 or 2
The boot process opens and uncompresses the initrd image onto the RAM disk. Then zmon
begins booting the Linux image. After Linux boots, the init process executes the
/etc/init.d/rcS script which, in turn, executes /etc/rcZ.d/rc (see Figure 6.3: Init Script
Flow). The /etc/rcZ.d/rc script runs S* files in /etc/rcZ.d, with the start parameter. The S*
files are the switch configuration files (for example, S50layer2).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 85
/etc/init.d/rcS
/etc/rcZ.d/rc
S*
S*
S*
Figure 6.3: Init Script
Flow
Saving Changes
Any modifications made to the scripts for your particular configuration must be properly saved or
your changes are lost when you reboot. The file system for the switch only exists in memory. A
rewritable overlay is contained within the upper four megabytes of the first application flash.
Modifying Files and Updating the Switch
Any file in OpenArchitect can be added, deleted or modified, with the exception of /sbin/init,
/usr/sbin/zmnt, /lib/modules/zfm_c.o, and the /tmp directory. Files are saved across a system
reboot by running the script zsync.
A directory /.zsync contains database files used by zsync for managing the file system
overlaying process. The user should not modify the files in this directory or unpredictable results
may occur.
Recovering from a System Failure
If the switch does not function after you initially change or reconfigure the image, you have
several options for recovering from an error. First, try to telnet into the switch. If you are
successful, remember to run zsync after fixing your problem.
If you cannot telnet, attach a console cable to the switch. Bring down the system and properly
attach the console cable, see Connecting to the Console Port .
System Boots with a Console Cable
After attaching the system console cable, if the system boots, fix the problem that does not allow
you to telnet to the box, run zsync, and reboot. The problem is likely to be in the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 86
configuration files contained in /e t c / r c Z . d In order to telnet into the box, there must be a
configured interface with a proper IP address. For example, zhp0 is configured with the IP
address 10.0.0.43 in the factory default configuration.
Booting with the –i option
If you cannot telnet into the switch and Linux fails to boot, it is likely that a change saved by
zsync has left the switch in an inaccessible state. To allow users to recover from mistakes saved
in the overlay file system, a boot argument of –i passed to the init process will stop the
untarring of the saved overlay files. As a result, the system boots to the factory-shipped
configuration.
•
Connect through the console port. During boot up, the system displays the Linux boot string.
Linux/PPC load: for 5 seconds. During the 5 second pause, enter the boot option -i
and press Return
Linux/PPC load: root=/dev/ram init=/sbin/init -i
•
Initiating the –i option of zbootcfg.
zbootcfg –d 1 –i
•
Reboot the system. After the reboot, clear the –i option from the boot string. Enter the
following command:
zbootcfg –d 1
The reboot command will also take -i as an option and pass it to the Linux boot,
reboot -i
When the system boots, the overlay file system is returned to the factory-installed
configuration. At this point, you have a few options.
• Run zsync and the factory-installed system will be restored to your flash.
•
CAUTION: All changes you have made and saved prior to the zsync command will
be lost.
•
Restore particular files from the existing overlay. Use the zmnt command to mount the
overlay in a designated directory and copy back just the changes you want to keep from the
existing overlay. For example, if you wanted to recover your /etc/hosts file from the
existing overlay, use zmnt to mount the overlay in a designated directory, like /tmp, then
copy /tmp/etc/hosts to /etc/hosts. Lastly, use zsync to save your changes.
zmnt /tmp
cp /tmp/etc/hosts /etc/hosts
Ethernet Switch Blade User's Guide
release 3.2.2j
page 87
zsync /etc/hosts
•
Reboot the system.
System Hangs During Boot
After attaching the system console cable, if the system hangs during boot, try booting with the –i
option as described in the previous section. It is possible that important Linux system files
became corrupted and incorrectly saved in the flash overlay. Use zmnt as described in the
previous section to fix or remove the problem files from the overlay. If the system will not boot
with the –i option, refer to Booting the Duplicate Flash Image section in this chapter.
Booting the Duplicate Flash Image
Another recovery method, if Linux fails to boot, is to temporarily boot the factory-installed
duplicate image located in the second flash device.
Connect through the console port.
When you see the number counter appear after the zmonitor … banner, press any key on the
console keyboard to enter the zmon application.
At the monitor prompt, type:
boot:2
You should see the counter again, but the system should boot into the secondary kernel. If you
have difficulties booting, contact Hewlett-Packard technical support.
At this point, follow the Upgrading the OpenArchitect Image section to put a new RAM disk
image in the application flash 1.
IMPORTANT: Be sure not to program flash 2, since currently this is your only
bootable image.
The command to program flash 1 should be similar to the following command. The image name
may be slightly different depending on the model of switch and version of the image:
zflash –d 1 rdr7100.zImage.initrd
Upgrading the OpenArchitect Image
Use telnet, or preferably, attach a console cable to the switch, and login to the switch. If you
are connecting via telnet, be aware that the upgrade process will reset the switch to the default
IP address of 10.0.0.43, so you will have to be able to reach 10.0.0.43.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 88
Download the OpenArchitect image to a local system.
The OpenArchitect image is very close to the limit of free space available on a default system so
you may need to clear some space prior to downloading the OpenArchitect image to the switch.
Check for free space with the df command.
One of the easiest ways to create free space is to remove /usr/sbin/gated. The application
will be replaced during the update procedure. Once you have enough free space, proceed.
From the switch console, ftp the new OpenArchitect (rdr) image from the local system to
your switch.
• The switch has two flash available: Device 1 and device 2. Use the zflash command to
write the new OpenArchitect image into the first flash device.
•
NOTE: Make sure that Surviving Partner is not running before using zflash. The delays
incurred while zflash writes the flash can cause the Surviving Partner daemons to think
there is a failure, resulting in link oscillation.
zflash -d 1 <image_file>
The image file will be something named similar to the following,
zflash -d 1 rdr7000.zImage.initrd
Upgrading or Adding Files
Follow the procedure below to upgrade or add a new file to the switch. Place the file you are
adding or upgrading into the appropriate location in the file system. Save the file in the overlay
directory area on the application flash by running zsync.
zsync
After running zsync, the file is saved to the flash for future reboots.
Excluding Saving Files to Flash
Specific files or directories can be excluded from saving to flash by zsync by including an entry
in /etc/exclude. Likewise, existing entries in /etc/exclude such as /tmp can be removed in
order to save those files to flash with zsync.
Upgrading the Switch Driver
The switch driver upgrade process is the same as a file upgrade. However, more caution should
be taken since the driver module is likely to be the method by which you are logging into the
system. If the switch driver has a problem, you will need to have a console cable to recover. To
upgrade a switch driver, replace the file /lib/modules/if_zxe.o, run zsync and reboot.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 89
Using apt-get
apt-get is a utility created by the Debian Linux community to allow remote fetching and
installation of software stored in a repository in Debian package format. It allows users to keep
their software up-to-date with the latest binaries, and install new software without the need to
recompile.
Users may create their own repositories and add entries in /etc/apt/sources.list
( empty by default ) for their private access methods to their private repository. See
http://www.debian.org for complete APT documentation.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 90
Chapter 7 Base Switch Configuration
At this point, the OpenArchitect Ethernet Switch Blade should be installed and powered up for
the first time. This chapter helps you connect and configure the base switch by presenting
command line examples as well as a discussion of the example configuration scripts. You may
configure the fabric switch independently from the base switch.
Two switches, two consoles
There are two separate switches in the Ethernet Switch Blade. The base switch handles traffic
among base ports 0-23. These ports are reserved for control functions on the ATCA rack such as
connecting to IPMI (shelf managers), and connecting each node card to control and monitoring
devices.
Connecting to the Base Switch Console
You can connect to the switch console using a telnet connection or with a console cable. Use
the procedure below for a telnet connection. See Connecting to the Console Port, for
instructions.
•
Connect an Ethernet cable to the host and the switch.
•
Configure a host on the 10.0.0.0 network.
•
The OpenArchitect switch is pre-configured with address 10.0.0.42. telnet to 10.0.0.42.
telnet 10.0.0.42
After you are connected, enter the login name root. No password is required.
ZX6000-OA login: root
ZX6000-OA<release no.>#
OpenArchitect Configuration Procedure
Switch configurations can be accomplished with a few simple commands. Once you’ve
configured your switch, the commands should be placed into a start up configuration script. Like
most Linux systems, the OpenArchitect switch boot process runs initialization commands and
scripts in /etc/init.d/. In particular, OpenArchitect runs /etc/init.d/rcS which in turn
executes all scripts located in /etc/rcZ.d starting with an uppercase “S” in alphabetical order.
Any configuration scripts you create should be named in the standard Linux/Unix manner,
starting with an uppercase “S” and numbered in the sequence you would like them executed. The
final step once the switch has been properly configured is to use the zsync command to save all
Ethernet Switch Blade User's Guide
release 3.2.2j
page 91
files into flash for reloading.
Changing the Shell Prompt
You may use standard bash shell procedures to change the prompts on your base switches. Many
sites choose a system that distinguishes among the individual switches at their location. The
same rules apply for saving your choice (zsync) as for all other configuration changes.
Default Configuration Scripts
As shipped the following scripts are run from /etc/rcZ.d as the switch boots up:
•
S20stack - Script that calls zstack to combine the two BCM5695 twelve-port
switch fabric chips into a single 24 port virtual switch. zstack must be run before
any other switch configuration.
•
S30e1000 - Script that loads the e1000 driver module for the Out-of-Band Ethernet
ports.
•
S40vpd - Script that checks the current OA version, and loads into the Vital Product
Data (VPD) area if necessary.
•
S50layer2 - Script that sets up a basic Layer 2 switch. All 24 10/100/1000 ports are
set up on one IP network (VLAN). The ISL is set up in its own vlan.
Example Configuration Scripts
Example scripts are supplied that can be used as templates. Use one of the scripts located in the
switch /etc/rcZ.d/examples directory to help you configure the switch. The default
configuration for the switch is located in the script file /etc/rcZ.d/S50layer2.
The following scripts are included. Each is examined in more detail later in the appropriate
section describing common Layer 2 and Layer 3 configurations:
•
S50layer2 - Script which sets up a basic Layer 2 switch. All 24 10/100/1000 ports are set
up on one IP network (VLAN). This is a copy of the switch in /etc/rcZ.d that is
loaded in the default configuration.
•
S50layer2sp - Script which sets up a basic Layer 2 switch. All 24 10/100/1000 ports
are set up on one IP network (VLAN), and turns on bridge support for Spanning Tree.
•
S50layer3 - Script which sets up a basic Layer 3 switch. All 24 10/100/1000 are set
up on individual IP networks (VLANs). Layer 3 switching is enabled.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 92
•
S50multivlan - Script which sets up multiple untagged VLANs. The first VLAN
includes the first ten 10/100/1000 ports, the next contains the last ten 10/100/1000
ports, the third VLAN contains two 10/100/1000 ports, the last VLAN contains the
last two 10/100/1000 ports. Layer 3 switching is enabled.
•
S55gatedRip1 - Script which is used with a Layer 3 switch and calls the GateD
daemon to enable RIP 1 routing protocol.
•
S55gatedRip2 - Script which is used with a Layer 3 switch and calls the GateD
daemon to enable RIP 2 routing protocol.
•
S55gatedOspf - Script which is used with a Layer 3 switch and calls the GateD
daemon to enable OSPF routing protocol.
Overview of OpenArchitect VLAN Interfaces
When you initially boot up the switch, one virtual host port is automatically created by
OpenArchitect to enable interaction between the software and hardware. This initial host port,
called ZNYX Host Port (zhp), is a network interface that provides communication between all
24 in-band ports. Therefore, linking to any port on the switch enables you to connect with
OpenArchitect.
A zhp device is associated with one Virtual Local Area Network (VLAN). A virtual local area
network (VLAN) is a logical mapping of workstations and network devices on some basis other
than geographic location (for example, by department, type of user, or primary application). The
primary purpose of a VLAN is to isolate traffic and enable communication to flow more
efficiently within groups of mutual interest. VLANs reduce the time it takes to implement
workstation and network moves, adds and changes. The switch is used to bridge from one VLAN
to another. Figure 7.1 is an illustration of multiple VLANs.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 93
Figure 7.1: Multiple VLANs
Tagging and Untagging VLANs
The OpenArchitect switch is capable of switching VLAN tagged and untagged data packets.
VLAN tagged packets conform to the 802.1q specification and the packet header contains an
additional four bytes of VLAN tag information. A given port can be specified to accept VLAN
tagged or untagged traffic. Internally, all traffic for a particular VLAN is treated as tagged traffic.
Switch Port Interfaces
For each switch port, OpenArchitect creates a separate interface with its own MAC address called
a ZNYX raw Ethernet (zre). After the initial power up, 24 zre interfaces are created, one for
each in band port. You cannot directly access or modify the zre interfaces.
During the initial power up of the switch, the default configuration creates a Layer 2 switch. The
Layer 2 configuration places all of the zre interfaces in the same zhp interface. The number
after zre represents the corresponding switch port number (that is, zre1 represents port 1 on the
switch).
Layer 2 Switch Configuration
The steps to build a Layer 2 switch involve creating a group of switch ports in a VLAN (or Layer
2 switching domain) and bringing that interface up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group. Figure 7.2 provides an illustration of a Layer 2 Switch connection.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 94
Linux IP
zhp0
10.0.0.42
VLAN 1
zre0
zre1
zre2
...
...
zre20 zre22 zre23
24 10/100/1000 Ports
Figure 7.2: Layer 2 Switch
During the initial power up, a startup script called /etc/rcZ.d/S50layer2 is executed at boot
time creating a single untagged VLAN (IP interface labeled as zhp0) which includes all Ethernet
and gigabit ports as one Layer2 switch. The interface to the host is then assigned the IP address of
10.0.0.42 to allow access to the switch. The S50layer2 script does the following:
•
Uses zconfig to create and configure a single, untagged VLAN that contains all 24 switch
ports.
/usr/sbin/zconfig zhp0: vlan1=zre0..23
/usr/sbin/zconfig zre0..23=untag1
•
Uses ifconfig(1M) to assign the IP address 10.0.0.42 to the interface.
/usr/sbin/ifconfig zhp0 10.0.0.42 up
To create another VLAN that only contained the two ports, first use zconfig from the
command to build the VLAN and create a network interface for the host.
zconfig zhp1: vlan2=zre20,zre21
Then, bring up the interface with ifconfig(1M):
ifconfig zhp1 193.08.1.1 up
Note that ports zre20 and zre21 are members of both vlan1 and vlan2, and that they are
tagged for vlan2. A port cannot be untagged for more than one VLAN. You can view the
configured VLANs with zconfig.
zconfig -a
Ethernet Switch Blade User's Guide
release 3.2.2j
page 95
Using the S50layer2 Script
The S50layer2 script can be used and example, or edited to customize your Layer2 setup. For
example, to reconfigure the IP address on your Layer 2 switch,
•
Open the S50Layer2 file in the Linux vi editor.
•
Change the IP address value listed under the Linux ifconfig(1M) command line.
•
Save your changes by running OpenArchitect zsync.
•
Reboot the switch.
Rapid Spanning Tree
The Rapid Spanning Tree Protocol (RSTP) configures a simply connected active topology from
the arbitrarily connected components of a Bridged Local Area Network. RSTP participants use a
simple dialog carried in packets called Bridge Protocol Data Units (BPDUs) for finding the
shortest path between two networks and for eliminating loops from the topology. If nodes
attached to ports fail or are added or deleted, the topology dynamically changes to accommodate
the new configuration. If your network topology is such that there is no real redundancy or
chance for loops, you do not need to turn on Spanning Tree.
zl2d is a shell script used to create Linux bridges consisting of the name of the previously
created zhp device or devices preceded with a "b" (for example, if you are creating a Bridge
device from zhp0, the resulting device would be bzhp0). zl2d then starts a background task
that monitors the port information of the Linux bridge at a specified interval and updates the
Spanning Tree state fields in the hardware when necessary.
brctl(8) is called by zl2d for configuring certain RSTP parameters. For an explanation of
these parameters, see the IEEE 802.1d specification, or reference the brctl(8) man page in
Appendix A. The following demonstrates a simple example of setting up a Layer 2 switch and
starting RSTP.
To Enable Rapid Spanning Tree:
Create a VLAN containing the ports that will be a part of the Linux bridge running Rapid
Spanning Tree. This example will use ports 0-3 (untagged):
zconfig zhp0: vlan1=zre0..3
zconfig zre0..3=untag1
Create a bridge device from the zhp device,
zl2d start zhp0
A Bridge device named bzhp0 should now exist consisting of ports zre0 through zre3 with
Spanning Tree enabled. To view the bridge device, use the brctl command,
Ethernet Switch Blade User's Guide
release 3.2.2j
page 96
brctl show
brctl showbr bzhp0
Port Path Cost
Each port has an associated cost that contributes to the total cost of the path to the Root Bridge
when the port is the root port. The smaller the cost, the better the path. The Ethernet Switch Blade
uses the following IEEE 802.1D recommendations based on the connection speed of your port:
Port Path Cost
Link
Speed
Recommended
Value
Recommended Range
10 Mb/s
100
50-600
100 Mb/s
19
10-60
1 Gb/s
4
3-10
Table 7.1: Port Path Cost
To change the port path, use the brctl setpathcost option. For example, to set the port
priority to a value consistent with a gigabit interface,
brctl setpathcost bzhp0 zre1 4
Layer 3 Switch Configuration
The previous section outlines the Layer 2 switch configuration that is automatically configured
when you initially bring up the OpenArchitect switch. In order to communicate between Layer2
interfaces, you must properly setup routing.
The steps to build a Layer 2 switch involve creating a group of switch ports in a VLAN (or Layer
2 switching domain) and bringing that interface up. zconfig creates the VLAN group of switch
ports as well as a network interface. Use ifconfig(1M) on the network interface to bring up
the VLAN group with Layer 2 switching. Layer3 routing information is then used to route
between the Layer2 network devices.
Take a simple example of two VLANs configured on the switch, each with four ports. First
teardown any existing configuration,
zconfig –t
Use zconfig to create two new VLANs, each with four ports, and untag them,
zconfig zhp0: vlan1=zre1..4
zconfig zre1..4=untag1
Ethernet Switch Blade User's Guide
release 3.2.2j
page 97
zconfig zhp1: vlan2=zre5..8
zconfig zre5..8=untag2
Now, use ifconfig to assign each zhp interface an IP address,
ifconfig zhp0 10.0.0.1
ifconfig zhp1 11.0.0.1
At this point, the Linux host has enough information to route between the networks of the directly
attached interfaces, 10.0.0.0 via zhp0, and 11.0.0.0 via zhp1.
The next step is to enable the ZNYX zl3d daemon to move that routing information from the
host to the base switch switching tables in silicon. Once enabled, zl3d will monitor the Linux
routing tables for changes in configuration and update the switch silicon tables. Start zl3d to
update the switch tables:
zl3d zhp0 zhp1
The base switch switch is now configured as a Layer3 switch that can route between two Layer2
devices in silicon.
Using the S50layer3 Script
To modify the configuration to a Layer 3 switch, remove the S50layer2 file from the
/etc/rcZ.d directory, and replace it with the example script file, S50layer3.
In the S50layer3 file, each port is assigned its own Virtual Local Area Network (VLAN)
interface (port interfaces are labeled as zhpN, where N is an integer). Each VLAN is associated
with an individual zhp interface. Remember, zre and zhp interfaces can begin with a zero value
but a VLAN cannot. Each zre interface is assigned a separate IP address in the example script
(see Figure 7.3).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 98
Linux IP
zhp0 - zhp23
VLAN 2
zre1
VLAN 4 VLAN 6
zre3
VLAN 1 VLAN 3
zre0
zre2
zre5
VLAN 8 VLAN 10 VLAN 12 VLAN 14
zre7
VLAN 5
VLAN 7
zre4
zre6
zre9
zre11
zre13
VLAN16
zre15
VLAN18 VLAN20 VLAN22 VLAN24
zre17
VLAN 9 VLAN 11 VLAN 13 VLAN 15 VLAN17
zre8
zre10
zre12
zre14
zre16
zre19
VLAN19
zre18
zre21
zre23
VLAN21 VLAN23
zre20
zre22
Each vlan interface (zhp) has only one switch port (zre)
Figure 7.3: Layer 3 Switch
The S50layer3 script executes the following commands:
•
Runs zconfig command to create 24 untagged VLANs (one for each switch port).
/usr/sbin/zconfig zhp0..23: vlan1..24=zre0+
/usr/sbin/zconfig zre0..23=untag1+
NOTE: Double periods (..) after vlan1 and untag1 are used to indicate a range of values.
The plus (+) sign after zre1 is a wildcard character that means auto-incremented and
causes each zhp interface to hold only one zre (that is, zhp0 has zre1 on vlan1,
zhp1 has zre1 on vlan2).
•
Runs the Linux ifconfig(1M) command for each interface to assign default IP
addresses (10.0.0.42-10.0.23.42), sets the netmask and brings up the interfaces.
ifconfig zhp0 10.0.00.42 netmask 255.255.255.0 up
ifconfig zhp1 10.0.01.42 netmask 255.255.255.0 up
ifconfig zhp2 10.0.02.42 netmask 255.255.255.0 up
.
.
.
ifconfig zhp21 10.0.21.42 netmask 255.255.255.0 up
ifconfig zhp22 10.0.22.42 netmask 255.255.255.0 up
ifconfig zhp23 10.0.23.42 netmask 255.255.255.0 up
Ethernet Switch Blade User's Guide
release 3.2.2j
page 99
Runs the OpenArchitect zl3d. The zl3d application monitors the Linux routing tables
and updates the switch routing tables for each interface configured above.
/usr/sbin/zl3d zhp0..23
•
zl3d initially creates and adds each zhp interface (VLAN) to the switch routing tables. The
zhp0..zhp23 is shorthand for the list of interfaces (zhp0, zhp1, …, zhp23) to monitor with
zl3d.
To Modify the Layer 3 Script
•
Modify the example script you copied into the /etc/rcZ.d directory. Adjust and
assign the number of IP addresses as applicable. In the example below, the IP address is
changed for the interface in the ifconfig command line of the script.
From:
ifconfig zhp0 10.0.0.42 netmask 255.255.255.0 broadcast 10.0.0.255 up
To:
ifconfig zhp0 193.08.1.1 netmask 255.255.255.0 broadcast
193.08.1.255 up
•
•
Adjust the number of zhp interfaces, that are added to the routing tables, depending on
the number of VLANs you are adding for your network. Include any other details, as
applicable.
Run the OpenArchitect zsync command to save your changes.
zsync
•
•
Reboot the switch.
After rebooting, your switch works from your customized Layer 3 configuration.
Layer 3 Switch Using Multiple VLANs
An example script is also provided for setting up multiple VLANs each with multiple ports.
Using the S50multivlan Script
The Layer 3 switch example file, S50multivlan, is included to help you configure multiple
VLANs to a Layer 3 switch. A VLAN can include one or more switch ports. In the
S50multivlan file, four VLANs are created (see 4):
•
•
•
VLAN 1, zhp0: for the first set of six ports, zre0-zre5
VLAN 2, zhp1: for the second set of six ports, zre6-zre11
VLAN 3, zhp2: for the third set of six ports, zre12-zre17
Ethernet Switch Blade User's Guide
release 3.2.2j
page 100
VLAN 4, zhp3: for last set of six ports, zre18-zre23
Each VLAN interface is labeled zh p N in the file, where N is a value from 0-3. Each interface is
untagged and assigned its own IP address (see Figure 7.4).
•
Linux IP
zhp0
zhp1
zhp2
VLAN1
zhp3
VLAN3
zre1 zre3 zre5
zre13 zre15 zre17
zre0 zre2 zre4
zre12 zre14 zre16
VLAN2
VLAN4
zre7 zre9 zre11
zre6 zre8 zre10
zre19 zre21 zre23
zre18 zre20 zre22
Each VLAN (zhp) contains 6 ports (zre's)
Figure 7.4: Multiple VLAN Configuration
The S50multivlan script executes the following commands:
•
Runs zconfig to create and start four VLANs. Switch ports 0-9 are placed in the first
VLAN, ports 10-19 in the second VLAN, ports 20-21 are placed in the third VLAN, and
ports 22-23 are placed in the fourth VLAN.
/usr/sbin/zconfig zhp0: vlan1=zre0..5
/usr/sbin/zconfig zre0..5=untag1
/usr/sbin/zconfig zhp1: vlan2=zre6..11
/usr/sbin/zconfig zre6..11=untag2
/usr/sbin/zconfig zhp2: vlan3=zre12..17
/usr/sbin/zconfig zre12..17=untag3
/usr/sbin/zconfig zhp3: vlan4=zre18..23
/usr/sbin/zconfig zre18..23=untag4
NOTE: Double periods (..) after vlan1 and untag1 are used to indicate a range of values.
•
Runs the Linux ifconfig command for each interface to assign default IP addresses
Ethernet Switch Blade User's Guide
release 3.2.2j
page 101
(10.0.0.42-10.0.3.42), assigns the netmask and brings them up.
ifconfig zhp0 10.0.0.42 netmask 255.255.255.0 broadcast 10.0.0.255 up
ifconfig zhp1 10.0.1.42 netmask 255.255.255.0 broadcast 10.0.1.255 up
ifconfig zhp2 10.0.2.42 netmask 255.255.255.0 broadcast 10.0.2.255 up
ifconfig zhp3 10.0.3.42 netmask 255.255.255.0 broadcast 10.0.3.255 up
•
Runs the OpenArchitect zl3d command. The zl3d application monitors the Linux
routing tables and updates the switch routing tables for each interface configured above.
/usr/sbin/zl3d zhp0..3
zhp0 zhp1 zhp2 zhp3
The zl3d application initially creates and adds each zhp interface (VLAN) to the switch routing
tables.
To Modify the Layer 3 Multivlan Script
Modify the example script you copied into the /etc/rcZ.d directory. Adjust and assign
the number of IP addresses as applicable. In the example below, the IP address is changed
for the interface in the ifconfig command line of the script.
ifconfig zhp0 10.0.0.42 netmask 255.255.255.0 up
ifconfig zhp0 193.08.1.1 netmask 255.255.255.0 up
•
•
•
•
•
Adjust the number of zhp interfaces depending on the number of VLANs you are adding
for your network.
Include any other details, as applicable.
Run the OpenArchitect zsync command to save your changes.
Reboot the switch.
After rebooting, your switch works from your customized Layer 3 configuration with
multiple VLANs per port.
Layer 3 Routing Protocols with GateD
An advanced networking configuration may require using the GateD software platform for
deployment of Routing Information Protocols (RIP 1 or RIP 2) and Open Shortest Path First
(OSPF) protocols. Once you’ve configured your Layer2 and Layer3 devices, start gated.
Using the Provided S55gatedRip1 Script
To use GateD protocol with the switch, you need to copy two files into the same directory as your
Layer 3 configuration file. From the /etc/rcZ.d/examples folder, copy the example script file
and its corresponding GateD configuration file (for example, S55gatedRip1 and
gated.conf.rip1).
The example startup script executes the following commands (S55gatedRip1 is used as an
Ethernet Switch Blade User's Guide
release 3.2.2j
page 102
example):
•
Starts GateD with Rip1 using gated.conf.rip1 as the configuration file:
/usr/sbin/gated –f /etc/rcZ.d/gated.conf.rip1
The GateD conf file specifies the following configuration commands:
Implements the passive function so GateD is prevented from rerouting information to a
different interface if insufficient information is received.
interface 10.0.0.42 passive
•
interface 10.0.1.42 passive
interface 10.0.2.42 passive
.
.
.
interface 10.0.13.42 passive
interface 10.0.14.42 passive
interface 10.0.15.42 passive
•
Defines the netmask used in the interface.
define 10.0.0.42 netmask 255.255.255.0;
define 10.0.1.42 netmask 255.255.255.0;
define 10.0.2.42 netmask 255.255.255.0;
.
.
.
define 10.0.13.42 netmask 255.255.255.0;
define 10.0.14.42 netmask 255.255.255.0;
define 10.0.15.42 netmask 255.255.255.0;
•
Sets the RIP1 protocol to open.
};
rip1 yes{
•
Shuts off sending and receiving packets from all interfaces.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 103
interface all noripin noripout
Opens sending and receiving packets for selected interfaces.
interface 10.0.0.42 ripin ripout version 1;
•
interface 10.0.1.42 ripin ripout version 1;
interface 10.0.2.42 ripin ripout version 1;
.
.
.
interface 10.0.13.42 ripin ripout version 1;
interface 10.0.14.42 ripin ripout version 1;
interface 10.0.15.42 ripin ripout version 1;
Imports routes learned through the RIP protocol.
import proto rip {
•
all;
};
Exports all directly connected routes and routes learned from the RIP protocol.
export proto rip {
•
proto direct }
all;
};
proto rip {
all;
};
To Modify the GateD Scripts:
Copy two GateD files, the OpenArchitect "S" file and its corresponding conf file, into the rcZ.d
directory (that is, S55gatedRip1 and gated.conf.rip1). Notice the files are placed in the
same directory as the Layer 3 configuration file.
For RIP1:
cp /etc/rcZ.d/examples/S55gatedRip1 /etc/rcZ.d
Ethernet Switch Blade User's Guide
release 3.2.2j
page 104
cp /etc/rcZ.d/examples/gated.conf.rip1 /etc/rcZ.d
Or for RIP2:
cp /etc/rcZ.d/examples/S55gatedRip2 /etc/rcZ.d
cp /etc/rcZ.d/examples/gated.conf.rip2 /etc/rcZ.d
Or for OSPF:
cp /etc/rcZ.d/examples/S55gatedOspf /etc/rcZ.d
cp /etc/rcZ.d/examples/gated.conf.ospf /etc/rcZ.d
•
•
•
•
Open and make configuration changes to the listed co n f file to coincide with the current
Layer 3 configuration (that is, adjust IP addresses and number of interfaces available).
See GateD documentation if you have questions regarding the co n f file.
Run the OpenArchitect zsync command to save your changes. Be sure your changes are
correct.
zsync
Reboot the switch. After rebooting, your switch operates as a Layer 3 switch with GateD
routing.
Class of Service (COS)
This following section provides information on using the ZNYX Networks OpenArchitect switch
to provide Class of Service (COS) support. The switching fabric architecture defines the scope of
the COS parameters. Some apply to an individual port, some apply to a group of ports (known as
a block) and others apply to the whole switch. It is important for the user to understand the scope
of the parameters to ensure that the expected behavior occurs.
Egress Queues
The base switch provides 1 to 8 COS queues per egress port, and for packets destined to the CPU
from the switching fabric. By default, a freshly booted OpenArchitect switch has a single queue
per egress port (and the CPU).
Ingress Classification
Incoming packets are mapped to queues based on their priority tags. The built-in behavior of the
base switch uses the 802.1p tag within a packet as the queue selector. There is one COS to queue
selector map per port.
By using the Linux iptables utility and zfilterd with ztmd, the queue selection can be
based on any information in the first 64 bytes of the IP packet header. The default OpenArchitect
switch behavior has all COS values mapping to a single queue on each of the egress ports.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 105
Marking and Re-marking
The OpenArchitect switch can mark or remark packets using the TOS field or 802.1p tag. This is
also controlled through the Linux iptables utility.
Scheduling
The servicing of configured queues by the switching fabric is referred to as scheduling. The
OpenArchitect switch has three built-in scheduling algorithms. The type of scheduling algorithm
used is implied, rather than being explicitly specified, based on the number of queues and which
options are configured. The following scheduling algorithms are provided:
First In First Out (FIFO) – When only one queue is configured per port, packets are serviced in
the order in which they arrive. This is the default for the OpenArchitect switch.
Strict Priority – This algorithm is used when more than one queue is provisioned on the port. The
highest priority queue, which is also the highest numbered one, is always serviced first (Example:
If four queues are configured, queue three is of higher priority than queue zero). As long as there
are packets in the highest priority queue, the lower priority queues are not serviced. The danger is
that higher priority traffic could block lower priority traffic.
Weighted Round Robin (WRR) – This algorithm is similar to Strict Priority scheduling, but it
provides fairness with quanta for each queue. Each queue is assigned a number of packets, known
as weight, that it is allowed to transmit before it yields to a lower priority queue. Note that with
WRR, the priorities of the queues are dependent on the weights allocated. A higher priority queue
with a smaller weight will get less wire-time than a lower priority queue configured with a larger
weight. Note that the same weight applies to all queues of that priority on all ports (this is a
switch-wide parameter).
The zfilterd and iptables utilities are required to map packets to queues using
information other than the 802.1p tags.
zcos
zcos is a tool for examining queue and scheduling settings. It provides a means to set many of the
hardware features of the switch related to class of service and differentiated services processing,
including scheduling and bandwidth management. The current settings can also be examined.
See the zcos man page in Appendix B for details on all options.
zfilterd
zfilterd is a daemon that intercepts filtering rules entered by the user via iptables, checks
them for validity and then passes them on to ztmd for entry in the switch. See the zfilterd
man page in Appendix B for details on all options.
ztmd
ztmd is traffic management daemon which accepts messages from traffic filtering and
quality of service applications and sets up the hardware.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 106
Running zfilterd
Before starting zfilterd, ztmd must be running. Your can start both from within a script, or
directly from the command line. For example,
ztmd
zfilterd
iptables rules can be entered at any time. If your iptables filtering rules set is extensive,
you may want to move your set of iptables commands to a start up script to run upon
initialization. This could be accomplished by creating a standalone "S" script and placing that
script into /e t c / r c Z . d .
Restrictions on Implementation
Several restrictions exist on the rules that can be implemented on the FFP hardware. These
include:
Actions
DROP the packet.
ACCEPT the packet.
Output Port
Should be specified if the action is ACCEPT, if no output port is
specified, an IRULE table entry is generated for every port.
Field values
If specified as ranges, they must be on power of two boundaries.
Negation
Can only be used for icmp, tcp, or udp fields.
Fields supported are: Source IP address, destination IP address, IP protocol, TCP or UDP source
port or destination port, ICMP type, and TCP flags bits (such as SYN).
The input port and output port may also be specified as either zre<n>, where <n> is one of the
24 physical ports, or as zhp<n>, where the zhp interface used must be previously defined using
zconfig.
A restriction on the fields supported is the size of the IMASK table. There are only 16 entries per
port available, which means only 16 combinations of fields can be used at any time.
Conflict Resolution
There are differences from the expected behavior of implementing iptables in a host:
Although the rules are taken from the FORWARD and INPUT chains, they are applied to all
packets, including those destined for the local CPU. The order of application of the rules is not
necessarily the order in which they appear in the chains. If a rule uses a mask that is less
restrictive than another rule, it will be applied first. The last rule that is matched determines the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 107
action that will take place. For example, the rules:
iptables -a FORWARD -i zhp3 -j DROP
iptables -a FORWARD -i zhp3 -o zhp1 -p tcp --dport smtp -j ACCEPT
result in SMTP packets received on any port in zhp3 to be sent for any port in zhp1; all other
packets from zhp3 would be dropped. The order of the two rules in the FORWARD chain does
not matter.
On the other hand, in the following sequence of rules, the position of the rule that drops SYN
packets is important. Since the set of fields it examines is not a subset of the fields examined by
the ACCEPT rules, and visa versa, the ordering rule given above does not apply. In this case, the
order it is applied will be the same as its position in the FORWARD chain, and all packets which
are TCP SYN packets from zhp5 for zhp3 will be DROPPED, even if they also match one of the
ACCEPT rules.
iptables -a FORWARD -i zhp5 -o zhp3 -j DROP
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport smtp -j ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p udp --sport domain -j ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport domain -j ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport www -j ACCEPT
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --sport 23 -j ACCEPT # rsync
iptables -a FORWARD -i zhp5 -o zhp3 -p tcp --syn -j DROP
iptables and filtering
iptables is a firewall management user-space utility used in conjunction with the Linux 2.4
kernels. iptables takes advantage of the netfilter 2.4 kernel code.
In addition, the iptables utility is extended with a few more targets to support the hardware
filtering functionality used in the Broadcom BCM5695 silicon on the base switch. Generally, all
of the iptables functionality is usable with a few minor extensions.
A more detailed source on IPtables can be found at:
http://www.netfilter.org/
Almost all the contents described here are derived from there.
There are also many tutorials and iptables manipulation tools, both graphical and command
line. This is expressive of the Open Architect concept. A good place to start is:
http://freshmeat.net/search/?q=iptables
Ethernet Switch Blade User's Guide
release 3.2.2j
page 108
Introduction
Firewall rules are stored in tables. These tables are sometimes also known as firewall chains or
just chains. Tables normally store rules for what are known as hooks, which can be looked as
packet-path junctions. There are five defined hooks: PRE-ROUTE, POST-ROUTE, INPUT,
OUTPUT and FORWARDING. The example below illustrates the default chains on boot up.
By default, INPUT, FORWARD and OUTPUT chains are installed on boot up. Additional rules
can be installed for the other chains. Additionally, one can write software extensions to add more
chains. Figure 7.5 provides an illustration of firewall flow.
In
c
o
m
in
g
P
re
ro
u
te
In
p
u
t
R
o
u
tin
g
D
e
c
is
io
n
F
o
rw
a
rd
L
o
c
a
l
P
ro
c
e
s
s
P
o
s
t
R
o
u
te
O
u
tg
o
in
g
O
u
tp
u
t
Figure 7.5: Firewall Flow
When a packet reaches a circle in the diagram, that chain is examined to decide the fate of the
packet. Two basic fates of a packet are defined as DROP and ACCEPT. If the chain says to
DROP the packet, it is killed there; however, if the chain says to ACCEPT the packet, it
continues traversing the diagram, ultimately terminating at an application or getting forwarded
out of the box. There are additional actions that can be applied to packets. These are described in
the "Supported Targets" section.
A chain is a checklist of rules. Each rule is checked against the packet header and if a rule
matches, action is taken. If the rule doesn't match the packet, then the next rule in the chain is
consulted. Finally, if there are no more rules to consult, then the kernel looks at the chain default
policy to decide what to do. In a security-conscious system, this policy usually tells the kernel to
DROP the packet.
In the base switch, both the FORWARD chain hook, and the INPUT chain hook (packets
destined for the CPU) are implemented in hardware. The rest of the hooks are in software in the
Linux kernel. An extension of the FORWARD hook also resides in software. It is important to
note that this is in sync with routing being implemented in hardware with software assist for
exception handling. Under general circumstances, when routing happens in hardware, only the
FORWARD chain is traversed. Under exceptional handling of an incoming packet, one can force
the full software traversal. As a router you do not really care about the other hooks except in the
situation where you have some special handling., in which case a policy would force the packet to
be sent to the CPU for further processing.
NOTE: This is also how one would extend the OA packet munging capabilities (for
example, introduce NAT).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 109
Packet Walk
When a packet comes in via one of the interface ports, the base switch makes a routing decision.
If the packet was destined for the base switch itself or if the send to CPU action is specified, it is
sent to the INPUT chain for further processing. If there is no valid way to forward the packet, it is
dropped. If the switch is configured to forward the packet, it is sent to the FORWARD chain.
Next the hardware FORWARD chain is walked. If there is a rule inserted that matches the packet
headers, then it is looked up next. The inserted policy will decide the packets fate.
In essence, a filter rule will be used to scan the packet data for certain characteristics. Upon a
match a selected 'target' is executed. The target decides what should happen to the packet.
Filter Rules Specifications
A rule could be added (-a) to a chain, deleted (-D) from a chain, replaced (-R) from a chain or
inserted (-I) in a specific position in a chain. Each rule specifies a set of conditions the packet
must meet, and what to do if it meets them ('what to do' is referred to as a `target').
Here's an example filter rule:
iptables -a FORWARD -p UDP -s 0/0 -d 10.0.0.1/32 --source-port 53 -j
DROP
This adds to the FORWARD chain the rule: "If you see UDP packets (-p UDP) from anywhere
(-s 0/0) going to host 10.0.0.1 (-d 10.0.0.1/32) with a source port number 53 (--source-port 53)
then the target is to DROP (-j DROP). More details on rule specifications follow.
Specifying Source and Destination IP Addresses
Source (-s, --source or --src) and destination (-d, --destination or --dst) IP addresses can
be specified in four ways. The most common way is to use the full name, such as localhost or
www.linuxhq.com. The second way is to specify the IP address such as 127.0.0.1.
Netmasks can be applied to IP addresses to specify ranges, like199.95.207.0/24 or
199.95.207.0/255.255.255.0 Both specify any IP address from 199.95.207.0 to 199.95.207.255
inclusive. To specify an all-inclusive IP address /0 can be used, like: -s or -d 0/0. The example
rule we use above applies this trick. Note, however, that the effect above is the same as not
specifying the - s option at all.
Specifying Protocol
The protocol can be specified with the -p (or --protocol) flag. Protocol can be a number (if you
know the numeric protocol values for IP) or a name for the special cases of TCP, UDP or ICMP.
Case doesn't matter, so tcp works as well as TCP.
Specifying an ICMP Message Type
If the protocol is ICMP, the --icmp-type option can be used to match a specific message type, for
example:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 110
--icmp-type ping
The type can be preceded by ! to match any message except the type listed, for example:
--icmp-type ! 1
Specifying TCP or UDP ports
If the protocol is TCP or UDP, the -s ( or --sport) and -d (or --dport) options specify the
TCP or UDP ports to match.
A range of ports can be specified by giving the first and last ports separated by a :, as in -dport 0:1023. It is also possible to precede the port specification with a ! to match all ports
which are not included in the range, for example, --sport ! 0:1023. However, the range of
ports must be a power of two, starting with a port number which is a multiple of the range.
Specifying TCP flags
If the protocol is TCP, a match on particular TCP flags is specified by listing the flag names; for
example, -p tcp --syn.
Specifying an Interface
The -i (or --in-interface) and -o (or --out-interface) options specify the name of an
interface to match. An interface is the physical device the packet came in on (-i) or is going out
on (-o). You can use the ifconfig command to list the `up' interfaces (that is, working at the
moment).
As a special case, an interface name ending with a + will match all interfaces, whether they
currently exist or not, which begin with that string. For example, to specify a rule which matches
all zhp interfaces, the -i zhp+ option would be used.
Filter Rule Targets
As mentioned above the -j construct within a rule specifies which target is to be used in filter rule
to define a target.
Supported Targets
The following are the supported targets. The switch has many additional targets that are software
based (example Network Address Translation or generic connection tracking). Please contact HP
Technical support if you have additional questions on additional features.
Classical Targets
DROP
This drops the packet.
ACCEPT
Accepts the packet
Ethernet Switch Blade User's Guide
release 3.2.2j
page 111
ZNYX Targets
ZACTION
This is the ZNYX Action target.
Parameters for ZACTION:
--drop
Drops the packet
--accept
Accepts the packet
--set-prio <val>
Set the 802.1p priority to <val>
--use-prio <val>
Use queue priority <val>
--copy-cpu Send the packet to the CPU.
installed chains traversal in software
This will force the full
--set-eport <val> Redirect the packet to port <val>
--set-mport <val> Mirror the packet to port <val>
--set-tos <val>
header to <val>
Set the IP-Precedence bits in the TOS field of the IP
--set-dscp <val>
to <val>.
Set the 6-bit DSCP
in the TOS field of the IP header
Options with any of these ZACTION parameters:
--counter <val>
--arp
Increment classifier hit counter <val>
Not an action,
match only ARP packets.
-i option can be used to specify ingress port or VLAN,
-d specifies target IP address,
-p specifies arp operation as request (1) or response (2).
For arp response, the -o field can be used to specify the egress port.
ZACTION Examples
Send all tcp packets arriving on zhp5 out port 2:
iptables -a FORWARD -i zhp5 -p tcp -j ZACTION --set-eport 2
Send all tcp packets arriving on zhp5 to the CPU (software).
iptables -a FORWARD -i zhp5 -p tcp -j ZACTION --copy-cpu
Set the 802.1p priority to 3 on all tcp packets arriving on zhp5.
iptables -a FORWARD -i zhp5 -p tcp -j ZACTION --set-prio 3
Ethernet Switch Blade User's Guide
release 3.2.2j
page 112
Extensions to the default matches
These are described in the Linux packet filtering HOWTO at:
http://netfilter.org/documentation/index.html#documentation-howto
ZNYX FORWARDING Chain supports all of them.
tc: Traffic Control
The switch supports up to eight queues for each port, including the cpu port. These queues hold
packets waiting to be transmitted for a given port. A scheduler selects the next packet to be
transmitted from one of these queues based on one of three scheduling algorithms: strict priority,
round robin, and weighted round robin.
tc, which stands for Traffic Control, is a mechanism for enabling Quality of Service on Linux.
tc supports the strict priority and weighted round robin algorithms, which it refers to as queuing
disciplines . tc uses three functional objects: queuing disciplines (qdiscs), which comprise
queuing and scheduling algorithms such as FIFO queues, priority queues, RED queues, and token
buckets; classes, which are leafs in queuing discipline hierarchies; and filters, such as u32 filters
and route filters. In addition to these three building blocks, tc also includes policers and meters,
which may be associated with filters.
The functional elements of tc may be combined to produce complex QoS rules. For example, a
packet may be matched to a filter, metered, policed as in-profile or out-of-profile, remarked,
mapped to a FIFO queue, and transmitted by a priority scheduler. tc is very flexible in the data
paths that it allows.
The utility zqosd is a daemon that monitors Linux QoS policy and shadows the policy rules into
a hardware configuration. When zqosd is running, tc rules are translated into hardware rules.
NOTE: This document does not detail all of the capabilities of the tc command, rather it
explicitly mentions only features that are supported by OpenArchitect-based switches.
The examples that follow assume that the switch is running the standard Layer 2 start-up script,
/etc/rcZ.d/examples/S50layer2, with all ports placed in a single VLAN, zhp0. Note that
this assumption is implied only by the fact that changes to zhp0 are shown to configure all ports.
Neither tc nor zqosd is limited by the interface setup. Each utility works on either VLANs
(zhp) or ports (zre).
Strict Priority Qdisc
A typical tc definition of a strict priority qdisc is:
tc qdisc add dev zre5 handle 105: prio bands 8 priomap 0 1 2 3 4 5 6
7 0 0 0 0 0 0 0 0
This defines a strict priority qdisc for port zre5 with 8 priority queues and a default mapping
from 802.1p packet priority to queue. A strict priority scheduler takes packets from the highest
numbered queue which is not empty. The handle is used to reference the qdisc and the individual
queues which have been declared. The handle for a priority queue is formed by appending the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 113
queue number + 1 after the qdisc handle. So the highest priority queue in this example is 105:8.
NOTE: 16 values must be provided for the priomap list. This is a feature of the Linux
priority system, which uses 16 priority levels. The last eight values given will be ignored.
Weighted Round Robin Qdisc
A weighted round robin qdisc builds on the above definition by adding the list of weights which
determine the order of scheduling from the queues:
tc qdisc add dev zre5 handle 105: prio bands 8 priomap 0 1 2 3 4 5 6
7 0 0 0 0 0 0 0 0 wrr 1 1 4 2 3 3 3 3
The weights apply to the eight queues in order, with the lowest numbered queue being given the
first weight. The weights determine the relative number of packets which will be sent from each
queue. In this example, assuming each queue has the necessary packets, the order of transmission
by queue number is:
7 6 5 4 3 2 1 0 7 6 5 4 3 2 7 6 5 4 2 2
This pattern repeats as long as there are packets to be transmitted in the queues.
Although the weights can be any integer value, they will be scaled so that the largest value is 15
or less and the smallest is at least 1.
FIFO Queues (pfifo and bfifo disciplines)
The simplest configuration for tc involves no classes or filters, and only a single FIFO queue.
With tc, queue sizes may be specified in bytes or packets. The first example defines a packetlimited FIFO. This example begins with only tc and then illustrates tc in conjunction with
zqosd.
As a first step, confirm that no tc configuration is active on the switch, by listing any queue
disciplines:
tc qdisc ls
The command should return nothing. Now, add a single packet-limited FIFO queue to zhp0 and
confirm that it has been installed to software:
tc qdisc add dev zhp0 handle 100:0 root pfifo limit 32
tc qdisc ls
The output should display the following,
qdisc pfifo 100: dev zhp0 limit 32p
The tc command is applied to a device, so dev zhp0 must be specified. Note that a VLAN, such
as zhp0, and a port, such as zre0, are each treated as devices. Breakdown of the options:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 114
handle 100:0
Defines the handle for the queuing discipline. This handle may be used to
reference the pfifo queue. Note that the handle is included with the output of
the qdisc ls command. (100:0 and 100: are equivalent in tc.) The choice of
handle is significant for zqosd.
root
Tells tc that this is the base queuing discipline for the device, not a child of
another queuing discipline.
pfifo limit 32
Specifies a packet-limited FIFO queue with an upper bound of 32 packets.
Now, delete the queuing discipline from zhp0 and confirm that it has been removed:
tc qdisc del dev zhp0 root
tc qdisc ls
Fifo Qdiscs
The length of the queues for each port may need to be limited to avoid filling the available
memory with packets which cannot be transmitted as fast as they are received. To limit the
length of a particular queue, a bfifo qdisc is defined.
tc qdisc add dev zre5 parent 105:1 bfifo limit 5kb
This limits queue 0 on zre5 to no more than 5k bytes of data.
Using Filters to Direct Packets to a COS Queue
Once the queues are defined for a port, filters can be added to direct the desired packets into the
queue. The target queue is identified by the classid parameter, which is the same as the
handle of the cos queue. For example, to send unicast packets with a destination IP address of
10.91.100.5 to cos queue 3 created above, the filter is:
tc filter add dev zre5 parent 105: protocol ip u32 match ip dst
10.91.100.5/32 classid 105:4
Protocol ip
The u32 filter allows matching many fields of the IP packet, including: ip src, ip dst,
ip tos, ip protocol, ip sport, ip dport, ip icmp_type and ip
icmp_code. Each field to match except the IP addresses is specified as:
match <field name> <value> <mask>
where <value> and <mask> are numerical values. The ip tos field is an 8 bit field, with the
ip precedence in the upper 3 bits, the type of service in the next 3 bits, and the ecn bits in the
lower 2 bits. The same field can be selected as dsfiled followed by the DSCP in the upper six bits
and the ecn in the remaining two bits. The mask specifies which bits are to match, so
Ethernet Switch Blade User's Guide
release 3.2.2j
page 115
match ip tos 0xa0 0xe0
would match an IP precedence of 5.
Specific fields can also be specified by giving their offset from the beginning of the IP header and
a field name of u8, u16, or u32, depending on the width of the field. For example, to match the
SYN bit in the TCP flags, the specification is:
match u8 2 0x02 at 33
Several IP fields can be matched in the same filter by specifying multiple match operations. The
filter will be satisfied only if all matches are true. For example, to put all UDP packets from a
particular IP subnet into cos queue 1, the filter might be:
tc filter add dev zre5 parent 105: protocol ip u32 match ip src
10.90.90.0/24 match ip protocol 17 0xFF classid 105:2
Protocol arp
In addition to IP packets, there is a limited capability to match other types of packets. To match
an arp packet, specify protocol arp. In this case the fields which can be matched are limited to the
arp operation, specified by match u16 <operation> 0xffff at 6, and the target IP
address, specified by match u32 <ip address> <mask> at 24. For example:
tc filter add dev zre5 parent 105: protocol arp u32 match u16 1
0xFFFF at 6 match u32 0x0A5A5A65 0xFFFFFF at 24 classid 105:3
Protocol all
Packets with IEEE 802.3/802.2 (LLC) encapsulation can be recognized based on their
DSAP/SSAP values, using protocol all. It is also possible to match the source or destination
MAC address, or the VLAN. For this protocol, displacements are measured from the beginning
of the MAC header, which always includes a VLAN tag after the source MAC address, so a
match for DSAP 0x42 and SSAP 0x42 would be:
tc filter add dev zre5 parent 105: protocol all u32 match u16 0x4242
0xFFFF at 18 classid 105:5
To match a full MAC address, two matches are needed, since no more than 32 bit can be matched
with one specification. This filter matches a source MAC address and VLAN:
tc filter add dev zre5 parent 105: protocol all u32 match u16 0x00c0
0xffff at 6 match u32 0x95123456 0xffffffff at 8 match u16 5 0x0fff
at 14 classid 105:7
Matching Specific Ingress Ports
The filters shown so far applied to all packets arriving at the switch from any of the switch ports.
To restrict the filter to only apply to packets from a specific port or ports, or only arriving on a
specific VLAN, an ingress queue discipline can be defined for those ports and the filter defined
on that qdisc. The classid of the target then identifies both the destination port and traffic class.
An ingress qdisc is very minimal:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 116
tc qdisc add dev zre1 ingress //ingress qdisc for zre1
tc qdisc add dev zhp2 ingress //ingress qdisc for vlan
The filter add command changes slightly, the parent is now a special handle ffff:fff1, so using the
same filter as the first example:
tc filter add dev zre1 parent ffff:fff1 protocol ip u32 match ip dst
10.91.100.5/32 classid 105:2
This filter will match packets arriving on port zre1, destined for port zre5, with destination IP
address 10.91.100.5. The packets will be put in cos queue 1 on port zre5.
If the filter was defined for dev zhp2 it would be applied to packets arriving on any port which
is included in zhp 2, and require that they be in the VLAN associated with zhp2.
tc filter add dev zhp2 parent ffff:fff1 protocol ip u32 match ip
protocol 6 0xff match tcp src 0 0xf000 classid 105:3
This filter illustrates matching a range of values; any tcp packet on the VLAN associated with
zhp2 with a source port below 4096 will be matched.
Advanced Filtering – Policing
In addition to using filters to direct packets into particular egress queues, it is possible to measure
the rate at which matching packets are arriving and specify actions to take place if the rate is “out
of profile” or “in profile”. This is called policing. It provides a means for limiting the bandwidth
used by matching packets.
The rate threshold is specified in bytes per second, with a burst size which is to be allowed when
the previous rate has been below the threshold. An action is specified to be taken only if the
packet is “out of profile”, that is, the rate has exceeded the threshold and burst size. A second
action can be specified if the packet is “in profile”; the default is to accept the packet. A separate
set of meters are used for policing on each ingress port. This means that the rates given are for
each ingress port, even if the matching packets are going into a single COS queue.
A policing specification follows the match rules in a filter, and precedes the classid. The
following policing specification will drop matching packets when the rate exceeds 10 million
bytes per second after a burst of 20 kilobytes:
police rate 10mbps burst 20kb drop
To specify actions for in-profile packets as well as those out-of-profile, separate the actions by a
“/”:
police rate 100mbit burst 10mbit action drop/reclassify
The reclassify action marks the packet for dropping if the cos queue is above its congestion
threshold. It would apply in this case to packets which were in profile, the out of profile packets
would be dropped immediately.
The classid parameter is not required, and may not be needed for some policing filters. If it is
Ethernet Switch Blade User's Guide
release 3.2.2j
page 117
omitted, and the packet is not dropped, the egress queue will be determined by the priority of the
packet, either from the 802.1p priority for tagged packets or the default priority for untagged
packets for the ingress port.
Examples
The following commands set up priority queues for packets sent to the CPU and then use filters
with policing to direct packets into these queues and limit their bandwidth.
Set up 8 strict priority queues for the cpu port (zrm)
tc qdisc add dev zrm root handle 124:0 prio bands 8 priomap 0 1 2 3
4 5 6 7
Set up bfifo qdiscs to limit the lowest 3 priority queues to 10240 bytes each
tc qdisc add dev zrm parent 124:1 bfifo limit 10240
tc qdisc add dev zrm parent 124:2 bfifo limit 10240
tc qdisc add dev zrm parent 124:3 bfifo limit 10240
Put “BPDU” packets into queue 5, limit rate to 250 kilobits/second
tc filter add dev zrm parent 124:0 protocol all u32 match u32
0x0180c200 0xffffffff at 0 match u16 0 0xffe0 at 4 police rate
250kbit burst 5200 action drop flowid 124:6
Spanning tree BPDU packets go in COS queue 6, no limit
tc filter add dev zrm parent 124:0 protocol all u32 match u32
0x0180c200 0xffffffff at 0 match u16 0x0001 0xffff at 4 match u16
0x4242 0xffff at 18 flowid 124:7
VRRP heartbeat packets go in COS queue 7, limited to 500kbits/second
tc filter add dev zrm parent 124:0 protocol all u32 match u32
0x0100c200 0xffffffff at 0 match u16 0x0012 0xffff at 4 police rate
500kbit burst 5200 action dropok flowid 124:8
Limit unicast traffic to the CPU to 20 Mbits/second, put in COS queue 0
tc filter add dev zrm parent 124:0 protocol all u32 match u8 0x00
0x01 at 0 police rate 20000kbit burst 250kb action dropok flowid
124:1
Allow pings to be received by the CPU with higher priority than other unicast packets. Give
them a much lower rate
tc filter add dev zrm parent 124: protocol ip u32match ip protocol 1
0xFF match icmp type 8 0xFF police rate 64kbit burst 2560 drop
flowid 124:3
Policing Actions
Besides drop, reclassify, and ok, other actions which are permitted by the hardware can be
Ethernet Switch Blade User's Guide
release 3.2.2j
page 118
specified numerically for either out-of-profile or in-profile actions. The numeric value is a
decimal integer action code shown in the table below. If the action requires a parameter, the
parameter value is multiplied by 256 and added to the action code. Only a few of the actions are
possible for out-of-profile. All can be used for in-profile.
Policing Actions
Action
Code
Out Action
Set 802.1p priority for packet to value
64+
No
Set IP TOS field to value (IPV4 only)
66+
No
Send a copy of the packet to CPU
67
Yes
Redirect the packet to port given
69+
No
Mirror the packet to the port given
70+
No
Copy the IP TOS precedence to 802.1p priority (IPV4 only)
72
No
Copy the 802.1p priority to the IP TOS precedence (IPV4
only)
73
No
Set the Differentiated Service field to value (IPV4 only)
74+
Yes
Set the ECN to value (IPV4 only)
81+
Yes
Set the VLAN ID to value
82+
No
Table 7.2: Policing Actions
To set the DS value to 0 for out-of-profile packets and 20 for in-profile packets, use
action 74/5194
// 74+256*20=5194
To mirror only in-profile packets to port 3, use
action ok/838
// 70+256*3=838
u32 match selectors used in filters
A u32 filter can specify multiple matches. The general form of a match is:
match {u32,u16,u8} <value> <mask> at <offset>
where <value> is a decimal or hexadecimal value preceded by 0x, <mask> is always
hexadecimal with an optional 0x, and <offset> is a decimal or hex value preceded by 0x, it is
measured from the beginning of the layer 3 header or the beginning of the layer 2 header
depending on the protocol. The offset must be 32 bit aligned for a u32 match, or 16 bit aligned
Ethernet Switch Blade User's Guide
release 3.2.2j
page 119
for a u16 match. In many cases, there is a field name that can be used for the match, eliminating
the need to specify the offset.
U match selectors
Field Match
Equivalent
ip src a.b.c.d/n
u32 <value> <mask> at 12
ip dst a.b.c.d/n
u32 <value> <mask> at 16
ip tos <value> <mask>
u8 <value> <mask> at 1
ip dsfield <value> <mask>
u8 <value> <mask> at 1
ip protocol <value> <mask>
u8 <value> <mask> at 9
ip precedence <val> <mask>
u8 <val> <mask> at 1
ip dport <value> <mask>
u16 <value> <mask> at 22
ip sport <value> <mask>
u16 <value> <mask> at 20
ip icmp_type <val> <mask>
u8 <val> <mask> at 20
ip icmp_code <val> <mask>
u8 <val> <mask> at 21
udp src <value> <mask>
u16 <value> <mask> at 20
udp dst <value> <mask>
u16 <value> <mask> at 22
tcp src <value> <mask>
u16 <value> <mask> at 20
tcp dst <value> <mask>
u16 <value> <mask> at 22
icmp type <value> <mask>
u8 <value> <mask> at 20
icmp code <value> <mask>
u8 <value> <mask> at 21
Table 7.3: U Match Selectors
Note that specifying a tcp, udp, or icmp field does not automatically include a match for the
appropriate IP protocol. To insure that only the desired packets are filtered, include a match for
IP protocol as well as the port or type. For example:
u32 match ip protocol 6 0xff match tcp dst 8 0xffff
Or
u32 match ip protocol 17 FF match ip sport 20 0xffff
zqosd
Thus far, tc has been used without zqosd. It is not sufficient to install software rules on the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 120
OpenArchitect switch though, because the normal case is for packets to be switched in hardware.
For that reason, zqosd must be used to shadow tc configuration into hardware. Like
zfilterd, zqosd works with ztmd, which provides the actual hardware interaction.
If ztmd is not already running, start it:, then initiate the zqosd daemon with no parameters:
ztmd
zqosd
Now, repeat the same tc command as before, to install a packet-limited FIFO queue:
tc qdisc add dev zhp0 handle 100:0 root pfifo limit 32
When this command is processed, zqosd detects the state change and generates output.
For each port belonging to zhp0, the queue size has changed to 32 packets. Under the default
switch configuration, all ports other than the CPU port belong to zhp0; so all queues other than
the CPU queue are affected.
As before, remove the tc configuration with the command:
tc qdisc del dev zhp0 root
Note that zqosd detects this state change. In fact, examining the CoS configuration on the
switch reveals that the queue sizes have reverted to their default values.
The byte-limited FIFO queue case differs only slightly from the packet-limited FIFO case. The
syntax is almost identical. In hardware the limit is based on 128 byte cells. The specified byte
limit is divided by 128 to determine the cell limit. Always specify a byte limit of at least 128
bytes to avoid setting the queue length to zero.
For example, to set the byte limit for zhp0 to 4096,
tc qdisc add dev zhp0 handle 100:0 root bfifo limit 4096
Tear down any installed rules before proceeding with the next example:
tc qdisc del dev zhp0 root
PRIO and WRR queues
The FIFO examples used a single queue for each interface. In fact, the Ethernet Switch Blade
fabric switch is capable of attaching 1 to 8 queues to each port, with either priority or weighted
round robin (WRR) scheduling, and classification based on a priority map.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 121
In tc, the prio queuing discipline establishes multiple queues and specifies their associated
priority map. Although WRR support is not part of the standard tc distribution, it has been
added to the prio disciplinE.
The following example illustrates WRR. A strict priority scheduler is a simpler case that can be
constructed easily from this example.
Examine the existing CoS settings on the switch, noting the number of queues per port, queue
sizes, scheduling parameters, and priority map. Each of these values changes with this test.
The full set of commands to install four queues, a priority map, and weights is as follows:
tc qdisc add dev zhp0 handle 100:0 root prio bands 4 priomap 1 2 2 2
3 3 3 3 1 1 1 1 1 1 1 1 wrr 1 2 4 6
tc qdisc add dev zhp0 parent 100:1 pfifo limit 120
tc qdisc add dev zhp0 parent 100:2 pfifo limit 100
tc qdisc add dev zhp0 parent 100:3 pfifo limit
80
tc qdisc add dev zhp0 parent 100:4 pfifo limit
60
The first command attaches a queuing discipline as the root discipline for zhp0, with a handle of
“100:0,” as in the FIFO cases. The “prio” option identifies the type of queuing discipline.
Priority scheduling implies multiple queues and the “bands 4” parameters specify that there are
four queues.
The priority map may be read from left to right as Priority n maps to Queue q, where
n is the index of the list element (numbering from 0) and q is the value specified by that element.
So, this example would read:
Priority 0 maps to Queue 1
Priority 1 maps to Queue 2
Priority 2 maps to Queue 2
Priority 3 maps to Queue 2
Priority 4 maps to Queue 3
NOTE: The tc priority map applies to a 4-bit field. With the Ethernet Switch Blade, the
priority map refers to the 802.1p tag, which is a 3-bit field. When translating this tc
rule to hardware, only Priorities 0 through 7 are significant; the other eight priorities
are ignored.
The parameters wrr 1 2 4 6 specify that WRR scheduling is being used and assigns a
relative weight to each queue. The weights are treated as numbers of packets to be sent from
each queue. In this example, if the queues have sufficient packets, queue 1 will have twice as
Ethernet Switch Blade User's Guide
release 3.2.2j
page 122
many packets sent as queue 0, queue 2 will have four times as many, and queue 3 will have six
times as many. wrr parameters are scaled such that the maximum value is no more than 15.
values which would be 0 are set to 1:
•
Queue 0 has a weight of 1000 bytes
•
Queue 1 has a weight of 2000 bytes
•
Queue 2 has a weight of 4000 bytes
•
Queue 3 has a weight of 6000 bytes
The remaining commands each define a packet-limited FIFO queue. As with all previous tc
examples, these queues are created on device zhp0. However, unlike all previous examples,
they are not created as root disciplines for the device. Instead, the “parent” option identifies them
as child queues of the prio discipline.
For example, “parent 100:1” identifies that queue as the first child of the prio discipline (Queue
0), because the prio discipline’s handle is 100:0.
After running each of those commands, again examine the CoS parameters. As with the simple
FIFO example, queue sizes change to 32 packets. In addition, though, the number of queues
changes to 4 for each port in zhp0. Furthermore, the weights have changed for each queue, as
have the queue mappings.
To test the strict priority case, simply remove the wrr 1 2 4 6 options from the first tc
command. Note that all queue disciplines in this test may be cleared by deleting the root
discipline, as before:
tc qdisc del dev zhp0 root
The U32 Filter
The U32 filter provides the capability to match on fields in the L2, L3 or L4 header of a packet.
Each match rule gives the location of the field to be tested, which is always a 32 bit word, a mask
selecting the bits to be tested, and a value which is to be matched by the packet field. Many
matches can be specified in one tc filter command. Only if all matches succeed does the filter
match. In that case, the flowid field identifies the classid of the class this packet belongs in.
The following tc commands put all icmp packets in class 100:10, packets from IP address
1.2.3.4 in class 100:20. Packets for IP address 1.2.3.4 in class 100:20, and arp reply packets in
class 100:30. The last filter illustrates using an offset from the beginning of the protocol header,
along with a mask, to locate the field to be matched
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip
protocol 1 0xff flowid 100:10
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip src
1.2.3.4/32 flowid 100:20
tc filter add dev zhp0 protocol ip parent 100:0 u32 match ip dst
1.2.3.4/32 flowid 100:20
Ethernet Switch Blade User's Guide
release 3.2.2j
page 123
tc filter add dev zhp0 protocol arp parent 100:0 u32 match u32 2
0xffff at +4 flowid 100:30
Combining Queuing Disciplines
Any of the queue length limiting disciplines can be used with the bandwidth management queue
disciplines, by defining them with the handle of one of the classes as their parent. For the htb
queueing discipline, each class has an explicit handle specified when it is defined. For the prio
queueing discipline, including wrr, each band is a class; their handles are formed from the
handle of the prio qdisc by appending a minor number of 1 to n for the n bands. For example, the
following commands define two strict priority queues for port zre5, with the lower priority
queue limited to 32 kb and the higher priority queue limited to 32 kb:
tc qdisc add dev zre5 root handle 100:0 prio bands 2 priomap 0 0 0 0
1 1 1 1
tc qdisc add dev zre5 parent 100:1 handle 110:0 bfifo limit 32kb
tc qdisc add dev zre5 parent 100:2 handle 120:0 bfifo limit 32kb
These translation rules handle conversions of individual rules from tc entries into hardware
entries. They do not explain the results of creating rules that are individually supported; but
which do not make sense in conjunction.
Although the translation rules handle some inconsistency between software and hardware, a user
must define a combination of rules that is reasonable in hardware, to ensure predictable results.
Handle Semantics
All examples have illustrated zqosd copying tc rules into hardware. In fact, the zqosd utility
also enables the user to add tc rules that remain only in software. This selection is based on
handles. zqosd processes all supported queue disciplines and filters with handles between 100:0
and 200:FFFF.
COPS: Common Open Policy Service
The Common Open Policy Service (COPS) is a protocol for distributing networking policy to
devices such as switches and routers. COPS allows a single Policy Decision Point (PDP) to
distribute policy to multiple Policy Enforcement Points (PEPs). A PDP acts as a server for PEP
clients. Figure 7.6 provides an illustration of the COPS Network Architecture.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 124
PDP
PEP
PEP
PEP
Figure 7.6: COPS Network Architecture
A PDP contains all of the policy rulers for its associated PEPs. A PDP typically stores rules in a
data and is a dedicated server, not a forwarding device.
A PEP is any network device that has to enforce policy decisions. For example, a switch that
restricts network access or prioritizes traffic fits the definition of a Policy Enforcement Point. A
PEP makes no policy decision. It simply applies policy that receives from its PDP.
COPS uses a connection-based query and response mechanism. The following scenario illustrates
PEP-PDP communication:
A PEP comes online and opens a connection to its PDP.
• After a connection has been established, the PEP transmits state information to the PDP.
•
•
The PDP uses that state information to determine what policy is applicable for the PEP.
•
The PDP sends that policy to the PEP.
•
The PEP installs the policy and applies it to future traffic.
As long as COPS is running, a connection between the PEP and PDP should stay open. A PEP
could query a PDP at any time asking for a policy decision. Alternatively, an administrator could
modify the policy on a PDP, which would then push any policy changes to its PEPs.
Protocol Architecture
The COPS protocol is broken into several components. The base layer is the COPS protocol
itself, which defines the messaging format. This protocol defines how communication is handled
without specifying the details of the message data.
The base COPS protocol is then used by different client types. These client types apply the COPS
messaging scheme to particular types of data. The currently standardized client types deal with
the RSVP model (COPS-RSVP) and provisioning model (COPS-PR).
The COPS-RSVP scheme is designed around the requirement that a PEP will have to query a
PDP in response to events. An RSVP PEP is constantly listening for resource reservation requests
Ethernet Switch Blade User's Guide
release 3.2.2j
page 125
and relaying those requests to its PDP.
By contrast, the provisioning model is based on longer lasting policy. The expectation is that
policy should be administratively defined at the PDP and pushed to the PEPs as needed.
OpenArchitect is a COPS-PR client.
The most common use of COPS-PR is for distributing Differentiated Services (Diffserv) policy.
Diffserv is concerned with such Quality of Service elements as queues and schedulers.
OpenArchitect PEP
The OpenArchitect PEP implementation is known as pepd. The pepd utility is based on:
RFC 2478: Common Open Policy Service (COPS)
RFC 3084: COPS Usage for Policy Provisioning
RFC 3159: Structure of Policy Provisioning Information
RFC 3289: Management Information Base (MIB) for the Differentiated Services Architecture
Internet Draft: Differentiated Services Quality of Service Policy Information Base (latest
version draft-ietf-diffserv-pib-09)
Internet Draft: Framework Policy Information Base (latest version draft-ietf-rapframeworkpib-09)
A Policy Information Base (PIB) defines the representation of a particular data set. For example,
the Diffserv PIB specifies the structures used to represent all Diffserv elements. PIBs are
functionally equivalent to Management Information Bases (MIBs) such as those used by SNMP.
The OA PEP has implemented those portions of the Diffserv and Framework PIBs that are
supported by the underlying switch architecture.
The pepd utility requires a PDP that has implemented the above RFCs and drafts. Until all draft
standards are approved, the certain COPS-PR data types will not be assigned OIDs. pepd uses
non-standard OIDs for the unassigned values.
Using pepd
The pepd utility works by connection to a PDP, informing the PDP of its roles, and installing
any rules that the PDP has for those roles. Configuration information should be specified in a
configuration file, specified on the command line with the –f option.
pepd –f <full_path_and_filename>
A sample configuration file is listed below:
PDP address: 10.0.0.11
PDP port: 3288
PEPID: some-id
Role-If: a zre1,zre2,zre3,zre4
Ethernet Switch Blade User's Guide
release 3.2.2j
page 126
where,
PDP address:
The IP address of the PDP. Default is loopback (127.0.0.1)
PDP port:
The destination port on which to open a COPS connection. Default is
3288.
PEPID:
The PEP Identifier
Role-If:
A mapping of roles to interfaces. The name of the role is
followed by a comma-delineated list of interfaces. Multiple roleinterface mappings are defined through multiple Role-If declarations.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 127
Chapter 8 Base Switch Administration
One of the main benefits of the OpenArchitect switch is that it runs Linux, so much of the switch
administration is already familiar to most network or system administrators. It is a good idea to
complement these instructions with a standard Linux reference guide, such as Linux Network
Administrator’s Guide available from O’Reilly. Below are brief descriptions of some of the more
routine administrative task pertinent to the switch.
Setting the Root Password
The switch is shipped with a default user root and no password. To set the root password, use the
password command:
ZX6000-OA<release no.># passwd
Changing password for root
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and numbers.
Enter new password:
Re-enter new password:
Password changed.
ZX6000-OA<release no.>#
CAUTION: Even when just changing the password, you need to save the file system
overlay with the zsync command, or you will lose your changes upon reboot.
Adding Additional Users
Additional users can be added with the adduser command. Additional users are desirable for
connecting to the switch via ftpd and other daemons that require a login other than root and a
password. To create a user named guest, run adduser
ZX6000-OA<release no.># adduser guest
Changing password for guest
Enter the new password (minimum of 5, maximum of 8 characters)
Please use a combination of upper and lower case letters and numbers.
Enter new password:
Re-enter new password:
Password changed.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 128
ZX6000-OA<release no.># zsync
ZX6000-OA<release no.>#
Setting up a Default Route
If you wish to access the switch from some place other than a directly attached network, you may
want to setup a default route. Use the route command to set a default gateway.
route add default gw 10.0.0.254
Put the entry into the /etc/init.d/rcS startup script to automatically set a default route upon
reboot.
Name Service Resolution
Name service lookups will be done locally using /etc/hosts. You can also tell the switch
which name server to use by including an entry in /etc/resolv.conf.
DHCP Client Configuration
A utility is included to dynamically determine the IP address of the OpenArchitect switch
interfaces. To set the the IP address dynamically, execute the command,
dhclient zhp0
The default device name, zhp0, works with the default configuration of the OpenArchitect
switch and will attempt to obtain an IP address from the local DHCP server. To use DHCP to set
your IP addresses automatically on boot up, uncomment the the following line in
/etc/init.d/rcS by removing the # sign:
/usr/sbin/dhclient zhp0
DHCP Server Configuration
The OpenArchitect switch includes a DHCP server. To start the DHCP server, configure
/etc/dhcpd.conf for your network, and run dhcpd
Consult Linux Network administration manuals for more information on DHCP and
configuration options.
To use DHCP to set your IP addresses automatically on boot up, uncomment the the following
line in /etc/init.d/rcS by removing the # sign:
dhcpd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 129
Network Time Protocol (NTP) Client Configuration
NTP is a protocol for setting the real time clock on a system. There are numerous primary and
secondary servers available on the network. For more NTP information, and a list of available
NTP servers, see the following URL:
http://www.ntp.org/
You will need to have your network settings properly configured to reach an available NTP
server on your local network or the Internet. To set the time and date, execute ntpdate with the
server of your choice. For example:
ntpdate –u ntp.ucsd.edu
The –u is required if the OpenArchitect switch is operating behind some types of firewalls.
If you wish for ntpdate to set your date and time automatically each time you boot,
uncomment the example ntpdate command line in /etc/init.d/rcS by removing the #
sign. ntpdate returns the Universal Time (UTC, formerly Greenwich Mean Time, or GMT). To
display the local time, set the TZ variable to the appropriate name and the number of hours offset
from UTC. For instance:
export TZ=PST8
for Pacific Standard Time offset from UTC by 8 hours. To set an environment variable, add the
entry to /etc/profile. Remember to zsync to make your changes permanent.
Network File System (NFS) Client Configuration
The OpenArchitect switch includes an NFS client for mounting remote file systems. You will
need to start NFS server processes in order to use NFS. You will need to start the following
servers:
/sbin/portmap
/sbin/rpc.statd
/usr/sbin/rpc.mountd -r
Once the above servers are started, you can mount a remote NFS file system.
mount rhost:nfs_file_system local_mount_point
If the remote NFS file system you’re mounting is on an OA switch, you should mount with
caching disabled.
mount rhost:nfs_file_system –o noac local_mount_point
Ethernet Switch Blade User's Guide
release 3.2.2j
page 130
All the necessary servers are included in /etc/init.d/rcS but are commented out by default.
To automatically start all NFS client services each time you boot, uncomment the NFS Client
servers. Go to the /etc/init.d/rcS file. Uncomment the following command lines by
removing the # sign.
/sbin/portmap
/sbin/rpc.statd
/usr/sbin/rpc.mountd -r
You can also include commands to mount remote NFS file systems at boot time. There is an
example line included at the appropriate location in /etc/init.d/rcS. Uncomment and alter
the mount command included for your particular configuration.
NOTE: A “sleep” of 5 seconds is included to allow time for the links to come up prior to
attempting the mount.
sleep 5
mount 10.0.0.1:/nfs –t nfs –o noac /mnt
NFS Server Configuration
The switch also contains an NFS server so that you can mount the switch file system from other
systems. To enable the NFS server, first follow the steps to enable the NFS client. Then, edit
/etc/exports to include the file systems you wish to export. Consult a standard Linux Network
Administrator’s Guide (or man pages) regarding options for exported file systems. Generally, an
entry in /etc/exports looks like the following:
/nfs
*.localdomain.com(ro)
Now start nfsd to export the mount points and begin answering requests from remote clients.
/sbin/rpc.nfsd –r
To export file systems automatically on boot, edit /etc/init.d/rcS, uncomment the
/sbin/rpc.nfsd command line by removing the #.
/sbin/rpc.nfsd -r
Connecting to the Switch Using FTP
Use ftp to transfer files to or from the switch. See the Linux Reference Guide for details of the ftp
command. In general, you can use ftp to connect to any system running an ftp server, including
other OpenArchitect switches, to either get (transfer files from the remote host to the switch) or
put (transfer files from the switch to the remote host) files.
ftp <remote_host>
Ethernet Switch Blade User's Guide
release 3.2.2j
page 131
ftpd Server Configuration
The switch itself can also be configured to run an FTP server (ftpd). See the Linux Reference
Guide for details of the ftpd command. You will need to add a user to the switch in order to
connect via ftp from a remote host, since root is not allowed ftp access. See the earlier section in
this chapter regarding how to add a user. The ftp daemon is started by default. If you wish to
shutdown the ftp daemon, comment out the betaftpd line in /etc/init.d/rcS.
Connecting to the Switch Using TFTP
Trivial File Transfer Protocol or TFTP, is a very simple protocol used to transfer files. It is
designed to be small and easy to implement. Therefore, it lacks most of the features of a regular
FTP, like user authentication. You can use tftp to connect to any system running a TFTP server
(tftp) including other OpenArchitect switches.
tftp <remote_host>
TFTPD Server Configuration
The TFTP server is started by inetd(8) using the configuration set up in /etc/inetd.conf.
The use of tftp(1) does not require an account or password on the remote system. Due to the
lack of authentication information, tftpd will allow only publicly readable files to be accessed.
The default location of these files is /tftpboot.
SNMP Agent
Simple Network Management Protocol (SNMP) is the defacto standard for network management.
An SNMP agent maintains a structure of data for a network device in a virtual information
database, called a Management Information Base (MIB). A network management station is
capable of accessing the MIB of the network device to monitor and configure the network device.
The OpenArchitect switch utilizes the NET-SNMP (formerly UCD-SNMP) agent core.
Additional information on the agent can be found at: http://www.net-snmp.com. The
OpenArchitect switch agent will respond to SNMPv1, SNMPv2, and SNMPv3 requests.
Protocols supported on the OpenArchitect switch by gated, such as RIP and OSPF communicate
with SNMP agent via the SNMP Multiplexing (SMUX) protocol.
Supported MIBS
OpenArchitect includes MIB support as documented by each of the RFCs listed. The MIBs
themselves are located on the switch in the /usr/share/snmp/mibs directory.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 132
Supported MIBS
RFC 1155:
Structure and Identification of Management Information for TCP/IP-based
internets
RFC 1227:
SNMP MUX Protocol and MIB
RFC 1493:
Definitions of Managed Objects for Bridges (obsoletes RFC 1286)
RFC 1657:
Definitions of Managed Objects for the Fourth Version of the Border Gateway
Protocol (BGP-4) using SMI-V2
RFC 1724:
RIP Version 2 MIB Extension (obsoletes RFC 1389)
RFC 1850:
OSPF Version 2 Management Information Base (obsoletes RFC 1253, which
obsoletes RFC 1252, which obsoletes RFC 1248)
RFC 2011:
SNMPv2 Management Information Base for the Internet Protocol Using SMIv2
RFC 2012:
SNMPv2 Management Information Base for the Transmission Control Protocol
Using SMIv2
RFC 2012:
SNMPv2 Management Information Base for the User Datagram Protocol Using
SMIv2
RFC 2013:
Management Information Base for Network Management of TCP/IP-based
internets: MIB-II (obsoletes RFC 1213, which obsoletes RFC 1158)
RFC 2021:
Remote Network Monitoring Management Information Base Version 2
RFC 2096:
IP Forwarding Table MIB
RFC 2571:
An Architecture for Describing SNMP Management Frameworks
RFC 2572:
Message Processing and Dispatching for the Simple Network Management
Protocol (SNMP)
RFC 2573:
SNMP Applications
RFC 2574:
User-based Security Model (USM) for version 3 of the Simple Network
Management Protocol (SNMPv3)
RFC 2575:
View-based Security Model (VACM) for version 3 of the Simple Network
Management Protocol (SNMP)
RFC 2576:
Coexistence between Version 1, Version 2 and Version 3 of the Internetstandard Network Management Framework
RFC 2665:
Definitions of Managed Objects for Ethernet-like Interfaces
RFC 2674:
Definitions of Managed Objects for Bridges with Traffic Classes, Multicast
Filtering and Virtual LAN Extensions
RFC 2742:
Definitions of Managed Objects for Extensible SNMP Agents
RFC 2787:
Definitions of Managed Objects for the Virtual Router Redundancy Protocol
RFC 2819:
Remote Network Monitoring Management Information Base
RFC 2863:
The Interfaces Group MIB (obsoletes RFC 2233, which obsoletes RFC 1573,
which obsoletes RFC1229)
RFC 2932:
IPv4 Multicast Routing MIB
RFC 3165:
Definitions of Managed Objects for the Delegation of Management Scripts
RFC 3231:
Definitions of Managed Objects for Scheduling Management Operations
Ethernet Switch Blade User's Guide
release 3.2.2j
page 133
Supported MIBS
ZNYX Networks Private
MIB
Custom ZNYX MIB to support software and hardware features not covered by
standard MIBs. The Private MIBs are ZX7100BASE.MIB AND
ZX7100FABRIC.MIB, pointed to by ZNYX-H.MIB.
UCD-SNMP Enterprise
MIB
UCD-SNMP MIB related to management and monitoring of the LINUX host
Table 8.1: Supported MIBs
Supported Traps
Upon certain events, the OpenArchitect switch can be configured to send notification of the
event, called an SNMP Trap out to a defined recipient/manager or managers. Traps are not issued
in real time. OpenArchitect will send SNMP traps for the following conditions:
Supported Traps
SNMPv2-MIB:
coldStart
SNMPv2-MIB:
authenticationFailure
IF-MIB:
linkUp
IF-MIB:
linkDown
UCD-SNMP-MIB:
ucdShutdown
RMON-MIB:
risingAlarm
RMON-MIB:
fallingAlarm
VRRP:
vrrpTrapNewMaster
VRRP:
vrrpTrapAuthFailure
EGP (rfc1213):
egpNeighborLoss
BGP4-MIB:
bgpEstablished
BGP4-MIB:
bgpBackwardTransition
Table 8.2: Supported Traps
SNMP and OpenArchitect Interface Definitions
OpenArchitect, defines three types of devices:
zre
physical port
zrl
trunk of ports
zhp
interface consisting of ports (zres) and trunks of ports (zrls)
A zrl (trunk device) is treated as an aggregate of its constituent zres (ports). A zhp is an
aggregate of its immediately contributing sub-interfaces (zres and zrls). The ports that make
up a trunk do not contribute to the zhp.
The administrative status of a zre and zhp are independent of each other. If the administrative
Ethernet Switch Blade User's Guide
release 3.2.2j
page 134
status is down, then the operational status will be down independent of the underlying link state.
You must ifconfig up the zres to see the operational link status for a zre. When the
administrative status is up, the operational status is dependent on the underlying physical state.
For example, Table 8.3 shows that if zhp0 contains zre1 and zre2, the it would also be true for
the operational status (given the administrative status is up on zre1, zre2, and zhp0).
Link and SNMP Status
Physical Link Status
SNMP Operational Status
zre1
zre2
zre1
zre2
zhp0
down
down
down
down
down
down
up
down
up
up
up
down
up
down
up
up
up
up
up
up
Table 8.3: Physical Link Status on Base Switch
The administrative status is directly controlled by ifconfig up/down. The administrative status
of the zhps and zres do not affect each other.
ifStackTable Entries
In the actual ifStackTable as shown in the MIB walk the following two OIDs (which denote
ifIndexes) show the relationships.
ifMIB.ifMIBObjects.ifStackTable.ifStackEntry.ifStackStatus.0.1 =
active(1)
ifMIB.ifMIBObjects.ifStackTable.ifStackEntry.ifStackStatus.0.2 =
active(1)
If they are X.Y then
if X = 0 there is nothing "above" this interface
if Y = 0 there is nothing "below" this interface
otherwise interface X has interface Y as a logical constituent.
•
•
SNMP Configuration
The SNMP agent is called snmpd and is started by default from the Linux boot up script
/etc/rcZ.d/S75snmpd. If you do not wish to start snmpd, remove /etc/rcZ.d/S75snmpd.
Configuration of the OpenArchitect switch SNMP agent is the same as configuration of any
standard Linux host that uses the NET-SNMP agent. Configuration information for persistent
data and security information is kept in snmpd.conf under the default SNMP configuration
location, which for the OpenArchitect switch is /usr/share/snmp. snmpd.conf is the
location to change “sys” information such as the syslocation and syscontact, as well as
permissions such as the rocommunity or rwcommunity.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 135
IMPORTANT: For NET-SNMP agents, these objects (sysLocation.0,
sysContact.0 and sysName.0) ordinarily are read-write. However, specifying the
value for one of these objects by giving the appropriate token in snmpd.conf makes the
corresponding object read-only, and attempts to set the value of the object will result in a
notWritable error response.
The processing for link up and link down traps is now user configurable. As the default, traps
conform to RFC2863, meaning the trap contents will include:
ifIndex, ifAdminStatus and ifOperstatus
You can alter this behavior by specifying:
cisco_link_traps on
If cisco_link_traps are turned on as described then link up and link down traps will have a
cisco-like format and the trap contents will include:
ifDescr and ifType
Examine and edit /usr/share/snmp/snmpd.conf appropriately for your configuration.
Information in /usr/share/snmp/snmpd.conf is only read at startup - or when the daemon is
forced to reread its configuration. See the standard Linux man page for snmpd.conf for more
details.
SNMP Applications
The OpenArchitect switch includes the snmpget, snmpwalk, and snmpset applications you
can use these standard Linux utilities to test your SNMP agent. For example,
snmpwalk localhost –c public
walks the entire MIB of the localhost (OpenArchitect switch) starting at the top of the MIB. See
the Linux Reference Man Pages for the usage of the SNMP utilities.
MIB values are decoded from their numerical representations into readable text by parsing MIBs
located in the /usr/share/snmp/mibs/ directory. If you need to add a MIB, add it to that
directory and zsync to save across reboots.
Port Mirroring
zmirror sets packet mirroring from a given set of ports to a given port. Turning on packet
mirroring causes a copy of the packet to be sent to the mirror_to port. There can be only a
single mirror_to port, but there can be multiple mirror_from ports. The zmirror
command overwrites the mirror_to port. The zmirror command accumulates the
mirror_from ports. Note, there are performance issues when trying to mirror more bandwidth
than is available on the mirror_to port. Use the zmirror command in the following way:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 136
zmirror mirror_from mirror_to
After executing the following three commands, packets received on ports 0, 1 and 2 would be
mirrored (copied and transmitted) to port 12. This mirroring would be in addition to any Layer 3
or Layer 2 switching.
zmirror zre0 zre12
zmirror zre1 zre12
zmirror zre2 zre12
To clear the current mirroring use the -t option. The -e option can be used to indicate that
packets being sent on a given port should be copied to the mirror_to port. For example if the
-e option is used as follows, the packets transmitted, as opposed to received, on ports 0, 1 or 2
would be mirrored to port 12.
zmirror -e zre0 zre12
zmirror -e zre1 zre12
zmirror -e zre2 zre12
Link and LED Control
The zlc application sets the link speed and state of individual ports of the switch, or display their
current state. It can also set or clear the extract led or the internal fault led, or to set a port down
or up. To force the link on port 0 down,
zlc zre1 down
To check the status of a link,
zlc zre1 query
To check the status of all links,
zlc zre0..23 query
Link Event Monitoring
The zlmd application is intended to run as a daemon, waiting for a configured event to occur and
then running the program configured for that event. The events monitored are changes in the link
status at any of the 24 in-band ports of the switch, the start of removal of the switch from the
backplane, or the cancellation of the removal before it actually takes place. The program can be a
shell script that initiates appropriate actions to respond to the event.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 137
Chapter 9 Base Switch Maintenance
This chapter includes basic information about the OpenArchitect switch environment including
an overview of the file system structure, modifying and updating switch files, upgrading the
switch driver and kernel, and implementing a system recovery.
Overview of the OpenArchitect switch boot process
The OpenArchitect switch is equipped with a Random Access Memory (RAM) disk and three
Read Only Memory (ROM) devices, including, a boot ROM and two application flash
Offset 0
Offset 0
zmon
Free space
Application
Flash 2 on
Device 2
Application
Flash 1 on
Device 1
Boot ROM
on Device 0
initrd
(exact copy
as in
Application
Flash 1)
initrd
Linux and
its file
system
Linux and its
file system
Offset 7f000
dev
bootstring
Free space
Free space
overlay
file system
Figure 9.1: ROM Devices in OpenArchitect
The boot ROM is located on device 0 and contains the OpenArchitect zmon application that
operates as a boot loader and includes a device bootstring. Device 1 contains the application
flash 1 image of the Linux operating system and the OpenArchitect overlay file system.
Application flash 1 is the primary working image for the switch. Device 2 contains the
application flash 2 that is an exact copy of application flash 1. You would only boot from this
device if application flash 1 is corrupted and you need to restore the switch to the factory-shipped
configuration.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 138
Bootloader examines the
bootstring in the boot
ROM
Determines
if the boot string
is dev1
Yes
Loads image
from Flash 1
to RAM
Yes
Loads image
from Flash 2
to RAM
No
Determines
if the boot String
is dev2
No
Begins
execution of
RAM image
Boot into
zmon bootloader
Figure 9.2: Booting up Process Flow
Under normal circumstances, the booting up process follows the process outlined in Figure 6-2.
During boot up, the zmon bootloader reads the device bootstring to locate and validate the
correct application image to load. The bootstring command is in the following format:
boot : X | [<options>] X represents the device value 0, 1 or 2
The boot process opens and uncompresses the initrd image onto the RAM disk. Then zmon
begins booting the Linux image. After Linux boots, the init process executes the
/etc/init.d/rcS script which, in turn, executes /etc/rcZ.d/rc (see Figure 9.3: Init Script
Flow)). The /etc/rcZ.d/rc script runs S* files in /etc/rcZ.d, with the start parameter. The S*
files are the switch configuration files (for example, S50layer2).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 139
/etc/init.d/rcS
/etc/rcZ.d/rc
S*
S*
S*
Figure 9.3: Init Script
Flow
Saving Changes
Any modifications made to the scripts for your particular configuration must be properly saved or
your changes are lost when you reboot. The file system for the switch only exists in memory. A
rewritable overlay is contained within the upper four megabytes of the first application flash.
Modifying Files and Updating the Switch
Any file in OpenArchitect can be added, deleted or modified, with the exception of /sbin/init,
/usr/sbin/zmnt, /lib/modules/zfm_c.o, and the /tmp directory. Files are saved across a
system reboot by running the script zsync.
A directory /.zsync contains database files used by zsync for managing the file system
overlaying process. The user should not modify the files in this directory or unpredictable results
may occur.
Recovering from a System Failure
If the switch does not function after you initially change or reconfigure the image, you have
several options for recovering from an error. First, try to telnet into the switch. If you are
successful, remember to run zsync after fixing your problem.
If you cannot telnet, attach a console cable to the switch. Bring down the system and properly
attach the console cable, see Chapter 2, Connecting to the Console Port .
System Boots with a Console Cable
After attaching the system console cable, if the system boots, fix the problem that does not allow
you to telnet to the box, run zsync, and reboot. The problem is likely to be in the configuration
files contained in /etc/rcZ.d In order to telnet into the box, there must be a configured interface
with a proper IP address. For example, zhp0 is configured with the IP address 10.0.0.42 in the
factory default configuration.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 140
Booting with the –i option
If you cannot telnet into the switch and Linux fails to boot, it is likely that a change saved by
zsync has left the switch in an inaccessible state. To allow users to recover from mistakes saved
in the overlay file system, a boot argument of –i passed to the init process will stop the
untarring of the saved overlay files. As a result, the system boots to the factory-shipped
configuration.
•
Connect through the console port. During boot up, the system displays the Linux boot
string. “Linux/PPC load: ” for 5 seconds. During the 5 second pause, enter the boot
option “-i” and press Return
Linux/PPC load: root=/dev/ram init=/sbin/init -i
• Initiating the –i option of zbootcfg.
zbootcfg –d 1 –i
•
Reboot the system. After the reboot, clear the –i option from the boot string. Enter the
following command:
zbootcfg –d 1
The reboot command will also take “-i” as an option and pass it to the Linux boot,
reboot -i
•
•
When the system boots, the overlay file system is returned to the factory-installed
configuration. At this point, you have a few options.
Run zsync and the factory-installed system will be restored to your flash.
NOTE: All changes you have made and saved prior to the zsync command will
be lost.
•
Restore particular files from the existing overlay. Use the zmnt command to mount
the overlay in a designated directory and copy back just the changes you want to keep
from the existing overlay. For example, if you wanted to recover your /etc/hosts
file from the existing overlay, use zmnt to mount the overlay in a designated
directory, like /tmp, then copy /tmp/etc/hosts to /etc/hosts. Lastly, use zsync
to save your changes.
zmnt /tmp
cp /tmp/etc/hosts /etc/hosts
zsync /etc/hosts
•
Reboot the system.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 141
System Hangs During Boot
After attaching the system console cable, if the system hangs during boot, try booting with the –i
option as described in the previous section. It is possible that important Linux system files
became corrupted and incorrectly saved in the flash overlay. Use zmnt as described in the
previous section to fix or remove the problem files from the overlay. If the system will not boot
with the –i option, refer to Booting the Duplicate Flash Image section in this chapter.
Booting the Duplicate Flash Image
Another recovery method, if Linux fails to boot, is to temporarily boot the factory-installed
duplicate image located in the second flash device.
Connect through the console port.
When you see the number counter appear after the “zmonitor …” banner, press any key on the
console keyboard to enter the zmon application.
At the monitor prompt, type:
boot:2
You should see the counter again, but the system should boot into the secondary kernel. If you
have difficulties booting, contact Hewlett-Packard Technical Support.
At this point, follow the Upgrading the OpenArchitect Image section to put a new RAM disk
image in the application flash 1.
IMPORTANT: Be sure not to program flash 2, since this is your only current bootable
image.
The command to program flash 1 should be similar to the following command. The image name
may be slightly different depending on the model of switch and version of the image:
zflash –d 1 rdr6000.zImage.initrd
Upgrading the OpenArchitect Image
1. Refer to the HP bh5700 14-Slot Blade Server Installation Guide, Chapter 6, 14-Slot Shelf
Startup, Validating and Updating Your Firmware for instructions on how to gain access
to firmware updates for the HP bh5700 14-Slot Blade Server.
2. Use telnet, or preferably, attach a console cable to the switch, and login to the switch.
IMPORTANT: If you are connecting via telnet, be aware that the upgrade
process will reset the switch to the default IP address of 10.0.0.42, so you will have to
be able to reach 10.0.0.42.
3. Using the procedures referenced in Step 1, above, download the available OpenArchitect
image upgrade to a local system.
4. Check for free space with the df command. The OpenArchitect image is very close to
Ethernet Switch Blade User's Guide
release 3.2.2j
page 142
the limit of free space available on a default system, so you may need to clear some space
prior to downloading the new OpenArchitect image to the switch.
CAUTION: Do not remove the existing copy of /usr/sbin/gated (as suggested
in Step 5, below) until you have, in fact, determined that an OpenArchitect upgrade
version is available for downloading.
5. One of the easiest ways to create free space is to remove /usr/sbin/gated, as the
application will be replaced during the update procedure. Once you have enough free
space, proceed.
6. From the switch console, ftp the new OpenArchitect (rdr) image from the local system
to your switch.
7. The switch has two flash available: Device 1 and device 2. Use the zflash command to
write the new OpenArchitect image into the first flash device.
IMPORTANT: Make sure that Surviving Partner is not running before using zflash. The
delays incurred while zflash writes the flash can cause the Surviving Partner daemons to
think there is a failure, resulting in link oscillation.
zflash -d 1 <image_file>
The image file will be something named similar to the following,
zflash -d 1 rdr6000.zImage.initrd
Upgrading or Adding Files
Follow the procedure below to upgrade or add a new file to the switch. Place the file you are
adding or upgrading into the appropriate location in the file system. Save the file in the overlay
directory area on the application flash by running zsync.
zsync
After running zsync, the file is saved to the flash for future reboots.
Excluding Saving Files to Flash
Specific files or directories can be excluded from saving to flash by zsync by including an entry
in /etc/exclude. Likewise, existing entries in /etc/exclude such as /tmp can be removed
in order to save those files to flash with zsync.
Upgrading the Switch Driver
The switch driver upgrade process is the same as a file upgrade. However, more caution should
be taken since the driver module is likely to be the method by which you are logging into the
system. If the switch driver has a problem, you will need to have a console cable to recover. To
upgrade a switch driver, replace the file /lib/modules/if_zxe.o, run zsync and reboot.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 143
Using apt-get
apt-get is a utility created by the Debian Linux community to allow remote fetching and
installation of software stored in a repository in Debian package format. It allows users to keep
their software up-to-date with the latest binaries, and install new software without the need to
recompile.
Users may create their own repositories and add entries in /etc/apt/sources.list ( empty
by default ) for their private access methods to their private repository. See
http://www.debian.org for complete APT documentation.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 144
Chapter 10 Connecting to the Ethernet Switch Blade
The Ethernet Switch Blade has two completely separate switching subsystems within one ATCA
blade supporting both Base Interface and Fabric Interfaces
Figure 10.1: Fabric and Base
The Ethernet Switch Blade implements an independent control processor and software
environment for both Base and Fabric Interface switching subsystems. Troubleshooting problems
are similar in both environments. The following sections provides instruction on how to connect
to the Serial Port or Out-of-Band (OOB) Ethernet Interface for each backplane interface type if
user-intervention is needed.
Base Interface Hub System:
A 24 port Gigabit Ethernet Switch that provides service for a full 14-slot ATCA chassis. All
connectors for the base interface hub and it’s processor are labeled “base”.
Ethernet Interfaces:
The 3.0 Base Interface switching system provides 24 ports of Gigabit Ethernet service for up to
14 line cards with support for dual-shelf manager connections. Three ingress/egress ports are
available on the front panel. An additional Gigabit Ethernet port (ISL) on the backplane
interconnects the switches together for high availability configurations. If the OpenArchitect
environment is running, any in-band port can be used to establish a Telnet session.
Management Interfaces:
The Ethernet Switch Blade features an RS-232 console port located on the front panel that allows
communication with the switch when the Out-of-Band Ethernet port is not available, or in-band
Ethernet service cannot be established with the switch. A hyperterminal application is
recommended to contact the Ethernet Switch Blade through a Telnet session when using the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 145
console port. An RS-232 to RJ-45 adapter is required.
Fabric Interface Hub System:
A 48-port Gigabit Ethernet Switch that provides PICMG 3.1 Option 2 (2.0 Gb/s) Ethernet service
for a full 14-slot ATCA chassis. All connectors for the fabric interface hub and it’s processor are
labeled “fabric”.
Ethernet Interfaces:
The 3.1 Fabric Interface switching system provides 48 ports of Gigabit Ethernet service with
PICMG 3.1 option 2 (2.0 Gb/s) links for all line cards installed and option 3 (4.0 Gb/s) for up to 6
slots. Four ingress/egress ports are available on the front panel. An additional Gigabit Ethernet
port (ISL) on the backplane interconnects the switches together for high availability
configurations. If the OpenArchitect environment is running, any in-band port can be used to
establish a telnet session.
Management Interfaces:
The Ethernet Switch Blade features a RS-232 console port located on the front panel that allows
communication with the switch when the Out-of-Band Ethernet port is not available or in-band
Ethernet service cannot be established with the switch. A hyperterminal application is
recommended to contact the Ethernet Switch Blade through a telnet session when using the
console port. An RS-232 to RJ-45 adapter is required.
Connecting to the Base Interface
Base Interface Serial Port Connection
The switch console can be accessed via a RJ-45 10/100 service port located on the front panel of
the Ethernet Switch Blade. The RS-232 RJ-45 console port may be used to recover from a system
failure. It is used for maintenance only and is generally not connected.
An RS-232 to RJ-45 adapter cable is required to connect to the console port of the switch.
Figure 10.2 Shows the RJ-45 serial console (1) and Out-of-Band (OOB) ports (2) for the Base
Interface.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 146
Figure 10.2: Base Interface
Serial Port
To attach the console cable to the Ethernet Switch Blade switch:
1. Plug the RJ-45 end of the console cable (P/N 6900-63006, shipped with the HP bh5700
ATCA 14-Slot Blade Server) into the RJ-45 Console Port (1) on the front panel.
2. Connect the DB-9 end of console cable into a standard Modem Eliminator Cable
(normally locally available).
3. Connect the DB-9 header on the other end of the Modem Eliminator Cable to a standard
COM port on your PC or laptop computer (9600, n, 8, 1).
4. Reinsert the switch into the system and power up.
5. Use a terminal emulation program to access the switch console.
Base Interface Out-of-Band Ethernet Connection
Connect an Ethernet cable from the Ethernet Switch Blade front panel MGMT OOB (2 in Figure
10.2) to your PC.
1. Configure a host on the 10.0.0.0 network.
2. The OpenArchitect switch is pre-configured with address 10.0.0.42. telnet to
10.0.0.42.
telnet 10.0.0.42
3. After you are connected, enter the login name root. No password is required.
OpenArchitect login: root
4. You are now logged in and should see the following shell prompt:
[ZX6000-OA3.2.2h]#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 147
NOTE: The OOB port is not active by default with the factory configured configuration.
The first time you log into the switch either in-band or through the console cable you must
use the ifconfig command to make the port active.
Connecting to the Fabric Interface
Fabric Interface Serial Port Connection
The switch console can be accessed via one RJ-45 10/100 serial port (3) located on the front panel
of the Ethernet Switch Blade. The RS-232 RJ-45 console port may be used to recover from a
system failure. It is used for maintenance only and is generally not connected.
An RS-232 to RJ-45 adapter cable is required to connect to the console port of the switch. See the
Users Guide for more information.
Figure 10.3 showsthe RJ-45 serial console (3) and Out-of-Band (OOB) ports (4) for the Fabric
Interface.
Figure 10.3: Fabric Interface Serial
Ports
To attach the console cable to the Ethernet Switch Blade switch:
6. Plug the RJ-45 end of the console cable (P/N 6900-63006, shipped with the HP by5700
ATCA 14-Slot Blade Server) into the RJ-45 Console Port (1) on the front panel.
7. Connect the DB-9 end of console cable into a standard Modem Eliminator Cable
(normally locally available).
8. Connect the DB-9 header on the other end of the Modem Eliminator Cable to a standard
COM port on your PC or laptop computer (9600, n, 8, 1).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 148
9. Reinsert the switch into the system and power up.
10. Use a terminal emulation program to access the switch console.
Fabric Interface Out of Band Ethernet Connection
Connect an Ethernet cable from the Ethernet Switch Blade front panel MGMT OOB (4 in Figure
10.3) to your PC.
1. Configure a host on the 10.0.0.0 network.
2. The OpenArchitect switch is preconfigured with address 10.0.0.42. telnet to 10.0.0.42.
telnet 10.0.0.42
3. After you are connected, enter the login name root. No password is required.
OpenArchitect login: root
4.
You are now logged in and should see the following shell prompt:
[Ethernet Switch Blade-OA3.2.2h]#
NOTE: The OOB port is not active by default with the factory configured configuration.
The first time you log into the switch, either in-band or through the console cable, you must
use the ifconfig command to make the port active.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 149
Chapter 11 Diagnosing a Failed Ethernet Switch Blade
Activation
Figure 11.1: Ethernet Switch Blade Activation States
The Ethernet Switch Blade must transition through a series of states (M0–M4) to become active
in an ATCA shelf. After the Ethernet Switch Blade has reached the M4 state, it will become
active and start the boot process of the OpenArchitect Switch Management environment. If a
failure occurs during the Shelf Manager activation stage, the Ethernet Switch Blade has to be
diagnosed through the ShMM as the OpenArchitect environment is not booted.
Table 11.1 lists the solutions to problems that may occur at the different stages the Ethernet
Switch Blade transitions through to become active and steps that should be taken if it fails during
activation.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 150
FRU State
HotSwap
LED Status
M0
OFF
Healthy
LED Status
OFF
Solution
No power. Board not inserted correctly.
1. Remove and re-insert board.
2. If board does not power-up after re-insertion,
try a different slot. If board continues to fail
in the new slot and the problem does not
affect other boards running in the chassis,
return the Ethernet Switch Blade board for
repair.
M1
ON
ON
No Communication with the Shelf Manager. Make
sure the hot swap ejector handle is securely closed.
The Hot Swap LED may remain lit or blinking if
awaiting permission to activate from the ShMM
If the Hot Swap handle is closed, retrieve the Mstate
information from the ShMM. Use the tips in the
Accessing the ShMM section below to determine if
other FRUs are also encountering difficulty (could
indicate a chassis-wide failure of the IPMB bus)
M2
LONG
BLINK
OFF
M3
OFF
OFF
If the switch has reported critical sensor data for
temperature or voltage, the ShMM can prevent the
switch from booting. To determine if the critical
sensor events persist, it may be necessary to alter the
rules enforced by the ShMM to allow the switch to
boot (receive back-end power). See Accessing the
ShMM section, below, for more information
Check the Shelf Managers to see if the voltage and
temperature sensors are within threshold (see the
Accessing the ShMM section below for how to
retrieve these values or detect critical thresholds
exceeded events), this indicates an internal hardware
fault and inoperable IPMC. Replace the switch.
Follow the other troubleshooting tips in the
Accessing the ShMM section below to determine if
exceeded thresholds are specific to the switch, or
might indicate a chassis-level problem (if other FRUs
are also reporting similar events).
M4
OFF
ON
Ethernet Switch Blade User's Guide
Switch Operational State. Try connecting into the
release 3.2.2j
page 151
FRU State
HotSwap
LED Status
Healthy
LED Status
Solution
switch through a console cable.
If OpenArchitect is running, and abnormal behavior
is occurring, please see Network Configuration
Problems for information on network issues.
If OpenArchitect cannot be accessed through the
console port, please see Troubleshooting a Failed
OpenArchitect Load.
Table 11.1: Troubleshooting States
Accessing the ShMM
If the Ethernet Switch Blade has not successfully booted, it will not be accessible via a remote
connection. Some of the following procedures will require local access to the switch to examine
the hardware, and possibly a locally attached serial console connection.
Access to the ShMM (either via a remote telnet session or a local console) will also be used to
gather some additional information about the state of the switch and other FRUs in the chassis. If
a remote connection to the ShMM can be established, it is possible to collect some preliminary
troubleshooting data.
Consult your Chassis user’s guide for more information on logging into the Shelf Manager, and
see Verifying Communications Between the ShMM and Switch, below.
Verifying Communications Between the ShMM and Switch
Verify that the communications between the IPMC on the switch and the ShMM have been
established. Try to see if the switch will respond to the following command requests:
GetSensorReading
GedDevID
If the ShMM cannot communicate with the Ethernet Switch Blade, check other devices in the
chassis. if the Ethernet Switch Blade is the only device not responding, replace the switch.
If the ShMM can communicate with the Ethernet Switch Blade, see the Critical Threshold Error
Reported section, below.
Critical Threshold Error Reported
The ShMM may have been configured to remove back-end power from a FRU reporting critical
Ethernet Switch Blade User's Guide
release 3.2.2j
page 152
sensor information. Examine the System Event Log (SEL) on the ShMM and determine if critical
sensor events have been logged for the switch in question.
If the switch has reported critical sensor data for temperature or voltage, the ShMM can prevent
it from booting. To determine if the critical sensor events persist, it may be necessary to alter the
rules enforced by the ShMM to allow the switch to receive back-end power and boot (see the
ShMM documentation for instruction).
If other FRUs, in addition to the failed switch, are reporting similar critical sensors, such as
temperature or voltage, this may indicate a chassis-related failure (such as fans or power supply).
Voltage: If the Ethernet Switch Blade continues to report voltage critical threshold error after
changing the rules in the ShMM to allow it to receive power, then return the switch for repair.
Temperature: If the Ethernet Switch Blade continues to report a temperature critical temperature
error after changing the rules, check the fans to make sure that there is sufficient airflow to the
switch. If airflow is sufficient, and the temperature threshold is still reported, then return the
switch for repair.
Analyzing Mstate information for the switch
The SEL will also contain Mstate information for the switch that can be useful in determining
conditions related to a failed boot. Knowing the state change transition history in the SEL can
help to narrow down activation problems with the switch. The states are defined as follows:
M0 – No power and hot swap handle open
M1 – No communications. Wait in M1 until hot swap ejector is closed.
M2 – FRU announces its presence to the ShMM and awaits activation permission
M3 – Activation
M4 - Operational state; command issued to enable back-end power.
M5 – Deactivation request, such as hotswap ejector opened
M6 – Deactivation granted by ShMM
M7 – Unexpected loss of communication between FRU and ShMM
The information in the SEL will mostly reflect problems that can be related to the IPMC functions
of the switch. Other problems related to loading the switch software environment during boot
might require further analysis of the switch itself.
Checking the ekey Status From the Shelf Manager
You can check the ekey status from the shelf manager with the clia board <x> command.
Shelf Manager commands:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 153
clia board -v 7
or
clia board -v 8
These commands generate an output that reports if the ShMM thinks it has granted access to
ports on the switches. Check the Shelf Manager User’s Guide for the expected output.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 154
Chapter 12 Troubleshooting a Failed OpenArchitect
Load
The OpenArchitect operating system is loaded from the FlashROM memory into RAM when the
Ethernet Switch Blade is activated by the Shelf Manager. If there is a problem with the loading
of OpenArchitect due to a hardware failure or corrupt file system, the back-up image can help to
troubleshoot the condition. The following chapter provides tips to troubleshooting a failed
OpenArchitect load.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 155
Ethernet Switchblade has
been enabled by the
ShMM and starts to boot
Bootloader examines the
bootstring in the Boot
ROM
Determines if
the bootstring
is dev 1
Determines if
the bootstring
is dev 2
Loads image from Flash
device 1
Loads image from Flash
device 2
Cannot boot
OpenArchitect
Begins execution of RAM
image
Figure 12.1: OpenArchitect Boot Process
The Ethernet Switch Blade is equipped with a Random Access Memory (RAM) disk and three
Read-Only Memory (ROM) devices, including, a boot ROM and two application flash
Ethernet Switch Blade User's Guide
release 3.2.2j
page 156
Figure 12.2: ROM Devices in OpenArchitect
The boot ROM is located on device 0 and contains the OpenArchitect zmon application that
operates as a boot loader and includes a device bootstring. Device 1 contains the application flash
1 image of the Linux operating system and the OpenArchitect overlay file system. Application
flash 1 is the primary working image for the switch. Device 2 contains the application flash 2 that
is an exact copy of application flash 1. You would only boot from this device if application flash
1 is corrupted and you need to restore the switch to the factory shipped configuration.
Recovering from a System Failure
If the switch does not function after you initially change or reconfigure the image, you have
several options for recovering from an error. First, try to telnet into the switch. If you are
successful, remember to run zsync after fixing the problem.
After attaching the system console cable, if the system boots, fix the problem that does not allow
you to telnet to the box, run zsync, and reboot. The problem is likely to be in the configuration
files contained in /etc/rcZ.. In order to telnet into the box, there must be a configured
interface with a proper IP address. For example, zhp0 is configured with the IP address 10.0.0.42
in the factory default configuration.
If you cannot telnet, attach a console cable and Modem Eliminator Cable to the switch. A
console cable ( PN A6900-63006) is included with each HP bh5700 ATCA 14-Slot Blade Server,
and a Modem Eliminator Cable should be obtainable locally. You can also obtain the correct
console cable from your Hewlett-Packard sales representative. Bring down the system and
Ethernet Switch Blade User's Guide
release 3.2.2j
page 157
properly attach the console cable.
Booting Without the Overlay File
If you cannot telnet into the switch and Linux fails to boot, it is likely that a change saved by
zsync has left the switch in an inaccessible state. To allow users to recover from mistakes saved
in the overlay file system, a boot argument of –i passed to the init process will stop the
untarring of the saved overlay files. As a result, the system boots to the factory-shipped
configuration.
1. Connect through the console port. During boot up, the system displays the Linux boot
string. Linux/PPC load: for 5 seconds. During the 5 second pause, enter the boot option
-i and press Return
Linux/PPC load: root=/dev/ram init=/sbin/init -i
2. Initiating the –i option of zbootcfg.
zbootcfg –d 1 –i
3. Reboot the system. After the reboot, clear the –i option from the boot string. Enter the
following command:
zbootcfg –d 1
4. The reboot command will also take -i as an option and pass it to the Linux boot,
reboot -i
5. When the system boots, the overlay file system is returned to the factory-installed
configuration. At this point, you have a few options.
Caution: All changes you have made and saved prior to the zsync command
will be lost when the command executes.
a) Enter zsync, and the factory-installed system will be restored to your flash.
b) Restore particular files from the existing overlay. Use the zmnt command to mount
the overlay in a designated directory and copy back just the changes you want to keep
from the existing overlay.
For example, if you wanted to recover your /etc/hosts file from the existing
overlay, use zmnt to mount the overlay in a designated directory, like /tmp,
then copy /tmp/etc/hosts to /etc/hosts. Lastly, use zsync to save your
changes (as follows).
zmnt /tmp
cp /tmp/etc/hosts /etc/hosts
zsync /etc/hosts
6. Reboot the system.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 158
If the switch still is unable to boot, see Booting the Duplicate Flash Image, below.
Booting the Duplicate Flash Image
Another recovery method, if Linux fails to boot, is to temporarily boot the factory-installed
duplicate image located in the second flash device.
1. Connect through the console port.
2. When you see the number counter appear after the zmonitor ... banner, press any key
on the console keyboard to enter the zmon application.
3. At the monitor prompt, type: boot:2
4. You should see the counter again, but the system should boot into the secondary kernel. If
you have difficulties booting, contact Hewlett-Packard technical support.
5. At this point, follow the Upgrading the OpenArchitect Image section to put a new RAM
disk image in the application flash 1.
IMORTANT: Do not program flash 2, since this is currently your only bootable
image. The command to program flash 1 should be similar to the following
command. The image name may be slightly different depending on the model of
switch and version of the image.
Base Interface: zflash -d 1 rdr6000.zImage.initrd
Fabric Interface: zflash –d 1 rdr7100.zImage.initrd
NOTE: If the duplicate flash image cannot be loaded, remove and return switch for
repair.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 159
Chapter 13
Network Configuration Problems
Many reported problems on a booted switch will ultimately be traced back to user errors in the
layer 2 or layer 3 switch configuration. In some cases, symptoms from an improperly configured
switch can masquerade as potential hardware problems.
Interface Overview
On startup OpenArchitect creates interfaces for all Ethernet ports on the Ethernet Switch Blade.
The three types of interfaces within the OpenArchitect environment are:
•
zhp - Host Port: A zhp interface is associated with one VLAN (Virtual Local Area
Network). zhps can contain one or more physical interfaces (zres) to create a private
network that does not let traffic cross interfaces outside of the VLAN.
•
zre - Raw Ethernet: An interface that represents a physical port on the OpenArchitect
switch.
•
zrl – Trunk of Ports (Link Aggregation)
•
eth - Each switch, fabric and base, in a Ethernet Switch Blade Series unit has Out-ofBand (OOB) Ethernet port on the front panel. These are an alternative maintenance port
supplying Ethernet connectivity instead of serial connectivity and are connected only
when performing switch maintenance activities.
Physical Interfaces
The tables below provides a translation guide for mapping the zre ports to the termination point.
Table 13.1 details the zre interface to the slot in the backplane that it terminates. Table 13.2 lists
all other Ethernet Switch Blade interfaces including management, and egress ports.
Table 13.1: Ethernet Switch Blade Backplane Interfaces (zre Ports)
Physical
Slot
1
2
3
4
5
6
Base
20
10
8
6
4
3
Fabric
Port 0
40
36
32
28
16
8
-
Fabric
Port 1
41
37
33
29
17
9
18
10
Fabric
Port 2
Ethernet Switch Blade User's Guide
7
8
9
10
11
12
13
14
15
16
0
1
3
5
7
9
11
21
-
0
4
12
24
30
34
38
42
-
-
1
5
13
25
31
35
49
43
-
-
2
6
14
26
23*
release 3.2.2j
page 160
Physical
Slot
1
2
Fabric
Port 3
3
4
5
6
7
8
9
10
11
12
19
11
-
-
3
7
15
27
13
14
15
16
51**
Fabric
* Base Interface Inter-Switch Link (ISL)
** 10 Gigabit Ethernet Fabric Interface - Update Channel
Table 13.2: Additional Interfaces
Additional Interfaces
Base
Fabric
12
20
-
21
14
22
15
23
Shelf Manager 1 (zre)
22
-
Shelf Manager 2 (zre)
13
-
Front Panel Out-of-Band Management Port (eth)
0
0
Front Panel Egress (zre)
NOTE: The Out-of-Band (eth) ports are not enabled by default. Edit the S50layer2
script or ifconfig eth0 for front panel access or eth1 for rear panel access (not
implemented this release) on either the Base or Fabric Interfaces.
Default Base Interface Configuration
Editing the S50layer2 script can change the Ethernet Switch Blade Base Interface default
configuration. The S50Layer2 script and included example scripts
(/etc/rcZ.d/examples) can be used as templates to create custom scripts. The default
S50layer2 scripts configures the switch accordingly:
24 port, Layer 2 Switching, single VLAN
1. S20stack - Script that calls zstack to combine the two BCM5695 twelve-port switch
fabric chips into a single 24 port virtual switch. zstack must be run before any other
switch configuration. (Editing this script is not recommended.)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 161
2. S30e1000 - Script that loads the e1000 driver module for the Out-of-Band Ethernet
ports. (Editing this script is not recommended.)
S40vpd - Script that checks the current OA version, and loads into the Vital Product
Data (VPD) area if necessary. (Editing this script is not recommended.)
3. S50layer2 - Script that sets up a basic Layer 2 switch. All 24 10/100/1000 ports are set
up on one IP network (VLAN).
Figure 13.1: Default Base Interface Network Diagram
Ethernet Switch Blade User's Guide
release 3.2.2j
page 162
OpenArchitect login: root
sh-2.04# ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16144 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
zhp0 Link encap:Ethernet HWaddr 00:11:65:09:E0:18
inet addr:10.0.0.42 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:281 errors:0 dropped:0 overruns:0 frame:0
TX packets:29 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:30018 (29.3 Kb) TX bytes:2027 (1.9 Kb)
Base address:0xc000
sh-2.04#
Default Fabric Interface Configuration
Editing the S50layer2 script can change the Ethernet Switch Blade Fabric Interface default
configuration. The S50Layer2 script and included example scripts
(/etc/rcZ.d/examples) can be used as templates to create custom scripts. The default
S50layer2 script configures the switch accordingly:
1. S20stack - Script that calls zstack to combine the two BCM56504 24-port switch fabric
chips into a single 48 port virtual switch. zstack must be run before any other switch
configuration. (Editing this script is not recommended.)
2. S50layer2 - Script that sets up a basic Layer 2 switch. All 48 ports are set up on one VLAN.
This configuration script is appropriate for an Ethernet Switch Blade. It may need to be modified
for other models.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 163
Figure 13.2: Linux Networking Environment Interfaces
ifconfig Default Screen Output for the Base Interface
[ZX7100-OA3.2.2h]# ifconfig
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16144 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)
zhp0 Link encap:Ethernet HWaddr 00:11:65:0B:C0:38
inet addr:10.0.0.42 Bcast:10.0.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:488 errors:0 dropped:0 overruns:0 frame:0
TX packets:377 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:32716 (31.9 Kb) TX bytes:92079 (89.9 Kb)
Base address:0x5000
[ZX7100-OA3.2.2h]#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 164
Configuration Troubleshooting
Problem
Solution
No Connection
Physical Link problem. Check to see if the port LED is lit. If
the LED port is not lit, then you may have a bad cable
connection.
OR
Configuration Error. Connect through the console port (See
Chapter 10). Use the ifconfig command to see all of the
configured interfaces on the Ethernet Switch Blade and their
IP addresses. Check to make sure that the IP address of the
switch and the subnet mask are properly set. Also, the section
on No Connection for more information.
Decreased throughput
You should make sure that your network topology contains no
data path loops. Between any two end nodes, there should be
only one active cabling path at any time. Data path loops will
cause broadcast storms that will severely impact your network
performance. See the section on Diminished Network
Throughput.
High Error Rates, Decreased
Throughput
If the interfaces on the Ethernet Switch Blade and the node
board are not set to auto by default, the link may not negotiate
correctly. See the section on Connecting to Devices with Fixed
Port Speeds.
Ext FLT LED on
The EXT FLT LED indicates that communications could not
be established with one or more remote partner devices on an
active port or ports. Ports which were configured to be up (via
ifconfig), but do not have remote partner devices attached,
can cause the EXT FLT LED to be lit, even if there are no
hardware problems with the switch. See the Network
Configuration Problems chapter for more information
Ekey Disabled
By default, if there is no device plugged into a node slot in the
ATCA chassis, the Shelf Manager will ekey disable the port.
If a node board is inserted and the ports on the Ethernet Switch
Blade are still ekey disabled by the Shelf Manager, check to
see if the node board has been properly activated by the Shelf
Manager.
Determining ekey status for a specific slot
You can determine the current state of any link by using the zlc command. The zlc command
will output the current status of any/all Ethernet ports and their ekey status that was set by the
Shelf Manager.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 165
The following table will translate the zlc output to link status.
Link
Zre
(x)
Port Status
EKEY_DISA
BLED
Link Speed
Auto
Pause
Enable
EKEY_ENAB 1000fd
LED
Faults
Internal Fault
OK
ON
Disable External Fault
UP
1000hd
DOWN
100fd
100hd
10fd
10hd
Link: zre(X) – physical interface
Shelf Manager Status: EKEY_DISABLED - A slot or device that has been disabled by the Shelf
Manager.
EKEY_ENABLED - A slot or device that has been enabled by the shelf
manager and enabled by the Ethernet Switch Blade
switch.
UP – The port has been configured to be active and has established a link
with another network device.
DOWN – The port has been configured to be active but has not establish a
link with another network device.
Link Speed: Auto – Auto Negotiate with the other device. The node device
must be configured to Auto Negotiate as well or network connectivity errors
will occur.
1000fd – Full Duplex Gigabit Ethernet
1000hd – Half Duplex Gigabit Ethernet
100fd – Fast Ethernet Full Duplex
100hd – Fast Ethernet Half Duplex
10fd – Ethernet Full Duplex
Ethernet Switch Blade User's Guide
release 3.2.2j
page 166
10hd – Ethernet Half Duplex
Pause: Enable: a port that can temporarily suspend the data transmission
between two network devices in the event that one of the devices becomes
congested. Pause enabled devices can reduce bottlenecks by making the
network more efficient.
Disabled: The pause feature is not enabled and will continue to
transmit traffic when even when the receiving device is
busy.
Faults: Internal: An internal fault indicates a severe hardware failure
External: An external fault indicates that a port has been configured to active,
but a link has not been established.
OK: ON: Indicates that the Ethernet Switch Blade has successfully loaded
OpenArchitect.
Querying Base Interface ekey Status
Link Status for a single port
To query a link status for a single port type zre<x> query for example:
zlc zre13 query
Example Output:
sh-2.04# zlc 13 query
zre13: <UP, 100FD, PAUSE ENABLE OFF, OK ON>
sh-2.04#
Link Status for a range of ports
To query the link status for a range of ports type zre<x>..<x> query for
example:
zlc zre0..23 query
Example Output:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 167
sh-2.04# zlc zre0..23 query
zre0: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre1: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre2: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre3: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre4: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre5: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre6: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre7: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre8: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre9: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre10: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre11: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre12: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre13: <UP, 100FD, PAUSE ENABLE OFF, OK ON>
zre14: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre15: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre16: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre17: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre18: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre19: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre20: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre21: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre22: <UP, 100FD, PAUSE ENABLE OFF, OK ON>
zre23: <UP, 1000FD, PAUSE ENABLE ON, OK ON>
sh-2.04#
NOTE: this is the zlc output for a single Ethernet Switch Blade Base Interface in the
default configuration with no line cards installed in the chassis.
Querying Fabric Interface ekey Status
Link Status for a single port
To query a link status for a single port type zre<x> query. For example:
zlc zre13 query
Example Output:
[ZX7100-OA3.2.2h]# zlc zre13 query
zre13: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
[ZX7100-OA3.2.2h]#
Link Status for a Range of Ports
To query the link status for a range of ports type zre<x>..<x> query. For example:
zlc zre0..51 query
Example Output:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 168
[ZX7100-OA3.2.2h]# zlc zre0..51 query
zre0: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre1: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre2: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre3: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre4: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre5: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre6: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre7: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre8: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre9: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre10: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre11: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre12: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre13: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre14: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre15: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre16: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre17: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre18: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre19: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre20: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre21: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre22: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre23: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre24: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre25: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre26: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre27: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre28: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre29: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre30: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre31: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre32: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre33: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre34: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre35: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre36: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre37: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre38: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre39: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre40: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre41: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre42: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre43: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
zre44: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre45: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre46: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre47: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre48: <STOPPED, 10GFD, PAUSE ENABLE, EXT_FLT ON, INT_FLT
zre49: <STOPPED, 10GFD, PAUSE ENABLE, EXT_FLT ON, INT_FLT
zre50: <STOPPED, 10GFD, PAUSE ENABLE, EXT_FLT ON, INT_FLT
zre51: <DOWN, 10GFD, PAUSE ENABLE, EXT_FLT ON, OK ON>
[ZX7100-OA3.2.2h]#
Ethernet Switch Blade User's Guide
release 3.2.2j
ON>
ON>
ON>
ON>
ON>
ON>
ON>
ON>
ON>
ON, OK ON>
ON, OK ON>
ON, OK ON>
page 169
Network Connectivity Troubleshooting
No Connection
If the port LED is lit on the front panel, the switch has established a physical connection and the
problem is a network configuration error. Check to see if both devices are configured to be on the
same network (ex. 10.0.0.xxx) and that the subnet mask is set correctly.
Diminished Network Throughput
Depending on how the switch is configured, throughput problems can reflect configuration errors
in the network topology.
If the Spanning Tree Protocol (STP) is not enabled, it is possible for broadcast loops to occur
which can stress the network. The tcpdump (See the Ethernet Switch Blade User’s Guide for
more information) utility is a widely used network troubleshooting tool which can display
network traffic according to user-defined filters, which can help to troubleshoot problems such as
broadcast loops. The tcpdump utility is included with the switch’s OA environment.
If STP is enabled, be sure to identify ports which were automatically blocked by the STP daemon
to prevent broadcast loops.
A network diagram can be useful in isolating network loops.
Connecting to Devices with Fixed Port Speeds
Verify that switch and the connected devices are set to the same port speed setting, otherwise
diminished or no connections can be made. If devices connected to the Ethernet Switch Blade are
connected at a fixed Full Duplex, high error rates or sporadic connectivity can be observed. It is
recommended that all devices connected to the Ethernet Switch Blade be set to auto-negotiate.
External Fault LED
The EXT FLT LED indicates that communications could not be established with one or more
remote partner devices on an active port or ports. Ports which were configured to be up (via
ifconfig), but do not have remote partner devices attached, can cause the EXT FLT LED to be
lit, even if there are no hardware problems with the switch.
The OA zlc command can be used to check the status of individual ports and also to manipulate
how the EXT FLT LED is managed (globally or on a per-port basis). By default, the EXT FLT
LED is a global indicator. Use the zlc command to change how the LED is managed, to help
isolate external fault indications to a particular port.
Before acting on an EXT FLT LED indication, make sure the configuration for the switch reflects
how the chassis is populated with expected device configurations. For example, if a “default”
configuration was used on the switch to bring up all ports on startup, but not all of those ports
Ethernet Switch Blade User's Guide
release 3.2.2j
page 170
have an active remote device attached, then first bring down the ports which do not have active
connections expected to make sure there is a legitimate EXT FLT condition.
If loss of communications is suspected on an externally wired port, make sure to check and test
affected cables.
Network Tests
Ping Test
It is possible to test a network connection by using the ping command. The ping command will
send a network packet to the specified IP address and wait for a reply.
To initiate a ping test, type
ping <ip address>
If you have a network connection, a normal output would look like the following:
sh-2.04# ping 10.0.0.43
PING (10.0.0.43): 56 data bytes
64 bytes from 10.0.0.43: icmp_seq=0
64 bytes from 10.0.0.43: icmp_seq=1
64 bytes from 10.0.0.43: icmp_seq=2
64 bytes from 10.0.0.43: icmp_seq=3
64 bytes from 10.0.0.43: icmp_seq=4
64 bytes from 10.0.0.43: icmp_seq=5
64 bytes from 10.0.0.43: icmp_seq=6
64 bytes from 10.0.0.43: icmp_seq=7
64 bytes from 10.0.0.43: icmp_seq=8
64 bytes from 10.0.0.43: icmp_seq=9
ttl=109
ttl=109
ttl=109
ttl=109
ttl=109
ttl=109
ttl=109
ttl=109
ttl=109
ttl=109
time=69.094
time=69.341
time=69.363
time=69.109
time=69.859
time=68.865
time=69.180
time=69.496
time=69.373
time=70.680
ms
ms
ms
ms
ms
ms
ms
ms
ms
ms
10 packets transmitted, 10 packets received, 0% packet loss
round-trip min/avg/max/stddev = 68.865/69.436/70.680/0.486 ms
sh-2.04#
results:
sh-2.04# ping 10.0.0.43
ping: cannot resolve 10.0.0.43: Unknown host
sh-2.04#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 171
Traceroute Test
It’s possible to trace a network path using the traceroute command. The
following is an example of a Layer 2 traceroute with only two devices.
sh-2.04# traceroute 192.168.1.101
traceroute to 192.168.1.101 (192.168.1.101), 64 hops max, 40 byte
packets 1 192.168.1.101 (192.168.1.101) 1.888 ms 1.135 ms 0.814 ms
sh-2.04#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 172
Chapter 14 Isolating Hardware Failures
Figure 14.1: ATCA Base Inside View
1.
Flash
10. Switch Chip (U69)
2.
EEPROM
11. Zone 3 ATCA Connector
3.
PHY
12. Isolation Transformers
4.
CPU
13. 4-port PHY
5.
SDRAM
14. Zone 2 ATCA Connector
6.
Isolation Transformer
15. Zone 1 ATCA Connector
7.
IPMI Controller
16. Isolation Transformers
8.
Power Supply
17. 4 port PHY
9.
Switch Chip (U56)
18. Fuses
Ethernet Switch Blade User's Guide
release 3.2.2j
page 173
Figure 14.2: ZMC Daughter Card Outside View
1. Isolation Transformer
2. Zone 3 ATCA Connector
3. Isolation Transformer
4. Switch Chip (U60)
5. SDRAM
6. Switch Chip (U59)
7. Isolation Transformer
Ethernet Switch Blade User's Guide
release 3.2.2j
page 174
Figure 14.3: ZMC Daughter Board Inside View
1.
Isolation Transformer
8.
Flash ROMs
2.
4 Port PHY
9.
FPGA
3.
CPU (U22)
10. ZMC Connector
4.
10 Gigabit XFP
11. Zone 3 ATCA Connector
5.
10 Gigabit PHY
12. Power Supply
Ethernet Switch Blade User's Guide
release 3.2.2j
page 175
6.
Isolation Transformer
13. Isolation Transformers
7.
Power Supply
14. 4 Port PHY
Hardware Subsystem
In the following tables, refer to the identified component-area numbers on indicated in the
pictures in the proceeding section. The indications of malfunction may be identified either during
normal operation, or in response to a specific test. The various tests that may be initiated are
shown in subsequent sections.
The information is equally applicable to both the Base Interface and the Fabric Interface switch
subsystems unless otherwise noted.
Base
4
ZMC 0 ZMC 1
#
#
3
5
1
9, 10
Hardware
Subsystem
CPU
3
8
6, 7
Indications of Malfunction
A CPU failure may be indicated by any of the
following:
•
A failure to run the Power-On-SelfTest (POST)
•
A failure to boot the OpenArchitect
kernel
•
Kernel panics
Loss of CPU response sometime after
operation is initiated
RAM
The Ethernet Switch Blade uses SDRAM for
the primary CPU memory system. A failure of
RAM will generally cause any of the
following:
•
Kernel panics
•
Loss of CPU response
•
Unexplained software failures
ROM
The Flash ROM subsystem on the Ethernet
Switch Blade is used only on boot-up. The
contents are decompressed and copied to RAM
memory for further use. A ROM failure will
generally cause a failure in the boot process.
Switch Fabric
The Ethernet Switch Blade has four switch
fabric chips: 2 for the Base Interface with 24
ports, and 2 for the Fabric Interface with 48
ports. A Switch Fabric failure may result in
Ethernet Switch Blade User's Guide
release 3.2.2j
page 176
Base
ZMC 0 ZMC 1
#
#
Hardware
Subsystem
Indications of Malfunction
any of the following indications:
•
Error message via OpenArchitect due
to inability to access the registers
within the switch chip, or a failure of
DMA transfers.
•
Loss of switch functionality, such as
the inability to forward packets, or
forwarding packets in error.
8
Power Supply
12
3, 6, 12, 2, 4, 6,
13, 16, 13, 15
17
9
1, 2, 4
A power-supply failure will generally result in
lack of boot activity.
Ethernet PHYs An error in the PHY will generally result in
& Transformers loss of link, or Ethernet data transfer errors
such as Checksum and Frame Alignment.
Note that for ATCA 3.1 Fabric ports, there is
no separate PHY devices or transformers; the
SerDes interface in the switch chips are used
instead.
FPGA
Most FPGA failures will result in the CPU
failing the boot process.
BOOT ROM
If the Boot ROM fails or is not programmed,
there will be no boot activity on the console
port after power up.
Network Cable Network cable failures will result in loss of
link or loss of data packets.
7
IPMI Controller If the IPMI controller is not functioning, the
Ethernet Switch Blade board will not power up
when inserted into a powered-up chassis.
This condition may also result from a failure
within the ATCA Shelf Manager (ShMM) or if
the ShMM determines that the chassis cannot
support the Ethernet Switch Blade.
Testing the FlashROMs
The FlashROMs (device 1 and 2) contain compressed images of the OpenArchitect operating
system. FlashROM device 1 is the primary operating system image. If OpenArchitect has
successfully booted, then FlashROM device 1 is fully operational.
To test FlashROM device 2, set the Ethernet Switch Blade to load the backup image on reboot.
For instructions on temporarily booting from FlashROM device 2, see the section on Booting the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 177
Duplicate Flash Image. If the switch can successfully boot from FlashROM device 2, then
FlashROM device 2 is fully operational.
Testing the Switch Fabric
You can test the functionality of the switch fabric by running the zlc command. The zlc
command outputs the link status for any Ethernet Switch Blade interface.
Link Status for a single port
To query a link status for a single port type zre<x> query for example:
zlc zre13 query
Example Output:
sh-2.04# zlc zre13 query
zre13: <UP, 100FD, PAUSE ENABLE OFF, OK ON>
sh-2.04#
Link Status for a range of ports
To query the link status for a range of ports, type zre<x>..<x> query. For example:
zlc zre0..23 query
Ethernet Switch Blade User's Guide
release 3.2.2j
page 178
Example Output:
sh-2.04# zlc zre0..23 query
zre0: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre1: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre2: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre3: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre4: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre5: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre6: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre7: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre8: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre9: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre10: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
ON>
zre11: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
ON>
zre12: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre13: <UP, 100FD, PAUSE ENABLE OFF, OK ON>
zre14: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre15: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre16: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre17: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre18: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre19: <DOWN, AUTO, PAUSE ENABLE, EXT_FLT ON, OK ON>
zre20: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
ON>
zre21: <EKEY_DISABLED, AUTO, PAUSE ENABLE, EXT_FLT ON, OK
ON>
zre22: <UP, 100FD, PAUSE ENABLE OFF, OK ON>
zre23: <UP, 1000FD, PAUSE ENABLE ON, OK ON>
sh-2.04#
NOTE: This is the zlc output for a single Ethernet Switch Blade Base Interface in the
default configuration with no line cards installed in the chassis.
Testing the onboard RAM
You can test the onboard memory by running the free command. The free command will
output the current memory usage.
h-2.04#
Total
Used
Free
Shared
Buffers
Cached
254716
46816
207900
5088
10000
22664
-/+ buffers/cache:
14152
240564
wsp:
0
0
em:
0
h-2.04#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 179
If the “Used” and “Free” memory statistics do not add up to the Total memory, the software
environment may have a memory leak caused by a software error. Reboot the switch.
If the problem persists after a reboot. Run the top command to list the memory utilization of all
current processes.
sh-2.04# top
The top command can help you isolate software related memory problems to specific processes.
Example output:
Testing the Control Processor
The Base Interface and Fabric Interface control processors are critical components in the
operation of the Ethernet Switch Blade. The control processors host the OpenArchitect operating
system and manage the switch fabric devices.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 180
To test the operational status of the control processors you can do the following:
Hardware Fault
Connect to the console port of either the Base or Fabric Interface control processor (See Chapter
10 for more information).
If you cannot communicate with the Ethernet Switch Blade, the control processor may have
encountered a software error. Reboot the switch to clear the error. If the problem persists, and you
are not able to communicate with the Ethernet Switch Blade, replace and return the Ethernet
Switch Blade for repair.
Software Error
If you can successfully contact the Ethernet Switch Blade through the console port (See Chapter
10 for more information), enter the top command at the command prompt. The top command
lists the processor utilization of all current processes.
sh-2.04# top
The top command can help you isolate software related problems to specific processes.
Example output:
6.1.5
INT FLT LED activity
The INT FLT LED indicates that internal communications may have failed on the board. If the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 181
INT FLT LED is illuminated, replace the switch and return it for repair.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 182
Chapter 15 High Availability Troubleshooting
The ATCA environment will usually contain a high-availability failover configuration between
two ATCA switches in the chassis. Note that the failover features are configurable and a switch
can be directed to fail over all of its processing when a single port or link goes down, or it can
perform a port-to-port or VLAN-to-VLAN failover where both partner switches are still
processing a portion of the network traffic.
Before replacing a switch that has gone out of service because of a switch-level failover, you
need to understand how the high-availability features have been configured. If the switch failover
was triggered by a port or link failure, make sure to isolate the cause for the link failure first, to
make sure the problem is not external to the switch (for example, a bad or loose cable for a wired
port).
Spontaneous Failover Activity
If while rebooting the inactive switch in a chassis causes the active switch to reboot and/or an
unexpected failover, you can try setting the zsp.conf file vrrp_msg_rate to 500.
The VRRP_msg_rate is the time in milliseconds between transmissions VRRP messages on the
inter-switch link (ISL). The VRRP protocol requires the absence of three VRRP messages before
concluding that the remote switch has failed. The msg_rate must match the msg_rate of all
siblings. Anything other than multiples of seconds does not conform to the VRRP specification,
and will only run with the vrrpd.
Unexpected Fail-back Activity
If unexpected fail-back activity is observed check to make sure that only one switch is setup as
the Master switch (vrrpd –M option) or the switches will oscillate. See the Ethernet Switch
Blade User’s Guide for more information on setting the failover priority level.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 183
Chapter 16 Switch Firmware Overview
There are three components to the firmware on the Ethernet Switch Blade:
1. Bootloader firmware (zmon)
2. OpenArchitect firmware
3. IPMI firmware
Some hardware and software problems can be resolved by updating the firmware to the latest
version. Check the Hewlett-Packard website for the latest version (see the HP 5700 ATCA 14Slot Blade Server Installation Guide).
Checking the switch firmware version
Use the zstats –V command to output the Vital Product Data from switch memory.
zstats –V
The following output is shown for the 3.0 Base Interface:
3.0 Base Interface
sh-2.04# zstats -V
VITAL PRODUCT DATA: Open Architect Advanced TCA Switch
PN = 700-0170-003
SN = 01BZ0X041HWP
V0 = 00116509E000
V1 = 1021
V2 = 3
V3 = Z
V4 = 1
V5 = 1
V9 = 2
EC = 00000000
VA = 3800
RV = 0x73
V7 = 0
V8 = 0
V6 = 3.2.2 build g
VS = 5
VP = 1.70
VM = 19
VR = 4
VZ = 4.42
RW = 192 bytes
VITAL PRODUCT DATA[1]: Port map ZX7120-HP
VC = 030405060708091011121314H1
VD = F0S2F2F3R0R1R2R31516S102H2
VE = 224223432444254526462747H3
VF = 284829493031323350513435H4
VG = H3NCH1NCNCH2NCH4
V9 = 2
RV = 0x3b
RW = 9 bytes
sh-2.04#
Ethernet Switch Blade User's Guide
release 3.2.2j
page 184
Key:
PN: Base Interface Switch Assembly Number
SN: Base Interface Switch Serial Number
V6: OpenArchitect Version Number
VP: IPMI Firmware Version
VZ: BootLoader Version Number
The following output is shown for the 3.1 Fabric Interface:
3.1 Fabric Interface
[ZX7100-OA3.2.2h]# zstats -V
VITAL PRODUCT DATA: Open Architect Advanced TCA Fabric
Switch
PN = 700-0174-002
SN = 01CS00029HWP
V0 = 0011650BC000
V1 = 1021
V2 = 3
V3 = Z
V4 = 1
V5 = 1
V9 = 2
EC = 00000000
VA = 3C00
VE =
2242628223436383244464842545658526466686F4F5F6F7H1H2F8R8
VF =
2747678728482949305031513252335334543555R4R5R6R7H1H2UCF9
Y6 = US45490039
Y7 = A-4546
RV = 0xba
V7 = 0
V8 = 0
V6 = 3.2.2 build h
VM = 3
VZ = 4.42
RW = 61 bytes
[ZX7100-OA3.2.2h]#
Key:
PN: Base Interface Switch Assembly Number
SN: Base Interface Switch Serial Number
V6: OpenArchitect Version
VZ: BootLoader Version
Ethernet Switch Blade User's Guide
release 3.2.2j
page 185
Updating the Switch Firmware
Currently, the OpenArchitect and bootloader components are the only upgradeable firmware on
the Ethernet Switch Blade. Upgrading the IPMI software is not currently supported.
BootLoader Firmware Upgrade:
1. Download the bootloader image to a local system.
2. FTP the bootloader image from the local system to your switch.
3. Use the zflash command to write the new elgoro/zmon image into the boot flash
device. Be sure and use device 0, not device 1 or 2.
zflash -d 0 <bootloader image name>
OpenArchitect Firmware Upgrade:
1. Refer to the HP bh5700 14-Slot Blade Server Installation Guide, Chapter 6, 14-Slot Shelf
Startup, Validating and Updating Your Firmware for instructions on how to gain access
to firmware updates for the HP bh5700 14-Slot Blade Server.
2. Use telnet, or preferably, attach a console cable to the switch, and login to the switch.
IMPORTANT: If you are connecting via telnet, be aware that the upgrade
process will reset the switch to the default IP address of 10.0.0.42, so you will have to
be able to reach 10.0.0.42.
3. Using the procedures referenced in Step 1, above, download the OpenArchitect image
upgrade to a local system.
4. Check for free space with the df command. The OpenArchitect image is very close to
the limit of free space available on a default system, so you may need to clear some space
prior to downloading the new OpenArchitect image to the switch.
CAUTION: Do not remove the existing copy of /usr/sbin/gated (as suggested
below) until you have, in fact, determined that an OpenArchitect upgrade version is
available for downloading.
One of the easiest ways to create free space is to remove /usr/sbin/gated, as the
application will be replaced during the update procedure. Once you have enough free
space, proceed.
5. From the switch console, ftp the new OpenArchitect image from the local system to
your switch.
The switch has two flash devices available: device 1 and device 2. Use the zflash
command to write the new OpenArchitect image into the first flash device.
IMPORTANT: Make sure that Surviving Partner is not running before using
zflash. The delays incurred while zflash writes the flash can cause the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 186
Surviving Partner daemons to think there is a failure, resulting in link oscillation.
Base Interface: zflash -d 1 rdr6000.zImage.initrd
Fabric Interface: zflash –d 1 rdr7100.zImage.initrd
IPMC Firmware Upgrade:
Upgrading the IPMC Firmware through OpenArchitect is not currently supported.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 187
Chapter 17 Restoring the Factory Default
Configuration
You should use this procedure if the contents in Flash Device 1 are corrupt and you need to
restore the switch to the factory default configuration. By restoring the factory default
configuration, you will overwrite your main file system in Flash Device 1 and lose all previous
configuration changes.
IMPORTANT: Make sure that Surviving Partner is not running before using zflash. The
delays incurred while zflash writes the flash can cause the Surviving Partner daemons to
think there is a failure, resulting in link oscillation.
1. Connect through the console port. For more information, see Chapter 10, Connecting to
the Ethernet Switch Blade.
2. When you see the number counter appear after the zmonitor ... banner, press any key
on the console keyboard to enter the zmon application.
3. At the monitor prompt, type:
boot:2
You should see the counter again, but the system should boot into the secondary kernel.
NOTE: If you have difficulties booting into Flash Device 2, the Flash Devices may
have failed. Return the switch for repair.
Use the zflash command to write the new OpenArchitect image into the first flash
device:
Base Interface: zflash -d 1 rdr6000.zImage.initrd
Fabric Interface: zflash –d 1 rdr7100.zImage.initrd
CAUTION: DO NOT program flash 2, since currently this is your only bootable image.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 188
Chapter 18 Before Calling Support
Because of the highly customized configurations that can be applied by customers to their ATCA
switch environment, the focus must be on data collection to get a snapshot of the current switch
configuration and network traffic activity. If support is needed, it is necessary to gather the
following information for further diagnosis before calling support:
1. Get a network diagram showing key devices, VLANs, ports, IP addresses, packet flow,
and amount and type of traffic (TCP, UDP, Broadcast, Multicast).
2. MAC addresses are useful as well.
3. Document a repeatable test case to reproduce the problem.
4. Obtain configuration scripts (the S* scripts run from the /etc/rcZ.d directory when
the switch first boots).
5. Run /etc/rcZ.d/tools/support_script.sh and capture the output before and
after the problem occurs. Navigate to the /etc/rcZ.d/tools directory and run the
following script:.
/support_script.sh
NOTE: The support script may take up to 10 minutes to output a report.
6. Copy and paste the output into a .txt file for support personnel.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 189
Offset 0
Offset 0
zmon
Free space
Application
Flash 2 on
Device 2
Application
Flash 1 on
Device 1
Boot ROM
on Device 0
initrd
Linux and
its file
system
initrd
(exact copy
as in
Application
Flash 1)
Linux and its
file system
Offset 7f000
dev
bootstring
Free space
Free space
overlay
file system
Figure 18.1: ROM Devices in OpenArchitect
The boot ROM is located on device 0 and contains the OpenArchitect zmon application that
operates as a boot loader and includes a device bootstring. Device1 contains the application
flash1 image of the Linux operating system and the OpenArchitect overlay file system.
Application flash1 is the primary working image for the switch. Device 2 contains the
application flash 2 that is an exact copy of application flash 1. You would only boot from this
device if application flash1 is corrupted and you need to restore the switch to the factoryshipped configuration.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 190
Appendix A Fabric Switch Command Man Pages
OpenArchitect applications are implemented above the OpenArchitect libraries and the RMAPI
interface. OpenArchitect applications are used for normal operation of the switch, for runtime
status and diagnostics, and for prototyping new applications development.
For runtime operation, the OpenArchitect applications perform initialization and configuration,
and real-time control and maintenance of the switching tables in the switch silicon. Protocol
support is performed by the Linux operating system. In turn the OpenArchitect applications
communicate with Linux to determine the appropriate switch table setup.
The initialization of the switch is completed by the zconfig application. Through configuration
scripts, the user can setup any combination of Layer 2 and Layer 3 switching configurations with
VLAN support. Running the zconfig command causes network interfaces to be presented to
the Linux operating system. These interfaces can be setup for Layer 2 bridging functions such as
Spanning Tree Protocol, or Layer 3 routing through the Linux operating system.
zl2d is run as a daemon to monitor the Linux operating system bridging function and update the
switch silicon accordingly.
zl3d is run as a daemon to monitor the Linux operating system routing table information and
update the switch silicon switching tables accordingly.
For gathering statistics or prototyping applications, there are OpenArchitect applications that
allow any register or table in the switch to be read or written. These applications include zreg,
ztats, and zarl and all of the different table equivalents.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 191
vrrpconfig
NAME
vrrpconfig – Configure and control the running vrrpd
SYNOPSIS
vrrpconfig [-d <level>] -- <vrrpd parameters>
vrrpconfig [-d <level>] [-k] [-a] [-p] [-s <vid>]
DESCRIPTION
vrrpconfig provides communication with a running vrrpd daemon. The -- option for
vrrpconfig will pass all parameters to vrrpd as would be done when starting the vrrpd.
Any output generated by vrrpd is displayed on the vrrpconfig controlling tty. Any action
normally taken by vrrpd for the given parameter is done so by vrrpd. Reference vrrpd for
vrrpd parameters and their usage.
OPTIONS
vrrpconfig also has a set of local options that are not passed to vrrpd directly. Many do,
however, retrieve information from the running vrrpd. The local options are as follows:
-d <level> Set the debug level. The default debug level is 1.
The higher the level, the more debugging output is produced.
Debugging output is sent to the controlling tty. This
debugging output is from vrrpconfig. To set the debug level
of vrrpd, one would use the vrrpd debug level setting option
placed after the -- in the vrrpconfig command line.
-a
Display in a user readable format, information about the
current state of all the Virtual Routers controlled by vrrpd.
-k
Kill vrrpd. The entire daemon is killed.
command will require that vrrpd be restarted.
-p
Running this
Display relevant SNMP table values.
-s <vid>
Print a numeric representation of the state of the
Virtual Router associated with the Virtual Router Identifier
<vid>. The numeric representations are 1 = INIT, 2 = BACKUP,
and 3 = MASTER
Ethernet Switch Blade User's Guide
release 3.2.2j
page 192
EXAMPLES
Here is an example of using the -- invocation method that changes the priority to 99 for the
Virtual Router associated with the Virtual Router Identifier 1:
vrrpconfig -- -v 1 –p 99
SEE ALSO
vrrpd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 193
vrrpd
NAME
vrrpd – Virtual Router Redundancy Protocol Daemon
SYNOPSIS
vrrpd -i ifname -v vrid [-f piddir] [-s] [-a auth] [-p prio]
[-nhb] [-I ifname] [-d delay] [-m address] [-M ] [-B]
[-S script] [-c conf_file] [-D level] ipaddr
DESCRIPTION
vrrpd is an implementation of Virtual Redundant Routing Protocol (VRRPv2) as specified in
RFC2338. It runs in Linux user space. In short, VRRP is a protocol that elects a Master server on
a LAN to which the Master answers to a virtual IP address. If it fails, a Backup server takes over
the IP address.
VRRP specifies an election protocol that dynamically assigns responsibility for a virtual router to
one of the VRRP routers on a LAN. The VRRP router controlling the IP address(es) associated
with a virtual router is called the Master, and forwards packets sent to these IP addresses. The
election process provides dynamic failover in the forwarding responsibility should the Master
become unavailable. This allows any of the virtual router IP addresses on the LAN to be used as
the default first hop router by end-hosts. The advantage gained from using VRRP is a higher
availability default path without requiring configuration of dynamic routing or router discovery
protocols on every end-host.
OPTIONS
The following options are supported by vrrpd:
-h
display the usage line
-n
Don’t use the virtual MAC address
-b
Run vrrpd in foreground
-i <ifname>
the interface name on which to run the Virtual
Router. Machines connected with the named interface will see
the Virtual Router Address move with the Master switch
-I <ifname>
the interface name on which to communicate
with other VRRP routers for management of the Virtual Router
(default is the -i interface)
-v <vrid> the id of the Virtual Router Identifier [1-255].
This value must be a unique value, one per Virtual Router. In
other words there is a unique vrid to ifname associated with
Ethernet Switch Blade User's Guide
release 3.2.2j
page 194
the –i option.
-s
Toggle preemption mode (Enabled by default). Preemption
means that a Master switch will go to Backup if a current
Backup has higher priority.
-M
Become MASTER when priority is equal. Be sure it is only
set on one host or the switches will oscillate. Must set –B
option on other hosts (requires preemption mode ! -s)
-B
Become BACKUP when priority is equal.
-S <script>
See -M option
script to be called when state change occurs.
-a <auth> (not yet implemented) set the authentication type
auth=(none|pass/hexkey|ah/hexkey) hexkey=0x[0-9a-fA-F]+
-p <prio> Set the priority of this host in the virtual server
(default is 100)
-f <piddir>
specify the directory where the pid file is
stored (default is /var/run)
-c <conf> Configuration read from conf file (required when
managing multiple Virtual Routers). The contents of the conf
file are lines of command line options. Each line represents
a Virtual Router. Parameters given on the command line apply
to all Virtual Routers defined by the conf file. So for
example, if the command line reads:
vrrpd –d 50 –c vrrpd.conf
And the vrrpd.conf file contains:
-v 1 –i zhp0 –I zhp3 10.0.0.43
-v 2 –i zhp1 –I zhp3 11.0.0.42
vrrpd would be started controlling two Virtual Routers; one
for 10.0.0.43 and the other for 11.0.0.42. They would both
get a –d 50 option.
-d <delay> Set the advertisement interval. Default is 1
second. By default time is specified in seconds. If the
delay value ends in a lower case ‘m’ the time is specified in
milliseconds. The millisecond specification results in a
proprietary use of the VRRP Adver Int field.
-m <address>
Change the virtual MAC address from
00:00:5E:00:01:<vid> to the provided addr. The addr should be
input as 6 two digit hex numbers that are colon delimited with
no spaces. The –n option overrides the change made with –m.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 195
The result of which to use the native MAC address of the
interface. Using the –n option is not recommended.
-D <level> Set debugging output to the supplied level
<ipaddr>
the ip address(es) of the virtual server
SEE ALSO
vrrpconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 196
zbootcfg
NAME
zbootcfg − Modifies the boot parameters of the OpenArchitect switch.
SYNOPSIS
zbootcfg -a | -d <device number> [<boot_string>]
DESCRIPTION
zbootcfg is used to display or modify the boot parameters on the switch. The boot parameters
are utilized by the minof boot loader application to indicate on which device to find a boot image.
Care should be taken when changing the boot string. Incorrect procedures can result in a switch
that cannot boot.
-d <device_number>
Three ROM boot devices are available in the switch. The
factory-shipped boot device is 1. The following describes each
boot device:
-d 1 Boot image located at offset 0 in the application flash
1. This is the factory-shipped location of the primary
OpenArchitect image.
-d 2 Loads an image located at offset 0 in the application
flash 2. This is the factory-shipped location for the
alternate OpenArchitect image.
Any characters after the -d <dev> parameters are saved in flash memory and passed unchanged
to the booting kernel.
OPTIONS
-a
Displays the current boot string. The default factory
shipping string is “dev1.”
-d <dev>
Specifies the ROM device from which to boot. The
<dev> value must be the number 1 or 2 corresponding to
application flash 1 or application flash 2, respectively.
<boot string>
Optionally, a boot string may be provided that
is passed to the booting kernel. All characters after the -d
<dev> are passed unchanged to the booting kernel.
EXAMPLES
The following example illustrates a command for making the image boot from the second
Ethernet Switch Blade User's Guide
release 3.2.2j
page 197
application flash. Typically this is required before updating application flash 1. By booting the
alternative image, if a failure occurs during the programming of application flash 1, recovery is
easier.
zbootcfg -d 2
The next example passes the -i option to the booting kernel. This is useful when recovering from
a mistake saved to the read-write file system or after updating the application flash 1 and doing
the first boot. The -i option prevents the read-write file system from overwriting the initial RAM
disk image.
zbootcfg -d 1 –i
SEE ALSO
zflash, reboot(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 198
zconfig
NAME
zconfig - Configures the OpenArchitect switch.
SYNOPSIS
zconfig [-h <host_name>] [-d <level>] [-a] [-t] [{-f <file>} |
<configuration>]
DESCRIPTION
zconfig creates Virtual Local Area Network (VLAN) groups of switch ports or trunks. Each
VLAN group forms a Layer 2 switching domain. Each VLAN group has a VLAN Identification
number (VID) that can be carried in a tag field, located in the header of packets traveling on that
VLAN. The configuration of a port determines whether a packet transmitted from that port
includes the VLAN tag. A set of up to eight ports may be configured as a trunk, with all links
from these ports connect to the same link partner. For each VLAN group created by zconfig, a
network interface is also created. After the network interface is started by ifconfig(1M), the
VLAN group performs Layer 2 switching. The network interface can be used for Layer 3 routing
between VLAN groups.
A network interface uses the following format:
zhpN (e.g., zhp0)
N-is an integer between 0 and 9999. The value of N is not required to be the same as any of the
port(s) that are its members. The range 0-4999 is reserved for network interfaces created by users.
The range 5000-9999 is reserved for network interfaces created by switch applications.
A trunk uses the format zrlK, where K is an integer between 0 and 31.
OPTIONS
-h <hostname>
Specifies the hostname to configure. By
default, zconfig configures the local OpenArchitect switch.
-d <level> Sets the level of debugging output produced by
zconfig. The default level is 1. Setting the debug level
higher produces more output. The maximum output level is
currently four (4).
-a
Displays the current configuration of the switch.
-t
Tears down the entire switch configuration.
{-f <file>} | <configuration>
Gets configuration information from the specified file. A
<file> name of ‘+’ reads configuration data from standard
Ethernet Switch Blade User's Guide
release 3.2.2j
page 199
input. If the -f flag is not used, a single line of
configuration data can be entered as parameters to zconfig.
CONFIGURATION SYNTAX
zconfig takes configuration data from standard input or from a file with the -f option. In
either case, the configuration syntax is the same. The zconfig configuration data consists of a
list of semicolon-delimited statements. Each statement specifies an action to take globally or on
an interface. An interface is one of three types: a network interface called ZNYX host port
(zhp); a switch port interface called ZNYX Raw Ethernet (zre); or a trunk interface called
ZNYX RainLink (zrl).
Comments, spaces and new lines are ignored. Comments begin with the # character and include
characters through the next new line.
Global Statements
Global statements can be used to set modes of operation on a switch-wide basis. The only
supported global statement is to set and teardown Double VLAN tag mode.
Global Statement Syntax:
Double VLAN tag mode is set and removed on a global basis with the following syntax.
dvlan 0x8100 | 0x9100;
(or other unused ethertype)
dvlan teardown;
The first option sets double VLAN tag mode on all ports and establishes the outer tag id. The
second tears down double VLAN tag mode.
Trunk Interface Statements
A trunk interface statement begins with the trunk name followed by an equals sign and an action.
Trunk interface statements are used to create or tear down trunks or define the rules to determine
which member of the trunk should be used to transmit a packet
Trunk interface syntax:
zrl0 = <Trunk Interface Action>;
Trunk interface actions:
List of ports
Creates a trunk interface with the
port members. All of the ports specified must not
of any other trunk, or be individually included in
interface. Up to eight ports can be included in a
specified
be a part
any network
trunk.
A port member is identified with the zre<X> format, where x
represents a port number between 0 and 23 for the in-band
ports. The Out of Band ports cannot be included in the List of
Ethernet Switch Blade User's Guide
release 3.2.2j
page 200
ports.
teardown
Removes the trunk interface, making the ports which
were part of the trunk available for configuration in other
trunks or VLANs.
all
mac [ source_address | destination_address ]
ip [ source_address | destination_address ]
port [ source_port | destination_port ]
Further specifies the rules for selecting which port in
the trunk a packet should be transmitted out of. A comma
delimited list is valid to specify more than one criterion.
Specifying a particular option only uses that layer’s source
and destination information. The default is all, which combines all
criteria for determining the transmit port of the trunk. Specifying both source and destination
for a given layer is the same as specifying that layer itself, i.e,
zrl0=ip source_address, ip destination_address
is the same as,
zrl0=ip
NOTE: The Ethernet Switch Blade supports destination_address and/or
source_address for MAC and IP. It cannot combine MAC and IP settings, nor does it
support port settings.
Examples of trunk interface statements:
This statement creates a trunk containing three ports:
zrl5 = zre11,zre15,zre17;
The following statement specifies that packets will be sent out over this trunk using the exclusive
OR of the last four bits of their MAC source and destination addresses to select the port:
zrl5 = mac source_address, mac destination_address;
The teardown statement uses a colon instead of an equals sign:
zrl5: teardown
Network Interface Statements
Ethernet Switch Blade User's Guide
release 3.2.2j
page 201
The syntax for a network interface statement is the interface name followed by a colon and an
action. Network interface statements are used to create or tear down a VLAN group and can
consist of one or a list of network interface names; followed by a colon and then an action. For
example:
zhp0: <Network Interface Action>;
Network interface actions may include:
vlan<N> = list of ports or trunks
Creates a network interface and a VLAN group with a VLAN
identification number (VID) consisting of specified port
members.
<N>
is an integer between 1-4095.
list of ports or trunks
A port member is identified with the zre<X> format, where x
represents a port number between 0-23 for the in-band ports. A
trunk is identified with the zrl<Y> format, where Y is a
number between 0-31.
If the network interface and VLAN group already exist, the
specified ports or trunks are added to the network interface
and VLAN group.
teardown
Deletes the network interface and the associated
VLAN group.
zre_list = multicast <mac_address>
Register the multicast <mac_address> on the zre_list ports
associated with the given VLAN
multicast_clear Clear all registered multicast address on all
the ports in the VLAN
<list of ports or trunks> teardown
Deletes the specified ports or trunks from the network
interface and the VLAN group associated with it. If there are
no remaining port or trunk members, then also deletes the
network interface and VLAN group.
Examples of Network Interface Statements:
The statement below creates a VLAN group with the VID number 1 and the network interface
named zhp5. This VLAN includes a single switch port, zre1.
zhp5: vlan1=zre1;
The next statement creates a VLAN group with the VID number 100 and the network interface
Ethernet Switch Blade User's Guide
release 3.2.2j
page 202
named zhp1. This VLAN includes four switch ports, zre1, zre10, zre11, zre13.
zhp0: vlan100 = zre1,zre10,zre11,zre13;
The next statement adds two switch ports, zre1, zre2 and zre3, to an existing network
interface and VLAN.
zhp0: vlan100 = zre1..3;
The next statement deletes two switch ports, zre1 and zre2, to an existing network interface
and VLAN.
zhp0: zre1..2 teardown;
The final example is a teardown action that deletes the VLAN group defined in the previous
example, including the network interface.
zhp1: teardown;
Port Interface Statements
Port interface statements specify a port or trunk name or a list of such names; followed by an
equal sign (=) and then the action. Port interface actions may include:
SYNTAX
zconfig <zre_list>=untag<n>
untag<N>
Packets sent from this port or trunk for VLAN <N>
are transmitted without a VLAN tag. The port or trunk
specified must have previously been included in the VLAN group
with VID<N>.
zconfig <zre_list> multicast=<forward_type>
<forward_type> Set the ports specified to act as defined by
the forward_type for multicast traffic. Possible
<forward_types> are:
forward_unregistered (default)
forward_all
filter_unregistered
Examples of Port Interface Statements:
Assuming that zre1 has been assigned to VLAN 1, to specify that packets sent from port 1 for
VLAN 1 are transmitted without a VLAN tag, and packets arriving on this port without a VLAN
tag are given the VLAN tag with the VID number 1, enter:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 203
zre1=untag1;
If port 0 is also a member of VLAN 100, packets for VLAN 100 are sent from this port with a
VLAN tag as part of their header.
In the next example, the switch ports 10, 11, and trunk 2 are configured as untagged members of
VLAN 100.
zre10,zre11,zrl2=untag100;
This statement is equivalent to the following three lines:
zre10=untag100;
zre11=untag100;
zrl2=untag100;
In the examples above, since port interfaces can only be untagged for one VLAN group, zre1
cannot also be untagged for VLAN 100. A port or trunk can be a member of multiple VLANs but
can only be designated untagged on one VLAN.
WILDCARDS
Wild card characters can be included to simplify the process of creating larger, more complex
configurations. Wild card characters for zconfig include:
,
(comma)
Use for creating lists
..
(dot-dot)
Specifies an inclusive range
+
(plus)
Specifies auto-incrementing
Below are some examples for the correct usage of the comma (,) and dot-dot (..). Each line below
produces the same results:
zhp0: vlan1 = zre1, zre2, zre3, zre4;
zhp0: vlan1 = zre1..4;
zhp0: vlan1 = zre1, zre2..4;
zhp0: vlan1 = zre1..2, zre3..4;
The following examples create multiple VLAN groups using a single statement. A list of network
interface names are followed by a colon (:) and a list of VLAN actions which are followed by an
equal sign (=) and a port list. Each VLAN group is created in turn, along with the corresponding
network interface, and all ports listed after the equal sign are included in each group.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 204
The following statement creates 14 VLAN groups with VID numbers 1-14. Each VLAN contains
the same switch port, port 1, represented as zre1.
zhp0..13: vlan1..14 = zre1;
The plus (+) wildcard can be used with the last port listed to auto-increment that port number
before each VLAN group is created. The following network interface statement creates 14 VLAN
groups, with the first group containing port 1, the second group port 2, and so on. The second
statement configures all ports as untagged in their respective VLANs.
zhp0..13: vlan1..14=zre1+;
zre1..13=untag1+;
This is equivalent to:
zhp0: vlan1=zre1;
zhp1: vlan2=zre2;
zhp2: vlan3=zre3;
.
.
zhp13: vlan14=zre14;
zre1=untag1;
zre2=untag2;
zre3=untag3;
.
.
zre14=untag14;
The previous configuration can be used for creating a 14 port Layer 3 switch, with each port
assigned to its own VLAN.
In the next example, one VLAN group, with VID number 1, is created that contains 14 ports. The
second statement designates the 14 ports as untagged for the VLAN 1 group.
zhp0: vlan1 = zre1..13;
zre1..13 = untag1;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 205
The previous configuration can be used for creating a 14 port Layer 2 switch, all 14 ports
assigned to the same VLAN.
SEE ALSO
zl3d
Ethernet Switch Blade User's Guide
release 3.2.2j
page 206
zcos
NAME
zcos -
class of service queue control
SYNOPSIS
zcos
[-h <hostname>] [-d <level>]
[ -u <default priority> ]
[ -m q0,q1,q2,q3,q4,q5,q6,q7 ]
[-n <queue length list in packets for each queue> |
-b <Reserved space in bytes for each queue> |
-s <limit on dynamic pool usage, in bytes>, <reset %>]
[ -k PRI | RR | WRR | DRR]
[ -w <queue weight list> ]
[ -g <max>,<burst> ]
[ -r <guaranteed bandwidth in Kbps for each queue> [ -t <burst
size list in Kbytes> ]
[ -l <maximum bandwidth in Kbps for each queue>
size list in Kbytes> ]
[ -t <burst
[ -q all | qmap | qinfo | scheduler ]
[<port list>]
DESCRIPTION
zcos provides a means to set many of the hardware features of the switch related to class of
service and differentiated services processing, including scheduling and bandwidth management.
The current settings can also be examined.
The OpenArchitect switch supports up to eight class of service queues for packets to be sent out
each of the Ethernet ports or forwarded to the CPU. Normally, packets are placed in these queues
based on their 802.1p priority for tagged packets or the default priority for the port on which they
arrive. The queue destination for each priority is determined by a map. A separate map is used
for each ingress port. For additional means of setting the cos queue for a packet, see the sections
on filtering and traffic control.
Packets are selected from the cos queues at a port based on a scheduler, which may be configured
in a variety of modes. The scheduler can provide minimum bandwidth guarantees and limit the
bandwidth used for packets from each cos queue. The total egress bandwidth for a port can also
be limited.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 207
Each cos queue is limited in the number of packets it can hold waiting scheduling; the memory
used by each queue is managed to provide a guaranteed space with additional space shared
among all queues for a port.
OPTIONS
Most options are optionally followed by a <port list>, which may include zre port ranges,
like zre0..5, individual ports, such as zre51, or cpu, to indicate the queues and scheduling
for packets to be transferred to the cpu. The priority and queue mapping options do not apply to
the cpu, these settings are provided by the host.
General Options
-h <hostname>
Specifies the hostname of the OpenArchitect switch to be configured. Ignore this option if
you are configuring on the OpenArchitect switch on which the zcos command is being run.
-d <level>
Sets the debug level. The default is 1. The maximum is 4.
Priority and Queue Mapping
-u <default priority> [<port-list>]
Packets which arrive without a tag have no 802.1p priority. This option assigns a default
priority for untagged packets arriving on each port in the <port-list>. The default
priority ranges from 0 (lowest) to 7 (highest).
-m q0,..,q7 [<port-list>]
Specifies the priority to COS queue map. The first parameter maps priority 0 to queue q0,
second maps priority 1 to queue q1, etc. (the queues are numbered 0 to 7). The <portlist> identifies which ingress ports will use this map. If no <port-list> is given, the
same map will be used for packets arriving on any of the input ports.
Queue Limits
(These limits should only be changed when the ports are idle)
-b <Reserved space in bytes for each queue> [<port-list>]
Specifies the dedicated memory for each cos queue of the ports listed.
-n <queue length list in packets> [<port list>]
Sets the number of packets allowed on each cos queue of the ports listed. The total number of
packets for all 8 cos queues is limited to 2048.
-s <limit on dynamic pool usage, in bytes>, <reset %> [<port list>]
Sets the limit on dynamic memory pool usage by all cos queues for each port listed.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 208
Packets are first counted against the reserved space for a queue. When that space is occupied,
additional memory is used from the dynamic memory pool until the dynamic pool usage limit for
the port is reached. Any additional packets received for the queue on this port are dropped.
Metering and Scheduling
-r <list of bandwidth guarantees in Kbps for each cos queue> [
-t <list of burst sizes in Kbytes>]
[<port-list>]
Sets up minimum rate meters for each cos queue. All queues which have not exceeded their
minimum transmission rate are scheduled before the other queues.
-l <list of bandwidth limits in Kbps for each queue> [ -t
<list of burst sizes in Kbytes>] [<port-list>]
Sets up maximum rate meters for each cos queue. Queues which have exceeded their
maximum transmission rate will not be scheduled.
-k PRIO | RR | WRR | DRR [<port list>]
Selects the scheduler mode for the ports listed:
PRIO – strict priority, cos queue 7 is highest priority, queue 0 is lowest
RR – Round robin, a single packet is scheduled from each backlogged COS queue.
WRR – Weighted round robin, a configurable number of packets are scheduled from each queue
before moving on to the next.
DRR – Deficit Round Robin, packets are scheduled from a backlogged queue until the configured
number of bytes for that queue have been sent.
-w <queue weight list> [<port list>]
Provides the weights for WRR and DRR scheduling. For WRR, the weights are the number of
packets, scaled such that all weights are between 1 and 15. For DRR, the weights are the
number of bytes, with a range of 10KB to 160 MB of data.
-g <max Kbps>,<burst size in KBytes> [<port list>]
Sets a maximum bandwidth meter for all packets transmitted from a port.
The guaranteed and maximum rate meters influence the four scheduling modes. First, those
queues which have not met their guaranteed rate and have packets to send are serviced according
to the scheduling mode. Then those queues which have not met their maximum rate and have
packets to send are serviced. If all queues have met their maximum rate, or the maximum
bandwidth for the port has been reached, no packets are sent. Each of the meters is implemented
as a separate leaky bucket.
Queries of the Current Settings
Ethernet Switch Blade User's Guide
release 3.2.2j
page 209
-q all | qmap | qinfo | scheduler [<port list>]
Queries the current COS/QOS Settings.
all - Displays all of the queue mappings, queue limits, metering and scheduling settings
qmap - Displays the priority to COS queue mappings.
qinfo - Displays queue limits for the COS queues.
scheduler - Displays the traffic metering and shaping settings and the scheduler mode.
EXAMPLES
1. To set Ethernet ports zre0 to zre19 to allow up to 50 packets in priority queues 0-3 and up
to 75 packets in queues 4-7:
zcos –n 50,50,50,50,75,75,75,75 zre0..19
2. To map packet priorities 1-1 to COS queues for packets received on all ports:
zcos –m 0,1,2,3,4,5,6,7
3. To set up weighted round robin scheduling on ports zre10 to zre14 and the CPU with a
weight of 2 for queue 0, 3 for queue 1, and 1 for all other queues:
zcos –k WRR –w 1,3,1,1,1,1,1,1 zre10..14,cpu
4. To limit the rate of packets sent to the CPU to 15 Megabits/sec., with bursts of no more than
20,000 bytes:
zcos –g 15000,20 cpu
5. To guarantee CPU cos queue 5 500 kbps, queue 6 200 kbps, and queue 7 1 mbps, and all other
queues no guarantee:
zcos –r 0,0,0,0,0,500,200,1000 cpu
6. To limit CPU cos queues 0 – 4 to 1000 kbps, with a burst of 20 Kbytes:
zcos –l 1000,1000,1000,1000,1000 –t 20,20,20,20,20 cpu
SEE ALSO
zfilterd, zqosd, ztmd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 210
zdog
NAME
zdog - Configure and send heartbeats to watch dog enabled drivers.
SYNOPSIS
zdog [-d <level>] -h | -i <interval> | -n <heartbeats>
zdog [-d <level>] -b
zdog [-d <level>] -a
DESCRIPTION
zdog is used to configure the Ethernet Switch Blade watchdog timer functions and to send
heartbeats to the Ethernet Switch Blade watchdog drivers. There are two components to the
Ethernet Switch Blade watchdog timer: A hardware component and a software component. The
two components are independent from each other in implementation, but work together to
provide safety against zombie hardware and software. The hardware component requires
attention on a predefined 1.5 second interval. The driver acts on this at interrupt level to ensure
that spurious reboots do not occur. The software component allows for a user programmable
interval on which lack of application to driver communication will cause a reboot. Both
components can be turned on with zdog.
The options -i and -n are used to configure the expected interval of heartbeats and the number
of missed heartbeats of the software component before the Ethernet Switch Blade should be
rebooted. If either the interval or number of heartbeats is 0, the software component is off. The
-h option is used to toggle on and off the hardware component of the watchdog timer. The
hardware component is off by default.
Once the software component of the watchdog timer is turned on, a heartbeat must be sent with
the -b option within that interval or the system will reboot. For example after issuing the
following command:
zdog -i 5000 -n 3
A heartbeat must be sent at least every 3*5000=15000 milliseconds (every 15 seconds). This can
be accomplished with something as simple as a polling script with a sleep, or started with a
higher level function like monit. The driver checks for heartbeat timeout approximately 3 times
per second. So (<heartbeat intervals>*<number of heartbeats>) faster then
330 milliseconds will have diminishing returns.
Combining monit and zdog allows multiple levels of insuring system integrity. The hardware
Ethernet Switch Blade User's Guide
release 3.2.2j
page 211
component of zdog insures that the CPU is functioning well enough to execute something. The
software component of zdog when launched from monit insures that monit is running to
perform higher level tasks. And finally monit can be used to monitor any or all critical system
resources and processes in the system.
OPTIONS
-d
set debug level to <level>
-h
Toggle use of the hardware watchdog timer.
default.
Off by
-i
Time interval in milliseconds between zdog to driver
heartbeats
-n
Number of missed heartbeats before system reboot
-b
Send a single heartbeat to the driver
-a
Display current configuration
SEE ALSO
http://www.tildeslash.com/monit/
Ethernet Switch Blade User's Guide
release 3.2.2j
page 212
zfilterd
NAME
zfilterd - A daemon to use the filter hardware of the OpenArchitect switch for filtering based
on iptables(8) rules.
SYNOPSIS
zfilterd [-d <level>] [-p <port>] [-f] [-l] [-i <pid>]
[-o <pid>]
DESCRIPTION
zfilterd is a daemon that intercepts filtering rules entered by the user, using iptables(8),
checks them for validity and then prepares messages for the traffic management daemon ztmd,
which is responsible for setting up the switch hardware for the filtering rules and actions.
OPTIONS
-d <level> Sets the level of debugging output required by
zconfig. The default level is one (1). Setting the debug level
higher produces more output. Four (4) is currently the maximum
output level.
-p <port>
sent.
Set the multicast port to which messages will be
-f
Run zfilterd in the foreground, by default, it runs in
the background.
-l
Log all diagnostic output to /var/log/zfilterd.log.
-I <pid>
Set our pid used in identifying ourselves to ztmd
-o <pid>
with.
set the pid of the ztmd process we will communicate
SEE ALSO
ztmd, zrule, iptables(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 213
zflash
NAME
zflash − Loads images into the flash ROMs on the OpenArchitect switch.
SYNOPSIS
zflash -d <dev> [-o|-O <offset>] <image_file>
<upgradeipmi.img>
DESCRIPTION
zflash enables you to program the flash ROMs on the switch. The switch contains 3 flash
ROM devices: the boot ROM flash, application flash 1 and application flash 2. Care should be
taken when flashing new images into the switch. Incorrect procedures can result in a switch that
cannot boot; especially when flashing the boot ROM referred to as device 0. See Switch
Maintenance for updating the flash ROM images.
OPTIONS
-d <dev>
Specifies the ROM device being programmed. Dev must be
a number 0, 1, or 2 corresponding to the boot ROM, application
flash 1, or application flash 2 respectively.
-i <upgradeipmi.img> will load the file <upgradeipmi.img> into
the IPMI controller flash memory. This updates the program
version, it does not affect the FRU data.
Progress indicators will be printed during the update. It may
take four minutes to flash.
Once the update is complete, the IPMI controller is rebooted,
which may cause the shelf manager to temporarily disable
fabric ports until the reboot is complete.
EXAMPLES
The following example loads a new initial RAM image into application flash 1 at the default
offset of 0:
zflash -d 1 rdr6000.zImage.initrd
The following example loads a new boot image into the boot ROM at the default offset of 0:
zflash -d 0 zx6000.img
Ethernet Switch Blade User's Guide
release 3.2.2j
page 214
Exercise caution when using this command, as an error can render your switch
inoperable. Do not interrupt this process until complete.
SEE ALSO
zbootcfg
Ethernet Switch Blade User's Guide
release 3.2.2j
page 215
zl2, zl2mc, zl3host, zl3net, zvlan
NAME
zl2, zl2mc, zl3host, zl3mc, zl3net, zvlan – Formatted display of
OpenArchitect generic tables.
zl2 displays the abstraction API’s layer 2 table.
zl2mc displays the abstraction API’s layer 2 multicast table.
zl3host displays the abstraction API’s layer 3 host route table.
zl3mc displays the abstraction API’s layer 3 multicast table.
zl3net displays the abstraction API’s layer 3 network route table.
zvlan displays the abstraction API’s VLAN table.
SYNOPSIS
zl2 [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl2mc [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl3host [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl3mc [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl3net [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zvlan [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-h <host_name>] [-d <level>]
DESCRIPTION
The generic table display functions produce formatted output of the abstraction API’s tables for
Ethernet Switch Blade User's Guide
release 3.2.2j
page 216
display on the user console. The format of the output is table-dependent. Port mapping affects the
ports referenced in the generic tables. (Ports listed in order from 1 to 15)
Headers describing the column being displayed are printed after every 22 lines of output, which
makes it easy to pipe through more(1). The abstraction layer tables grow and shrink as entries are
added and deleted.
Several options are available which enable the user to display only selected entries. Additionally,
there is an option that clears user-specified entries in the table.
OPTIONS
-i <index> Displays the entry at the <index> position in the
table. Valid for all tables. Cannot be combined with -m, -P or
-v.
-m <mac_address>
Displays entries whose MAC address field matches
<mac_address>. Only valid for tables that have a MAC address
field. Cannot be combined with -i, -P, or -v.
-a
Displays the entire table.
-v <vlan_id>
Displays entries whose VLAN ID field matches
<vlan_id>. Only valid for tables that have a VLAN ID field.
Cannot be combined with -i, -m, or -P.
-P <port> Displays the entries whose port field matches
<port>. Only valid for tables that have a PORT ID field.
Cannot be combined with -i, -m, or -v.
-h <host_name> Specifies which hostname to connect. By
default, zgr connects to the locally connected OpenArchitect
switch (i.e., the one that is on the local PCI bus).
-d <level>
Sets the level of debugging output required by
zgr. The default level is one (1). Setting the debug level
higher produces more output. Four (4) is currently the maximum
output level.
EXAMPLES
The following example searches for and displays an entry in the zl2 table with the specified
MAC address:
zl2 -m 00:c0:95:45:00:00
If there is an entry in the ZL2 table with the MAC address, 00:c0:95:45:00:00, all the fields of
that entry will be displayed.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 217
The following command deletes the above entry:
zl2 -c -m 00:c0:95:45:00:00
The following command displays all entries of the zl2 table:
zl2
Be careful, the -c option does not ask. The following command deletes all entries in the zl2
table:
zl2 -c
SEE ALSO
zal
Ethernet Switch Blade User's Guide
release 3.2.2j
page 218
zgvrpd
NAME
zgvrpd - GARP VLAN Registration Protocol (GVRP) daemon for the OpenArchitect
switch.
SYNOPSIS
zgvrpd [-d <level>] [-f] [-h <hostname>] [-p <ppa>] [-t
<target>]
DESCRIPTION
zgvrpd is run after the network interfaces are created and initialized with zconfig, and started
with ifconfig(1M).zgvrpd starts a background task that implements the GARP VLAN
Registration Protocol (GVRP) protocol for a specified zhp interface. GVRP provides a Layer-2
mechanism for dynamically managing port membership in VLANs, including adding and
deleting ports, and creating and deleting VLANs. The background task started by zgvrpd
continues throughout the life of the Layer 2 network. GVRP is specified in ANSI/IEEE Std
802.1Q, 1998 Edition. The GARP protocol on which GVRP is based is specified in ANSI/IEEE
Std 802.1D, 1998 Edition.
zgvrpd updates the switch’s VLAN configuration, based on GVRP packets received on the
target interface. Specifically, it adds a port to or deletes a port from the VLAN specified in the
GVRP packet. If the VLAN does not exist, zgvrpd creates it. If zgvrpd deletes the last port
from a dynamically-created VLAN, it also deletes the VLAN.
When a VLAN is dynamically created, a corresponding zhpN interface is also created, where N is
an integer between 5000 and 9999. The value of N is equal to 5000 plus the VLAN identification
number (VID). For example, if zgvrpd creates VLAN 5, it also creates a zhp5005.
zgvrpd learns the existing (static) VLAN configuration of its target when starting up. When
shutting down, zgvrpd deletes only the dynamic changes to the VLAN configuration that it has
made. Manual changes to the VLAN configuration can be made while zgvrpd is running.
However, all such changes will be deleted when zgvrpd is terminated.
Only the GARP normal registration mode is currently supported. Multiple instances of zgvrpd
may run concurrently provided the targets are unique. If zgvprd’s target is a zhp, it’s
recommended that the zhp contain all the ports on the switch.
zgvrpd cannot be run concurrently with zsnoopd, because zsnoopd assumes static VLAN
membership.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 219
OPTIONS
-d <level>
Sets the level of debugging output required by zgvrpd. The
default level is zero (0). Setting the debug level higher
produces more output. Five (5) is currently the maximum output
level.
-f
Run zgvrpd in foreground. Default is to run it in
background.
-h <hostname>
Connect to remote host <hostname>.
-p <ppa>
Start zgvrpd on switch <ppa>. Default is 0.
-t <target>
Enable GVRP on the set of ports specified by the target zhp
interface. There is no default. A target must be specified.
EXAMPLES
In the following example, zgvrpd starts a background task that enables the GVRP protocol for
the ports in the zhp0 interface. zgvrpd receives and sends GVRP packets, and updates the
VLAN configuration accordingly. This background task continues throughout the life of the
Layer 2 network, or until manually terminated.
zgvrpd –t zhp0
Once you run zgvrpd, use zconfig -a to display the current VLAN configuration.
SEE ALSO
zconfig, zsnoopd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 220
zl2d
NAME
zl2d - Layer 2 daemon for the OpenArchitect switch.
SYNOPSIS
zl2d [start | stop] [-t <msecs>] [-d <level>] [-f]
[-p <priority>] <iface..>
DESCRIPTION
zl2d is run after the network interfaces are created and initialized with zconfig. zl2d
creates a Linux bridge for each interface using brctl(8). The bridge name is the interface
name with a ‘b’ pre-pended to it. This command is primarily used for Rapid Spanning Tree
Protocol (RSTP). Each port associated with the interface is included within the bridge. zl2d
starts a background task that continues throughout the life of the Layer 2 network. zl2d is a
script that can be modified to include brctl commands when started and stopped. Examine
/usr/sbin/zl2d for examples of how to change common options.
OPTIONS
start | stop
Starts or stops the zl2d daemon.
-t <msec> Cause zl2d to monitor the Spanning Tree state of
each port on each bridge every <msec> milliseconds. If
unspecified, the default is 500 milliseconds.
-f
Enables Fast Forward on bridge(s) using 0x4000 (16384) as
the dynamic root priority.
-d <level> Sets the level of debugging output required by
zl2d. The default level is one (1). Setting the debug level
higher produces more output. Four (4) is currently the maximum
output level.
-p <priority>
Sets the dynamic root priority. <priority>
should be specified as a decimal number. A priority of 0
disables root priority change.
<iface…>
The network interfaces on which zl2d should
operate. These network interfaces must first be created by
zconfig. zl2d does not operate with standard network interface
cards. It only works on switch network interfaces created by
zconfig.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 221
OPERATIONS
zl2d manages the Spanning Tree state fields in the switch of each port within the bridge(s).
Based on a timer, zl2d reads the port information for each Linux bridge and updates the switch
when necessary.
EXAMPLES
In the following example, zl2d creates a Linux bridge named bzhp0 which includes all of the
zre<n> devices previously associated with the zhp0 device. zl2d then starts a background
task that monitors the port information of the Linux bridge every 500 ms. and updates the
Spanning Tree state fields in the hardware when necessary.
zl2d -t 500
zhp0
Once you run zl2d, use brctl(8) to display and alter your Spanning Tree settings.
SEE ALSO
zconfig, brctl(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 222
zl3d
NAME
zl3d - Layer 3 daemon for the OpenArchitect switch.
SYNOPSIS
zl3d [-h <host_name>] [-t <msecs>] [-b] [-e] [-l] [-n] [-d <level>]
<iface ..>
DESCRIPTION
zl3d is run after the network interfaces are created and initialized with zconfig, and started
with ifconfig(1M).zl3d listens for Netlink messages from the kernel and monitors the
Linux network routing tables for routing updates. When an update occurs, if appropriate, zl3d
updates the corresponding routing tables in the silicon so that Layer 3 forwarding is done in the
switch at line speed. zl3d also ages the switch host route entries.
zl3d attempts to keep the Linux route cache and the switch host route table consistent. When a
host route table entry is in use, zl3d updates the Linux route cache. Host route table entries are
deleted when Linux removes the corresponding entry from the route cache. Similarly, network
route entries are removed when the corresponding Linux FIB table entry is deleted.
OPTIONS
-h <hostname>
Specifies which host to monitor. By default,
zl3d monitors the OpenArchitect switch that is locally connected
(i.e., the one that is on the local PCI bus).
-t <msec> Sets the timeout value. By default, zl3d wakes up
every 15 seconds (15000 ms) to look for updates to the Linux
routing tables and to do aging processing of route table entries
on the switch.
-b
Do not background the zl3d process.
-e
Enables additions of a default route to the L3 network route
table.
-l
Leave hardware tables intact on exit.
-n
Writes the proc table entry. zl3d writes 3/2 of it’s
timeout value to /proc/net/sys/ipv4/route/route_expires. The Linux kernel uses
this value as an expiration time for cache routes updated by
zl3d.
-d <level> Sets the level of debugging output required by zl3d.
The default level is one (1). Setting the debug level higher
Ethernet Switch Blade User's Guide
release 3.2.2j
page 223
produces more output. Four (4) is currently the maximum output
level.
<iface…>
The network interfaces on which zl3d should operate.
These network interfaces must first be created by zconfig. zl3d
does not operate with standard network interface cards. It only
works on switch network interfaces created by zconfig. It uses
the same syntax as zconfig.
OPERATIONS
zl3d manages the host route and network route tables located in the switch. Based on a
trigger condition, zl3d reads the Linux FIB table and the Linux route cache. Each table
entry is filtered by zl3d according to the following:
If an entry from the route cache is not a broadcast address or a local address, and zl3d is
able to resolve the MAC address of the destination, then zl3d inserts the entry into the
switch host route table.
If an entry in the Linux FIB table is a host entry and zl3d is able to resolve the MAC
address of the destination host, then the entry is inserted into the switch host route table. If an
entry from the FIB table is a gateway entry, is not local, and zl3d is able to resolve the
MAC address of the gateway, then the entry is inserted into the switch network route table.
EXAMPLES
Normally, zl3d is run as a background task that continues throughout the life of the Layer 3
network.
In the following example, for the three interfaces specified, zl3d continuously monitors the
Linux FIB and route cache tables looking for updates every three seconds. Entries in the
switch host route table are checked for activity every 15 seconds and aged accordingly.
zl3d zhp1 zhp2 zhp3
SEE ALSO
zconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 224
zlc
NAME
zlc − link and LED control
SYNOPSIS
zlc [-h <hostname>][-d <level>][-x] <port_list> <action> [on | off ]
zlc [-h <hostname>][-d <level>][-x] <action> [on | off |clear]
zlc [-h <hostname>][-d <level>][-x] [state|query]
DESCRIPTION
The zlc application sets the link speed and state of individual ports of the switch, or displays
the current state. It can also turn on or off the extract LED or the internal fault LED.
OPTIONS
-h <hostname>
Connect to remote host <hostname>.
-d <level> Set debug level to <level>
-x
Expected query value. Creates no output, exit code only.
If the port_list contains more than one port, returns the
number of ports that match the option.
<port_list>
Port or list of ports on which to take action.
Port lists are supplied in zconfig syntax (e.g. zre1, zre2..4,
etc.)
<action>
Set link speed or state to up, down, auto, 1000fd,
1000hd, 100fd, 100hd, 10fd or 10hd.
The interface must be down to
change the port speed. Set intfault or
<option>.
extfault. Must supply
Use query to return present settings.
on | off
Turn on or off the specified LED. Only valid for
actions intfault, extfault or extract.
clear No longer globally control the specified LED
state Return the list of LEDs currently illuminated.
query Return the list of LEDs currently controlled globally.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 225
EXAMPLES
In the following example, zlc forces the line speed of port 1 to 100 Full duplex. The interface
must be down to change the speed. Assuming zre1 is part of interface zhp0,
ifconfig zhp0 down
zlc zre1 100fd
The external fault, internal fault, and ok LEDs can be set on a per port basis or globally. To set
the external fault LED for a particular port,
zlc zre1 extfault on
To query the settings of a particular port,
zlc zre1 query
Global Settings
The external fault, internal fault, and extract LEDs are set as a logical OR of all the ports. The
LEDs can also be set globally to on, off, or other. If globally set to on or off, the LED will not
change when links go up or down ,or interfaces are configured. If set to other, the LED resumes
its normal operation.
The next example globally turns on the Pull (extract) LED.
zlc extract on
Additional capabilities are also available by supplying an additional led action:
zlc <led_name> on
The <led_name> LED is turned on. The LED will not change when links go up or down or
interfaces are configured. If other is used, the LED resumes its normal operation. <led_name>
can be intfault, extfault, extract, or ok.
zlc query
zlc state
query lists the LEDs which have been set globally on or off. state shows which LEDs are on at the
moment. All LEDs are shown, including the clk LED.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 226
SEE ALSO
ifconfig(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 227
zlmd
NAME
zlmd − monitor link changes or hot swap events.
SYNOPSIS
zlmd [-h <hostname>] [-b] [-d <level>] {-f <file>} |
<configuration>
DESCRIPTION
The zlmd application is intended to run as a daemon, waiting for a configured event to occur and
then running the program configured for that event. The events monitored are changes in the link
status of any of the ports of the switch, the start of removal of the switch from the back plane, or
the cancellation of the removal before it actually takes place. The program can be a shell script
that initiates appropriate actions to respond to the event.
OPTIONS
-h <host_name>
-b
Connect to remote host <host_name>
Do not background the zlmd process.
-d <level> Set debug level to <level>.
{-f <configuration_file>} | <configuration>
Read configuration from <file>. If file is a ‘+’,
configuration is read from stdin. Without the -f option, a
single line of input can precede the last flag. (i.e.
<Configuration>)
CONFIGURATION SYNTAX
zlmd takes configuration data from standard input or from a file with the -f option. In either
case, the configuration syntax is the same. The zlmd configuration data consists of a list of
semicolon-delimited statements. Each statement specifies an event to monitor followed by a
response action.
Configuration commands:
<port-list> = <program>
Run <program> when a fault occurs or clears on a port.
hotswap = <program>
Run <program> when a hot-swap extraction or insertion event
occurs.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 228
<port-list>
A list of ports in the same forms supported by
zconfig, e.g. zre1,zre2 or zre10..14
<program> Path to an executable program or script to be run
when the event occurs. Note: An absolute path to <program> is
required. The program will be called with the following
parameters:
For Link Changes:
<program> <ppa> <port> {external(0)|internal(1)} {off(0)|
on(1)}
For Hot-swap Events:
<program> <ppa> {extraction(1)|insertion(2)}
Note: The <ppa> parameter is undefined and should be ignored.
EXAMPLES
In the following example, zlmd monitors ports 1 through 4 and runs a script called
prt_change upon a link change event.
zlmd zre1..4=/usr/sbin/prt_change
Suppose port 2 were UP and you disconnected the cable, zlmd would call prt_change with
the following parameters:
/usr/sbin/prt_change 0 2 0 1
where 0 is the ppa, 2 is the port, 0 is an external fault, 1 is ON.
SEE ALSO
zconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 229
zlogrotate
NAME
zlogrotate − Rotates log files.
SYNOPSIS
zlogrotate [-b] [-t time] [-s segment size] [-n # of files] [-f file
to rotate]
DESCRIPTION
zlogrotate rotates the selected file every [time] seconds if the file is larger than [segment
size]. It will keep only the number of files selected. zlogrotate is called from
/etc/init.d/rcS by default with no parameters.
OPTIONS
-b
Do not background the process - i.e. run in foreground.
-t <time>
60)
the time between logfile checks in seconds (default
-s <size> the targeted file segment size, in kilobytes
(default 256)
-n <# of files> The number of segments kept on the system
(default 4)
-f
<file> The file to rotate (default /var/log/messages)
EXAMPLES
To start zlogrotate with the default values,
zlogrotate
Ethernet Switch Blade User's Guide
release 3.2.2j
page 230
zmirror
NAME
zmirror - Set packet mirroring on an ingress or egress port.
SYNOPSIS
zmirror
-a | -t
zmirror
[-e] <from_list> <to_port>
DESCRIPTION
zmirror sets packet mirroring from a given set of ports to a given port. Turning on packet
mirroring causes a copy of the packet to be sent to the to port. Any number of from ports can be
mirrored to one to ports.
NOTE: There may be a significant performance impact when trying to mirror more
bandwidth than is available on the to port.
After executing the following command, packets received on ports 1, 2 and 3 would be mirrored
(copied and transmitted) to port 12. This mirroring would be in addition to any Layer 3 or Layer
2 switching.
zmirror zre1, zre2, zre3 zre12
To clear the current mirroring, use the -t option.
The -e option can be used to indicate that packets being sent on a given port should be copied to
the to port. For example if the -e option is used as follows, the packets transmitted, as opposed
to received, on ports 1, 2 or 3 would be mirrored to port 12.
zmirror -e zre1, zre2, zre3 zre12
The to port can also be the keyword cpu to indicate that packets should be forwarded to the onboard processor. The following example would mirror the contents of port 1, 2 or 3 to the onboard processor:
zmirror zre1, zre2, zre3 cpu
The to port can be a single port or the keyword cpu. The from port can be a list consisting of one
or more ports. The from port cannot be the cpu. See the section on wildcards for discussion of
from port lists.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 231
zmirror is cumulative:
zmirror zre1, zre2, zre3 cpu
Is the same as:
zmirror zre1 cpu
zmirror zre2 cpu
zmirror zre3 cpu
Setting a different to port will overwrite the previous setting and direct previously mirrored ports
to a new to port. Given the last setup the following will change port 1 traffic to be forwarded to
port 10.
zmirror zre1 zre10
OPTIONS
-a
Display the current mirroring setup
-e
Set egress port mirroring for the specified from port
-t
Teardown or disable the mirroring
WILDCARDS
Wildcard characters can be included to simplify the process of creating larger, more complex
configurations. Wild card characters for zconfig include:
,
(comma)
Use for creating lists
..
(dot-dot)
Specifies an inclusive range
Below are some examples for the correct usage of the comma (,) and dot-dot (..). Each line below
produces the same results:
zre1, zre2, zre3, zre4
zre1..4
zre1, zre2..4
zre1..2, zre3..4
SEE ALSO
tcpdump(1M)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 232
zmnt
NAME
zmnt − Expands the read/write files onto the RAM disk.
SYNOPSIS
zmnt [-c] <directory>
zmnt [-c] -t <file>
zmnt [-c] –l
DESCRIPTION
zmnt expands files from flash onto the RAM disk that have been previously saved with zsync.
The init process runs zmnt to expand the files in flash onto RAM file system. The user may
use zmnt to expand the files at anytime and may place them in any directory. For example,
zmnt /tmp
will expand the files under the directory /tmp.
Booting the kernel with -i instructs the init process to skip the zmnt stage After booting in
this way, zmnt can be used for correcting a problem file in the read-write file system.
The -t option can be used to save the configuration of a switch to a tar file. A tar file can be
copied to another switch and saved with zsync -t. As a result, the configuration of a switch
may be cloned to other switches.
The -c option is used to mount the custom overlay. See zsync for a description of custom
verses dynamic overlay.
OPTIONS
-c
Read saved files from the custom overlay
-t <file>
Save the overlay into the specified <file> in tar format.
<directory>
The directory under which the overlay files
are expanded, or the file to which the tar image is saved.
-l
List files in selected overlay, do not unpack.
EXAMPLES
Ethernet Switch Blade User's Guide
release 3.2.2j
page 233
In the following example, zmnt the current overlay into a tar file called overlay.tar
zmnt –t overlay.tar
The resulting tar file can now be saved on a different host as a snapshot of the overlay at that
point in time. Use zsync to restore the overlay on the switch:
zsync –t overlay.tar
The restored overlay will be used upon the next reboot.
SEE ALSO
zsync
Ethernet Switch Blade User's Guide
release 3.2.2j
page 234
zpeer
NAME
zpeer – Application for High Availability communication between the Fabric and Data
switches.
SYNOPSIS
zpeer [-d <level>] local|peer <command> <value>|query
zpeer [-d <level>][-a][-r]
DESCRIPTION
zpeer is used to pass bidirectional High Availability(HA) state and priority information
between the base and fabric switches in the Ethernet Switch Blade. zpeer uses the concept of
local and peer information. The local information is written, and the peer information is read. As
an example:
On the base switch:
zpeer local priority 203
On the fabric switch the following command would return 203:
zpeer peer priority query
The communication is bidirectional so the example above can be reversed allowing the fabric
switch to pass state and priority information to the base switch.
NOTE: Local information can also be read as confirmation and for debugging purposes.
zpeer is part of the HA software suite. It is called as part of the scripts that are generated by the
zspconfig application. With the exception of querying information for debugging and
validation, the zpeer application would not need to be executed by the user. Following is a
quick overview of the values communicated between switches. See documentation on HA and
zspconfig for better understanding of how zpeer fits in with the HA software suite.
Following is an example of setting the state to “backup”:
zpeer local state backup
Possible states are:
unhealthy
Set by hardware at power up or at system reboot. Set by
software when the HA software suite is stopped. Indicates that
the peer is not properly functioning
healthy
Set by software when HA is started. This state is never
Ethernet Switch Blade User's Guide
release 3.2.2j
page 235
displayed by query, but must be set at initialization. After setting
the healthy state, the query will return the backup state.
backup
Used to reflect the backup state of vrrpd
master
Used to reflect the master state of vrrpd
The priority value is a value between 0 and 255. In the HA suite, the value is set to 254 minus
the number of ports that are link down.
The state or priority value for the local or peer can be displayed with the query command. The
following will query the state of the peer switch:
zpeer peer state query
The output from the above command during the boot process would be “unhealthy”
The -a option can be used to display a complete listing of all state and priority information and
internal information that can be used for debugging. Here is example output from the -a option.
Local/Write
203
master
priority
state
data
position byte|bit
status
2|cb
2|8
50(ACK)
231
Peer/Read
backup
2|e7
2|0
0()
The priority and state rows are the same values returned by the query command. The information
in the data, position and status rows are internal debugging information that is useful to support
engineers when diagnosing problems in the field.
The -r option should not be used while HA is running. It can cause loss of coordination between
the two switch planes. The -r option will reset all internal communication values, and possibly
require the peer to be reset also.
OPTIONS
-d
set debug level to <level>
-a
Display complete status of zpeer
-r
Reset all values in the zpeer communication. Can cause
loss of communication with the peer requiring the peer to
Ethernet Switch Blade User's Guide
release 3.2.2j
page 236
be also reset.
SEE ALSO
zspconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 237
zqosd
NAME
zqosd – monitors tc(8) commands to implement classification filters and queuing disciplines
in hardware.
SYNOPSIS
zqosd [-d <level>] [-p <port>] [-f] [-l] [-i <pid>] [-o <pid>]
DESCRIPTION
zqosd monitors commands entered by tc which set up queuing disciplines and classification
filters for managing traffic in the switch. It supports a variety of queuing disciplines which allow
distributing available bandwidth at each output port of the switch among different classes of
traffic as well as selecting which packets to drop when bandwidth limits are exceeded.
zqosd does not directly set up the hardware. It prepares messages describing the queuing
disciplines and filters and sends them to a hardware specific daemon, ztmd. ztmd should be
started before zqosd. Both programs normally run as background processes.
OPTIONS
-d <level> Set the level of diagnostic information logged.
<level> may be 0-4; higher levels produce more output.
- p <port> Use <port> as the multicast listening port for
communication with ztmd. Default is 2345.
-f
Run zqosd in the foreground.
run as a daemon.
-l
Without this option, it is
Log diagnostic output to /var/log/zqosd.log
-i <pid>
Set the pid for this process.
-o <pid>
Set the pid of the destination process (ztmd)
EXAMPLES
Start the traffic management daemon, ztmd, then start zqosd to monitor tc output. Both
daemons are run as background processes and log their messages.
ztmd –l
zqosd -l
Ethernet Switch Blade User's Guide
release 3.2.2j
page 238
SEE ALSO
ztmd, tc(8), zfilterd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 239
zrc
NAME
zrc - Packet rate control
SYNOPSIS
zrc -b | -m | -d | -t | -a [-p <port>] [-v <vlan>] [-g
<group>] [-M <mac_addr>] [-T <timeout>] [-D <level>] <rate>
DESCRIPTION
zrc sets rate control on Broadcast, Multicast and/or Destination Lookup Failure (DLF) packets.
The rate is measured in the number of packets per time period. If the number of packets received
of the specified type exceeds the specified rate limit, packets are discarded at the ingress port.
OPTIONS
-b
Enable rate control for Broadcast packets
-m
Enable rate control for Multicast packets
-d
Enable rate control for DLF packets
-t
Teardown or disable all rate control
-a
Display the current rate control settings
-p <port>
Enable rate control on this <port>
-T <timeout>
Set time period to <timeout> milliseconds.
Default is 1000 (one second).
-D <level> Set debugging output to <level> when running the
program.
<rate_limit>
The number of packets per time period above
which Broadcasts, Multicasts and/or DLFs will be discarded.
Valid rate limits are any number between 0 and 262143.
SEE ALSO
ztats
Ethernet Switch Blade User's Guide
release 3.2.2j
page 240
zreg
NAME
zreg - Read and write registers and tables on the OpenArchitect switch switching hardware.
SYNOPSIS
zreg [-p <ppa>] [-w] [-i <index>] [-t <index>] [-k] [-h
<hostname>] [-d <level>] [-r 10] <reg>
DESCRIPTION
zreg allows a user to read and write direct and indirect registers and tables on the resident
switch chip. zreg is commonly used for debug, or for prototyping when creating applications
that control the OpenArchitect switch. It is also useful when put into shell scripts for displaying
hardware status or statistics. Although the -t option allows tables to be displayed, one might find
the formatted output of the table functions more useful. See zal.
OPTIONS
-p <ppa>
Specify the Physical Point of Attachment (PPA).
Each OpenArchitect switch that is controlled by the CPU on
which zreg is running is a unique PPA. If there is only one
OpenArchitect switch, as would be the case when zreg is
running on the embedded processor the PPA would be 0. The
default PPA is 0.
-w
Causes zreg to write to the register or table. Data to be
written is read from standard input.
-i <index> Causes zreg to access at the indexed register
specified by <reg>. See OPERANDS for usage of <reg> with
indexed registers. The <index> parameter is used to determine
which entry i
-t <index> Causes zreg to access the memory specified by
<reg>. The <index> parameter is used as the index into the
specified table. Only content addressable memories are
accessed using the –t option. All other tables and memories
are accessed with the –i option.
-k
Causes zreg to access the memory specified by <reg>. The
entry accessed is determined using the data from standard
input as the search key. Enter 0 for fields that are not part
of the search key.
-h <hostname>
Specify the <hostname> to configure. By
Ethernet Switch Blade User's Guide
release 3.2.2j
page 241
default zreg configures the OpenArchitect switch that is
locally connected (i.e., the one that is on the local PCI
bus).
-r 10 Sets numeric radix for registers to 10. Default is 16.
-d <level> Set the level of debugging output produced by zreg.
The default level is 1. Setting the debug level higher
produces more output. The maximum level of output is currently
4.
OPERANDS
TBD
EXAMPLES
There are three types of accesses performed by zreg; scalar register, indexed register, and table.
For each of theses access types, values can be read or written. Content addressable memory
register access is the default. The following is an example of reading the CONFIG Register:
zreg 1
When running zreg on the embedded processor of the OpenArchitect switch, the <ppa> is
always 0, since the embedded CPU processor only controls the directly attached switch chip.
To write a value to a register the -w flag is used and the data is read from standard input. The
following example writes 0x10003c to the Aging Time Register:
echo 0x10003c | zreg -i 0 131 -w -p 0
echo 0x10003c | zreg -i 0 131 -w -p 1
echo 0x10003c | zreg -i 0 131 -w -p 2
echo 0x10003c | zreg -i 0 131 -w -p 3
If the zreg command is typed at the prompt, it waits for input from the user. You may also use
File I/O or shell scripts. The following example reads the MAC Packet Length Register for port
7:
zreg -i 7 2
SEE ALSO
zal, ztats
Ethernet Switch Blade User's Guide
release 3.2.2j
page 242
zrld
NAME
zrld – ZNYX redirector daemon
SYNOPSIS
zrld [-d <level>] [-p <port>] [-f]
DESCRIPTION
zrld is used for remote management of OA/HA applications. OA/HA applications capable of
remote management include zlc, ztats, zlmd.
zrld only allows requests from hosts listed in /etc/rcZ.d/zrld_trusted_hosts.
OPTIONS
-d <level> Set debug level to <level>
-p <port>
7000.
Specifies TCP port to listen on. Default port is
-f
Run the daemon in the foreground
EXAMPLES
In the following example, zrld starts a background task that listens on the default port 7000 for
incoming TCP requests and passes along the request to the OA/HA application,
zrld
Once started, you can issue supported commands to the host running zrld from a remote host.
For example, if the host running zrld had an IP address of 10.0.0.43, you could use zlc to
remotely query the status of ports 1 through 5. Remember, the IP address of the switch you are on
must be listed in /etc/rcZ.d/zrld_trusted_hosts on the switch running zrld.
zlc –h 10.0.0.43 zre1..5 query
SEE ALSO
zlc, zlmd, ztats
Ethernet Switch Blade User's Guide
release 3.2.2j
page 243
zsnoopd
NAME
zsnoopd - IGMP Snooping daemon for the OpenArchitect switch.
SYNOPSIS
zsnoopd [-d <level>] [-f] [-h <hostname>] [-p <ppa>]
[-r <sec>] [-t <sec>] [-u <sec>] [-v <vlan_id>]
DESCRIPTION
zsnoopd is run after the network interfaces are created and initialized with zconfig, and
started with ifconfig(1M).zsnoopd starts a background task that monitors incoming IGMP
traffic in order to learn which hosts in a VLAN are listening to which IP multicast addresses.
zsnoopd updates the multicast table in silicon with this information, so that IP multicast traffic
is forwarded only to the ports in a VLAN to which listening hosts are attached. This optimizes
packet flow through the switch. Traffic on all VLANs is monitored by default. IGMP snooping
can be restricted to specific VLANs using the –v option. The background task started by
zsnoopd continues throughout the life of the Layer 2 network.
zsnoopd manages the switch multicast table (MARL). Based on monitored IGMP traffic,
zsnoopd creates or updates an entry in the MARL. The key to each MARL entry is a source
Ethernet multicast address combined with a VLAN ID. Two port bitmaps are maintained: one
that identifies the untagged members of the VLAN, and one which identifies which ports of the
VLAN have listening hosts attached.
When the maximum number of entries in the MARL is reached, zsnoopd deletes a random
entry prior to adding the next entry.
Traffic from IP multicast addresses not found in the multicast table is forwarded to all ports
within a VLAN. Traffic from reserved IP multicast addresses (224.0.0.X) is forwarded on all
ports within a VLAN.
To operate correctly, traffic from unregistered IP multicast addresses should be forwarded on all
ports in a VLAN. To do this, the multicast port filtering mode must be set to
FORWARD_UNREGISTERED. See zconfig.
Neither zgmrpd nor zgvrpd can run concurrently with zsnoopd.
OPTIONS
-d <level> Sets the level of debugging output required by
zsnoopd. The default level is zero (0). Setting the debug
level higher produces more output. Four (4) is currently the
maximum output level.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 244
-f
Run zsnoopd in foreground. Default is to run it in
background.
-h <hostname>
-p <ppa>
Connect to remote host <hostname>.
Start zsnoopd on switch <ppa>. Default is 0.
-r <sec>
Time to wait, in seconds, before removing a port
with no router multicast traffic. Default is 260 seconds.
-t <sec>
Time to wait, in seconds, before removing a port
with no host multicast traffic. Default is 260 seconds.
-u <sec>
timeouts.
Time to wait, in seconds, before checking port
-v <vlan_id>
Enable zsnoopd for VLAN <vlan_id>. Default is
to enable zsnoopd on all VLANs. This option may be entered
more than once.
EXAMPLES
In the following example, zsnoopd starts a background task that monitors incoming IGMP
packets and updates the Multicast Table (MARL) accordingly. This background task continues
throughout the life of the Layer 2 network.
zsnoopd
Once you run zsnoopd, use zmarl to display the contents of the MARL. zsnoopd deletes all
entries in the MARL when starting up, and when shutting down. Manual changes to the MARL
are not recommended while zsnoopd is running
SEE ALSO
zconfig, zgmrpd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 245
zspconfig
NAME
zspconfig - configure and start surviving partner
SYNOPSIS
zspconfig [-d <level>] [-p <directory_path>] [-u
<dhcp_interface>] [-c <dhclient.conf>] [-t <timeout>] [-s]
[-v] -f <file>
DESCRIPTION
zspconfig is used to configure and start the Surviving Partner software. With the -f
option a configuration file is provided that completely describes the network setup and
desired behavior of all of the switches participating in the Surviving Partner. With the -u
option the interface on which to run dhclient to retrieve a configuration file must be
provided. The configuration file format retrieved by a -u is identical to that supplied with
-f. It is envisioned that the -f option is used for initial configuration, and all subordinate
and replacement switches run zspconfig with the -u option. The -v option prints the
current version of zspconfig and performs no actions.
OPTIONS
-d <level> Set the debug level. The default debug level is 1.
The higher the level, the more debugging output is produced.
Debugging output is sent to the console.
-p <directory_path>
Set the directory path for where zspconfig places the scripts
it generates. The default location is
/etc/rcZ.d/surviving_partner.
-u < dhcp_interface>
Come up as an unconfigured slave. Use the specified
<dhcp_interface> to retrieve the configuration. User
confirmation is not required unless the –s option is also
used.
-c <dhclient.conf>
Use file <dhclient.conf> as the configuration file to dhclient
when retrieving configuration information. If –c is not used,
a default configuration file is created and used. Only valid
with the –u option.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 246
-t <timeout>
Time to wait in seconds before giving up on
finding a Surviving Partner to retrieve configuration
information from. Only valid with the –u option.
-s
Do not ask for confirmation. Run from a script.
-v
Prints the current version of zspconfig.
-f <file> The provided <file> is used as input to configure
the Surviving Partner. See the next section on CONFIGURATION
FILE for the syntax of the configuration file.
CONFIGURATION FILE
The configuration file contains commands for controlling the Surviving Partner setup. Commands
are single lines end -delimited with a semicolon. Comments, spaces and new lines are ignored.
Comments begin with the # character and include characters through the next new line.
Comments may be placed on the same line as a command after the semicolon.
All configurations must include the VLAN configuration first by use of zconfig commands.
zconfig commands can be put in the configuration file and will be passed directly to the
zconfig application. zconfig commands start with the keyword zconfig and are of the
same format as described in the zconfig manual page. Here is an example of zconfig
commands in a zspconfig configuration file.
zconfig zhp0: vlan1 = zre1..4;
zconfig zhp1: vlan2 = zre5..8;
zconfig zhp2: vlan100 = zre14;
In the above example, three VLANs are created: zhp0 and zhp1 will be used as connections to
high availability nodes; zhp2 will be used as the inter-connect between two Surviving Partner
switches to run the VRRP heartbeat. All VLANs must be created before other zspconfig
commands may operate on them.
The next section of a zspconfig configuration file sets up the IP addresses of the created
VLANs. There are two types of addresses in a Surviving Partner setup: Physical Addresses, and
Virtual Addresses. The Virtual Addresses are those that VRRP manages and moves to the current
Master switch. The Physical Addresses are the real addresses of the switch, and are used for
management only. Physical Addresses are setup using the sibling_addresses command.
Virtual addresses are setup using the virtual_address command. The sibling addresses
name comes from the fact that we are setting up the addresses for all of the siblings in the
Surviving Partner group. So if we have two Surviving Partner switches, the
sibling_addresses statement might look like this:
sibling_addresses: zhp0 = 10.0.0.30, 10.0.0.31;
sibling_addresses: zhp1 = 11.0.0.30, 11.0.0.31;
sibling_addresses: zhp100 = 100.0.0.30, 100.0.0.31;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 247
A sibling_addresses statement is required for each VLAN created with the zconfig commands.
The two addresses in the list indicate there are two switches in the Surviving Partner group. The
first address 10.0.0.30 and 11.0.0.30 are assigned to the switch on which the configuration is
being run. The remaining addresses are distributed to the switches that run zspconfig -u on a
first come, first serve basis.
The sibling_addresses command may also take a netmask operand similar to that given to
ifconfig. For example:
sibling_addresses: zhp0 = 10.0.0.30, 10.0.0.31 netmask
255.255.255.0;
sibling_addresses: zhp1 = 11.0.0.30, 11.0.0.31 netmask
255.255.255.0;
Virtual addresses should be setup for all VLANs that connect to High Availability nodes.
Usually this would include all VLANs except the interconnect VLAN or VLANs connected to
upstream routers. It is possible that a setup would have VLANs that are used for management
only. Virtual addresses are setup as follows:
virtual_address: zhp0 = 10.0.0.43 netmask 255.255.255.0;
virtual_address: zhp1 = 11.0.0.42 netmask 255.255.255.0;
Only a single address per VLAN is provided because this single address will move with the
current Master switch, and the netmask must be the same as that provided in the
sibling_addresses statement.
The last required section for the configuration is description of the ports. Particularly we need to
specify one of the following for all of the ports participating in the high availability setup. The
possible port types are:
interconnect - Ports connected between groups of Surviving Partner switches. VRRP
heartbeat messages are sent on the interconnect ports.
Crossconnect - Crossconnect ports are ports that are connected to other Surviving Partner
switches, that are not part of this Surviving Partner group. Crossconnect ports behave
differently then bonding ports. The links are not brought down temporarily, and VRRP runs
with the native MAC addresses to avoid MAC address duplication with the other VRRP
group.
RAINLink - Ports connected to rain link or bonding driver nodes. These ports contain
virtual addresses managed by VRRP. And during a failover event, the links are toggled
down to force failover to the Master switch.
Route - Ports connected to upstream routers. VRRP does not manage virtual IP addresses
for these links. Routing protocols must be used to instruct up stream routers of a different
path to get to the VRRP managed networks.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 248
monitor_only - Ports that are monitored but do not have a virtual address managed on them.
They will not have their links brought down temporarily during a failover scenario. These
ports are only monitored. If a problem occurs on this type of link it will cause a failover
scenario.
configure_only - Ports are configured as per the zconfig commands, but do not participate
in the high availability network. Problems on these links will not cause a switch failover.
NOTE: The zhp specified for interconnect is important, and will be the zhp
interface/VLAN where zspconfig/HA on the the master will start the DHCP daemon.
Zre51 on the slave swtiches should be configured up into this same VLAN so that
zspconfig -u can connect to the master.
Each port that is setup in a VLAN by the zconfig commands must have its port type specified.
The port type is specified on a physical port bases. That is on a zre basis, but zhp names can be
used as a quick way to setup the port type for all ports that are a member of that VLAN. It is
possible to make a port a member of more then one VLAN. That is a zre can be a member of
more then one zhp. In such cases, configuring the zhps as different port types would cause a
conflict, and will not work. To handle this setup the individual zre commands would be used to
setup the port types. Here is an example of setting up the port types as a continuation of our
current configuration:
interconnect:
zhp2; # Could also use zre14
rain link: zhp0, zhp1;
# Could also use zhp0..1,
# or list the zres
The ".." wild card is supported as in zconfig to indicate a range of numbers. The comma is
used to indicate a list. The zres that are part of the zhp could also be used. Here is a more
complex setup. The zconfig commands are also shown to understand the VLAN setup:
zconfig zhp0: vlan1 = zre1..4, zre23;
zconfig zhp1: vlan2 = zre5..8, zre23;
zconfig zhp100: vlan100 = zre23;
zconfig zhp0: vlan1 = zre1..4;
zconfig zhp1: vlan2 = zre5..8;
zconfig zhp2: vlan100 = zre23;
#
sibling and virtual address setup omitted
interconnect:
zhp100;
rain link: zre1..8;
# Use zre definition to
# exclude zre23
If zhp0 and zhp100 are setup as different port types, there would be a conflict for port zre23.
In the particular example above, the zre23 is shared. It is used to pass VRRP interconnect
traffic and as a means to pass VLAN 1 and 2 traffic between switches. Since zre23 is an
Ethernet Switch Blade User's Guide
release 3.2.2j
page 249
interconnect, it is not a bonding driver enabled port, and therefore should be setup as an
interconnect port type. To accomplish this, the zre ports are listed to avoid conflicting port
types.
Note that a single line cannot contain both zhp and zre definitions. Therefore
rain link: zhp1, zre1..4
does not work and the definition zre1..8 is equivalent.
Optional zspconfig commands are listed below.
vrrp_msg_rate:
100; # Time in milliseconds.
# Default is 100 milliseconds
The message rate is the interval between VRRP messages sent over the interconnect link. The
time given is in milliseconds. It takes the lack of 3 VRRP messages for the Backup to assume the
role as Master. It is recommended with faster message rates to increase the default priority to
254. The higher priority will decrease the latency of the failover.
vrrp_def_priority:
254; # default is 100.
Value from 1 to 254
The default virtual MAC address for VRRP is 00:00:5E:00:01:<vid>, where <vid> the
virtual router ID. Using this virtual address is problematic in network environments where
multiple VRRP instances might be running that are using the same <vid>. To overcome this
problem, zspconfig, uses a default MAC address derived from the physical address of the
switch on which it is running. For the slave switch, the vrrp_virtual_mac_addr command
is used to set the MAC address to the same as the Master. This statement is typically not used
within the Master switch’s configuration. It is used in the zspconfig generated Slave switch
configuration. And is retrieved by the Slave switch with zspconfig using the –u option. If in
doubt, don’t use this command.
vrrp_virtual_mac_addr: 00:01:02:03:04:05
There are three failover_modes supported; switch, vlan and port. For switch failover, if
any managed link fails, the entire switch is failed over to the backup switch. In vlan failover, if
a link fails in a VLAN, only the links associated with that VLAN are failed over. In port
failover, only the port that fails is moved to the backup switch. For both vlan and port
failover, the interconnect link will need to be used to maintain connectivity between ports that
have failed over and those that have not. For vlan and port failover modes, the interconnect
link must be an in-band port, and must be included in the managed VLANs running with VLAN
tagging on.
failover_mode:
failover_mode:
failover_mode:
switch;
vlan;
port;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 250
Additional startup scripts may be included in the configuration using the start_script command.
The files in the start_script command will be placed in a location for tftp transfer to sibling
switches that initialize using the –u option. A common use of the start_script command might be
to propagate gated configurations to all members of the Surviving Partner group. Absolute path
names must be used. Using multiple commands allows inclusion of multiple scripts. For
example:
start_script: /etc/rcZ.d/S75gated;
start_script: /etc/rcZ.d/S80static_routes;
The vrrpd_script command allows a user defined script to be run when vrrpd changes
state. This script is called at the end of the zspconfig created vrrpd.script. See vrrpd –s for a
description of when the scripts are called. zspconfig sets up vrrpd to call vrrpd.script. The
vrrpd_script command in zspconfig places a call to the user-defined script at the end of
vrrpd.script file. The following example would call the my_vrrpd_script each time
vrrpd calls its –s provided script:
vrrpd_script:
/etc/rcZ.d/surviving_partner/my_vrrpd_script;
NOTE: The my_vrrpd_script is not called from a different process thread. Therefore if
my_vrrpd_script crashes or has long delays, it will crash the vrrpd, or cause delays in
the Surviving Partner failover. To protect against this, write the script to launch a second
script in a background shell. The advantage to calling the user provided script in the same
process thread is that it gives synchronized control over the failover process for those who
want it.
OUTPUT FILES
The output of zspconfig is a set of configuration and script files. The configuration files
configure vrrpd and zlmd daemons. The vrrpd and zlmd daemons combined with the script
files run the Surviving Partner. This is a list of all configuration and script files:
/etc/rcZ.d/S70Surviving_partner
The main startup script that starts the Surviving Partner by
running zconfig, ifconfig, zlmd and vrrpd. zspconfig prompts
the user to run this script. This file can be saved with
zsync to automatically start the Surviving Partner at switch
boot.
/etc/rcZ.d/surviving_partner/vrrpd.conf
Configuration script for the VRRP daemon. This configuration
is used when the S70Surviving_partner script launches vrrpd.
There is a line in this file for each router address vrrpd
will manage. Or stated another way, each virtual_address
command in the zspconfig configuration file results in a line
in vrrpd.conf.
/tftpboot/zsp.conf<n>
zspconfig configuration file that contains the configuration
of the sibling backup switches. The <n> is used to
Ethernet Switch Blade User's Guide
release 3.2.2j
page 251
distinguish potentially more than one backup switch. This
configuration file is placed in /tftpboot, and is retrieved
via DHCP by a replacement switch on boot up.
/etc/rcZ.d/surviving_partner/dhcpd.conf
Configuration script used by dhcpd when the switch becomes
master. dhcpd is used to serve replacement switches their
configuration scripts. Namely a zsp_DC.conf file that can be
input to the zspconfig with the -u flag.
/etc/rcZ.d/surviving_partner/dhclient.conf
If zspconfig is executed with the -u flag, a dhclient.conf
file is created, and then dhclient is used to retrieve a
zspconfig configuration file from the /tftpboot area of the
Master switch.
/etc/rcZ.d/surviving_partner/vrrpd.script
Runtime script that executes each time the vrrpd changes
state. This script starts and stops dhcpd, and toggles down
bonding driver/rain link ports to force the nodes to a new
Master switch.
/etc/rcZ.d/surviving_partner/zlmd.script
Runtime script executed by zlmd when a link goes up or down.
This script modifies the priority of vrrpd, which in turn may
cause the VRRP Master to move from one sibling switch to
another.
SEE ALSO
zconfig, ifconfig, vrrpd, dhclient, dhcpd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 252
zstack
NAME
zstack - Configures the OpenArchitect switch stacking.
SYNOPSIS
zstack [-h <host_name>] [-d <level>] [-a] [-t] [{-f <file>} |
<configuration>]
DESCRIPTION
zstack combines multiple switch fabric chips into a single virtual switch. zstack must be run
before any other switch configuration. Specifically it must be run before zconfig. zstack is
typically run from an S20stack script prior to the S50xxx scripts. zstack currently only
supports directly connected switch chips as are present on the Ethernet Switch Blade. Directly
connected means that the local CPU can directly access and control the switch fabric chips being
stacked. zstack does not yet support network based stacking where there are separate boards
with separate CPUs controlling the switch fabric chips.
OPTIONS
-h <hostname>
Specifies the remote hostname to configure.
By default, zstack configures stacking on the local
OpenArchitect switch. This option should only be used for
displaying the configuration, if at all.
-d <level> Sets the level of debugging output produced by
zstack. The default level is 1. Setting the debug level
higher produces more output. The maximum output level is
currently four (4).
-a
Displays the current stacking configuration of the
switch.
-t
Tears down the entire switch stacking configuration.
{-f <file>} | <configuration>
Gets configuration information from the specified file. A
<file> name of '+' reads configuration data from standard
input. If the -f flag is not used, a single line of
configuration data can be entered as parameters to zstack.
CONFIGURATION SYNTAX
zstack takes configuration data from standard input or from a file with the -f option. In either
case, the configuration syntax is the same. The zstack configuration data consists of a list of
Ethernet Switch Blade User's Guide
release 3.2.2j
page 253
semicolon-delimited statements. Each statement specifies an action to take on a stack. A stack is
a group of ports on a single switch fabric chip. Actions include stack creation, stack port
association, stack configuration and stack control.
Comments, spaces and new lines are ignored. Comments begin with the # character and include
characters through the next new line.
Stack Creation
The first step in creating a stack is to define its location. Each stack is assigned a unique small
integer by the user. On the Ethernet Switch Blade this integer must be a value from 0 to 31. The
location is defined with two values; a Physical Point of Attachment (ppa) and a network location.
The ppa is defined by the keyword "ppa" followed by an integer value. The integer value is a 0
based contiguous value representing the physical switch fabric chip as it was discovered by the
Linux operating system. In the case of the Ethernet Switch Blade there are two chips directly
controlled. The network location specifies an IP address of the CPU that controls the physical
switch fabric chips. If the CPU that is running zstack controls the physical switch fabric chip,
the key word "local" is used in place of the IP address. Currently only "local" CPU control is
supported.
Stack creation example for a Ethernet Switch Blade:
stack0: ppa0 local;
stack1: ppa1 local;
The above statements indicate that there are two switch fabric chips that are controlled by the
local CPU.
Stack Port Association
After stack creation, the physical ports must be associated with a virtual port name. One might
think of this as mapping the ports from their physical association to a virtual name. The physical
port numbers are usually 0 based, but are dependent on how the ports are physically configured in
the switch fabric silicon and how those ports are labeled at the physical connector. At a
minimum the port association is used to move the ports of a second, third, or more switch silicon
chip to a different virtual port name then the others. In this way, the ports can be built into a
unique linear port list.
Stack port association syntax:
stack<N>: <zre_list> = <zre_list>;
The port association statement begins with the stack and number representing the group of ports
being mapped. The stack must be previously created with a stack creation command. After the
semicolon are two zre_lists separated by an equal sign. The first is the list of virtual port
names, the second is the physical port names. The assignment is done in order, and there must be
Ethernet Switch Blade User's Guide
release 3.2.2j
page 254
an equal number of ports in each list. Wild cards may be used in the zre_lists. See below.
Stack port association syntax for a Ethernet Switch Blade:
stack0: zre0..11 = zre0..11;
stack1: zre12..23 = zre0..11;
The first statement above configures the first switch silicon chip, represented by stack0, to have
no translation between its physical port numbering and its virtual port numbering.
NOTE: The statement must be made even if the mapping is one-to-one.
The second statement above configures the second switch silicon chip to have its physical ports 0
through 11 map to virtual ports 12 through 23. The mapping is done in a linear fashion. zre0
maps to zre12. zre1 maps to zre1 and so on.
Stack Configuration Statements
After stack creation and port association, the configuration of the stack must be defined. The
stack configuration provides the network map of how inter-switch fabric communication is
performed. It specifies which physical port or should be used to communicate with a different
group of stacked ports. The syntax is as follows:
stack<N>: stack<M> = zre<n>;
The above syntax indicates that stack N should use zre n to access stack M. The zre value n is
a physical port number as seen by Stack N. It is not a virtual port number as mapped by a port
association command. Multiple configuration statements for Stack N can be used to indicate how
to get to other stacks.
NOTE: stack<M> can be a list of comma delimited or range of stacks as described below
in the section on wild cards.
Stack port association example for a Ethernet Switch Blade:
stack0: stack1 = zre12;
stack1: stack0 = zre12;
The above example indicates that stack0 is connected to stack1 through the port 12 and stack1 is
connected to stack0 through port 12. Zre12 on the Ethernet Switch Blade switch fabric chips is
the HIGIG port and are directly connected between the two devices.
Stack Control Statements
Finally after creating the stack, associating the ports, and setting the stack configuration, the stack
can be enabled using one of the Stack Control statements. The following stack control statements
Ethernet Switch Blade User's Guide
release 3.2.2j
page 255
are supported.
enable;
The enable statement turns on stacking that has been previously configured.
cannot be made until configuration is complete.
This statement
disable;
The disable statement turns off stacking. Before disabling stacking, all Ethernet Switch Blade
daemons must be stopped, and the VLAN configurations must be torn down using zconfig.
EXAMPLES
zstack stack0: ppa0 local
zstack stack1: ppa1 local
zstack stack0: zre0..23, zre48..49 = zre0..25
zstack stack1: zre24..47, zre50..51 = zre0..25
zstack stack0: trunk0 = zre26..27
zstack stack1: trunk0 = zre26..27
zstack stack0: stack1 = trunk0;
zstack stack1: stack0 = trunk0;
zstack enable
WILDCARDS
Wild card characters can be included to simplify the process of creating larger, more complex
configurations. Wild card characters for zconfig include:
,
(comma)
Use for creating lists
..
(dot-dot)
Specifies an inclusive range
Below are some examples for the correct usage of the comma (,) and dot-dot (..). Each line below
produces the same results:
stack0: zre4..7 = zre0, zre1, zre2, zre3;
stack0: zre4, zre5..7 = zre0..3;
stack0: zre4..7 = zre0, zre1..3;
stack0: zre4, zre5..7 = zre0..1, zre2..3;
The stack may also be in list form in the Stack Configuration command in similar fashion to the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 256
zre lists. Example of stack0..3 representing stacks 0, 1, 2 and 3.
SEE ALSO
zconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 257
ztats
NAME
ztats − Display statistics and information about switch
SYNOPSIS
ztats [-d <level>] [-i <unit>] | [-m <port>] |
[-v <vlan id>] | [-t <tgid>] | [-v]
DESCRIPTION
ztats displays MIB counters for a selected physical port, trunk group or VLAN. It can also
display information about the configuration of the switch and bridge to the PCI bus or the Vital
Product Data memory. All output is formatted.
OPTIONS
-m <port>
MIB statistics for specified <port>
-v <vlan id>
-i <unit>
MIB statistics for specified <vlan id>
Information for specified <unit>:
0 is
BCM56504 ports 0-23, 48, 49
1 is
BCM56504 ports 24-47, 50, 51
-d <level> Set debug level to <level>
-t <tgid>
MAC layer statistics for all ports in trunk <tgid>.
-v
Vital Product Data (not currently supported in Ethernet
Switch Blade)
EXAMPLES
To display statistics for a particular port on the switch, such as port 0
ztats –m 0
SEE ALSO
zreg, zal
Ethernet Switch Blade User's Guide
release 3.2.2j
page 258
zsync
NAME
zsync − Saves changes to the flash.
SYNOPSIS
zsync [-c][-f][<dir_or_file>]
zsync [-c][-f][-t <file>]
zsync [-c][-f][-z]
zsync [-c][-l]
DESCRIPTION
zsync is used to save a snapshot of the current file system to flash ROM. By default, zsync
creates a compressed tar image of the files that have changed and saves the image in the flash
ROM. The saved image is expanded on reboot. The saved compressed tar image is called an
“overlay”.
If a directory parameter is given to zsync, the contents of the directory are saved instead of
searching for updated files. The specific purpose of the <directory> parameter is for saving
files that have been mounted with zmnt. Using the -t option allows a tar image created by
zmnt -t to be saved.
To correct a corrupted file that is saved to flash ROM with zsync, first reboot with the -i
option (see Switch Maintenance). Use zmnt to put the corrupted file in the /mnt directory,
open and correct the file, then zsync to the /mnt directory to save your changes and reboot.
There are two overlay areas: dynamic and custom. The dynamic overlay is where the switch's
current configuration is stored. It will boot with a simple reboot command. The custom
overlay is where the customization to the standard software is stored. It is used to create new,
customer-specific default configurations that differ from the generic configuration. The intent of
the custom overlay is to make it possible for customers to customize the switch for their end
users. To return to the custom overlay, (that is, default configuration) use the reboot -i
option. The custom overlay is written by using the -c option.
The -z option zeros the overlay area, returning the switch to the factory configuration.
Specific files or directories can be excluded from saving to flash by zsync by including an entry
in /etc/exclude. Likewise, existing entries in /etc/exclude such as /tmp can be removed
in order to save those files to flash with zsync.
OPTIONS
-c
Save files to the custom overlay
Ethernet Switch Blade User's Guide
release 3.2.2j
page 259
-t <file>
-z
Read files to be saved from a tar file.
Zero the overlay area.
-f
Do not confirm with user and do not warn if saving
failed. Exit code can be examined to determine success or
failure.
<dir_or_file>
Save only the named file, or save the named
directory to the overlay. Contents of directories must be
created with zmnt.
-l
List files that would be written.
Do not flash.
EXAMPLES
To zsync only the hosts file:
cd /etc
zsync hosts
If you previously created a snapshot of an overlay to a tar file using zmnt,
zmnt –t overlay.tar
You can use zsync to restore the overlay on the switch directly from the tar file,
zsync –t overlay.tar
The restored overlay will be loaded upon the next reboot.
FILES
/etc/exclude, /.zsync
SEE ALSO
zmnt
Ethernet Switch Blade User's Guide
release 3.2.2j
page 260
ztmd
NAME
ztmd – traffic management daemon which accepts messages from traffic filtering and
quality of service applications and sets up hardware.
SYNOPSIS
ztmd [-d <level>] [-p <port>] [-f] [-i <pid>]
[-o <pid>] [-a <addr>] [-l]
DESCRIPTION
ztmd listens for messages on a multicast port. These messages describe packet filters and
queuing disciplines that are to be installed in the switch hardware. ztmd interprets these
messages to set up the switch hardware.
OPTIONS
-d <level> Set the level of diagnostic information logged.
<level> may be 0-4; higher levels produce more output.
.p <port> Use <port> as the multicast listening port for
communication with ztmd. Default is 2345.
-f
Run ztmd in the foreground.
run as a daemon.
Without this option, it is
-i <pid>
Set the PID for this process (default is 1)
-o <pid>
Set expected client PID.
-a <addr>
Bind multicast socket to <addr>
-l
Log diagnostic output to /var/log/ztmd.log
EXAMPLES
Start the traffic management daemon, ztmd, then start zqosd to monitor tc output and
zfilterd to monitor iptables(8) output. All daemons are run as background processes
and log their messages to files in /var/log.
ztmd –l
zqosd –l
zfilterd -l
Ethernet Switch Blade User's Guide
release 3.2.2j
page 261
SEE ALSO
zqosd, iptables(8), tc(8), zfilterd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 262
brctl(8)
NAME
brctl - Bridge and Spanning Tree Protocol administration.
SYNOPSIS
brctl [options]
DESCRIPTION
brctl is used to set up, maintain, and display the bridge configuration in the Linux kernel.
brctl is a standard command included with Linux bridge support which includes Rapid
Spanning Tree Protocol (RSTP) support.
A bridge is a device commonly used to connect different networks together, so that these
networks will appear as one network to the participants.
Each of the networks being connected corresponds to one physical interface, or port in the bridge.
These individual networks are bundled into one bigger logical network. This bigger network
corresponds to the bridge network interface.
Multiple bridges can work together to create even larger networks using the IEEE 802.1d
Spanning Tree Protocol and 802.1w Rapid Spanning Tree Protocol. This protocol is used for
finding the shortest path between two networks as well as eliminating loops from the topology.
Bridges communicate with each other by sending and receiving Bridge Protocol Data Units
(BPDUs).
brctl(8) can be used for configuring certain spanning tree protocol parameters. For an
explanation of these parameters, see the IEEE 802.1d specification for detailed information.
OPTIONS
show shows all current bridges.
showbr <bridge>
shows information for the bridge and its attached ports. Check
the priority using this command.
showmacs <bridge>
shows a list of learned MAC addresses for the bridge.
setgcint <bridge> <time>
sets the garbage collection interval for the bridge to <time>
seconds. This means that the bridge will check the forwarding
database for timed out entries every <time> seconds.
stp <bridge> <state>
controls this bridge's participation in the Spanning Tree
Protocol. <state> can be “off” or “on”. When turned off, the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 263
bridge will not send or receive BPDUs, and will thus not
participate in the Spanning Tree Protocol. If your bridge
isn't the only bridge on the LAN, or if there are loops in the
LAN's topology, DO NOT turn this option off.
Turning this
option off may impair network traffic, so be careful.
setbridgeprio <bridge> <priority>
sets the bridge's priority to <priority>. The priority value
is an unsigned 16-bit quantity (a number between 0 and 65535),
and has no dimension. Lower priority values are better. The
bridge with the lowest priority will be elected Root Bridge.
setfd <bridge> <time>
sets the bridge's bridge forward delay to <time> seconds.
sethello <bridge> <time>
sets the bridge's bridge hello time to <time> seconds.
setmaxage <bridge> <time>
sets the bridge's maximum message age to <time> seconds.
setpathcost <bridge> <zre#> <cost>
sets the port cost of the port (zre#) to <cost>. This is a
dimensionless metric. The path cost is set to 100 for all
OpenArchitect switch ports by default. For the OpenArchitect
switch a port is zre1, zre2, …
IEEE 802.d recommends the
following:
Link
Speed
Recommended
Value
Recommended
Range
10 Mb/s
100
50-600
100 Mb/s
19
10-60
1 Gb/s
4
3-10
setportprio <bridge> <zre#> <priority>
sets the port's priority to <priority>. The priority value (a
number between 0 and 255), and has no dimension. This metric
is used in the designated port and root port selection
algorithms. For the OpenArchitect switch a port is zre1, zre2,
…
NOTES
brctl(8) replaces the older brcfg tool.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 264
SEE ALSO
zconfig, zl2d
Ethernet Switch Blade User's Guide
release 3.2.2j
page 265
Appendix B Base Switch Command Man Pages
OpenArchitect applications are implemented above the OpenArchitect libraries and the RMAPI
interface. OpenArchitect applications are used for normal operation of the switch, for runtime
status and diagnostics, and for prototyping new applications development.
For runtime operation, the OpenArchitect applications perform initialization and configuration,
and real-time control and maintenance of the switching tables in the switch silicon. Protocol
support is performed by the Linux operating system. In turn the OpenArchitect applications
communicate with Linux to determine the appropriate switch table setup.
The initialization of the switch is completed by the zconfig application. Through configuration
scripts, the user can setup any combination of Layer 2 and Layer 3 switching configurations with
VLAN support. Running the zconfig command causes network interfaces to be presented to
the Linux operating system. These interfaces can be setup for Layer 2 bridging functions such as
Spanning Tree Protocol, or Layer 3 routing through the Linux operating system.
zl2d is run as a daemon to monitor the Linux operating system bridging function and update the
switch silicon accordingly.
zl3d is run as a daemon to monitor the Linux operating system routing table information and
update the switch silicon switching tables accordingly.
For gathering statistics or prototyping applications, there are OpenArchitect applications that
allow any register or table in the switch to be read or written. These applications include zreg,
ztats, and zarl and all of the different table equivalents.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 266
vrrpconfig
NAME
vrrpconfig – Configure and control the running vrrpd
SYNOPSIS
vrrpconfig [-d <level>] -- <vrrpd parameters>
vrrpconfig [-d <level>] [-k] [-a] [-p] [-s <vid>]
DESCRIPTION
vrrpconfig provides communication with a running vrrpd daemon. The -- option for
vrrpconfig will pass all parameters to vrrpd as would be done when starting the vrrpd.
Any output generated by vrrpd is displayed on the vrrpconfig controlling tty. Any action
normally taken by vrrpd for the given parameter is done so by vrrpd. Reference vrrpd for
vrrpd parameters and their usage.
OPTIONS
vrrpconfig also has a set of local options that are not passed to vrrpd directly. Many do,
however, retrieve information from the running vrrpd. The local options are as follows:
-d <level> Set the debug level. The default debug level is 1.
The higher the level, the more debugging output is produced.
Debugging output is sent to the controlling tty. This debugging
output is from vrrpconfig. To set the debug level of vrrpd, one
would use the vrrpd debug level setting option placed after the
-- in the vrrpconfig command line.
-a
Display in a user readable format, information about the
current state of all the Virtual Routers controlled by vrrpd.
-k
Kill vrrpd. The entire daemon is killed.
command will require that vrrpd be restarted.
-p
Running this
Display relevant SNMP table values.
-s <vid>
Print a numeric representation of the state of the
Virtual Router associated with the Virtual Router Identifier
<vid>. The numeric representations are 1 = INIT, 2 = BACKUP, and
3 = MASTER
Ethernet Switch Blade User's Guide
release 3.2.2j
page 267
EXAMPLES
Here is an example of using the -- invocation method that changes the priority to 99 for the
Virtual Router associated with the Virtual Router Identifier 1:
vrrpconfig -- -v 1 –p 99
SEE ALSO
vrrpd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 268
vrrpd
NAME
vrrpd – Virtual Router Redundancy Protocol Daemon
SYNOPSIS
vrrpd -i ifname -v vrid [-f piddir] [-s] [-a auth] [-p prio]
[-nhb] [-I ifname] [-d delay] [-m address] [-M ] [-B]
[-S script] [-c conf_file] [-D level] ipaddr
DESCRIPTION
vrrpd is an implementation of Virtual Redundant Routing Protocol (VRRPv2) as specified in
RFC2338. It runs in Linux user space. In short, VRRP is a protocol that elects a Master server on
a LAN to which the Master answers to a virtual IP address. If it fails, a Backup server takes over
the IP address.
VRRP specifies an election protocol that dynamically assigns responsibility for a virtual router to
one of the VRRP routers on a LAN. The VRRP router controlling the IP address(es) associated
with a virtual router is called the Master, and forwards packets sent to these IP addresses. The
election process provides dynamic failover in the forwarding responsibility should the Master
become unavailable. This allows any of the virtual router IP addresses on the LAN to be used as
the default first hop router by end-hosts. The advantage gained from using VRRP is a higher
availability default path without requiring configuration of dynamic routing or router discovery
protocols on every end-host.
OPTIONS
The following options are supported by vrrpd:
-h
display the usage line
-n
Don’t use the virtual MAC address
-b
Run vrrpd in foreground
-i <ifname>
the interface name on which to run the Virtual
Router. Machines connected with the named interface will see the
Virtual Router Address move with the Master switch
-I <ifname>
the interface name on which to communicate with
other VRRP routers for management of the Virtual Router (default
is the -i interface)
-v <vrid> the id of the Virtual Router Identifier [1-255].
This value must be a unique value, one per Virtual Router. In
Ethernet Switch Blade User's Guide
release 3.2.2j
page 269
other words there is a unique vrid to ifname associated with the
–i option.
-s
Toggle preemption mode (Enabled by default). Preemption
means that a Master switch will go to Backup if a current Backup
has higher priority.
-M
Become MASTER when priority is equal. Be sure it is only
set on one host or the switches will oscillate. Must set –B
option on other hosts (requires preemption mode ! -s)
-B
Become BACKUP when priority is equal.
-S <script>
See -M option
script to be called when state change occurs.
-a <auth> (not yet implemented) set the authentication type
auth=(none|pass/hexkey|ah/hexkey) hexkey=0x[0-9a-fA-F]+
-p <prio> Set the priority of this host in the virtual server
(default is 100)
-f <piddir>
specify the directory where the pid file is
stored (default is /var/run)
-c <conf> Configuration read from conf file (required when
managing multiple Virtual Routers). The contents of the conf
file are lines of command line options. Each line represents a
Virtual Router. Parameters given on the command line apply to
all Virtual Routers defined by the conf file. So for example, if
the command line reads:
vrrpd –d 50 –c vrrpd.conf
And the vrrpd.conf file contains:
-v 1 –i zhp0 –I zhp3 10.0.0.42
-v 2 –i zhp1 –I zhp3 11.0.0.42
vrrpd would be started controlling two Virtual Routers; one for
10.0.0.42 and the other for 11.0.0.42. They would both get a –d
50 option.
-d <delay> Set the advertisement interval. Default is 1 second.
By default time is specified in seconds. If the delay value ends
in a lower case ‘m’ the time is specified in milliseconds. The
millisecond specification results in a proprietary use of the
VRRP Adver Int field.
-m <address>
Change the virtual MAC address from
00:00:5E:00:01:<vid> to the provided addr. The addr should be
input as 6 two digit hex numbers that are colon delimited with no
Ethernet Switch Blade User's Guide
release 3.2.2j
page 270
spaces. The –n option overrides the change made with –m. The
result of which to use the native MAC address of the interface.
Using the –n option is not recommended.
-D <level> Set debugging output to the supplied level
<ipaddr>
the ip address(es) of the virtual server
SEE ALSO
vrrpconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 271
zbootcfg
NAME
zbootcfg − Modifies the boot parameters of the OpenArchitect switch.
SYNOPSIS
zbootcfg -a | -d <device number> [<boot_string>]
DESCRIPTION
zbootcfg is used to display or modify the boot parameters on the switch. The boot parameters
are utilized by the minof boot loader application to indicate on which device to find a boot image.
Care should be taken when changing the boot string. Incorrect procedures can result in a switch
that cannot boot.
-d <device_number>
Three ROM boot devices are available in the switch. The factoryshipped boot device is 1. The following describes each boot
device:
-d 1 Boot image located at offset 0 in the application flash 1.
This is the factory-shipped location of the primary OpenArchitect
image.
-d 2 Loads an image located at offset 0 in the application flash
2. This is the factory-shipped location for the alternate
OpenArchitect image.
Any characters after the -d <dev> parameters are saved in flash memory and passed unchanged
to the booting kernel.
OPTIONS
-a
Displays the current boot string. The default factory
shipping string is “dev1.”
-d <dev>
Specifies the ROM device from which to boot. The <dev>
value must be the number 1 or 2 corresponding to application
flash 1 or application flash 2, respectively.
<boot string>
Optionally, a boot string may be provided that is
passed to the booting kernel. All characters after the -d <dev>
are passed unchanged to the booting kernel.
EXAMPLES
The following example illustrates a command for making the image boot from the second
Ethernet Switch Blade User's Guide
release 3.2.2j
page 272
application flash. Typically this is required before updating application flash 1. By booting the
alternative image, if a failure occurs during the programming of application flash 1, recovery is
easier.
zbootcfg -d 2
The next example passes the -i option to the booting kernel. This is useful when recovering
from a mistake saved to the read-write file system or after updating the application flash 1 and
doing the first boot. The -i option prevents the read-write file system from overwriting the initial
RAM disk image.
zbootcfg -d 1 –i
SEE ALSO
zflash, reboot(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 273
zconfig
NAME
zconfig - Configures the OpenArchitect switch.
SYNOPSIS
zconfig [-h <host_name>] [-d <level>] [-a] [-t] [{-f <file>} |
<configuration>]
DESCRIPTION
zconfig creates VLAN groups of switch ports or trunks. Each VLAN group forms a Layer 2
switching domain. Each VLAN group has a VLAN Identification number (VID) that can be
carried in a tag field, located in the header of packets traveling on that VLAN. The configuration
of a port determines whether a packet transmitted from that port includes the VLAN tag. A set of
up to eight ports may be configured as a trunk, with all links from these ports connect to the same
link partner. For each VLAN group created by zconfig, a network interface is also created.
After the network interface is started by ifconfig(1M), the VLAN group performs Layer 2
switching. The network interface can be used for Layer 3 routing between VLAN groups.
A network interface uses the following format:
zhpN (for example, zhp0)
N-is an integer between 0 and 9999. The value of N is not required to be the same as any of the
port(s) that are its members. The range 0-4999 is reserved for network interfaces created by users.
The range 5000-9999 is reserved for network interfaces created by switch applications.
A trunk uses the format zrlK, where K is an integer between 0 and 31, though only 24 ports
actually may be used on the base board.
OPTIONS
-h <hostname>
Specifies the hostname to configure.
zconfig configures the local OpenArchitect switch.
By default,
-d <level> Sets the level of debugging output produced by
zconfig. The default level is 1. Setting the debug level higher
produces more output. The maximum output level is currently
four (4).
-a
Displays the current configuration of the switch.
-t
Tears down the entire switch configuration.
{-f <file>} | <configuration>
Gets configuration information from the specified file. A <file>
name of ‘+’ reads configuration data from standard input. If the
-f flag is not used, a single line of configuration data can be
Ethernet Switch Blade User's Guide
release 3.2.2j
page 274
entered as parameters to zconfig.
CONFIGURATION SYNTAX
zconfig takes configuration data from standard input or from a file with the -f option. In
either case, the configuration syntax is the same. The zconfig configuration data consists of a
list of semicolon-delimited statements. Each statement specifies an action to take globally or on
an interface. An interface is one of three types: a network interface called ZNYX host port (zhp);
a switch port interface called ZNYX Raw Ethernet (zre); or a trunk interface called ZNYX
RainLink (zrl).
Comments, spaces and new lines are ignored. Comments begin with the #
character and include characters through the next new line.
Global Statements
Global statements can be used to set modes of operation on a switch-wide basis.
The only supported global statement is to set and teardown Double VLAN tag
mode.
Global Statement Syntax:
Double VLAN tag mode is set and removed on a global basis with the following syntax.
dvlan 0x8100 | 0x9100;
(or other unused ethertype)
dvlan teardown;
The first option sets double vlan tag mode on all ports and establishes the outer
tag id. The second tears down double vlan tag mode.
Trunk Interface Statements
A trunk interface statement begins with the trunk name followed by an equals
sign and an action. Trunk interface statements are used to create or tear down
trunks or define the rules to determine which member of the trunk should be used
to transmit a packet
Trunk interface syntax:
zrl0 = <Trunk Interface Action>;
Trunk interface actions:
List of ports
Creates a trunk interface with the specified port
members. All of the ports specified must not be a part of any
other trunk, or be individually included in any network
interface. Up to eight ports can be included in a trunk.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 275
A port member is identified with the zre<X> format, where x
represents a port number between 0 and 23 for the in-band ports.
The Out-of-Band ports cannot be included in the List of ports.
teardown
Removes the trunk interface, making the ports which
were part of the trunk available for configuration in other
trunks or VLANs.
all
mac [ source_address | destination_address ]
ip [ source_address | destination_address ]
port [ source_port | destination_port ]
Further specifies the rules for selecting which port in the
trunk a packet should be transmitted out of. A comma delimited
list is valid to specify more than one criterion. Specifying a
particular option only uses that layer’s source and destination
information. The default is all, which combines all criteria for
determining the transmit port of the trunk. Specifying both
source and destination for a given layer is the same as
specifying that layer itself, that is,
zrl0=ip source_address, ip destination_address
is the same as,
zrl0=ip
NOTE: The base switch supports destination_address and/or source_address for
MAC and IP. It cannot combine MAC and IP settings, nor does it support port settings.
Examples of trunk interface statements:
This statement creates a trunk containing three ports:
zrl5 = zre11,zre15,zre17;
The following statement specifies that packets will be sent out over this trunk
using the exclusive OR of the last three bits of their MAC source and destination
addresses to select the port:
zrl5 = mac source_address, mac destination_address;
The teardown statement uses a colon instead of an equals sign:
zrl5: teardown
Ethernet Switch Blade User's Guide
release 3.2.2j
page 276
Network Interface Statements
The syntax for a network interface statement is the interface name followed by a colon and an
action. Network interface statements are used to create or tear down a VLAN group and can
consist of one or a list of network interface names; followed by a colon and then an action. For
example:
zhp0: <Network Interface Action>;
Network interface actions may include:
vlan<N> = list of ports or trunks
Creates a network interface and a VLAN group with a VLAN
identification number (VID) consisting of specified port members.
<N>
is an integer between 1-4095.
list of ports or trunks
A port member is identified with the zre<X> format, where x
represents a port number between 0-23 for the in-band ports. A
trunk is identified with the zrl<Y> format, where Y is a number
between 0-31.
If the network interface and VLAN group already exist, the
specified ports or trunks are added to the network interface and
VLAN group.
teardown
group.
Deletes the network interface and the associated VLAN
zre_list = multicast <mac_address>
Register the multicast <mac_address> on the zre_list ports
associated with the given VLAN
multicast_clear Clear all registered multicast address on all the
ports in the VLAN
<list of ports or trunks> teardown
Deletes the specified ports or trunks from the network interface
and the VLAN group associated with it. If there are no remaining
port or trunk members, then also deletes the network interface
and VLAN group.
Examples of Network Interface Statements:
The statement below creates a VLAN group with the VID number 1 and the network interface
named zhp5. This VLAN includes a single switch port, zre1.
zhp5: vlan1=zre1;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 277
The next statement creates a VLAN group with the VID number 100 and the
network interface named zhp1. This VLAN includes four switch ports, zre1,
zre10, zre11, zre13.
zhp0: vlan100 = zre1,zre10,zre11,zre13;
The next statement adds two switch ports, zre1, zre2 and zre3, to an existing network interface
and VLAN.
zhp0: vlan100 = zre1..3;
The next statement deletes two switch ports, zre1 and zre2, to an existing network interface and
VLAN.
zhp0: zre1..2 teardown;
The final example is a teardown action that deletes the VLAN group defined in the previous
example, including the network interface.
zhp1: teardown;
Port Interface Statements
Port interface statements specify a port or trunk name or a list of such names; followed by an
equal sign (=) and then the action. Port interface actions may include:
SYNTAX
zconfig <zre_list>=untag<n>
untag<N>
Packets sent from this port or trunk for VLAN <N> are
transmitted without a VLAN tag. The port or trunk specified must
have previously been included in the VLAN group with VID<N>.
zconfig <zre_list> multicast=<forward_type>
<forward_type> Set the ports specified to act as defined by the
forward_type for multicast traffic. Possible <forward_types>
are:
forward_unregistered (default)
forward_all
filter_unregistered
Examples of Port Interface Statements:
Assuming that zre1 has been assigned to VLAN 1, to specify that packets sent from port 1 for
VLAN 1 are transmitted without a VLAN tag, and packets arriving on this port without a VLAN
Ethernet Switch Blade User's Guide
release 3.2.2j
page 278
tag are given the VLAN tag with the VID number 1, enter:
zre1=untag1;
If port 0 is also a member of VLAN 100, packets for VLAN 100 are sent from this port with a
VLAN tag as part of their header.
In the next example, the switch ports 10, 11, and trunk 2 are configured as untagged members of
VLAN 100.
zre10,zre11,zrl2=untag100;
This statement is equivalent to the following three lines:
zre10=untag100;
zre11=untag100;
zrl2=untag100;
In the examples above, since port interfaces can only be untagged for one VLAN group, zre1
cannot also be untagged for VLAN 100. A port or trunk can be a member of multiple VLANs but
can only be designated untagged on one VLAN.
WILDCARDS
Wild card characters can be included to simplify the process of creating larger, more complex
configurations. Wild card characters for zconfig include:
,
(comma)
Use for creating lists
..
(dot-dot)
Specifies an inclusive range
+
(plus)
Specifies auto-incrementing
Below are some examples for the correct usage of the comma (,) and dot-dot (..). Each line below
produces the same results:
zhp0: vlan1 = zre1, zre2, zre3, zre4;
zhp0: vlan1 = zre1..4;
zhp0: vlan1 = zre1, zre2..4;
zhp0: vlan1 = zre1..2, zre3..4;
The following examples create multiple VLAN groups using a single statement. A list of network
interface names are followed by a colon (:) and a list of VLAN actions which are followed by an
equal sign (=) and a port list. Each VLAN group is created in turn, along with the corresponding
Ethernet Switch Blade User's Guide
release 3.2.2j
page 279
network interface, and all ports listed after the equal sign are included in each group.
The following statement creates 14 VLAN groups with VID numbers 1-14. Each VLAN contains
the same switch port, port 1, represented as zre1.
zhp0..13: vlan1..14 = zre1;
The plus (+) wildcard can be used with the last port listed to auto-increment that port number
before each VLAN group is created. The following network interface statement creates 14 VLAN
groups, with the first group containing port 1, the second group port 2, and so on. The second
statement configures all ports as untagged in their respective VLANs.
zhp0..13: vlan1..14=zre1+;
zre1..13=untag1+;
This is equivalent to:
zhp0: vlan1=zre1;
zhp1: vlan2=zre2;
zhp2: vlan3=zre3;
.
.
zhp13: vlan14=zre14;
zre1=untag1;
zre2=untag2;
zre3=untag3;
.
.
zre14=untag14;
The previous configuration can be used for creating a 14 port Layer 3 switch, with each port
assigned to its own VLAN.
In the next example, one VLAN group, with VID number 1, is created that contains 14 ports. The
second statement designates the 14 ports as untagged for the VLAN 1 group.
zhp0: vlan1 = zre1..13;
zre1..13 = untag1;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 280
The previous configuration can be used for creating a 14 port Layer 2 switch, all 14 ports
assigned to the same VLAN.
SEE ALSO
zl3d
Ethernet Switch Blade User's Guide
release 3.2.2j
page 281
zcos
NAME
zcos - class of service queue control
SYNOPSIS
zcos
[-h <hostname>] [-d <level>]
[ -u <default priority> ]
[ -m q0,q1,q2,q3,q4,q5,q6,q7 ]
[-n <queue length list in packets for each queue> |
-b <Reserved space in bytes for each queue> |
-s <limit on dynamic pool usage, in bytes>, <reset %>]
[ -k PRI | RR | WRR | DRR]
[ -w <queue weight list> ]
[ -g <max>,<burst> ]
[ -r <guaranteed bandwidth in Kbps for each queue> [ -t <burst
size list in Kbytes> ]
[ -l <maximum bandwidth in Kbps for each queue>
size list in Kbytes> ]
[ -t <burst
[ -q all | qmap | qinfo | scheduler ]
[<port list>]
DESCRIPTION
zcos provides a means to set many of the hardware features of the switch related to class of
service and differentiated services processing, including scheduling and bandwidth management.
The current settings can also be examined.
The OpenArchitect switch supports up to eight class of service queues for packets to be sent out
each of the Ethernet ports or forwarded to the CPU. Normally, packets are placed in these queues
based on their 802.1p priority for tagged packets or the default priority for the port on which they
arrive. The queue destination for each priority is determined by a map. A separate map is used
for each ingress port. For additional means of setting the cos queue for a packet, see the sections
on filtering and traffic control.
Packets are selected from the cos queues at a port based on a scheduler, which may be configured
in a variety of modes. The scheduler can provide minimum bandwidth guarantees and limit the
bandwidth used for packets from each cos queue. The total egress bandwidth for a port can also
be limited.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 282
Each cos queue is limited in the number of packets it can hold waiting scheduling; the memory
used by each queue is managed to provide a guaranteed space with additional space shared
among all queues for a port.
OPTIONS
Most options are optionally followed by a <port list>, which may include zre port ranges, like
zre0..5, individual ports, such as zre51, or cpu, to indicate the queues and scheduling for
packets to be transferred to the CPU. The priority and queue mapping options do not apply to the
CPU, these settings are provided by the host.
General Options
-h <hostname>
Specifies the hostname of the OpenArchitect switch to be configured. Ignore this option if
you are configuring on the OpenArchitect switch on which the zcos command is being run.
-d <level>
Sets the debug level. The default is 1. The maximum is 4.
Priority and Queue Mapping
-u <default priority> [<port-list>]
Packets which arrive without a tag have no 802.1p priority. This option assigns a default
priority for untagged packets arriving on each port in the <port-list>. The default priority
ranges from 0 (lowest) to 7 (highest).
-m q0,..,q7 [<port-list>]
Specifies the priority to COS queue map. The first parameter maps priority 0 to queue q0,
second maps priority 1 to queue q1, etc. (the queues are numbered 0 to 7). The <port-list>
identifies which ingress ports will use this map. If no <port-list> is given, the same map will
be used for packets arriving on any of the input ports.
Queue Limits
(These limits should only be changed when the ports are idle)
-b <Reserved space in bytes for each queue> [<port-list>]
Specifies the dedicated memory for each cos queue of the ports listed.
-n <queue length list in packets> [<port list>]
Sets the number of packets allowed on each cos queue of the ports listed. The total number of
packets for all 8 cos queues is limited to 2048.
-s <limit on dynamic pool usage, in bytes>, <reset %> [<port
list>]
Sets the limit on dynamic memory pool usage by all cos queues for each port listed.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 283
Packets are first counted against the reserved space for a queue. When that space is occupied,
additional memory is used from the dynamic memory pool until the dynamic pool usage limit for
the port is reached. Any additional packets received for the queue on this port are dropped.
Metering and Scheduling
-r <list of bandwidth guarantees in Kbps for each cos queue> [
-t <list of burst sizes in Kbytes>]
[<port-list>]
Sets up minimum rate meters for each cos queue. All queues which have not exceeded their
minimum transmission rate are scheduled before the other queues.
-l <list of bandwidth limits in Kbps for each queue> [ -t
<list of burst sizes in Kbytes>] [<port-list>]
Sets up maximum rate meters for each cos queue. Queues which have exceeded their
maximum transmission rate will not be scheduled.
-k PRIO | RR | WRR | DRR [<port list>]
Selects the scheduler mode for the ports listed:
PRIO – strict priority, cos queue 7 is highest priority, queue 0 is lowest
RR – Round robin, a single packet is scheduled from each backlogged COS queue.
WRR – Weighted round robin, a configurable number of packets are scheduled from each queue
before moving on to the next.
DRR – Deficit Round Robin, packets are scheduled from a backlogged queue until the configured
number of bytes for that queue have been sent.
-w <queue weight list> [<port list>]
Provides the weights for WRR and DRR scheduling. For WRR, the weights are the number of
packets, scaled such that all weights are between 1 and 15. For DRR, the weights are the
number of bytes, with a range of 10KB to 160 MB of data.
-g <max Kbps>,<burst size in KBytes> [<port list>]
Sets a maximum bandwidth meter for all packets transmitted from a port.
The guaranteed and maximum rate meters influence the four scheduling modes. First, those
queues which have not met their guaranteed rate and have packets to send are serviced according
to the scheduling mode. Then those queues which have not met their maximum rate and have
packets to send are serviced. If all queues have met their maximum rate, or the maximum
bandwidth for the port has been reached, no packets are sent. Each of the meters is implemented
as a separate leaky bucket.
Queries of the Current Settings
Ethernet Switch Blade User's Guide
release 3.2.2j
page 284
-q all | qmap | qinfo | scheduler [<port list>]
Queries the current COS/QOS Settings.
all - Displays all of the queue mappings, queue limits, metering and scheduling settings
qmap - Displays the priority to COS queue mappings.
qinfo - Displays queue limits for the COS queues.
scheduler - Displays the traffic metering and shaping settings and the scheduler mode.
EXAMPLES
1. To set Ethernet ports zre0 to zre19 to allow up to 50 packets in priority queues 0-3 and up
to 75 packets in queues 4-7:
zcos –n 50,50,50,50,75,75,75,75 zre0..19
2. To map packet priorities 1-1 to COS queues for packets received on all ports:
zcos –m 0,1,2,3,4,5,6,7
3. To set up weighted round robin scheduling on ports zre10 to zre14 and the CPU with a
weight of 2 for queue 0, 3 for queue 1, and 1 for all other queues:
zcos –k WRR –w 1,3,1,1,1,1,1,1 zre10..14,cpu
4. To limit the rate of packets sent to the CPU to 15 Megabits/sec., with bursts of no more than
20,000 bytes:
zcos –g 15000,20 cpu
5. To guarantee CPU cos queue 5 500 kbps, queue 6 200 kbps, and queue 7 1 mbps, and all other
queues no guarantee:
zcos –r 0,0,0,0,0,500,200,1000 cpu
6. To limit CPU cos queues 0 – 4 to 1000 kbps, with a burst of 20Kbytes:
zcos –l 1000,1000,1000,1000,1000 –t 20,20,20,20,20 cpu
SEE ALSO
zfilterd, zqosd, ztmd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 285
zdog
NAME
zdog - Configure and send heartbeats to watchdog enabled drivers.
SYNOPSIS
zdog [-d <level>] -h | -i <interval> | -n <heartbeats>
zdog [-d <level>] -b
zdog [-d <level>] -a
DESCRIPTION
zdog is used to configure the base switch watchdog timer functions and to send heartbeats to the
base switch watchdog drivers. There are two components to the base switch watchdog timer: A
hardware component and a software component. The two components are independent from each
other in implementation, but work together to provide safety against zombie hardware and
software. The hardware component requires attention on a predefined 1.5 second interval. The
driver acts on this at interrupt level to ensure that spurious reboots do not occur. The software
component allows for a user programmable interval on which lack of application to driver
communication will cause a reboot. Both components can be turned on with zdog.
The options -i and -n are used to configure the expected interval of heartbeats and the number
of missed heartbeats of the software component before the base switch should be rebooted. If
either the interval or number of heartbeats is 0, the software component is off. The -h option is
used to toggle on and off the hardware component of the watchdog timer. The hardware
component is off by default.
Once the software component of the watchdog timer is turned on, a heartbeat must be sent with
the -b option within that interval or the system will reboot. For example after issuing the
following command:
zdog -i 5000 -n 3
A heartbeat must be sent at least every 3*5000=15000 milliseconds (every 15 seconds). This can
be accomplished with something as simple as a polling script with a sleep, or started with a
higher level function like monit. The driver checks for heartbeat timeout approximately 3 times
per second. So (<heartbeat intervals>*<number of heartbeats>) faster then
330 milliseconds will have diminishing returns.
Combining monit and zdog allows multiple levels of insuring system integrity. The hardware
component of zdog insures that the CPU is functioning well enough to execute something. The
Ethernet Switch Blade User's Guide
release 3.2.2j
page 286
software component of zdog when launched from monit insures that monit is running to
perform higher level tasks. And finally monit can be used to monitor any or all critical system
resources and processes in the system.
OPTIONS
-d
set debug level to <level>
-h
Toggle use of the hardware watchdog timer.
Off by default.
-i
Time interval in milliseconds between zdog to driver
heartbeats
-n
Number of missed heartbeats before system reboot
-b
Send a single heartbeat to the driver
-a
Display current configuration
Ethernet Switch Blade User's Guide
release 3.2.2j
page 287
zffpcounter
NAME
zffpcounter—Query or clear one or more Fast Filter Processor (FFP) counters.
SYNOPSIS
zffpcounter -P <zre_port> [-p <ppa>] [-i <index>]
<hostname>] [-c] [-d <level>]
[-h
DESCRIPTION
The switch enforces filtering rules through the FFP. Each filtering rule may specify an FFP
counter, to be incremented for every packet that matches that rule.
The zffpcounter command is used: (a) for querying the current value(s) of one or more
counters; or (b) for resetting one or more counters to zero. If a rule includes metering if the rule is
matched,
only packets which are in-profile increment the counter. Out of profile packets are counted in the
FFP out of profile counters, which are displayed using zffppacketcounter, which has
exactly the same format and syntax as zffpcounter.
The iptables or zirule utilities may be queried to see which rules, if any, are using FFP
counters.
OPTIONS
-h <hostname>
Specifies the hostname to query/clear. By
default, zffpcounter uses the local OpenArchitect switch.
-p <ppa>
Specifies the Physical Point of Attachment (PPA) on
which to query/clear. By default, zffpcounter uses the value
0.
-c
Specifies that zffpcounter should clear the specified
counter(s), rather than querying.
-d
Sets the level of debugging output for zffpcounter. The
default level is one (1), which reports all errors. Setting
the debug level higher produces more output. Three (3) is
currently the maximum output level.
-i <index> An index whose counters should be queried/cleared.
Individual values (such as 5) are acceptable. If no index is
given, zffpcounter uses all counters.
-P <zre_port>
(required) Print table entry that contains
<zre_port> (zre<n>).
Ethernet Switch Blade User's Guide
release 3.2.2j
page 288
EXAMPLES
The first example queries all FFP counter values.
zffpcounter
The output displays the initial state of the counters. Note that the counters are not initialized on
startup,
Counter 0: 59602801
Counter 1: 83360091
Counter 2: 83361262
.
.
.
Counter 29: 83074779
Counter 30: 81723249
Counter 31: 71007391
The next example clears all FFP counter values.
zffpcounter -P –c
Now using zffpcounter to display,
zffpcounter
Counter 0: 0
Counter 1: 0
Counter 2: 0
.
.
.
Counter 29: 0
Ethernet Switch Blade User's Guide
release 3.2.2j
page 289
Counter 30: 0
Counter 31: 0
iptables(8) is used to setup a rule, and associate that rule with a counter. For instance, add a
rule to accept all packets from 10.0.0.11 and associate that rule with FFP Counter 1.
iptables –A FORWARD –s 10.0.0.11 –j ZACTION -–accept
–-counter 1
Start zfilterd to move the rule entered with iptables(8) down into the switching silicon.
zfilterd
Counter 1 will now increment as traffic is sent to the switch from host 10.0.0.11.
zffpcounter
Counter 0: 0
Counter 1: 98
Counter 2: 0
.
.
.
The next example queries ports 2-7, 15, and 19-21.
zffpcounter -P 1..4, 15, 19..21
Counter 1: 98
Counter 2:
0
Counter 3:
0
Counter 4:
0
Counter 15: 0
Ethernet Switch Blade User's Guide
release 3.2.2j
page 290
Counter 19: 0
Counter 20: 0
Counter 21: 0
SEE ALSO
zirule, iptables(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 291
zfilterd
NAME
zfilterd - A daemon to use the filter hardware of the OpenArchitect switch for filtering based
on iptables(8) rules.
SYNOPSIS
zfilterd [-d <level>] [-p <port>] [-f] [-l] [-i <pid>] [-o <pid>]
DESCRIPTION
zfilterd is a daemon that intercepts filtering rules entered by the user, using iptables(8),
checks them for validity and then prepares messages for the traffic management daemon ztmd,
which is responsible for setting up the switch hardware for the filtering rules and actions.
OPTIONS
-d <level> Sets the level of debugging output required by
zconfig. The default level is one (1). Setting the debug level
higher produces more output. Four (4) is currently the maximum
output level.
-p <port>
Set the multicast port to which messages will be sent.
-f
Run zfilterd in the foreground, by default, it runs in the
background.
-l
Log all diagnostic output to /var/log/zfilterd.log.
-I <pid>
Set our pid used in identifying ourselves to ztmd
-o <pid>
with.
set the pid of the ztmd process we will communicate
SEE ALSO
ztmd, zrule, iptables(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 292
zflash
NAME
zflash − Loads images into the flash ROMs on the OpenArchitect switch.
SYNOPSIS
zflash -d <dev> [-o|-O <offset>] <image_file>
<upgradeipmi.img>
DESCRIPTION
zflash enables you to program the flash ROMs on the switch. The switch contains 3 flash
ROM devices: the boot ROM flash, application flash 1 and application flash 2. Care should be
taken when flashing new images into the switch. Incorrect procedures can result in a switch that
cannot boot; especially when flashing the boot ROM referred to as device 0. See Switch
Maintenance for updating the flash ROM images.
OPTIONS
-d <dev>
Specifies the ROM device being programmed. Dev must be
a number 0, 1, or 2 corresponding to the boot ROM, application
flash 1, or application flash 2 respectively.
-i <upgradeipmi.img> will load the file <upgradeipmi.img> into
the IPMI controller flash memory. This updates the program
version, it does not affect the FRU data.
Progress indicators will be printed during the update. It may
take four minutes to flash.
Once the update is complete, the IPMI controller is rebooted,
which may cause the shelf manager to temporarily disable
fabric ports until the reboot is complete.
EXAMPLES
The following example loads a new initial RAM image into application flash 1 at the default
offset of 0:
zflash -d 1 rdr6000.zImage.initrd
The following example loads a new boot image into the boot ROM at the default offset of 0:
zflash -d 0 zx6000.img
Ethernet Switch Blade User's Guide
release 3.2.2j
page 293
Exercise caution when using this command, as an error can render your switch
inoperable. Do not interrupt this process until complete.
SEE ALSO
zbootcfg
Ethernet Switch Blade User's Guide
release 3.2.2j
page 294
zgmrpd
NAME
zgmrpd - GARP Multicast Registration Protocol (GMRP) daemon for the OpenArchitect switch.
(Partially supported in this release.)
SYNOPSIS
zgmrpd [-d <level>] [-f] [-h <hostname>] [-p <ppa>] [-t <target>]
DESCRIPTION
zgmrpd is run after the network interfaces are created and initialized with zconfig, and started
with ifconfig(1M).zgmrpd starts a background task that implements the GARP Multicast
Registration Protocol (GMRP) protocol for a specified interface, either a zhp or bzhp. GMRP
provides a Layer-2 mechanism for determining which ports in a VLAN are listening to which
multicast addresses. zgmrpd updates the multicast table in silicon with this information, so that
the multicast traffic is forwarded only to the ports in a VLAN to which listening hosts are
attached. This optimizes packet flow through the switch. The background task started by
zgmrpd continues throughout the life of the Layer 2 network. GMRP is specified in ANSI/IEEE
Std 802.1D, 1998 Edition.
zgmrpd manages the switch multicast table (MARL). Based on GMRP packets received on the
target interface, zgmrpd creates, updates, or deletes an entry in the MARL. The key to each
MARL entry is a source Ethernet multicast address combined with a VLAN ID. Two port
bitmaps are maintained: one that identifies the untagged members of the VLAN, and one which
identifies which ports of the VLAN have listening hosts attached.
NOTE: OpenArchitect does not tag BPDU packets. This means that if a port belongs to
multiple VLANs, exactly which Multicast MAC/VLAN entry to create, modify, or delete
cannot be determined. GMRP should not be used when ports belong to multiple VLANs. If
the target is a zhp, it’s recommended that the zhp contain all the ports on the switch.
When the maximum number of entries in the MARL is reached, zgmrpd deletes a random entry
prior to adding the next entry.
By default, while GMRP is running, unregistered multicast traffic is filtered on the target
interface. The GMRP protocol provides a mechanism for changing the filtering behavior on a perport basis. Filtering behavior can be set to filter unregistered multicast traffic, forward
unregistered multicast traffic, or forward all traffic. zconfig can be used to set the filtering
mode of each port. See zconfig . Ports connected to routers should be set to “Forward All”
filtering mode manually with zconfig after zgmrpd is up and running. When zgmrpd is
terminated, filtering behavior is set to forward unregistered multicast traffic on each port in the
target interface.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 295
Only the GARP normal registration mode is currently supported.
Multiple instances of zgmrpd may run concurrently provided the targets are unique. However,
zgmrpd cannot run concurrently with zsnoopd. See zsnoopd.
OPTIONS
-d <level>
Sets the level of debugging output required by zgmrpd. The
default level is zero (0). Setting the debug level higher
produces more output. Five (5) is currently the maximum output
level.
-f
Run zgmrpd in foreground. Default is to run it in
background.
-h <hostname>
Connect to remote host <hostname>.
-p <ppa>
Start zgmrpd on switch <ppa>. Default is 0.
-t <target>
Enable GMRP on the set of ports specified by the target, either a
zhp or bzhp interface. There is no default. A target must be
specified.
EXAMPLES
In the following example, zgmrpd starts a background task that enables the GMRP protocol for
the ports in the zhp0 interface. zgmrpd receives and sends GMRP packets, and updates the
Multicast Table (MARL) accordingly. This background task continues throughout the life of the
Layer 2 network, or until manually terminated.
zgmrpd –t zhp0
Once you run zgmrpd, use zmarl to display the contents of the MARL. zgmrpd deletes all
entries in the MARL when starting up, and when shutting down. Manual changes to the MARL
can be made while zgmrpd is running. However, all such changes will be deleted when zgmrpd
is terminated.
SEE ALSO
zconfig, zsnoopd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 296
zgr
NAME
zl2, zl2mc, zl3host, zl3net, zvlan – Formatted display of OpenArchitect generic
tables.
zl2 displays the abstraction API’s layer 2 table.
zl2mc displays the abstraction API’s layer 2 multicast table.
zl3host displays the abstraction API’s layer 3 host route table.
zl3net displays the abstraction API’s layer 3 network route table.
zvlan displays the abstraction API’s VLAN table.
SYNOPSIS
zl2 [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl2mc [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl3host [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zl3net [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-P port] [-h <host_name>] [-d <level>]
zvlan [-i <index>] [-m <mac_address>] [-a]
[-v <vlan_id>] [-h <host_name>] [-d <level>]
DESCRIPTION
The generic table display functions produce formatted output of the abstraction API’s tables for
display on the user console. The format of the output is table-dependent. Port mapping affects the
ports referenced in the generic tables. (Ports listed in order from 1 to 15)
Headers describing the column being displayed are printed after every 22 lines of output, which
makes it easy to pipe through more(1). The abstraction layer tables grow and shrink as entries are
added and deleted.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 297
Several options are available which enable the user to display only selected entries. Additionally,
there is an option that clears user-specified entries in the table.
OPTIONS
-i <index> Displays the entry at the <index> position in the
table. Valid for all tables. Cannot be combined with -m, -P or -v
.
-m <mac_address>
Displays entries whose MAC address field matches <mac_address>.
Only valid for tables that have a MAC address field. Cannot be
combined with -i, -P, or -v.
-a
Displays the entire table.
-v <vlan_id>
Displays entries whose VLAN ID field matches
<vlan_id>. Only valid for tables that have a VLAN ID field.
Cannot be combined with -i, -m, or -P.
-P <port> Displays the entries whose port field matches <port>.
Only valid for tables that have a PORT ID field. Cannot be
combined with -i, -m, or -v.
-h <host_name> Specifies which hostname to connect. By default,
zgr connects to the locally connected OpenArchitect switch (that
is, the one that is on the local PCI bus).
-d <level>
Sets the level of debugging output required by
zgr. The default level is one (1). Setting the debug level higher
produces more output. Four (4) is currently the maximum output
level.
EXAMPLES
The following example searches for and displays an entry in the zl2 table with the specified MAC
address:
zl2 -m 00:c0:95:45:00:00
If there is an entry in the ZL2 table with the MAC address, 00:c0:95:45:00:00, all the fields of
that entry will be displayed.
The following command deletes the above entry:
zl2 -c -m 00:c0:95:45:00:00
The following command displays all entries of the zl2 table:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 298
zl2
Be careful, the -c option does not ask. The following command deletes all entries in the zl2
table:
zl2 -c
SEE ALSO
zal
Ethernet Switch Blade User's Guide
release 3.2.2j
page 299
zgvrpd
NAME
zgvrpd - GARP VLAN Registration Protocol (GVRP) daemon for the OpenArchitect switch.
SYNOPSIS
zgvrpd [-d <level>] [-f] [-h <hostname>] [-p <ppa>] [-t <target>]
DESCRIPTION
zgvrpd is run after the network interfaces are created and initialized with zconfig, and started
with ifconfig(1M).zgvrpd starts a background task that implements the GARP VLAN
Registration Protocol (GVRP) protocol for a specified zhp interface. GVRP provides a Layer-2
mechanism for dynamically managing port membership in VLANs, including adding and
deleting ports, and creating and deleting VLANs. The background task started by zgvrpd
continues throughout the life of the Layer 2 network. GVRP is specified in ANSI/IEEE Std
802.1Q, 1998 Edition. The GARP protocol on which GVRP is based is specified in ANSI/IEEE
Std 802.1D, 1998 Edition.
zgvrpd updates the switch’s VLAN configuration, based on GVRP packets received on the
target interface. Specifically, it adds a port to or deletes a port from the VLAN specified in the
GVRP packet. If the VLAN does not exist, zgvrpd creates it. If zgvrpd deletes the last port
from a dynamically-created VLAN, it also deletes the VLAN.
When a VLAN is dynamically created, a corresponding zhpN interface is also created, where N is
an integer between 5000 and 9999. The value of N is equal to 5000 plus the VLAN identification
number (VID). For example, if zgvrpd creates VLAN 5, it also creates a zhp5005.
zgvrpd learns the existing (static) VLAN configuration of its target when starting up. When
shutting down, zgvrpd deletes only the dynamic changes to the VLAN configuration that it has
made. Manual changes to the VLAN configuration can be made while zgvrpd is running.
However, all such changes will be deleted when zgvrpd is terminated.
Only the GARP normal registration mode is currently supported. Multiple instances of zgvrpd
may run concurrently provided the targets are unique. If zgvprd’s target is a zhp, it’s
recommended that the zhp contain all the ports on the switch.
zgvrpd cannot be run concurrently with zsnoopd, because zsnoopd assumes static VLAN
membership.
OPTIONS
-d <level>
Sets the level of debugging output required by zgvrpd. The
Ethernet Switch Blade User's Guide
release 3.2.2j
page 300
default level is zero (0). Setting the debug level higher
produces more output. Five (5) is currently the maximum output
level.
-f
Run zgvrpd in foreground. Default is to run it in
background.
-h <hostname>
Connect to remote host <hostname>.
-p <ppa>
Start zgvrpd on switch <ppa>. Default is 0.
-t <target>
Enable GVRP on the set of ports specified by the target zhp
interface. There is no default. A target must be specified.
EXAMPLES
In the following example, zgvrpd starts a background task that enables the GVRP protocol for
the ports in the zhp0 interface. zgvrpd receives and sends GVRP packets, and updates the
VLAN configuration accordingly. This background task continues throughout the life of the
Layer 2 network, or until manually terminated.
zgvrpd –t zhp0
Once you run zgvrpd, use zconfig -a to display the current VLAN configuration.
SEE ALSO
zconfig, zsnoopd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 301
zl2d
NAME
zl2d - Layer 2 daemon for the OpenArchitect switch.
SYNOPSIS
zl2d [start | stop] [-t <msecs>] [-d <level>] [-f]
[-p <priority>] <iface..>
DESCRIPTION
zl2d is run after the network interfaces are created and initialized with zconfig. zl2d
creates a Linux bridge for each interface using brctl(8). The bridge name is the interface
name with a ‘b’ pre-pended to it. This command is primarily used for Spanning Tree Protocol
(STP). Each port associated with the interface is included within the bridge. zl2d starts a
background task that continues throughout the life of the Layer 2 network. zl2d is a script that
can be modified to include brctl commands when started and stopped. Examine
/usr/sbin/zl2d for examples of how to change common options.
OPTIONS
start | stop
Starts or stops the zl2d daemon.
-t <msec> Cause zl2d to monitor the Spanning Tree state of each
port on each bridge every <msec> milliseconds. If unspecified,
the default is 500 milliseconds.
-f
Enables Fast Forward on bridge(s) using 0x4000 (16384) as
the dynamic root priority.
-d <level> Sets the level of debugging output required by zl2d.
The default level is one (1). Setting the debug level higher
produces more output. Four (4) is currently the maximum output
level.
-p <priority>
Sets the dynamic root priority. <priority> should
be specified as a decimal number. A priority of 0 disables root
priority change.
<iface…>
The network interfaces on which zl2d should operate.
These network interfaces must first be created by zconfig. zl2d
does not operate with standard network interface cards. It only
works on switch network interfaces created by zconfig.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 302
OPERATIONS
zl2d manages the Spanning Tree state fields in the switch of each port within the bridge(s).
Based on a timer, zl2d reads the port information for each Linux bridge and updates the switch
when necessary.
EXAMPLES
In the following example, zl2d creates a Linux bridge named bzhp0 which includes all of the
zre<n> devices previously associated with the zhp0 device. zl2d then starts a background
task that monitors the port information of the Linux bridge every 500 ms. and updates the
Spanning Tree state fields in the hardware when necessary.
zl2d -t 500
zhp0
Once you run zl2d, use brctl(8) to display and alter your Spanning Tree settings.
SEE ALSO
zconfig,
brctl(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 303
zl3d
NAME
zl3d - Layer 3 daemon for the OpenArchitect switch.
SYNOPSIS
zl3d [-h <host_name>] [-t <msecs>] [-b] [-e] [-l] [-n] [-d
<level>] <iface ..>
DESCRIPTION
zl3d is run after the network interfaces are created and initialized with zconfig, and started
with ifconfig(1M). zl3d listens for Netlink messages from the kernel and monitors the
Linux network routing tables for routing updates. When an update occurs, if appropriate, zl3d
updates the corresponding routing tables in the silicon so that Layer 3 forwarding is done in the
switch at line speed. zl3d also ages the switch host route entries.
zl3d attempts to keep the Linux route cache and the switch host route table consistent. When a
host route table entry is in use, zl3d updates the Linux route cache. Host route table entries are
deleted when Linux removes the corresponding entry from the route cache. Similarly, network
route entries are removed when the corresponding Linux FIB table entry is deleted.
OPTIONS
-h <hostname>
Specifies which host to monitor. By default, zl3d
monitors the OpenArchitect switch that is locally connected
(i.e., the one that is on the local PCI bus).
-t <msec> Sets the timeout value. By default, zl3d wakes up
every 15 seconds (15000 ms) to look for updates to the Linux
routing tables and to do aging processing of route table entries
on the switch.
-b
Do not background the zl3d process.
-e
Enables additions of a default route to the L3 network route
table.
-l
Leave hardware tables intact on exit.
-n
Writes the proc table entry. zl3d writes 3/2 of it’s timeout
value to /proc/net/sys/ipv4/route/route_expires. The Linux kernel uses
this value as an expiration time for cache routes updated by
zl3d.
-d <level> Sets the level of debugging output required by zl3d.
The default level is one (1). Setting the debug level higher
Ethernet Switch Blade User's Guide
release 3.2.2j
page 304
produces more output. Four (4) is currently the maximum output
level.
<iface…>
The network interfaces on which zl3d should operate.
These network interfaces must first be created by zconfig. zl3d
does not operate with standard network interface cards. It only
works on switch network interfaces created by zconfig. It uses
the same syntax as zconfig.
OPERATIONS
zl3d manages the host route and network route tables located in the switch. Based on a trigger
condition, zl3d reads the Linux FIB table and the Linux route cache. Each table entry is filtered
by zl3d according to the following:
If an entry from the route cache is not a broadcast address or a local address, and zl3d is
able to resolve the MAC address of the destination, then zl3d inserts the entry into the
switch host route table.
• If an entry in the Linux FIB table is a host entry and zl3d is able to resolve the MAC
address of the destination host, then the entry is inserted into the switch host route table. If
an entry from the FIB table is a gateway entry, is not local, and zl3d is able to resolve the
MAC address of the gateway, then the entry is inserted into the switch network route table.
•
EXAMPLES
Normally, zl3d is run as a background task that continues throughout the life of the Layer 3
network.
In the following example, for the three interfaces specified, zl3d continuously monitors the
Linux FIB and route cache tables looking for updates every three seconds. Entries in the switch
host route table are checked for activity every 15 seconds and aged accordingly.
zl3d zhp1 zhp2 zhp3
SEE ALSO
zconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 305
zlc
NAME
zlc − link and LED control
SYNOPSIS
zlc [-h <hostname>][-d <level>][-x] <port_list> <action> [on |
off ]
zlc [-h <hostname>][-d <level>][-x] <action> [on | off |clear]
zlc [-h <hostname>][-d <level>][-x] [state|query]
DESCRIPTION
The zlc application sets the link speed and state of individual ports of the switch, or displays the
current state. It can also turn on or off the extract LED or the internal fault LED.
OPTIONS
-h <hostname>
Connect to remote host <hostname>.
-d <level> Set debug level to <level>
-x
Expected query value. Creates no output, exit code only. If
the port_list contains more than one port, returns the number of
ports that match the option.
<port_list>
Port or list of ports on which to take action.
Port lists are supplied in zconfig syntax (e.g. zre1, zre2..4,
etc.)
<action>
Set link speed or state to up, down, auto, 1000fd,
1000hd, 100fd, 100hd, 10fd or 10hd. The interface must be down to
change the port speed. Set intfault or extfault. Must supply
<option>.
Use query to return present settings.
on | off
Turn on or off the specified LED. Only valid for
actions intfault, extfault or extract.
clear No longer globally control the specified LED
state Return the list of LEDs currently illuminated.
query Return the list of LEDs currently controlled globally.
EXAMPLES
In the following example, zlc forces the line speed of port 1 to 100 Full duplex. The interface
Ethernet Switch Blade User's Guide
release 3.2.2j
page 306
must be down to change the speed. Assuming zre1 is part of interface zhp0,
ifconfig zhp0 down
zlc zre1 100fd
The external fault, internal fault, and ok LEDs can be set on a per port basis or globally. To set
the external fault LED for a particular port,
zlc zre1 extfault on
To query the settings of a particular port,
zlc zre1 query
Global Settings
The external fault, internal fault, and extract LEDs are set as a logical OR of all the ports. The
LEDs can also be set globally to on, off, or other. If globally set to on or off, the LED will not
change when links go up or down ,or interfaces are configured. If set to other, the LED resumes
its normal operation.
The next example globally turns on the Pull (extract) LED.
zlc extract on
Additional capabilities are also available by supplying an additional led action:
zlc <led_name> on
The <led_name> LED is turned on. The LED will not change when links go up or down or
interfaces are configured. If other is used, the LED resumes its normal operation.
<led_name> can be intfault, extfault, extract, or ok.
zlc query
zlc state
query lists the LEDs which have been set globally on or off. state shows which LEDs are on at the
moment. All LEDs are shown, including the clk LED.
SEE ALSO
ifconfig(8)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 307
zlmd
NAME
zlmd − monitor link changes or hot swap events.
SYNOPSIS
zlmd [-h <hostname>] [-b] [-d <level>] {-f <file>} |
<configuration>
DESCRIPTION
The zlmd application is intended to run as a daemon, waiting for a configured event to occur and
then running the program configured for that event. The events monitored are changes in the link
status of any of the 24 inband ports of the switch, the start of removal of the switch from the back
plane, or the cancellation of the removal before it actually takes place. The program can be a
shell script that initiates appropriate actions to respond to the event.
OPTIONS
-h <host_name>
Connect to remote host <host_name>
-b Do not background the zlmd process.
-d <level> Set debug level to <level>.
{-f <configuration_file>} | <configuration>
Read configuration from <file>. If file is a ‘+’,
configuration is read from stdin. Without the -f option, a
single line of input can precede the last flag. (i.e.
<Configuration>)
CONFIGURATION SYNTAX
zlmd takes configuration data from standard input or from a file with the -f option. In either
case, the configuration syntax is the same. The zlmd configuration data consists of a list of
semicolon-delimited statements. Each statement specifies an event to monitor followed by a
response action.
Configuration commands:
<port-list> = <program>
Run <program> when a fault occurs or clears on a port.
hotswap = <program>
Run <program> when a hot-swap extraction or insertion event
occurs.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 308
<port-list>
A list of ports in the same forms supported by
zconfig, e.g. zre1,zre2 or zre10..14
<program> Path to an executable program or script to be run when
the event occurs. Note: An absolute path to <program> is
required. The program will be called with the following
parameters:
For Link Changes:
<program> <ppa> <port> {external(0)|internal(1)} {off(0)|on(1)}
For Hot-swap Events:
<program> <ppa> {extraction(1)|insertion(2)}
NOTE: The <ppa> parameter is undefined and should be ignored.
EXAMPLES
In the following example, zlmd monitors ports 1 through 4 and runs a script called
prt_change upon a link change event.
zlmd zre1..4=/usr/sbin/prt_change
Suppose port 2 were UP and you disconnected the cable, zlmd would call prt_change with
the following parameters:
/usr/sbin/prt_change 0 2 0 1
where 0 is the ppa, 2 is the port, 0 is an external fault, 1 is ON.
SEE ALSO
zconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 309
zlogrotate
NAME
zlogrotate − Rotates log files.
SYNOPSIS
zlogrotate [-b] [-t time] [-s segment size] [-n # of files] [-f
file to rotate]
DESCRIPTION
zlogrotate rotates the selected file every [time] seconds if the file is larger than [segment
size]. It will keep only the number of files selected. zlogrotate is called from /etc/init.d/rcS
by default with no parameters.
OPTIONS
-b
Do not background the process - i.e. run in foreground.
-t <time>
60)
the time between logfile checks in seconds (default
-s <size>
256)
the targeted file segment size, in kilobytes (default
-n <# of files> The number of segments kept on the system
(default 4)
-f
<file> The file to rotate (default /var/log/messages)
EXAMPLES
To start zlogrotate with the default values,
zlogrotate
Ethernet Switch Blade User's Guide
release 3.2.2j
page 310
zmirror
NAME
zmirror - Set packet mirroring on an ingress or egress port
SYNOPSIS
zmirror
-a | -t
zmirror
[-e] <from_list> <to_port>
DESCRIPTION
zmirror sets packet mirroring from a given set of ports to one given port. Turning on packet
mirroring causes a copy of the packet to be sent to the to port. Any number of from ports can be
mirrored to one to ports.
NOTE: There are performance issues when trying to mirror more bandwidth than is
available on the to port.
After executing the following command, packets received on ports 1, 2 and 3 would be mirrored
(copied and transmitted) to port 12. This mirroring would be in addition to any Layer 3 or Layer
2 switching.
zmirror zre1, zre2, zre3 zre12
To clear the current mirroring, use the -t option.
The -e option can be used to indicate that packets being sent on a given port should be copied to
the to port. For example if the -e option is used as follows, the packets transmitted, as opposed
to received, on ports 1, 2 or 3 would be mirrored to port 12.
zmirror -e zre1, zre2, zre3 zre12
The to port can also be the keyword cpu to indicate that packets should be forwarded to the onboard processor. The following example would mirror the contents of port 1, 2 or 3 to the onboard processor:
zmirror zre1, zre2, zre3 cpu
The to port can be a single port or the keyword cpu. The from port can be a list consisting of one
or more ports. The from port cannot be the cpu. See the section on wildcards for discussion of
from port lists.
zmirror is cumulative:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 311
zmirror zre1, zre2, zre3 cpu
Is the same as:
zmirror zre1 cpu
zmirror zre2 cpu
zmirror zre3 cpu
Multiple mirroring setups can be made. The following example will mirror port 1 traffic to port
11 and port 2 traffic to port 12.
zmirror zre1 zre10
zmirror zre2 zre11
Setting a different to port will overwrite the previous setting. Given the last setup the following
will change port 1 traffic to be forwarded to port 10.
zmirror zre1 zre10
The following example results in ingress mirroring from port 0 to port 1and from port 2 and 3 to
port 18.
zmirror zre0 zre1
zmirror zre2 zre12
zmirror zre3 zre18
The to port of 12 was over written with the to port 18 .
Use zmirror –a to query the hardware to display the current configuration.
zmirror -a
OPTIONS
-a
Display the current mirroring setup
-e
Set egress port mirroring for the specified from port
-t
Teardown or disable the mirroring
WILDCARDS
Wildcard characters can be included to simplify the process of creating larger, more complex
configurations. Wild card characters for zconfig include:
Ethernet Switch Blade User's Guide
release 3.2.2j
page 312
,
(comma)
Use for creating lists
..
(dot-dot)
Specifies an inclusive range
Below are some examples for the correct usage of the comma (,) and dot-dot (..). Each line below
produces the same results:
zre1, zre2, zre3, zre4
zre1..4
zre1, zre2..4
zre1..2, zre3..4
SEE ALSO
tcpdump(1M)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 313
zmnt
NAME
zmnt − Expands the read/write files onto the RAM disk.
SYNOPSIS
zmnt [-c] <directory>
zmnt [-c] -t <file>
zmnt [-c] –l
DESCRIPTION
zmnt expands files from flash onto the RAM disk that have been previously saved with zsync.
The init process runs zmnt to expand the files in flash onto RAM file system. The user may
use zmnt to expand the files at anytime and may place them in any directory. For example,
zmnt /tmp
will expand the files under the directory /tmp.
Booting the kernel with -i instructs the init process to skip the zmnt stage After booting in
this way, zmnt can be used for correcting a problem file in the read-write file system.
The -t option can be used to save the configuration of a switch to a tar file. A tar file can be
copied to another switch and saved with zsync -t. As a result, the configuration of a switch
may be cloned to other switches.
The -c option is used to mount the custom overlay. See zsync for a description of custom
verses dynamic overlay.
OPTIONS
-c
Read saved files from the custom overlay
-t <file>
Save the overlay into the specified <file> in tar format.
<directory>
The directory under which the overlay files are
expanded, or the file to which the tar image is saved.
-l
List files in selected overlay, do not unpack.
EXAMPLES
In the following example, zmnt the current overlay into a tar file called overlay.tar
Ethernet Switch Blade User's Guide
release 3.2.2j
page 314
zmnt –t overlay.tar
The resulting tar file can now be saved on a different host as a snapshot of the overlay at that
point in time. Use zsync to restore the overlay on the switch:
zsync –t overlay.tar
The restored overlay will be used upon the next reboot.
SEE ALSO
zsync
Ethernet Switch Blade User's Guide
release 3.2.2j
page 315
zpeer
NAME
zpeer – Application for High Availability communication between the Fabric and Data
switches.
SYNOPSIS
zpeer [-d <level>] local|peer <command> <value>|query
zpeer [-d <level>][-a][-r]
DESCRIPTION
zpeer is used to pass bidirectional High Availability(HA) state and priority information
between the base and fabric switches in the Ethernet Switch Blade. zpeer uses the concept of
local and peer information. The local information is written, and the peer information is read. As
an example:
On the base switch:
zpeer local priority 203
On the fabric switch the following command would return 203:
zpeer peer priority query
The communication is bidirectional so the example above can be reversed allowing the fabric
switch to pass state and priority information to the base switch.
NOTE: Local information can also be read as confirmation and for debugging purposes.
zpeer is part of the HA software suite. It is called as part of the scripts that are generated by the
zspconfig application. With the exception of querying information for debugging and
validation, the zpeer application would not need to be executed by the user. Following is a
quick overview of the values communicated between switches. See documentation on HA and
zspconfig for better understanding of how zpeer fits in with the HA software suite.
Following is an example of setting the state to “backup”:
zpeer local state backup
Possible states are:
unhealthy
Set by hardware at power up or at system reboot. Set by
software when the HA software suite is stopped. Indicates that
the peer is not properly functioning
healthy
Set by software when HA is started. This state is never
Ethernet Switch Blade User's Guide
release 3.2.2j
page 316
displayed by query, but must be set at initialization. After setting
the healthy state, the query will return the backup state.
backup
Used to reflect the backup state of vrrpd
master
Used to reflect the master state of vrrpd
The priority value is a value between 0 and 255. In the HA suite, the value is set to 254 minus
the number of ports that are link down.
The state or priority value for the local or peer can be displayed with the query command. The
following will query the state of the peer switch:
zpeer peer state query
The output from the above command during the boot process would be “unhealthy”
The -a option can be used to display a complete listing of all state and priority information and
internal information that can be used for debugging. Here is example output from the -a option.
Local/Write
203
master
priority
state
data
position byte|bit
status
2|cb
2|8
50(ACK)
231
Peer/Read
backup
2|e7
2|0
0()
The priority and state rows are the same values returned by the query command. The information
in the data, position and status rows are internal debugging information that is useful to support
engineers when diagnosing problems in the field.
The -r option should not be used while HA is running. It can cause loss of coordination between
the two switch planes. The -r option will reset all internal communication values, and possibly
require the peer to be reset also.
OPTIONS
-d
set debug level to <level>
-a
Display complete status of zpeer
-r
Reset all values in the zpeer communication. Can cause
loss of communication with the peer requiring the peer to
Ethernet Switch Blade User's Guide
release 3.2.2j
page 317
be also reset.
SEE ALSO
zspconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 318
zqosd
NAME
zqosd – monitors tc(8) commands to implement classification filters and queuing disciplines
in hardware.
SYNOPSIS
zqosd [-d <level>] [-p <port>] [-f] [-l] [-i <pid>] [-o <pid>]
DESCRIPTION
zqosd monitors commands entered by tc which set up queuing disciplines and classification
filters for managing traffic in the switch. It supports a variety of queuing disciplines which allow
distributing available bandwidth at each output port of the switch among different classes of
traffic as well as selecting which packets to drop when bandwidth limits are exceeded.
zqosd does not directly set up the hardware. It prepares messages describing the queuing
disciplines and filters and sends them to a hardware specific daemon, ztmd. ztmd should be
started before zqosd. Both programs normally run as background processes.
OPTIONS
-d <level> Set the level of diagnostic information logged.
<level> may be 0-4; higher levels produce more output.
.p <port> Use <port> as the multicast listening port for
communication with ztmd. Default is 2345.
-f
Run zqosd in the foreground.
as a daemon.
-l
Without this option, it is run
Log diagnostic output to /var/log/zqosd.log
-i <pid>
Set the pid for this process.
-o <pid>
Set the pid of the destination process (ztmd)
EXAMPLES
Start the traffic management daemon, ztmd, then start zqosd to monitor tc output. Both
daemons are run as background processes and log their messages.
ztmd –l
zqosd -l
Ethernet Switch Blade User's Guide
release 3.2.2j
page 319
SEE ALSO
ztmd, tc(8), zfilterd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 320
zrc
NAME
zrc - Packet rate control
SYNOPSIS
zrc -b | -m | -d | -t | -a [-p <port>] [-v <vlan>] [-g <group>]
[-M <mac_addr>] [-T <timeout>] [-D <level>] <rate>
DESCRIPTION
zrc sets rate control on Broadcast, Multicast and/or Destination Lookup Failure (DLF) packets.
The rate is measured in the number of packets per time period. If the number of packets received
of the specified type exceeds the specified rate limit, packets are discarded at the ingress port.
OPTIONS
-b
Enable rate control for Broadcast packets
-m
Enable rate control for Multicast packets
-d
Enable rate control for DLF packets
-t
Teardown or disable all rate control
-a
Display the current rate control settings
-p <port>
Enable rate control on this <port>
-v <vlan>
Enable rate control for this <vlan_id>
-g <group> Enable rate control for this Multicast <group>
-M <mac_addr>
Enable rate control for this Mac <mac_addr>
-T <timeout>
Set time period to <timeout> milliseconds.
Default is 1000 (one second).
-D <level> Set debugging output to <level> when running the
program.
<rate_limit>
The number of packets per time period above which
Broadcasts, Multicasts and/or DLFs will be discarded. Valid rate
limits are any number between 0 and 262143.
SEE ALSO
ztats
Ethernet Switch Blade User's Guide
release 3.2.2j
page 321
zreg
NAME
zreg - Read and write registers and tables on the OpenArchitect switch switching hardware.
SYNOPSIS
zreg [-p <ppa>] [-w] [-i <index>] [-t <index>] [-k] [-h
<hostname>] [-d <level>] [-r 10] <reg>
DESCRIPTION
zreg allows a user to read and write direct and indirect registers and tables on the resident
switch chip. zreg is commonly used for debug, or for prototyping when creating applications
that control the OpenArchitect switch. It is also useful when put into shell scripts for displaying
hardware status or statistics. Although the -t option allows tables to be displayed, one might find
the formatted output of the table functions more useful. See zal.
OPTIONS
-p <ppa>
Specify the Physical Point of Attachment (PPA). Each
OpenArchitect switch that is controlled by the CPU on which zreg
is running is a unique PPA. If there is only one OpenArchitect
switch, as would be the case when zreg is running on the embedded
processor the PPA would be 0. The default PPA is 0.
-w
Causes zreg to write to the register or table. Data to be
written is read from standard input.
-i <index> Causes zreg to access at the indexed register
specified by <reg>. See OPERANDS for usage of <reg> with indexed
registers. The <index> parameter is used to determine which entry
i
-t <index> Causes zreg to access the memory specified by <reg>.
The <index> parameter is used as the index into the specified
table. Only content addressable memories are accessed using the –
t option. All other tables and memories are accessed with the –i
option.
-k
Causes zreg to access the memory specified by <reg>. The
entry accessed is determined using the data from standard input
as the search key. Enter 0 for fields that are not part of the
search key.
-h <hostname>
Specify the <hostname> to configure. By default
zreg configures the OpenArchitect switch that is locally
Ethernet Switch Blade User's Guide
release 3.2.2j
page 322
connected (i.e., the one that is on the local PCI bus).
-r 10 Sets numeric radix for registers to 10. Default is 16.
-d <level> Set the level of debugging output produced by zreg.
The default level is 1. Setting the debug level higher produces
more output. The maximum level of output is currently 4.
OPERANDS
<reg> If no –i, -t, or –k option is specified, <reg> is chosen
from the en um ix54_reg_e. With the –i option, <reg> is chosen
from the enum ix54_xreg_e. With the –t and –k options, <reg> is
chosen from ix54_mem_e.
EXAMPLES
There are three types of accesses performed by zreg; scaler register, indexed register, and table.
For each of theses access types, values can be read or written. Content addressable memory
register access is the default. The following is an example of reading the CONFIG Register:
zreg 1
When running zreg on the embedded processor of the OpenArchitect switch, the <ppa> is
always 0, since the embedded CPU processor only controls the directly attached switch chip.
Since the default <ppa> is 0 the -p option is not needed.
To write a value to a register the -w flag is used and the data is read from standard input. The
following example writes 0x800000640 to the Aging Time Register:
echo 0x80000640 | zreg -w 49
If the zreg command is typed at the prompt, it waits for input from the user. You may also use
File I/O or shell scripts. The following example reads the MAC Packet Length Register for port
7:
zreg -i 7 2
SEE ALSO
zal, ztats
Ethernet Switch Blade User's Guide
release 3.2.2j
page 323
zrld
NAME
zrld – ZNYX redirector daemon
SYNOPSIS
zrld [-d <level>] [-p <port>] [-f]
DESCRIPTION
zrld is used for remote management of OA/HA applications. OA/HA applications capable of
remote management include zlc, ztats, zlmd.
zrld only allows requests from hosts listed in /etc/rcZ.d/zrld_trusted_hosts.
OPTIONS
-d <level> Set debug level to <level>
-p <port>
Specifies TCP port to listen on. Default port is 7000.
-f
Run the daemon in the foreground
EXAMPLES
In the following example, zrld starts a background task that listens on the default port 7000 for
incoming TCP requests and passes along the request to the OA/HA application,
zrld
Once started, you can issue supported commands to the host running zrld from a remote host.
For example, if the host running zrld had an IP address of 10.0.0.42, you could use zlc to
remotely query the status of ports 1 through 5. Remember, the IP address of the switch you are on
must be listed in /etc/rcZ.d/zrld_trusted_hosts on the switch running zrld.
zlc –h 10.0.0.42 zre1..5 query
SEE ALSO
zlc, zlmd, ztats
Ethernet Switch Blade User's Guide
release 3.2.2j
page 324
zsnoopd
NAME
zsnoopd - IGMP Snooping daemon for the OpenArchitect switch.
SYNOPSIS
zsnoopd [-d <level>] [-f] [-h <hostname>] [-p <ppa>]
[-r <sec>] [-t <sec>] [-u <sec>] [-v <vlan_id>]
DESCRIPTION
zsnoopd is run after the network interfaces are created and initialized with zconfig, and
started with ifconfig(1M). zsnoopd starts a background task that monitors incoming
IGMP traffic in order to learn which hosts in a VLAN are listening to which IP multicast
addresses. zsnoopd updates the multicast table in silicon with this information, so that IP
multicast traffic is forwarded only to the ports in a VLAN to which listening hosts are attached.
This optimizes packet flow through the switch. Traffic on all VLANs is monitored by default.
IGMP snooping can be restricted to specific VLANs using the –v option. The background task
started by zsnoopd continues throughout the life of the Layer 2 network.
zsnoopd manages the switch multicast table. Based on monitored IGMP traffic, zsnoopd
creates or updates an entry in the table. The key to each entry is a source Ethernet multicast
address combined with a VLAN ID. Two port bitmaps are maintained: one that identifies the
untagged members of the VLAN, and one which identifies which ports of the VLAN have
listening hosts attached.
When the maximum number of entries in the table is reached, zsnoopd deletes a random entry
prior to adding the next entry.
The Ethernet Switch Blade does not perform Multicast routing, but traffic from IP multicast
addresses not found in the multicast table is forwarded to all ports within a VLAN. Traffic from
reserved IP multicast addresses (224.0.0.X) is forwarded on all ports within a VLAN.
To operate correctly, traffic from unregistered IP multicast addresses should be forwarded on all
ports in a VLAN. To do this, the multicast port filtering mode must be set to
FORWARD_UNREGISTERED. See zconfig.
Neither zgmrpd nor zgvrpd can run concurrently with zsnoopd.
OPTIONS
-d <level> Sets the level of debugging output required by
zsnoopd. The default level is zero (0). Setting the debug level
higher produces more output. Four (4) is currently the maximum
output level.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 325
-f
Run zsnoopd in foreground. Default is to run it in
background.
-h <hostname>
-p <ppa>
Connect to remote host <hostname>.
Start zsnoopd on switch <ppa>. Default is 0.
-r <sec>
Time to wait, in seconds, before removing a port with
no router multicast traffic. Default is 260 seconds.
-t <sec>
Time to wait, in seconds, before removing a port with
no host multicast traffic. Default is 260 seconds.
-u <sec>
timeouts.
Time to wait, in seconds, before checking port
-v <vlan_id>
Enable zsnoopd for VLAN <vlan_id>. Default is to
enable zsnoopd on all VLANs. This option may be entered more than
once.
EXAMPLES
In the following example, zsnoopd starts a background task that monitors incoming IGMP
packets and updates the Multicast Table (MARL) accordingly. This background task continues
throughout the life of the Layer 2 network.
zsnoopd
Once you run zsnoopd, use zmarl to display the contents of the MARL. zsnoopd deletes all
entries in the MARL when starting up, and when shutting down. Manual changes to the MARL
are not recommended while zsnoopd is running.
SEE ALSO
zconfig, zgmrpd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 326
zpeer peer state query
The output from the above command during the boot process would be “unhealthy”
The -a option can be used to display a complete listing of all state and priority information and
internal information that can be used for debugging. Here is example output from the -a option.
Local/Write
203
master
priority
state
data
position byte|bit
status
2|cb
2|8
50(ACK)
Peer/Read
231
backup
2|e7
2|0
0()
The priority and state rows are the same values returned by the query command. The information
in the data, position and status rows are internal debugging information that is useful to support
engineers when diagnosing problems in the field.
The -r option should not be used while HA is running. It can cause loss of coordination between
the two switch planes. The -r option will reset all internal communication values, and possibly
require the peer to be reset also.
OPTIONS
-d
set debug level to <level>
-a
Display complete status of zpeer
-r
Reset all values in the zpeer communication. Can cause
loss of communication with the peer requiring the peer to
be also reset.
SEE ALSO
zspconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 327
zspconfig
NAME
zspconfig - configure and start surviving partner
SYNOPSIS
zspconfig [-d <level>] [-p <directory_path>] [-u
<dhcp_interface>] [-c <dhclient.conf>] [-t <timeout>] [-s]
[-v] -f <file>
DESCRIPTION
zspconfig is used to configure and start the Surviving Partner software. With the -f option a
configuration file is provided that completely describes the network setup and desired behavior of
all of the switches participating in the Surviving Partner. With the -u option the interface on
which to run dhclient to retrieve a configuration file must be provided. The configuration file
format retrieved by a -u is identical to that supplied with -f. It is envisioned that the -f option
is used for initial configuration, and all subordinate and replacement switches run zspconfig
with the -u option. The -v option prints the current version of zspconfig and performs no
actions.
OPTIONS
-d <level> Set the debug level. The default debug level is 1.
The higher the level, the more debugging output is produced.
Debugging output is sent to the console.
-p <directory_path>
Set the directory path for where zspconfig places the scripts
it generates. The default location is
/etc/rcZ.d/surviving_partner.
-u < dhcp_interface>
Come up as an unconfigured slave. Use the specified
<dhcp_interface> to retrieve the configuration. User
confirmation is not required unless the –s option is also
used.
-c <dhclient.conf>
Use file <dhclient.conf> as the configuration file to dhclient
when retrieving configuration information. If –c is not used,
a default configuration file is created and used. Only valid
with the –u option.
-t <timeout>
Time to wait in seconds before giving up on
Ethernet Switch Blade User's Guide
release 3.2.2j
page 328
finding a Surviving Partner to retrieve configuration
information from. Only valid with the –u option.
-s
Do not ask for confirmation. Run from a script.
-v
Prints the current version of zspconfig.
-f <file> The provided <file> is used as input to configure
the Surviving Partner. See the next section on CONFIGURATION
FILE for the syntax of the configuration file.
CONFIGURATION FILE
The configuration file contains commands for controlling the Surviving Partner setup. Commands
are single lines end -delimited with a semicolon. Comments, spaces and new lines are ignored.
Comments begin with the # character and include characters through the next new line.
Comments may be placed on the same line as a command after the semicolon.
All configurations must include the VLAN configuration first by use of zconfig commands.
zconfig commands can be put in the configuration file and will be passed directly to the
zconfig application. zconfig commands start with the keyword zconfig and are of the
same format as described in the zconfig manual page. Here is an example of zconfig
commands in a zspconfig configuration file.
zconfig zhp0: vlan1 = zre1..4;
zconfig zhp1: vlan2 = zre5..8;
zconfig zhp2: vlan100 = zre14;
In the above example, three VLANs are created: zhp0 and zhp1 will be used as connections to
high availability nodes; zhp2 will be used as the inter-connect between two Surviving Partner
switches to run the VRRP heartbeat. All VLANs must be created before other zspconfig
commands may operate on them.
The next section of a zspconfig configuration file sets up the IP addresses of the created
VLANs. There are two types of addresses in a Surviving Partner setup: Physical Addresses, and
Virtual Addresses. The Virtual Addresses are those that VRRP manages and moves to the current
Master switch. The Physical Addresses are the real addresses of the switch, and are used for
management only. Physical Addresses are setup using the sibling_addresses command.
Virtual addresses are setup using the virtual_address command. The sibling addresses
name comes from the fact that we are setting up the addresses for all of the siblings in the
Surviving Partner group. So if we have two Surviving Partner switches, the
sibling_addresses statement might look like this:
sibling_addresses: zhp0 = 10.0.0.30, 10.0.0.31;
sibling_addresses: zhp1 = 11.0.0.30, 11.0.0.31;
sibling_addresses: zhp100 = 100.0.0.30, 100.0.0.31;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 329
A sibling_addresses statement is required for each VLAN created with the zconfig
commands. The two addresses in the list indicate there are two switches in the Surviving Partner
group. The first address 10.0.0.30 and 11.0.0.30 are assigned to the switch on which the
configuration is being run. The remaining addresses are distributed to the switches that run
zspconfig -u on a first come, first serve basis.
The sibling_addresses command may also take a netmask operand similar to that given
to ifconfig. For example:
sibling_addresses: zhp0 = 10.0.0.30, 10.0.0.31 netmask
255.255.255.0;
sibling_addresses: zhp1 = 11.0.0.30, 11.0.0.31 netmask
255.255.255.0;
Virtual addresses should be setup for all VLANs that connect to
High Availability nodes. Usually this would include all VLANs
except the interconnect VLAN or VLANs connected to upstream
routers. It is possible that a setup would have VLANs that are
used for management only. Virtual addresses are setup as
follows:
virtual_address: zhp0 = 10.0.0.42 netmask 255.255.255.0;
virtual_address: zhp1 = 11.0.0.42 netmask 255.255.255.0;
Only a single address per VLAN is provided because this single address will move with the
current Master switch, and the netmask must be the same as that provided in the
sibling_addresses statement.
The last required section for the configuration is description of the ports. Particularly we need to
specify one of the following for all of the ports participating in the high availability setup. The
possible port types are:
interconnect - Ports connected between groups of Surviving Partner switches. VRRP
heartbeat messages are sent on the interconnect ports.
Crossconnect - Crossconnect ports are ports that are connected to other Surviving Partner
switches, that are not part of this Surviving Partner group. Crossconnect ports behave
differently then bonding ports. The links are not brought down temporarily, and VRRP runs
with the native MAC addresses to avoid MAC address duplication with the other VRRP
group.
RAINlink - Ports connected to RAINLink or bonding driver nodes. These ports contain
virtual addresses managed by VRRP. And during a failover event, the links are toggled
down to force failover to the Master switch.
Route - Ports connected to upstream routers. VRRP does not manage virtual IP addresses for
these links. Routing protocols must be used to instruct up stream routers of a different path
to get to the VRRP managed networks. These port types are currently not implemented.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 330
monitor_only - Ports that are monitored but do not have a virtual address managed on them.
They will not have their links brought down temporarily during a failover scenario. These
ports are only monitored. If a problem occurs on this type of link it will cause a failover
scenario.
configure_only - Ports are configured as per the zconfig commands, but do not participate in
the high availability network. Problems on these links will not cause a switch failover.
NOTE: The zhp specified for interconnect is important, and will be the zhp
interface/VLAN where zspconfig/HA on the the master will start the DHCP daemon.
Zre51 on the slave swtiches should be configured up into this same VLAN so that
zspconfig -u can connect to the master.
Each port that is setup in a VLAN by the zconfig commands must have its port type specified.
The port type is specified on a physical port bases. That is on a zre basis, but zhp names can be
used as a quick way to setup the port type for all ports that are a member of that VLAN. It is
possible to make a port a member of more then one VLAN. That is a zre can be a member of
more then one zhp. In such cases, configuring the zhps as different port types would cause a
conflict, and will not work. To handle this setup the individual zre commands would be used to
setup the port types. Here is an example of setting up the port types as a continuation of our
current configuration:
interconnect:
RAINlink:
zhp2; # Could also use zre14
zhp0, zhp1;
# Could also use zhp0..1,
# or list the zres
The ".." wild card is supported as in zconfig to indicate a range of numbers. The comma is
used to indicate a list. The zres that are part of the zhp could also be used. Here is a more
complex setup. The zconfig commands are also shown to understand the VLAN setup:
zconfig zhp0: vlan1 = zre1..4, zre23;
zconfig zhp1: vlan2 = zre5..8, zre23;
zconfig zhp100: vlan100 = zre23;
zconfig zhp0: vlan1 = zre1..4;
zconfig zhp1: vlan2 = zre5..8;
zconfig zhp2: vlan100 = zre23;
#
sibling and virtual address setup omitted
interconnect:
zhp100;
RAINlink: zre1..8;
# Use zre definition to
# exclude zre23
If zhp0 and zhp100 are setup as different port types, there would be a conflict for port zre23.
In the particular example above, the zre23 is shared. It is used to pass VRRP interconnect
traffic and as a means to pass VLAN 1 and 2 traffic between switches. Since zre23 is an
Ethernet Switch Blade User's Guide
release 3.2.2j
page 331
interconnect, it is not a bonding driver enabled port, and therefore should be setup as an
interconnect port type. To accomplish this, the zre ports are listed to avoid conflicting port
types.
Note that a single line cannot contain both zhp and zre definitions. Therefore
RAINlink:
zhp1, zre1..4
does not work and the definition zre1..8 is equivalent.
Optional zspconfig commands are listed below.
vrrp_msg_rate:
100; # Time in milliseconds.
# Default is 100 milliseconds
The message rate is the interval between VRRP messages sent over the interconnect link. The
time given is in milliseconds. It takes the lack of 3 VRRP messages for the Backup to assume the
role as Master. It is recommended with faster message rates to increase the default priority to
254. The higher priority will decrease the latency of the failover.
vrrp_def_priority:
254; # default is 100.
Value from 1 to 254
The default virtual MAC address for VRRP is 00:00:5E:00:01:<vid>, where <vid> the
virtual router ID. Using this virtual address is problematic in network environments where
multiple VRRP instances might be running that are using the same <vid>. To overcome this
problem, zspconfig, uses a default MAC address derived from the physical address of the
switch on which it is running. For the slave switch, the vrrp_virtual_mac_addr command
is used to set the MAC address to the same as the Master. This statement is typically not used
within the Master switch’s configuration. It is used in the zspconfig generated Slave switch
configuration. And is retrieved by the Slave switch with zspconfig using the –u option. If in
doubt, don’t use this command.
vrrp_virtual_mac_addr: 00:01:02:03:04:05
There are three failover_modes supported; switch, vlan and port. For switch failover,
if any managed link fails, the entire switch is failed over to the backup switch. In vlan failover,
if a link fails in a VLAN, only the links associated with that VLAN are failed over. In port
failover, only the port that fails is moved to the backup switch. For both vlan and port
failover, the interconnect link will need to be used to maintain connectivity between ports that
have failed over and those that have not. For vlan and port failover modes, the interconnect
link must be an in-band port, and must be included in the managed VLANs running with VLAN
tagging on.
failover_mode:
failover_mode:
failover_mode:
switch;
vlan;
port;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 332
Coordination between the data and fabric switches can be enabled by setting the
board_synchronization_mode. Possible modes are “off” and “basic”. Board
synchronization is off by default. When board synchronization is put into basic mode, HA events
on the base switch are coordinated with the HA events on the fabric switch. The behavior of
board synchronization is dependent on the failover_mode. In switch failover_mode all VLANs
are moved to the HA partner with the most up links in unison. With board synchronization the
concept is extended so that all VLANs in both switching planes are moved to the HA partner with
the most up links in both switch planes. So, both switches failover together. In vlan and port
failover modes the number of up links is not considered in the board synchronization. Each vlan
or port can move independently across the data or fabric planes between HA partners according
to the failover_mode rules. In all failover_modes, if the data plane or fabric switch reboots or
power cycles, the HA partner will take mastership for all VLANs in both planes. The syntax for
setting the board_synchronization_mode is:
board_synchronization_mode:
basic;
Additional startup scripts may be included in the configuration using the start_script
command. The files in the start_script command will be placed in a location for tftp
transfer to sibling switches that initialize using the –u option. A common use of the
start_script command might be to propagate gated configurations to all members of the
Surviving Partner group. Absolute path names must be used. Using multiple commands allows
inclusion of multiple scripts. For example:
start_script: /etc/rcZ.d/S75gated;
start_script: /etc/rcZ.d/S80static_routes;
The vrrpd_script command allows a user defined script to be run when vrrpd changes
state. This script is called at the end of the zspconfig created vrrpd.script. See vrrpd –s
for a description of when the scripts are called. zspconfig sets up vrrpd to call vrrpd.script.
The vrrpd_script command in zspconfig places a call to the user-defined script at the
end of vrrpd.script file. The following example would call the my_vrrpd_script each
time vrrpd calls its –s provided script:
vrrpd_script:
/etc/rcZ.d/surviving_partner/my_vrrpd_script;
CAUTION: The my_vrrpd_script is not called from a different process thread.
Therefore if my_vrrpd_script crashes or has long delays, it will crash the vrrpd, or
cause delays in the Surviving Partner failover. To protect against this, write the script to
launch a second script in a background shell. The advantage to calling the user provided
script in the same process thread is that it gives synchronized control over the failover
process for those who want it.
OUTPUT FILES
The output of zspconfig is a set of configuration and script files. The configuration files
configure vrrpd and zlmd daemons. The vrrpd and zlmd daemons combined with the script
Ethernet Switch Blade User's Guide
release 3.2.2j
page 333
files run the Surviving Partner. This is a list of all configuration and script files:
/etc/rcZ.d/S70Surviving_partner
The main startup script that starts the Surviving Partner by
running zconfig, ifconfig, zlmd and vrrpd. zspconfig prompts
the user to run this script. This file can be saved with
zsync to automatically start the Surviving Partner at switch
boot.
/etc/rcZ.d/surviving_partner/vrrpd.conf
Configuration script for the VRRP daemon. This configuration
is used when the S70Surviving_partner script launches vrrpd.
There is a line in this file for each router address vrrpd
will manage. Or stated another way, each virtual_address
command in the zspconfig configuration file results in a line
in vrrpd.conf.
/tftpboot/zsp.conf<n>
zspconfig configuration file that contains the configuration
of the sibling backup switches. The <n> is used to
distinguish potentially more than one backup switch. This
configuration file is placed in /tftpboot, and is retrieved
via DHCP by a replacement switch on boot up.
/etc/rcZ.d/surviving_partner/dhcpd.conf
Configuration script used by dhcpd when the switch becomes
master. dhcpd is used to serve replacement switches their
configuration scripts. Namely a zsp_DC.conf file that can be
input to the zspconfig with the -u flag.
/etc/rcZ.d/surviving_partner/dhclient.conf
If zspconfig is executed with the -u flag, a dhclient.conf
file is created, and then dhclient is used to retrieve a
zspconfig configuration file from the /tftpboot area of the
Master switch.
/etc/rcZ.d/surviving_partner/vrrpd.script
Runtime script that executes each time the vrrpd changes
state. This script starts and stops dhcpd, and toggles down
bonding driver/RAINlink ports to force the nodes to a new
Master switch.
/etc/rcZ.d/surviving_partner/zlmd.script
Runtime script executed by zlmd when a link goes up or down.
This script modifies the priority of vrrpd, which in turn may
cause the VRRP Master to move from one sibling switch to
another.
SEE ALSO
zconfig, ifconfig, vrrpd, dhclient, dhcpd, zpeer
Ethernet Switch Blade User's Guide
release 3.2.2j
page 334
Ethernet Switch Blade User's Guide
release 3.2.2j
page 335
zstack
NAME
zstack - Configures the OpenArchitect switch stacking.
SYNOPSIS
zstack [-h <host_name>] [-d <level>] [-a] [-t] [{-f <file>} |
<configuration>]
DESCRIPTION
zstack combines multiple switch fabric chips into a single virtual switch. zstack must be run
before any other switch configuration. Specifically it must be run before zconfig. zstack is
typically run from an S20stack script prior to the S50xxx scripts. zstack currently only
supports directly connected switch chips as are present on the base switch. Directly connected
means that the local CPU can directly access and control the switch fabric chips being stacked.
zstack does not yet support network based stacking where there are separate boards with
separate CPUs controlling the switch fabric chips.
OPTIONS
-h <hostname>
Specifies the remote hostname to configure. By
default, zstack configures stacking on the local OpenArchitect
switch. This option should only be used for displaying the
configuration, if at all.
-d <level> Sets the level of debugging output produced by zstack.
The default level is 1. Setting the debug level higher produces
more output. The maximum output level is currently four (4).
-a
Displays the current stacking configuration of the switch.
-t
Tears down the entire switch stacking configuration.
{-f <file>} | <configuration>
Gets configuration information from the specified file. A <file>
name of '+' reads configuration data from standard input. If the
-f flag is not used, a single line of configuration data can be
entered as parameters to zstack.
CONFIGURATION SYNTAX
zstack takes configuration data from standard input or from a file with the -f option. In either
case, the configuration syntax is the same. The zstack configuration data consists of a list of
Ethernet Switch Blade User's Guide
release 3.2.2j
page 336
semicolon-delimited statements. Each statement specifies an action to take on a stack. A stack is
a group of ports on a single switch fabric chip. Actions include stack creation, stack
port association, stack configuration and stack control.
Comments, spaces and new lines are ignored. Comments begin with the #
character and include characters through the next new line.
Stack Creation
The first step in creating a stack is to define its location. Each stack is assigned a
unique small integer by the user. On the base switch this integer must be a
value from 0 to 31. The location is defined with two values; a Physical Point of
Attachment (ppa) and a network location. The ppa is defined by the keyword
"ppa" followed by an integer value. The integer value is a 0 based contiguous
value representing the physical switch fabric chip as it was discovered by the
Linux operating system. In the case of the base switch there are two chips
directly controlled. The network location specifies an IP address of the CPU that
controls the physical switch fabric chips. If the CPU that is running zstack
controls the physical switch fabric chip, the key word "local" is used in place of
the IP address. Currently only "local" CPU control is supported.
Stack creation example for a base switch:
stack0: ppa0 local;
stack1: ppa1 local;
The above statements indicate that there are two switch fabric chips that are
controlled by the local CPU.
Stack Port Association:
After stack creation, the physical ports must be associated with a virtual port name. One might
think of this as mapping the ports from their physical association to a virtual name. The physical
port numbers are usually 0 based, but are dependent on how the ports are physically configured in
the switch fabric silicon and how those ports are labeled at the physical connector. At a
minimum the port association is used to move the ports of a second, third, or more switch silicon
chip to a different virtual port name then the others. In this way, the ports can be built into a
unique linear port list.
Stack port association syntax:
stack<N>: <zre_list> = <zre_list>;
The port association statement begins with the stack and number representing the group of ports
being mapped. The stack must be previously created with a stack creation command. After the
Ethernet Switch Blade User's Guide
release 3.2.2j
page 337
semicolon are two zre_lists separated by an equal sign. The first is the list of virtual port
names, the second is the physical port names. The assignment is done in order, and there must be
an equal number of ports in each list. Wild cards may be used in the zre_lists. See below.
Stack port association syntax for a base switch:
stack0: zre0..11 = zre0..11;
stack1: zre12..23 = zre0..11;
The first statement above configures the first switch silicon chip, represented by stack0, to have
no translation between its physical port numbering and its virtual port numbering.
NOTE: The statement must be made even if the mapping is one to one.
The second statement above configures the second switch silicon chip to have its physical ports 0
through 11 map to virtual ports 12 through 23. The mapping is done in a linear fashion. zre0
maps to zre12. zre1 maps to zre1 and so on.
Stack Configuration Statements
After stack creation and port association, the configuration of the stack must be defined. The
stack configuration provides the network map of how inter-switch fabric communication is
performed. It specifies which physical port or should be used to communicate with a different
group of stacked ports. The syntax is as follows:
stack<N>: stack<M> = zre<n>;
The above syntax indicates that stack N should use zre n to access stack M. The zre value n is
a physical port number as seen by Stack N. It is not a virtual port number as mapped by a port
association command. Multiple configuration statements for Stack N can be used to indicate how
to get to other stacks.
NOTE: stack<M> can be a list of comma delimited or range of stacks as described below
in the section on wild cards.
Stack port association example for a base switch:
stack0: stack1 = zre12;
stack1: stack0 = zre12;
The above example indicates that stack0 is connected to stack1 through the port 12 and stack1 is
connected to stack0 through port 12. Zre12 on the base switch switch fabric chips is the HIGIG
port and are directly connected between the two devices.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 338
Stack Control Statements
Finally after creating the stack, associating the ports, and setting the stack configuration, the stack
can be enabled using one of the Stack Control statements. The following stack control statements
are supported.
enable;
The enable statement turns on stacking that has been previously configured.
cannot be made until configuration is complete.
This statement
disable;
The disable statement turns off stacking. Before disabling stacking, all ZNYX daemons must be
stopped, and the VLAN configurations must be torn down using zconfig.
WILDCARDS
Wild card characters can be included to simplify the process of creating larger, more complex
configurations. Wild card characters for zconfig include:
,
(comma)
Use for creating lists
..
(dot-dot)
Specifies an inclusive range
Below are some examples for the correct usage of the comma (,) and dot-dot (..). Each line below
produces the same results:
stack0: zre4..7 = zre0, zre1, zre2, zre3;
stack0: zre4, zre5..7 = zre0..3;
stack0: zre4..7 = zre0, zre1..3;
stack0: zre4, zre5..7 = zre0..1, zre2..3;
The stack may also be in list form in the Stack Configuration command in similar fashion to the
zre lists. Example of stack0..3 representing stacks 0, 1, 2 and 3.
SEE ALSO
zconfig
Ethernet Switch Blade User's Guide
release 3.2.2j
page 339
ztats
NAME
ztats − Display statistics and information about switch
SYNOPSIS
ztats [-d <level>] [-i <unit>] | [-m <port>] |
[-v <vlan id>] | [-t <tgid>] | [-v]
DESCRIPTION
ztats displays MIB counters for a selected physical port, trunk group or VLAN. It can also
display information about the configuration of the switch and bridge to the PCI bus or the Vital
Product Data memory. All output is formatted.
OPTIONS
-m <port>
MIB statistics for specified <port>
-v <vlan id>
-i <unit>
MIB statistics for specified <vlan id>
Information for specified <unit>:
0 is
BCM5695 ports 0-11,
1 is
BCM5695 ports 12-23.
-d <level> Set debug level to <level>
-t <tgid>
-v
MAC layer statistics for all ports in trunk <tgid>.
Vital Product Data (not currently supported in base switch)
EXAMPLES
To display statistics for a particular port on the switch, such as port 0.
ztats –m 0
SEE ALSO
zreg, zal
Ethernet Switch Blade User's Guide
release 3.2.2j
page 340
zsync
NAME
zsync − Saves changes to the flash.
SYNOPSIS
zsync [-c][-f][<dir_or_file>]
zsync [-c][-f][-t <file>]
zsync [-c][-f][-z]
zsync [-c][-l]
DESCRIPTION
zsync is used to save a snapshot of the current file system to flash ROM. By default, zsync
creates a compressed tar image of the files that have changed and saves the image in the flash
ROM. The saved image is expanded on reboot. The saved compressed tar image is called an
“overlay”.
If a directory parameter is given to zsync, the contents of the directory are saved instead of
searching for updated files. The specific purpose of the <directory> parameter is for saving
files that have been mounted with zmnt. Using the -t option allows a tar image created by
zmnt -t to be saved.
To correct a corrupted file that is saved to flash ROM with zsync, first reboot with the -i
option (see Switch Maintenance). Use zmnt to put the corrupted file in the /mnt directory, open
and correct the file, then zsync to the /mnt directory to save your changes and reboot.
There are two overlay areas: dynamic and custom. The dynamic overlay is the default. The
custom overlay is used to create base configurations that are different then the base configuration.
The custom overlay is written by using the -c option.
The -z option zeros the overlay area, returning the switch to the factor configuration.
Specific files or directories can be excluded from saving to flash by zsync by including an entry
in /etc/exclude. Likewise, existing entries in /etc/exclude such as /tmp can be removed
in order to save those files to flash with zsync.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 341
OPTIONS
-c
Save files to the custom overlay
-t <file>
-z
Read files to be saved from a tar file.
Zero the overlay area.
-f
Do not confirm with user and do not warn if saving failed.
Exit code can be examined to determine success or failure.
<dir_or_file>
Save only the named file, or save the named
directory to the overlay. Contents of directories must be created
with zmnt.
-l
List files that would be written.
Do not flash.
EXAMPLES
To zsync only the hosts file:
cd /etc
zsync hosts
If you previously created a snapshot of an overlay to a tar file using zmnt,
zmnt –t overlay.tar
You can use zsync to restore the overlay on the switch directly from the tar file,
zsync –t overlay.tar
The restored overlay will be loaded upon the next reboot.
FILES
/etc/exclude, /.zsync
SEE ALSO
zmnt
Ethernet Switch Blade User's Guide
release 3.2.2j
page 342
ztmd
NAME
ztmd – traffic management daemon which accepts messages from traffic filtering and quality of
service applications and sets up hardware.
SYNOPSIS
ztmd [-d <level>] [-p <port>] [-f] [-i <pid>]
[-o <pid>] [-a <addr>] [-l]
DESCRIPTION
ztmd listens for messages on a multicast port. These messages describe packet filters and
queuing disciplines that are to be installed in the switch hardware. ztmd interprets these
messages to set up the switch hardware.
OPTIONS
-d <level> Set the level of diagnostic information logged.
<level> may be 0-4; higher levels produce more output.
.p <port> Use <port> as the multicast listening port for
communication with ztmd. Default is 2345.
-f
Run ztmd in the foreground.
as a daemon.
Without this option, it is run
-i <pid>
Set the PID for this process (default is 1)
-o <pid>
Set expected client PID.
-a <addr>
Bind multicast socket to <addr>
-l
Log diagnostic output to /var/log/ztmd.log
EXAMPLES
Start the traffic management daemon, ztmd, then start zqosd to monitor zstats output and
zfilterd to monitor iptables(8) output. All daemons are run as background processes
and log their messages to files in /var/log.
ztmd –l
zqosd –l
zfilterd -l
Ethernet Switch Blade User's Guide
release 3.2.2j
page 343
SEE ALSO
zqosd, iptables(8), tc(8), zfilterd
Ethernet Switch Blade User's Guide
release 3.2.2j
page 344
brctl(8)
NAME
brctl - Bridge and Spanning Tree Protocol administration.
SYNOPSIS
brctl [options]
DESCRIPTION
brctl is used to set up, maintain, and display the bridge configuration in the Linux kernel.
brctl is a standard command included with Linux bridge support including Rapid Spanning
Tree Protocol (RSTP).
A bridge is a device commonly used to connect different networks together, so that these
networks will appear as one network to the participants.
Each of the networks being connected corresponds to one physical interface, or port in the bridge.
These individual networks are bundled into one bigger logical network. This bigger network
corresponds to the bridge network interface.
Multiple bridges can work together to create even larger networks using the IEEE 802.1d
Spanning Tree Protocol. This protocol is used for finding the shortest path between two networks
as well as eliminating loops from the topology. Bridges communicate with each other by sending
and receiving Bridge Protocol Data Units (BPDUs).
brctl(8) can be used for configuring certain spanning tree protocol parameters. For an
explanation of these parameters, see the IEEE 802.1d specification for detailed information.
OPTIONS
addbr <bridge>
creates a new instance of a bridge. The network interface
corresponding to the bridge will be called <bridge>. For the
OpenArchitect switch, bridges are named bzhp0, bzhp1, etc.
IMPORTANT: This option must only be executed by zl2d.
delbr <bridge>
deletes the instance <bridge> of an Ethernet bridge. The network
interface corresponding to the bridge must be down before it can
be deleted.
IMPORTANT: This option must only be executed by zl2d.
show shows all current bridges.
addif
<bridge>
<interface>
Ethernet Switch Blade User's Guide
release 3.2.2j
page 345
makes the interface a port of the bridge. This means that all
frames received on the interface will be processed as if destined
for the bridge. Also, when sending frames on the bridge, the
interface will be considered as a potential output interface. For
the OpenArchitect switch, <interface> is zhp0, zhp1, …
IMPORTANT: This option must only be executed by zl2d.
delif <bridge> <interface>
detaches the interface from the bridge.
IMPORTANT: This option must only be executed by zl2d.
showbr <bridge>
shows information for the bridge and its attached ports. Check
the priority using this command.
showmacs <bridge>
shows a list of learned MAC addresses for the bridge.
setageingtime <bridge> <time>
sets the Ethernet (MAC) address aging time, in seconds. After
<time> seconds of not having seen a frame coming from a certain
address, the bridge will time out (delete) that address from the
Forwarding DataBase (fdb).
setgcint <bridge> <time>
sets the garbage collection interval for the bridge to <time>
seconds. This means that the bridge will check the forwarding
database for timed out entries every <time> seconds.
stp <bridge> <state>
controls this bridge's participation in the Spanning Tree
Protocol. <state> can be “off” or “on”. When turned off, the
bridge will not send or receive BPDUs, and will thus not
participate in the Spanning Tree Protocol.
CAUTION: If your bridge isn't the only bridge on the LAN, or if
there are loops in the LAN's topology, DO NOT turn this option
off. If you turn this option off, please know what you are doing.
setbridgeprio <bridge> <priority>
sets the bridge's priority to <priority>. The priority value is
an unsigned 16-bit quantity (a number between 0 and 65535), and
has no dimension. Lower priority values are better. The bridge
with the lowest priority will be elected Root Bridge.
setfd <bridge> <time>
sets the bridge's bridge forward delay to <time> seconds.
sethello <bridge> <time>
sets the bridge's bridge hello time to <time> seconds.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 346
setmaxage <bridge> <time>
sets the bridge's maximum message age to <time> seconds.
setpathcost <bridge> <port> <cost>
sets the port cost of the port to <cost>. This is a dimensionless
metric. The path cost is set to 100 for all OpenArchitect switch
ports by default. IEEE 802.d recommends the following:
Link
Speed
Recommended
Value
Recommended
Range
10 Mb/s
100
50-600
100 Mb/s
19
10-60
1 Gb/s
4
3-10
setportprio <bridge> <port> <priority>
sets the port's priority to <priority>. The priority value is an
unsigned 8-bit quantity (a number between 0 and 255), and has no
dimension. This metric is used in the designated port and root
port selection algorithms. For the OpenArchitect switch a port is
zre1, zre2, …
NOTES
brctl(8) replaces the older brcfg tool.
SEE ALSO
zconfig, zl2d
Ethernet Switch Blade User's Guide
release 3.2.2j
page 347
Appendix C Intelligent Platform Management Interface
The Ethernet Switch Blade provides Intelligent Platform Management Interface (IPMI) support.
IPMI circuitry provides:
The communication channel between the Baseboard Management Controller (BMC) and the
CPU for management.
• Data storage, SDRR, FRU, access.
• Sensor readings.
IPMI circuitry on the Ethernet Switch Blade peripheral management controller (PMC) provides
three I2C compatible serial interfaces:
•
I2C Port 0
Connects to the BMC on an Alarm or System card. The bus is called
IPMB. Via this connection the PMC can send status events, SDR ’s and
FRU information to the BMC. It can also receive commands and control
information from the BMC.
I2C Port 1
Connects to the Ethernet Switch Blade CPU and the “spare” SEEPROM.
I2C Port 2
Connects to the boot and run time SEEPROMs as well as the switch
silicon.
ISwitch-ShMC Interaction
•
Switch reports state changes to ShMC.
•
If event reporting is enabled, an event will be generated by the switch whenever a sensor
threshold is crossed.
•
Policy is enforced by the ShMC.
•
Typical ShMC policy is to remove back-end power from a FRU that has crossed a threshold.
M States
M0
No power and hot swap handle open
M1
No communications. (Wait in M1 until hot swap ejector is closed)
M2
FRU announces its presence to the ShMC and awaits activation permission
M3
Activation
M4
Operational state (command issued to enable back-end power)
Ethernet Switch Blade User's Guide
release 3.2.2j
page 348
M States
M5
Deactivation request (e.g. hot swap ejector opened)
M6
Deactivation granted by ShMC
M7
Unexpected loss of communication between FRU and ShMC
Table C.1.: IPMI M States
Peripheral Management Controller Functional Support
The following IPMI commands are implemented in version 1.00:
PMC Controller Support
Command
GetDevID
Code
Sensor #
Status
0x01
Mandatory
BroadcastGetDeviceID 0x01
Mandatory
ColdReset
0x02
Optional
GetSelfTestResult
0x04
Mandatory
GetSensorReading
0x2D
Mandatory
Notes
TempSensor
60
Returned in Celsius
A2D_0
41
A2D_1
42
A2D_2
43
Full 8-bit value (255)
represents 3.3 Volts;
scale returned value
accordingly
A2D_3
Not used
A2D_4
45
A2D_5
Not used
SetSlotPower
0x12
Vendor
specific
SetSlotBlueLed
0x13
Vendor
specific
SetSlotReset
0x15
Vendor
specific
Table C.2: PMC Controller Support
Ethernet Switch Blade User's Guide
release 3.2.2j
page 349
Sensor Reading Example
This is an example of how to structure a command and response to determine a sensor value. In
this example, a GetSensorReading command is sent from BMC (address 20h in this example), to
the switch in slot 2 (geographical address is B2h) to read the temperature sensor, which is
assigned to sensor number 60h.
Standard IPMI Command: GetSensorReading
Byte
Data Field
Description
1
rsAddr
B2h
2
netFn/Lun
10h
3
check1
3Eh
4
rqAddr
20h
5
seq no
06 (random pick)
6
command
2Dh
7
sensor number
60h
8
checksum2
4Dh
Table C.3: GetSensorReading
Ethernet Switch Blade User's Guide
release 3.2.2j
page 350
Standard IPMI Response: GetSensorReading
Byte
Data Field
Description
1
rqAddr
20h
2
netFn/Lun
16h
3
check1
CAh
4
rsAddr
B2h
5
seq no
06
6
command
2Dh
7
completion code
00h
8
sensor reading
1Bh -> 27 Celsius degree
9
optional data
byte
C0h scanning is enabled
10
optional data
byte
C0h
11
optional data
byte
00
12
checksum2
80h
Table C.4: GetSensorResonse
Only scanning is supported and enabled for the optional bytes.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 351
Structure of Standard IPMI Commands: From BMC to PMC
Structure of Standard IPMI Commands BMC - PMC
Byte
1
Data Field
rsAddr
Description
<slot’s IPMB addr>
2
netFn/Lun
<netFn>
3
check1
<chksm1>
4
rqAddr
<sw_id>
5
seq no
<seq>
6
command
<cmd>
7
optional data byte
<arg1>
7+x
optional data
bytes
<argN>
7+x+1
check2
<chksm2>
Table C.5: Standard IPMI Commands
Structure of Standard IPMI Responses: From PMC to BMC
Structure of Standard IPMI Responses PMC - BMC
Byte
1
Data Field
rqAddr
Description
<sw_id>
2
netFn/Lun
<netFn>
3
check1
<chksm1>
4
rsAddr
<slot’s IPMB addr>
5
seq no
<seq>
6
command
<cmd>
7
completion code
<ccode>
8
optional data byte
<arg1>
8+x
optional data bytes
<argN>
8+x+1
check2
<chksm2>
Table C.6: Standard IPMI Responses
Ethernet Switch Blade User's Guide
release 3.2.2j
page 352
Event Generator
The PMC’s event generator is disabled until it receives a SetEventReceiver command from
BMC for Event Receiver’s slave address and LUN. If the event generator is enabled, PMC
reports significant events to the BMC asynchronously.
The standard IPMI platform event message format is used.
IPMB Event message format
Structure of event messages sent to the BMC by PM device (PMC) is shown below.
IPMB Event Message Format
Byte
1
Data Field
EvMRev
Byte Description
<event_msg_rev>
2
Sensor Type
<sensor_type>
3
Sensor #
<sensor_number>
4
Event Dir | Type
<direction, type> bit: 7
4
Event Type
<type> bits: 0-6
5
Event Data 1
<arg1>
6
Event Data 2
<arg2>
7
Event Data 3
<arg3>
Table C.7: Event Message Format
IPMI Event Message Definitions
The following tables describe the IPMI event messages to be generated by the PMC. The basic
requirement is that when a monitored sensor changes state, an event must be generated and sent
to BMC.
Field Replaceable Unit Inventory Device
A Field Replaceable Unit Inventory Device (FRU ID) file is generated for the Ethernet Switch
Blade in binary format. At manufacturing, the file is loaded at the same time as the boot and
runtime files to a specified space, which is reserved in the spare SEEPROM on I2C bus number 1
at device address 0xA2. The FRU information is product specific and, in addition, may be
customer specific. At boot time, the IPMI firmware will read the product serial number from the
board’s VPD file to FRU’s product Information Area.
Ethernet Switch Blade User's Guide
release 3.2.2j
page 353
Version 2.x supports three FRU Inventory Device Commands:
Get FRU Inventory Area Info
Read FRU Data
Write FRU Data
The spare SEEPROM space is allocated as follows:
Spare SEEPROM Space Allocation
Space for
VPD
Start address
0
End address
0x3FF
Size
0x400 (1Kbytes)
FRU
0x400
0x13FF
0x1000 (4kbytes)
Parameters
0x1400
0x17FF
0x400 (1 Kbytes)
Attribute
0x1800
0x37FF
0x2000 (8
Kbytes)
Notes
Table C.8: SEEPROM Space
IPMB Override/Local Status - Event Data 3 for the IPMB link
IPMB Override Status Data
Bit(s)
7
6:4
3
2:0
Type
Notes
IPMB B Override State
0=Override state, bus isolated; 1= Local Control state - IPM
Controller determines state of bus
IPMB B Local Status
0=No Failure. Bus enabled if no override in effect;
1=Unable to drive clock HI; 2=Unable to drive data HI;
3=Unable to drive clock LO; 4= Unable to drive data LO;
5= Clock low time out; 6=under test; 7=Undiagnosed
communications failure
IPMB A Override State
0=Override state, bus isolated; 1= Local Control state - IPM
Controller determines state of bus
IPMB A Local Status
0=No Failure. Bus enabled if no override in effect;
1=Unable to drive clock HI; 2=Unable to drive data HI;
3=Unable to drive clock LO; 4= Unable to drive data LO;
Ethernet Switch Blade User's Guide
release 3.2.2j
page 354
IPMB Override Status Data
5= Clock low time out; 6=under test; 7=Undiagnosed
communications failure
Table C.9.: IPMB Override Status Data
Ethernet Switch Blade User's Guide
release 3.2.2j
page 355
Index
Index
adduser...........................................................................................................................................................................................
...................................................................................................................................................................................... 70, 129
application flash 1............................................................................................................................................................. 84, 142
Apt-Get.......................................................................................................................................................................... 90, 147
boot process.......................................................................................................................................................................84, 142
boot ROM..........................................................................................................................................................................84, 142
Booting..........................................................................................................................................................................................
Duplicate Flash Image................................................................................................................................................. 88, 146
-i option........................................................................................................................................................................87, 145
brctl......................................................................................................................................................................................48, 96
brctl(8)..................................................................................................................................................................................... 303
Central Authority.......................................................................................................................................................................40
Class of Service................................................................................................................................................................. 54, 105
Combining Queueing Disciplines..................................................................................................................................... 66, 124
Common Open Policy Service.......................................................................................................................................... 67, 125
COPS-PR..................................................................................................................................................................... 68, 126
COPS-RSVP................................................................................................................................................................ 68, 126
Policy Decision Point.................................................................................................................................................. 67, 125
Policy Enforcement Points.......................................................................................................................................... 67, 125
console cable.................................................................................................................................................................................
Booting........................................................................................................................................................................ 86, 144
Console Port.............................................................................................................................................................................. 21
COPS................................................................................................................................................................................. 67, 125
Default Route.................................................................................................................................................................... 71, 130
dhclient.............................................................................................................................................................................. 71, 130
DHCP........................................................................................................................................................................................ 40
DHCP............................................................................................................................................................................................
Client............................................................................................................................................................................71, 130
Server...........................................................................................................................................................................71, 130
dhcpd................................................................................................................................................................................. 72, 130
Diffserv..............................................................................................................................................................................68, 126
flash................................................................................................................................................................................... 88, 146
FTPD Server......................................................................................................................................................................74, 133
gated.................................................................................................................................................................................. 51, 102
ifconfig................................................................................................................................................................................ 49, 97
Layer 2 Switch.....................................................................................................................................................................46, 94
Layer 3 Switch.....................................................................................................................................................................49, 97
Using Multiple VLANs..................................................................................................................................................... 100
LED Control...................................................................................................................................................................... 80, 138
Link Event Monitoring......................................................................................................................................................80, 138
monit................................................................................................................................................................................168, 247
Name Service Resolution.................................................................................................................................................. 71, 130
Network File System......................................................................................................................................................... 72, 131
7100 User's Guide
release 3.2.2j
page 356
Index
Network Time Protocol..................................................................................................................................................... 72, 131
NFS....................................................................................................................................................................................72, 131
NTP................................................................................................................................................................................... 72, 131
ntpdate.......................................................................................................................................................................... 72, 131
pepd............................................................................................................................................................................. 68p., 126p.
PIB.....................................................................................................................................................................................69, 127
Policy Information Base....................................................................................................................................................69, 127
Port Mirroring..........................................................................................................................................................................137
Port Path Cost......................................................................................................................................................................48, 97
RAIN Management................................................................................................................................................................... 18
RFC 2478.......................................................................................................................................................................... 68, 126
RFC 3084.......................................................................................................................................................................... 68, 127
RFC 3159.......................................................................................................................................................................... 68, 127
RFC 3289.......................................................................................................................................................................... 68, 127
RMAPI...................................................................................................................................................................................... 18
Root Password...................................................................................................................................................................70, 129
route...................................................................................................................................................................................71, 130
routing protocol......................................................................................................................................................................... 17
RSVP................................................................................................................................................................................. 68, 126
S60SP_startup........................................................................................................................................................................... 40
Saving Changes................................................................................................................................................................. 86, 144
Scripts............................................................................................................................................................................................
rcS................................................................................................................................................................................ 71, 130
S20stack.........................................................................................................................................................................44, 92
S30e1000............................................................................................................................................................................. 92
S40vpd................................................................................................................................................................................. 92
S50layer2....................................................................................................................................................................... 44, 92
S50layer2sp..........................................................................................................................................................................92
S50layer3........................................................................................................................................................... 44, 50, 92, 98
S50multivlan..........................................................................................................................................................44, 93, 100
S55gatedOSPF...............................................................................................................................................................44, 93
S55gatedRip1...................................................................................................................................................44, 51, 93, 102
S55gatedRip2.................................................................................................................................................................44, 93
Scripts, example.................................................................................................................................................................. 44, 92
Scripts, examples.................................................................................................................................................................44, 92
snmp.................................................................................................................................................................................. 75, 133
applications.................................................................................................................................................................. 79, 137
interface details............................................................................................................................................................77, 135
MIBS............................................................................................................................................................................75, 133
traps..............................................................................................................................................................................76, 135
Software Development Kit........................................................................................................................................................19
Spanning Tree Protocol....................................................................................................................................................... 47, 96
Switch Console....................................................................................................................................................................43, 91
System Failure Recovery...................................................................................................................................................86, 144
System Hangs During Boot...............................................................................................................................................88, 145
7100 User's Guide
release 3.2.2j
page 357
Index
tc 62, 113
The U32 Filter................................................................................................................................................................... 66, 124
thttpd..................................................................................................................................................................................81, 139
trunking.........................................................................................................................................................................................
configuring with zconfig............................................................................................................................................156, 235
Updating the Switch.......................................................................................................................................................... 86, 144
Upgrading or Adding Files................................................................................................................................................89, 147
Users..............................................................................................................................................................................................
Adding Additional....................................................................................................................................................... 70, 129
VLAN..................................................................................................................................................................................45, 93
vrrpconfig................................................................................................................................................................................ 228
vrrpd........................................................................................................................................................................................ 230
Web management.............................................................................................................................................................. 81, 139
zbootcfg..................................................................................................................................................................... 87, 145, 233
zconfig............................................................................................................................................................... 97, 148, 227, 235
zdog......................................................................................................................................................................................... 247
zfilterd....................................................................................................................................................................... 56, 107, 253
zflash................................................................................................................................................................. 88, 146, 171, 254
zgmrpd.....................................................................................................................................................................................256
zgr............................................................................................................................................................................................258
zgvrpd......................................................................................................................................................................................261
zl2179, 258
zl2d............................................................................................................................................................... 148, 184, 227, 263p.
zl2mc............................................................................................................................................................................... 179, 258
zl3d...................................................................................................................................................... 49, 98, 102, 148, 227, 266
zl3host............................................................................................................................................................................. 179, 258
zl3net............................................................................................................................................................................... 179, 258
zlc80, 138, 268
zlmd........................................................................................................................................................................... 80, 138, 270
zlogrotate................................................................................................................................................................................. 272
zmirror....................................................................................................................................................................... 79, 137, 273
zmnt......................................................................................................................................................................................... 276
zmonitor.............................................................................................................................................................................84, 142
ZNYX-H.MIB......................................................................................................................................................................... 327
zqosd..................................................................................................................................................................................62, 278
zrc280
zreg.......................................................................................................................................................................................... 281
zrld...........................................................................................................................................................................................283
zrld_trusted_hosts............................................................................................................................................................206, 283
zsnoopd....................................................................................................................................................................................284
zstack....................................................................................................................................................................................... 294
zstats........................................................................................................................................................................................ 298
zsync........................................................................................................................................................................................299
7100 User's Guide
release 3.2.2j
page 358
Index
ztmd......................................................................................................................................................................................... 301
zvlan................................................................................................................................................................................ 179, 258
ZX4920.MIB........................................................................................................................................................................... 333
7100 User's Guide
release 3.2.2j
page 359