Linux Tips, Troubleshooting, Package, Installation Tools

Posts

Showing posts from September, 2010

Configuring the virtual path devices

We recommend that you start with a "fresh" disk configuration, so it is a good idea to delete all previously configured FC adapters and their child (disk) devices. On node1, we checked which disks are still defined: {node1:root}/-> lspvhdisk0 0022be2ab1cd11ac rootvg activehdisk1 0022be2a3d02ead0 Nonehdisk2 0022be2a4cbbafd8 Nonehdisk3 none NoneThese are the internal SCSI disk drives {node1:root}/-> lscfg grep disk+ hdisk3 U1.9-P2/Z2-A8 16 Bit LVD SCSI Disk Drive (36400 MB)+ hdisk2 U1.9-P2/Z1-A8 16 Bit LVD SCSI Disk Drive (36400 MB)+ hdisk1 U1.9-P1/Z2-A8 16 Bit LVD SCSI Disk Drive (36400 MB)+ hdisk0 U1.9-P1/Z1-A8 16 Bit LVD SCSI Disk Drive (36400 MB)In order to include the ESS disks, run the configuration manager on each node: {node1:root}/-> cfgmgr -vSince ESS was configured with two host paths for each node (node1a and node1b), this results in two hdisks on the nodes. Actually, those two logical hdisks represent the same physical disk, accessed via the two ...

ESS Subsystem Device Driver setup

In a high availability environment, there is a special device driver, designed for ESS, named Subsystem Device Driver (SDD). This device driver allows for redundant links and load sharing for storage traffic when multiple fiber connections exist between nodes and the ESS storage subsystem. SDD comes as an AIX installable fileset, named ibm2105.rte. This has to be installed on all cluster nodes, even if not all nodes in the cluster have more than one FC adapter. In our configuration, since each node is connected to the ESS using two optical cables, each disk can be accessed via any of the two paths. When SDD is installed, a virtual path is created. This virtual path represents the same storage space, but is accessible via both fiber links. There are two versions of the SDD driver for AIX: ibmSdd_510.rte - This is suitable for non-HACMP configurations or for concurrent HACMP (HACMP/ESCRM). ibmSdd_510nchamp.rte - This has to be used in nonconcurrent HACMP environments. In our...

Dynamic tracking of Fibre Channel adapters

AIX 5.2 provides support for Dynamic Tracking of Fibre Channel devices. In previous AIX releases, the user was required to unconfigure the FC storage device and adapter device instances before making changes on the SAN that might result in an N_Port ID (SCSI ID) change of any remote storage ports. If Dynamic Tracking for FC devices is enabled, the FC adapter driver will detect when the Fibre Channel N_Port ID of a device changes, and will reroute traffic designated for that device to the new address, while the devices are still online. Examples of events that can cause an N_Port ID to change are moving a cable between a switch and storage device from one switch port to another, connecting two separate switches via an Inter-Switch Link (ISL), and possibly rebooting a switch. Dynamic tracking of FC devices is controlled by a new fscsi device attribute, named dyntrk. The default setting for this attribute is no. Setting this attribute to yes enables dynamic tracking: {node1:root}/-...

Enable Fast I/O Failure for FC adapt

AIX 5.2 supports the Fast I/O Failure feature for Fibre Channel devices in case of link events in a switched environment. If the FC adapter driver detects a link event (for example, a lost link between a storage device and a switch), the FC adapter driver waits a short period of time, about 15 seconds, to allow the SAN fabric to stabilize. At this point, if the FC adapter driver detects that the device is not on the SAN fabric, it begins failing all I/O requests at the adapter driver level. Any new I/O request, or future retries of the previously failed I/Os, are failed immediately by the adapter, until the adapter driver detects that the device has rejoined the SAN fabric. Fast I/O Failure is controlled by a new fscsi device attribute, fc_err_recov. The default setting for this attribute is delayed_fail, which is the I/O failure behavior that has existed in previous versions of AIX. Setting this attribute to fast_fail enables Fast I/O Failure, as shown in Example 3-12. Example 3-...

Configuring RAC logical disks AIX

There are two FC protocols, depending on whether an FC switch (fabric configuration) or an FC HUB (arbitrated loop) is used: The nodes are connected directly to the ESS, excluding any network equipment. Choose Arbitrated Loop (al protocol). This is the default value. The nodes are connected to a SAN, using a switch. Select Point-to-Point (pt2pt protocol). Because our platform is using a switched SAN, the point-to-point protocol is selected on all nodes and ESS Fibre Channel adapters. To view the actual value set on the FC adapter: {node3:root}/-> lsattr -El fcs0 grep init_linkinit_link al INIT Link flags TrueTo change this value, use "smitty devices", select the FC Adapter menu, and change the field INIT Link flags. Or, you can use the command: {node3:root}/-> chdev -l fcs0 -a init_link=pt2pt If you are unsure about the way your nodes are connected to the disk subsystems, choose init_link=al. This setting first tries to detect a switch. If that fails, it...

The HACMP clstat command does not work

Clinfo daemon started? The clstat command retrieves the information from the clinfo daemon. This daemon is not started automatically. It is an option in the SMIT HACMP startup menu (smitty clstart). If it is not running, stop HACMP on this node, and start it up again with clinfo activated. To check the status of the HACMP processes: lssrc -a | grep -E "ES | svcs" SNMP version in AIX 5L version 5.2 AIX 5L version 5.2 uses SNMP version 3 agents, whereas HACMP uses a SNMP version 1 configuration. Both clinfo and CSPOC require SNMP version 1. With the default SNMP agent, the clinfo daemon is not able to supply cluster information, thus the clinfo command will not work properly. Run the script provided in Example 3-38 to switch to version 1 agents. Example 3-38 Switch SNMP agents to version 1 #!/bin/kshLOG=/var/adm/chg_snmpd.logtouch $LOG/usr/es/sbin/cluster/utilities/clstop -N -g >> $LOG 2>&1sleep 5for i in muxatmd aixmibd snmpmibd hostmibd snmpddosto...

Oracle RAC does not start

HACMP started Before starting Oracle9i RAC instances, the HACMP cluster must be up and running. Check the cluster state with the /usr/sbin/cluster/clstat command. Hagsuser group These tests must be performed on all nodes: Check that the Oracle user is part of the hagsuser group. The name of this group is mandatory, and cannot be changed. Check, and change if necessary, the permissions on the cldomain executable. This program must be executable by everybody (user, group, other): # chmod a+x /usr/es/sbin/cluster/utilities/cldomain Check, and change the group to hagsuser if necessary, for the svcsdsocket.oracle9irac socket file: (assuming that oracle9irac is the name of your cluster, returned by the cldomain command): {node2:root}/-> chgrp hagsuser /var/ha/soc/grpsvcsdsocket.oracle9irac Check, and change if necessary, the group permissions for the grpsvcsdsocket.oracle9irac socket: {node2:root}/-> chmod g+w /var/ha/soc/grpsvcsdsocket.oracle9irac The HAGS sock...

ESS Specialist does not list the WWPN

Each Fibre Channel adapter on the nodes has a unique World Wide Number (Identified in ESS Specialist as World Wide Port Number - WWPN). When creating a new host with ESS Specialist, select the appropriate WWPN in a scroll list. If you don't see this WWPN, ensure the following: The link is correct. You should see the red laser light emitted by the host adapter on the fiber connected to the Storage Area Network - SAN (FC switch). The FC adapters on the nodes and the ESS have the proper link type value of point-to-point or arbitrated loop. If you are unsure about your network topology, leave the default, which is arbitrated loop. The command cfgmgr -vsl fcs0 has been run on the nodes.

FC adapter microcode

Always check your FC adapter microcode level and resolve any issues with your IBM service representative. Example 3-11 Fibre Channel adapter microcode (on the nodes) {node1:root}/-> lscfg -vl fcs0 fcs0 U1.9-P1-I1/Q1 FC Adapter Part Number.................00P2995 EC Level....................A Serial Number...............1A238006CD Manufacturer................001A FRU Number.................. 00P2996 Network Address.............10000000C92F4E44 <--- World Wide Number ROS Level and ID............02C03891 Device Specific.(Z0)........2002606D Device Specific.(Z1)........00000000 Device Specific.(Z2)........00000000 Device Specific.(Z3)........02000909 Device Specific.(Z4)........FF401050 Device Specific.(Z5)........02C03891 Device Specific.(Z6)........06433891 Device Specific.(Z7)........07433891 Device Specific.(Z8)........20000000C92F4E44 Device Specific.(Z9)........CS3.82A1 Device Specific.(ZA)........C1D3.82A1 Device Specific.(ZB)........C2D3.82A1 Device S...

Oracle Transparent Network Failover Failback (TNFF)

Oracle offers a high availability feature called Oracle Transparent Network Failover Failback (TNFF). In an IBM pSeries environment, this works closely with HACMP/ES. Tip: The Oracle9i RAC network selection mechanism must be well understood before planning and configuring HACMP/ES. Interconnect network selection The Oracle cluster interconnect networks can be configured using either a "private" or a "public" network type defined in HCMP/ES. The Oracle IPC traffic is sent over one of the networks. Oracle RAC chooses the network interface according to the following rules: 1. HACMP "service" networks are required. Oracle chooses only HACMP/ES networks configured as "service". 2. Oracle uses the HACMP cllsif utility to determine the network interface. Oracle uses the HACMP utility cllsif to determine which network interface to use based on the rules described below. Oracle chooses the network to use according to the cllsif alphanumeric...

Typical Oracle RAC configurations

All configurations are based on the following building blocks: Hardware Server nodes Storage Networking Software Operating system Cluster software Oracle RAC (application) Application architecture Oracle9i RAC on RAW devices is based on a shared disk architecture. Figure 2-1 shows a two-node cluster. The lower solid line is the primary Oracle interconnect, the middle dashed line is the secondary Oracle interconnect. For high availability, both these networks should be defined in the HACMP as "private". HACMP/ESCRM provides Oracle9i RAC with the infrastructure for concurrent access to disks. Although HACMP provides concurrent access and a disk locking mechanism, this mechanism is not used. Oracle, instead, provides its own locking mechanism for concurrent data access, integrity, and consistency. Volume groups are varied on all the nodes, thus ensuring short failover time. This type of concurrent access can only be provided for RAW ...

Various Site-to-Site IPSec VPN: Cisco, Juniper, Checkpoint, Sonicwall, Zywall

Introduction Setting up site-to-site IPSec VPN connection in general involves two phases. Phase 1 is called ISAKMP SA (Security Association) establishment and Phase 2 is called IPSec SA establishment. Phase 1 In general, Phase 1 deals with confirmation among sites that are about to establish secure connection across unsecure network. This process is to verify that each site is authorized to establish such connection. Following is further description. Phase 1 is to establish the ISAKMP key matching with remote site. One popular technique of this ISAKMP key matching is to use preshared key. This key is basically a string (combination of alphabets, numbers, and characters) that both sites agree to use. The key is then stored (and encrypted) within each VPN device configuration. Phase 1 in IPSec VPN connection establishment is also involving the remote VPN device IP address (peer). A popular technique is to specifically set the remote peer IP address (for security purposes); know...